← Back to portfolio

MyMigraine, 2024

Independent research with Duke University (advisor: Prof. Zbigniew Kabala) — Predictive modeling of migraine occurrence & duration using Wolfram Mathematica

Skills: Wolfram Mathematica, data cleaning, feature engineering, classification & regression, cross-validation, data visualization.

Context

Overview

Project: Independent research project on migraine prediction using machine learning.

Goal: Build and evaluate predictive models to forecast migraine occurrence and estimate attack duration.

Timeline: Independent study (self-directed)

Advisor

Advisor: Prof. Zbigniew Kabala (Duke University)

Collaboration: Independent researcher with periodic advisor feedback and review.

My Role

Researcher: End-to-end workflow: data cleaning, feature engineering, modeling, and evaluation using Wolfram Mathematica.

Deliverables: A 10-page research article entitled "Predicting the Occurrence and the Duration of Migraine Attacks Using Machine Learning", plus reproducible Mathematica notebooks.

Problem

Migraine attacks are highly disruptive, but patients and clinicians currently have limited tools to anticipate onset or estimate duration. This project was personal: watching my mom suffer frequent attacks motivated me to explore practical forecasting approaches that could meaningfully improve daily management. Better short-term forecasts can improve symptom management, medication timing, and clinical decision-making.

This research asked: can machine learning models trained on combined patient-reported and sensor-derived data provide reliable signals to predict whether a migraine will occur in the next 24 hours, and estimate how long an attack will last?

Research

I performed a focused literature review and assembled a multi-source dataset composed of patient-reported diaries and auxiliary sensor-derived signals (where available). The research focused on three tasks: exploratory data analysis, feature engineering for temporal patterns, and building predictive models for both occurrence (classification) and duration (regression).

Data & preprocessing

  1. Aggregated and harmonized heterogeneous data sources (self-reports, timestamped event logs, and basic sensor summaries).
  2. Data cleaning: missing-value handling, outlier detection, timestamp alignment, and normalization using Mathematica's data pipelines.
  3. Feature engineering: derived temporal features (time-since-last-attack, circadian phase proxies), rolling-window statistics, and simple contextual covariates.

Modeling approach

  1. Framed occurrence as a binary classification problem (will an attack occur in the next 24 hours?).
  2. Framed duration as a regression problem for predicted attack length conditioned on occurrence.
  3. Implemented baseline and tree-based models (logistic regression, decision-tree ensembles) using Mathematica's built-in functions, with k-fold cross-validation for robust evaluation.

All analyses and figures were produced reproducibly in Wolfram Mathematica notebooks; notebooks and the 10-page article document the pipeline, assumptions, and limitations.

Results

The analyses produced a detectable predictive signal for short-term migraine occurrence and identified several temporally informative features. Cross-validated evaluation indicated that our models performed consistently better than simple baselines on the assembled dataset, while interpretability checks (feature importances and sensitivity analyses) surfaced clinically plausible covariates. All results were produced from reproducible Mathematica notebooks to ensure transparency and repeatability.

Key findings

  1. Temporal features (lagged counts and rolling-window statistics) and simple circadian proxies were among the most informative predictors.
  2. Cross-validation and bootstrap resampling suggested the models generalize better than naive baselines, though additional data would strengthen confidence.
  3. Interpretability analyses helped prioritize features that are actionable for patient self-management and clinician review.

Artifacts

  1. 10-page research article: "Predicting the Occurrence and the Duration of Migraine Attacks Using Machine Learning" (analysis, methods, figures).
  2. Reproducible Mathematica notebooks containing data-cleaning scripts, model training code, and figure generation.

Product & Impact

To make the research useful, I implemented a lightweight mobile app that runs the model on user inputs (hormonal, environmental, dietary, and self-reported triggers) and returns a short-term probability of an upcoming attack plus a duration estimate. The demo interface was used in as a DECA competition project to communicate the idea and show a working prototype.

During the DECA competition I presented a live demo of the app and pitched the product narrative. The project received a DECA International — Excellence Award (top 15%) for the prototype and presentation. The app also records user entries to further personalize predictions and to collect additional labeled data for model refinement.

Because the prototype was intended for demonstration and early user feedback, emphasis was placed on clear inputs, transparent output confidence, and privacy-conscious storage of user data (local-first with opt-in upload for model improvement).

Reflection

This independent study reinforced the importance of clean, well-documented data pipelines and careful validation when working with health signals. Wolfram Mathematica proved effective for rapid exploratory analysis and reproducible figure generation, while cross-validation and resampling were critical to avoid overfitting on limited datasets. The work felt personal—designing models that could help someone like my mom shaped my emphasis on clarity and clinical usefulness.

Key lessons: prioritize interpretability for clinical relevance, treat missing data thoughtfully, and collaborate with domain experts early to surface realistic covariates. Next steps would include expanding the dataset, incorporating richer sensor streams, and exploring time-series-specific models (e.g., sequence models) for improved short-term forecasting.