Data Pre-Processing

Users can bring their own dataset to train SPARE models. Here are steps to prepare your dataset for the training:

1. Load your dataset as a Pandas DataFrame

import pandas as pd

df = pd.read_csv('your_dataset.csv', low_memory=False)

2. Define a column to predict and columns to use as predictors

to_predict = 'binary_variable' # for a SPARE classification model
to_predict = 'continuous_variable" # for a SPARE regression model

predictors = df.columns.str.startswith('MUSE_Volume_')
# categorical variables with more than 2 categories are currently not supported.

3. (Optional) Select unique timepoints for a longitudinal dataset

import spare_scores.data_prep as data_prep

df = data_prep.smart_unique(df, to_predict=to_predict)
# selects unique timepoints in a way that optimizes SPARE training.

4. (Optional) Match two groups to classify for age and sex

df = data_prep.age_sex_match(df, to_match=to_predict, p_threshold=0.15)
# matches two groups for age and sex in a way that optimizes SPARE training.