spare_scores source code

Submodules

spare_scores.classes module

class spare_scores.classes.MetaData(mdl_type: str, mdl_task: str, kernel: str, predictors: list, to_predict: str, key_var: str, params: Any = None, stats: Any = None, cv_folds: Any = None, scaler: Any = None, cv_results: Any = None)[source]

Bases: object

Stores training information on its paired SPARE model

Parameters:
  • mdl_type (str) – Type of model to be used.

  • mdl_task (str) – Task of the model to be used.

  • kernel (str) – Kernel used for SVM.

  • predictors (list) – List of predictors used for modeling.

  • to_predict (str) – Target variable for modeling.

  • key_var (str) – Key variable for modeling.

cv_folds: Any = None
cv_results: Any = None
kernel: str
key_var: str
mdl_task: str
mdl_type: str
params: Any = None
predictors: list
scaler: Any = None
stats: Any = None
to_predict: str
class spare_scores.classes.SpareModel(model_type: str, predictors: list, target: str, key_var: str, verbose: int = 1, parameters: dict = {}, **kwargs: Any)[source]

Bases: object

A class for managing different spare models. Additionally, the class can be initialized with any number of keyword arguments. These will be added as attributes to the class.

Methods:
train_model(df, **kwargs):

Calls the model’s train_model method and returns the result.

apply_model(df):

Calls the model’s apply_model method and returns the result.

set_parameters(**parameters):

Updates the model’s parameters with the provided values. This also changes the model’s attributes, while retaining the original ones.

Parameters:
  • model_type (str) – Type of model to be used.

  • target (str) – Target variable for modeling.

  • key_var (str) – key variable for modeling

  • verbose (int) – Verbosity level.

  • parameters (dict) – Additional parameters for the model.

Predictors:

List of predictors used for modeling.

apply_model(df: DataFrame) Any[source]
get_parameters() Any[source]
set_parameters(**parameters: Any) None[source]
train_model(df: DataFrame, **kwargs: Any) Any[source]

spare_scores.cli module

spare_scores.cli.main() None[source]

spare_scores.data_prep module

spare_scores.data_prep.age_sex_match(df1: DataFrame, df2: DataFrame | None = None, to_match: str = '', p_threshold: float = 0.15, verbose: int = 1, age_out_percentage: float = 20) DataFrame[source]

Match two groups for age and sex.

Parameters:
  • df1 (pandas.DataFrame) – the passed dataframe

  • df2 (pandas.DataFrame) – optional, if df1 and df2 are two groups to classify.

  • to_match (str) – a binary variable of two groups. Must be one of the columns in df. Ignored if df2 is given.If to_match is ‘Sex’, then only perform age matching.

  • p_threshold (float) – minimum p-value for matching. Default value = 0.15

  • verbose (int) – whether to output messages.(Will be deprecated later)

  • age_out_percentage (float) – percentage of the larger group to randomly select a participant to take out from during the age matching. For example, if age_out_percentage = 20 and the larger group is significantly older, then exclude one random participant from the fifth quintile based on age. Default value = 20

Returns:

a trimmed pandas dataframe or a tuple of two dataframes with age/sex matched groups.

Return type:

pandas.DataFrame

spare_scores.data_prep.check_test(df: DataFrame, meta_data: dict) Tuple[str, list] | Tuple[str, None][source]

Checks testing dataframe for errors.

Parameters:
  • df (pandas.DataFrame) – a pandas dataframe containing testing data.

  • meta_data (dict) – a dictionary containing training information on its paired SPARE model.

spare_scores.data_prep.check_train(df: DataFrame, predictors: list, to_predict: str, verbose: int = 1, pos_group: str = '') str | Tuple[DataFrame, list, str][source]

Checks training dataframe for errors.

Parameters:
  • df (pandas.DataFrame) – a pandas dataframe containing training data.

  • predictors (list) – a list of predictors for SPARE model training.

  • to_predict (str) – variable to predict.

  • pos_group (str) – group to assign a positive SPARE score (only for classification).

Returns:

a tuple containing 1) the filtered dataframe, 2) filtered predictors, 3)SPARE model type.

Return type:

[pandas.DataFrame, list, str]

spare_scores.data_prep.convert_cat_variables(df: DataFrame, predictors: list, meta_data: Any) Any[source]
spare_scores.data_prep.logging_basic_config(verbose: int = 1, content_only: bool = False, filename: str = '') Any[source]

Basic logging configuration for error exceptions

Parameters:
  • verbose (int) – input verbose. Default value = 1

  • content_only (bool) – If set to True it will output only the needed content. Default value = False

  • filename (str) – input filename. Default value = ‘’

spare_scores.data_prep.smart_unique(df1: DataFrame, df2: DataFrame | None = None, to_predict: str = '') str | DataFrame | tuple[source]

Select unique data points in a way that optimizes SPARE training. For SPARE regression, preserve data points with extreme values. For SPARE classification, preserve data points that help age match.

Parameters:
  • df1 (pandas.DataFrame) – the passed dataframe

  • df2 (pandas.DataFrame) – optional, if df1 and df2 are two groups to classify.

  • to_predict (str) – variable to predict. Binary for classification and continuous for regression. Must be one of the columnes in df. Ignored if df2 is given.

Returns:

a trimmed pandas dataframe or a tuple of two dataframes with only one time point per ID.

Return type:

pandas.DataFrame

spare_scores.mlp module

class spare_scores.mlp.MLPModel(predictors: list, to_predict: str, key_var: str, verbose: int = 1, **kwargs: Any)[source]

Bases: object

A class for managing MLP models. Additionally, the class can be initialized with any number of keyword arguments. These will be added as attributes to the class.

Parameters:
  • predictors (list) – List of predictors used for modeling.

  • to_predict (str) – Target variable for modeling.

  • key_var (str) – Key variable for modeling.

fit(df: DataFrame, verbose: int = 1) dict[source]

Trains the model using the provided dataframe and default parameters.

Parameters:
  • df (pandas.DataFrame) – the provided dataframe.

  • verbose (int) – the verbosity level

Returns:

A dictionary with the results from training.

Return type:

dict

get_stats(y: ndarray, y_hat: ndarray) None[source]

Return the stats from the training

Parameters:
  • y (np.ndarray) – original labels

  • y_hat (np.ndarray) – predicted values

output_stats() None[source]
predict(df: DataFrame) ndarray[source]

Predicts the result of the provided dataframe using the trained model.

Parameters:

df (pandas.DataFrame) – the provided dataframe.

Returns:

The predictions from the trained model regarding the provided dataframe.

Return type:

np.ndarray

set_parameters(**parameters: Any) None[source]

spare_scores.mlp_torch module

class spare_scores.mlp_torch.MLPDataset(X: list, y: list)[source]

Bases: Dataset

A class for managing datasets that will be used for MLP training

Parameters:
  • X (list) – the first dimension of the provided data(input)

  • y (list) – the second dimension of the provided data(output)

class spare_scores.mlp_torch.MLPTorchModel(predictors: list, to_predict: str, key_var: str, verbose: int = 1, **kwargs: Any)[source]

Bases: object

A class for managing MLP models.

Parameters:
  • predictors (list) – List of predictors used for modeling.

  • to_predict (str) – Target variable for modeling.

  • key_var (str) – Key variable for modeling.

Additionally, the class can be initialized with any number of keyword arguments. These will be added as attributes to the class.

find_best_threshold(y_hat: list, y: list) Any[source]

Returns best threshold value using the roc_curve

Parameters:
  • y_hat (list) – predicted values

  • y (list) – original labels

Returns:

the best threshold value

Return type:

List

fit(df: DataFrame, verbose: int = 1, **kwargs: Any) dict[source]
get_all_stats(y_hat: list, y: list, classification: bool = True) dict[source]

Returns all stats from training in a dictionary

Parameters:

y (list) – ground truth y (1: AD, 0: CN) -> numpy

:param y_hat:predicted y -> numpy, notice y_hat is predicted value [0.2, 0.8, 0.1 …] :type y_hat: list

Returns:

A dictionary with the Accuracy, F1 score, Sensitivity, Specificity, Balanced Accuracy, Precision, Recall

Return type:

dict

object(trial: Any) float[source]
output_stats() None[source]
predict(df: DataFrame) ndarray[source]
set_parameters(**parameters: Any) None[source]
class spare_scores.mlp_torch.SimpleMLP(hidden_size: int = 256, classification: bool = True, dropout: float = 0.2, use_bn: bool = False, bn: str = 'bn')[source]

Bases: Module

A class to create a simple MLP model.

Parameters:
  • num_features (int) – total number of features. Default value = 147.

  • hidden_size (int) – number of features that will be passed to normalization layers of the model. Default value = 256.

  • classification (bool) – If set to True, then the model will perform classification, otherwise, regression. Default value = True.

  • dropout (float) – the dropout value.

  • use_bn (bool) – if set to True, then the model will use the normalization layers, otherwise, the model will use the linear layers.

  • bn (str) – if set to ‘bn’ the model will use BatchNorm1d() for the hidden layers, otherwise, it will use InstanceNorm1d().

forward(x: Tensor) Tensor[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

spare_scores.spare_scores module

spare_scores.svm module

class spare_scores.svm.SVMModel(predictors: list, to_predict: str, key_var: str, verbose: int = 1, **kwargs: Any)[source]

Bases: object

A class for managing SVM models. Additionally, the class can be initialized with any number of keyword arguments. These will be added as attributes to the class.

Methods:
train_model(df, **kwargs):

Trains the model using the provided dataframe.

apply_model(df):

Applies the trained model on the provided dataframe and returns the predictions.

set_parameters(**parameters):

Updates the model’s parameters with the provided values. This also changes the model’s attributes, while retaining the original ones.

Parameters:
  • predictors (list) – List of predictors used for modeling.

  • to_predict (str) – Target variable for modeling.

  • key_var (str) – Key variable for modeling.

correct_reg_bias(fold: Any, y_test: list) Any[source]
fit(df: DataFrame, verbose: int = 1, **kwargs: Any) dict[source]
get_stats(y_test: ndarray, y_score: ndarray) None[source]
output_stats() None[source]
predict(df: DataFrame, verbose: int = 1) ndarray[source]
prepare_sample(df: DataFrame, fold: Any, scaler: Any, classify: Any = None) Any[source]
run_CV(df: DataFrame) None[source]
set_parameters(**parameters: Any) None[source]
train_initialize(df: DataFrame, to_predict: str) None[source]

spare_scores.util module

spare_scores.util.add_file_extension(filename: str, extension: str) str[source]

Adds file extension to needed file

Parameters:
  • filename (str) – The path to the file

  • extension (str) – The wanted extension(i.e. .txt, .csv, etc)

Returns:

The filename

Return type:

str

spare_scores.util.check_file_exists(filename: str, logger: Any) Any[source]

Checks if file exists

Parameters:
  • filename (str) – The file that will be searched

  • logger (logging.basicConfig) – Output logger

Returns:

True if file exists, False otherwise

Return type:

bool

spare_scores.util.convert_to_number_if_possible(string: str) float | str[source]

Converts the the input string to a float if possible

Parameters:

string (str) – the input string

Returns:

float if the string is numeric, the same string if it’s not

Return type:

float or str

spare_scores.util.expspace(span: list) ndarray[source]
spare_scores.util.is_unique_identifier(df: DataFrame, column_names: list) bool[source]

Checks if the passed dataframe is a unique identifier

Parameters:
  • df (pandas.DataFrame) – The passed dataframe

  • column_names (list) – The passed column names

Returns:

True if the passed data frame is a unique identifier False otherwise

Return type:

bool

spare_scores.util.load_df(df: DataFrame | str) DataFrame[source]

Fast loader for dataframes

Parameters:

df (Union[pd.DataFrame, str]) – Either pd.DataFrame or path to the .csv file

Returns:

The dataframe

Return type:

pd.DataFrame

spare_scores.util.load_examples(file_name: str = '') Any[source]

Loads example data and models in the package.

Parameters:

file_name – either name of the example data saved as .csv or

name of the SPARE model saved as .pkl.gz. :type file_name: str

Returns:

the resulted dataframe

Return type:

None or pandas.DataFrame

spare_scores.util.load_model(mdl_path: str) Any[source]

Loads the model from the passed path

Parameters:

mdl_path (str) – the path to the weights of the model

spare_scores.util.save_file(result: Any, output: str, action: str, logger: Any) None[source]

Saves the results in a file depending the action

Parameters:
  • result (Either .csv or pandas.DataFrame depending on the action) – The results that will be dumped into the file

  • output (str) – The output filename

  • action (str) – Either ‘train’ or ‘test’ depending on the action

  • logger (logging.basicConfig) – Output logger

Module contents