spare_scores package

Submodules

spare_scores.classes module

class spare_scores.classes.MetaData(mdl_type: str, mdl_task: str, kernel: str, predictors: list, to_predict: str, key_var: str, params: Any = None, stats: Any = None, cv_folds: Any = None, scaler: Any = None, cv_results: Any = None)[source]

Bases: object

Stores training information on its paired SPARE model

Parameters:

mdl_type (str) – Type of model to be used.
mdl_task (str) – Task of the model to be used.
kernel (str) – Kernel used for SVM.
predictors (list) – List of predictors used for modeling.
to_predict (str) – Target variable for modeling.
key_var (str) – Key variable for modeling.

cv_folds: Any = None

cv_results: Any = None

kernel: str

key_var: str

mdl_task: str

mdl_type: str

params: Any = None

predictors: list

scaler: Any = None

stats: Any = None

to_predict: str

class spare_scores.classes.SpareModel(model_type: str, predictors: list, target: str, key_var: str, verbose: int = 1, parameters: dict = {}, **kwargs: Any)[source]

Bases: object

A class for managing different spare models. Additionally, the class can be initialized with any number of keyword arguments. These will be added as attributes to the class.

Methods:

train_model(df, **kwargs):: Calls the model’s train_model method and returns the result.
apply_model(df):: Calls the model’s apply_model method and returns the result.
set_parameters(**parameters):: Updates the model’s parameters with the provided values. This also changes the model’s attributes, while retaining the original ones.

Parameters:

model_type (str) – Type of model to be used.
target (str) – Target variable for modeling.
key_var (str) – key variable for modeling
verbose (int) – Verbosity level.
parameters (dict) – Additional parameters for the model.

Predictors:

List of predictors used for modeling.

apply_model(df: DataFrame) → Any[source]

get_parameters() → Any[source]

set_parameters(**parameters: Any) → None[source]

train_model(df: DataFrame, **kwargs: Any) → Any[source]

spare_scores.cli module

spare_scores.cli.main() → None[source]

spare_scores.data_prep module

spare_scores.data_prep.age_sex_match(df1: DataFrame, df2: DataFrame | None = None, to_match: str = '', p_threshold: float = 0.15, verbose: int = 1, age_out_percentage: float = 20) → DataFrame[source]

Match two groups for age and sex.

Parameters:

df1 (pandas.DataFrame) – the passed dataframe
df2 (pandas.DataFrame) – optional, if df1 and df2 are two groups to classify.
to_match (str) – a binary variable of two groups. Must be one of the columns in df. Ignored if df2 is given.If to_match is ‘Sex’, then only perform age matching.
p_threshold (float) – minimum p-value for matching. Default value = 0.15
verbose (int) – whether to output messages.(Will be deprecated later)
age_out_percentage (float) – percentage of the larger group to randomly select a participant to take out from during the age matching. For example, if age_out_percentage = 20 and the larger group is significantly older, then exclude one random participant from the fifth quintile based on age. Default value = 20

Returns:

a trimmed pandas dataframe or a tuple of two dataframes with age/sex matched groups.

Return type:

pandas.DataFrame

spare_scores.data_prep.check_test(df: DataFrame, meta_data: dict) → Tuple[str, list] | Tuple[str, None][source]

Checks testing dataframe for errors.

Parameters:

df (pandas.DataFrame) – a pandas dataframe containing testing data.
meta_data (dict) – a dictionary containing training information on its paired SPARE model.

spare_scores.data_prep.check_train(df: DataFrame, predictors: list, to_predict: str, verbose: int = 1, pos_group: str = '') → str | Tuple[DataFrame, list, str][source]

Checks training dataframe for errors.

Parameters:

df (pandas.DataFrame) – a pandas dataframe containing training data.
predictors (list) – a list of predictors for SPARE model training.
to_predict (str) – variable to predict.
pos_group (str) – group to assign a positive SPARE score (only for classification).

Returns:

a tuple containing 1) the filtered dataframe, 2) filtered predictors, 3)SPARE model type.

Return type:

[pandas.DataFrame, list, str]

spare_scores.data_prep.convert_cat_variables(df: DataFrame, predictors: list, meta_data: Any) → Any[source]

spare_scores.data_prep.logging_basic_config(verbose: int = 1, content_only: bool = False, filename: str = '') → Any[source]

Basic logging configuration for error exceptions

Parameters:

verbose (int) – input verbose. Default value = 1
content_only (bool) – If set to True it will output only the needed content. Default value = False
filename (str) – input filename. Default value = ‘’

spare_scores.data_prep.smart_unique(df1: DataFrame, df2: DataFrame | None = None, to_predict: str = '') → str | DataFrame | tuple[source]

Select unique data points in a way that optimizes SPARE training. For SPARE regression, preserve data points with extreme values. For SPARE classification, preserve data points that help age match.

Parameters:

df1 (pandas.DataFrame) – the passed dataframe
df2 (pandas.DataFrame) – optional, if df1 and df2 are two groups to classify.
to_predict (str) – variable to predict. Binary for classification and continuous for regression. Must be one of the columnes in df. Ignored if df2 is given.

Returns:

a trimmed pandas dataframe or a tuple of two dataframes with only one time point per ID.

Return type:

pandas.DataFrame

spare_scores.mlp module

class spare_scores.mlp.MLPModel(predictors: list, to_predict: str, key_var: str, verbose: int = 1, **kwargs: Any)[source]

Bases: object

A class for managing MLP models. Additionally, the class can be initialized with any number of keyword arguments. These will be added as attributes to the class.

Parameters:

predictors (list) – List of predictors used for modeling.
to_predict (str) – Target variable for modeling.
key_var (str) – Key variable for modeling.

fit(df: DataFrame, verbose: int = 1) → dict[source]

Trains the model using the provided dataframe and default parameters.

Parameters:

df (pandas.DataFrame) – the provided dataframe.
verbose (int) – the verbosity level

Returns:

A dictionary with the results from training.

Return type:

dict

get_stats(y: ndarray, y_hat: ndarray) → None[source]

Return the stats from the training

Parameters:

y (np.ndarray) – original labels
y_hat (np.ndarray) – predicted values

output_stats() → None[source]

predict(df: DataFrame) → ndarray[source]

Predicts the result of the provided dataframe using the trained model.

Parameters:: df (pandas.DataFrame) – the provided dataframe.
Returns:: The predictions from the trained model regarding the provided dataframe.
Return type:: np.ndarray

set_parameters(**parameters: Any) → None[source]

spare_scores.mlp_torch module

class spare_scores.mlp_torch.MLPDataset(X: list, y: list)[source]

Bases: Dataset

A class for managing datasets that will be used for MLP training

Parameters:

X (list) – the first dimension of the provided data(input)
y (list) – the second dimension of the provided data(output)

class spare_scores.mlp_torch.MLPTorchModel(predictors: list, to_predict: str, key_var: str, verbose: int = 1, **kwargs: Any)[source]

Bases: object

A class for managing MLP models.

Parameters:

predictors (list) – List of predictors used for modeling.
to_predict (str) – Target variable for modeling.
key_var (str) – Key variable for modeling.

Additionally, the class can be initialized with any number of keyword arguments. These will be added as attributes to the class.

find_best_threshold(y_hat: list, y: list) → Any[source]

Returns best threshold value using the roc_curve

Parameters:

y_hat (list) – predicted values
y (list) – original labels

Returns:

the best threshold value

Return type:

List

fit(df: DataFrame, verbose: int = 1, **kwargs: Any) → dict[source]

get_all_stats(y_hat: list, y: list, classification: bool = True) → dict[source]

Returns all stats from training in a dictionary

Parameters:: y (list) – ground truth y (1: AD, 0: CN) -> numpy

:param y_hat:predicted y -> numpy, notice y_hat is predicted value [0.2, 0.8, 0.1 …] :type y_hat: list

Returns:: A dictionary with the Accuracy, F1 score, Sensitivity, Specificity, Balanced Accuracy, Precision, Recall
Return type:: dict

object(trial: Any) → float[source]

output_stats() → None[source]

predict(df: DataFrame) → ndarray[source]

set_parameters(**parameters: Any) → None[source]

class spare_scores.mlp_torch.SimpleMLP(hidden_size: int = 256, classification: bool = True, dropout: float = 0.2, use_bn: bool = False, bn: str = 'bn')[source]

Bases: Module

A class to create a simple MLP model.

Parameters:

num_features (int) – total number of features. Default value = 147.
hidden_size (int) – number of features that will be passed to normalization layers of the model. Default value = 256.
classification (bool) – If set to True, then the model will perform classification, otherwise, regression. Default value = True.
dropout (float) – the dropout value.
use_bn (bool) – if set to True, then the model will use the normalization layers, otherwise, the model will use the linear layers.
bn (str) – if set to ‘bn’ the model will use BatchNorm1d() for the hidden layers, otherwise, it will use InstanceNorm1d().

forward(x: Tensor) → Tensor[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

spare_scores.spare module

spare_scores.spare.spare_test(df: DataFrame | str, mdl_path: str | Tuple[dict, dict], key_var: str = '', output: str = '', spare_var: str = 'SPARE_score', verbose: int = 1, logs: str = '') → DataFrame[source]

Applies a trained SPARE model on a test dataset

Parameters:

df (pandas.DataFrame) – either a pandas dataframe or a path to a saved csv containing the test sample.
mdl_path (str) – either a path to a saved SPARE model (‘.pkl.gz’ file extension expected) or a tuple of SPARE model and meta_data.
key_var (str) – The of key variable to be used for training. If not given, and the saved model does not contain it,the first column of the dataset is considered the primary key of the dataset.
output (str) – path to save the calculated scores. ‘.csv’ file extension optional. If None is given, no data will be saved.
spare_var (str) – The name of the variable to be predicted. If not given, the name ‘SPARE_score’ will be used.
verbose (int) – Verbosity. Int, higher is more verbose. [0,1,2]
logs (str) – Where to save log file. If not given, logs will only be printed out.

Returns:

A dictionary with three keys, ‘status_code’, ‘status’ and ‘data’. ‘status’ is either ‘OK’ or the error message. ‘data’ is the pandas dataframe containing predicted SPARE scores, or None / error object if unsuccessful. ‘status_code’ is either 0, 1 or 2. 0 is success, 1 is warning, 2 is error.

Return type:

dict

spare_scores.spare.spare_train(df: DataFrame | str, to_predict: str, model_type: str = 'SVM', pos_group: str = '', key_var: str = '', data_vars: list = [], ignore_vars: list = [], kernel: str = 'linear', output: str = '', verbose: int = 1, logs: str = '', **kwargs: Any) → dict[source]

Trains a SPARE model, either classification or regression

Parameters:

df (pandas.DataFrame) – either a pandas dataframe or a path to a saved csv containing training data.
to_predict (str) – variable to predict. Binary for classification and continuous for regression. Must be one of the columnes in df.
pos_group (str) – group to assign a positive SPARE score (only for classification).
key_var (str) – The key variable to be used for training. If not given, the first column of the dataset is considered the primary key of the dataset.
data_vars (list) – a list of predictors for the training. All must be present in columns of df.

:param ignore_vars:The list of predictors to be ignored for training. Can be: a listkey_var, or empty.

Parameters:

kernel (str) – ‘linear’ or ‘rbf’ (only linear is supported currently in regression).
output (str) – path to save the trained model. ‘.pkl.gz’ file extension optional. If None is given, no model will be saved.
verbose (int) – Verbosity. Int, higher is more verbose. [0,1,2]
logs (str) – Where to save log file. If not given, logs will only be printed out.

Returns:

A dictionary with three keys, ‘status_code’, ‘status’ and ‘data’. ‘status’ is either’OK’ or the error message. ‘data’ is a dictionary containing the trained model and metadata if successful, or None / error object if unsuccessful. ‘status_code’ is either 0, 1 or 2. 0 is success, 1 is warning, 2 is error.

Return type:

dict

spare_scores.svm module

class spare_scores.svm.SVMModel(predictors: list, to_predict: str, key_var: str, verbose: int = 1, **kwargs: Any)[source]

Bases: object

A class for managing SVM models. Additionally, the class can be initialized with any number of keyword arguments. These will be added as attributes to the class.

Methods:

train_model(df, **kwargs):: Trains the model using the provided dataframe.
apply_model(df):: Applies the trained model on the provided dataframe and returns the predictions.
set_parameters(**parameters):: Updates the model’s parameters with the provided values. This also changes the model’s attributes, while retaining the original ones.

Parameters:

predictors (list) – List of predictors used for modeling.
to_predict (str) – Target variable for modeling.
key_var (str) – Key variable for modeling.

correct_reg_bias(fold: Any, y_test: list) → Any[source]

fit(df: DataFrame, verbose: int = 1, **kwargs: Any) → dict[source]

get_stats(y_test: ndarray, y_score: ndarray) → None[source]

output_stats() → None[source]

param_search(mdl_i: Any, X_train: list, y_train: list, scoring: Any) → Any[source]

predict(df: DataFrame, verbose: int = 1) → ndarray[source]

prepare_sample(df: DataFrame, fold: Any, scaler: Any, classify: Any = None) → Any[source]

run_CV(df: DataFrame) → None[source]

set_parameters(**parameters: Any) → None[source]

train_initialize(df: DataFrame, to_predict: str) → None[source]

spare_scores.util module

spare_scores.util.add_file_extension(filename: str, extension: str) → str[source]

Adds file extension to needed file

Parameters:

filename (str) – The path to the file
extension (str) – The wanted extension(i.e. .txt, .csv, etc)

Returns:

The filename

Return type:

str

spare_scores.util.check_file_exists(filename: str | None, logger: Any) → Any[source]

Checks if file exists

Parameters:

filename (str) – The file that will be searched
logger (logging.basicConfig) – Output logger

Returns:

True if file exists, False otherwise

Return type:

bool

spare_scores.util.convert_to_number_if_possible(string: str) → float | str[source]

Converts the the input string to a float if possible

Parameters:: string (str) – the input string
Returns:: float if the string is numeric, the same string if it’s not
Return type:: float or str

spare_scores.util.expspace(span: list) → ndarray[source]

spare_scores.util.is_unique_identifier(df: DataFrame, column_names: list) → bool[source]

Checks if the passed dataframe is a unique identifier

Parameters:

df (pandas.DataFrame) – The passed dataframe
column_names (list) – The passed column names

Returns:

True if the passed data frame is a unique identifier False otherwise

Return type:

bool

spare_scores.util.load_df(df: DataFrame | str) → DataFrame[source]

Fast loader for dataframes

Parameters:: df (Union[pd.DataFrame, str]) – Either pd.DataFrame or path to the .csv file
Returns:: The dataframe
Return type:: pd.DataFrame

spare_scores.util.load_examples(file_name: str = '') → Any[source]

Loads example data and models in the package.

Parameters:: file_name – either name of the example data saved as .csv or

name of the SPARE model saved as .pkl.gz. :type file_name: str

Returns:: the resulted dataframe
Return type:: None or pandas.DataFrame

spare_scores.util.load_model(mdl_path: str) → Any[source]

Loads the model from the passed path

Parameters:: mdl_path (str) – the path to the weights of the model

spare_scores.util.save_file(result: Any, output: str, action: str, logger: Any) → None[source]

Saves the results in a file depending the action

Parameters:

result (Either .csv or pandas.DataFrame depending on the action) – The results that will be dumped into the file
output (str) – The output filename
action (str) – Either ‘train’ or ‘test’ depending on the action
logger (logging.basicConfig) – Output logger

spare_scores package

Submodules

spare_scores.classes module

spare_scores.cli module

spare_scores.data_prep module

spare_scores.mlp module

spare_scores.mlp_torch module

spare_scores.spare module

spare_scores.svm module

spare_scores.util module

Module contents