spare_scores package
Submodules
spare_scores.classes module
- class spare_scores.classes.MetaData(mdl_type: str, mdl_task: str, kernel: str, predictors: list, to_predict: str, key_var: str, params: Any = None, stats: Any = None, cv_folds: Any = None, scaler: Any = None, cv_results: Any = None)[source]
Bases:
object
Stores training information on its paired SPARE model
- Parameters:
mdl_type (str) – Type of model to be used.
mdl_task (str) – Task of the model to be used.
kernel (str) – Kernel used for SVM.
predictors (list) – List of predictors used for modeling.
to_predict (str) – Target variable for modeling.
key_var (str) – Key variable for modeling.
- cv_folds: Any = None
- cv_results: Any = None
- kernel: str
- key_var: str
- mdl_task: str
- mdl_type: str
- params: Any = None
- predictors: list
- scaler: Any = None
- stats: Any = None
- to_predict: str
- class spare_scores.classes.SpareModel(model_type: str, predictors: list, target: str, key_var: str, verbose: int = 1, parameters: dict = {}, **kwargs: Any)[source]
Bases:
object
A class for managing different spare models. Additionally, the class can be initialized with any number of keyword arguments. These will be added as attributes to the class.
- Methods:
- train_model(df, **kwargs):
Calls the model’s train_model method and returns the result.
- apply_model(df):
Calls the model’s apply_model method and returns the result.
- set_parameters(**parameters):
Updates the model’s parameters with the provided values. This also changes the model’s attributes, while retaining the original ones.
- Parameters:
model_type (str) – Type of model to be used.
target (str) – Target variable for modeling.
key_var (str) – key variable for modeling
verbose (int) – Verbosity level.
parameters (dict) – Additional parameters for the model.
- Predictors:
List of predictors used for modeling.
spare_scores.cli module
spare_scores.data_prep module
- spare_scores.data_prep.age_sex_match(df1: DataFrame, df2: DataFrame | None = None, to_match: str = '', p_threshold: float = 0.15, verbose: int = 1, age_out_percentage: float = 20) DataFrame [source]
Match two groups for age and sex.
- Parameters:
df1 (pandas.DataFrame) – the passed dataframe
df2 (pandas.DataFrame) – optional, if df1 and df2 are two groups to classify.
to_match (str) – a binary variable of two groups. Must be one of the columns in df. Ignored if df2 is given.If to_match is ‘Sex’, then only perform age matching.
p_threshold (float) – minimum p-value for matching. Default value = 0.15
verbose (int) – whether to output messages.(Will be deprecated later)
age_out_percentage (float) – percentage of the larger group to randomly select a participant to take out from during the age matching. For example, if age_out_percentage = 20 and the larger group is significantly older, then exclude one random participant from the fifth quintile based on age. Default value = 20
- Returns:
a trimmed pandas dataframe or a tuple of two dataframes with age/sex matched groups.
- Return type:
pandas.DataFrame
- spare_scores.data_prep.check_test(df: DataFrame, meta_data: dict) Tuple[str, list] | Tuple[str, None] [source]
Checks testing dataframe for errors.
- Parameters:
df (pandas.DataFrame) – a pandas dataframe containing testing data.
meta_data (dict) – a dictionary containing training information on its paired SPARE model.
- spare_scores.data_prep.check_train(df: DataFrame, predictors: list, to_predict: str, verbose: int = 1, pos_group: str = '') str | Tuple[DataFrame, list, str] [source]
Checks training dataframe for errors.
- Parameters:
df (pandas.DataFrame) – a pandas dataframe containing training data.
predictors (list) – a list of predictors for SPARE model training.
to_predict (str) – variable to predict.
pos_group (str) – group to assign a positive SPARE score (only for classification).
- Returns:
a tuple containing 1) the filtered dataframe, 2) filtered predictors, 3)SPARE model type.
- Return type:
[pandas.DataFrame, list, str]
- spare_scores.data_prep.convert_cat_variables(df: DataFrame, predictors: list, meta_data: Any) Any [source]
- spare_scores.data_prep.logging_basic_config(verbose: int = 1, content_only: bool = False, filename: str = '') Any [source]
Basic logging configuration for error exceptions
- Parameters:
verbose (int) – input verbose. Default value = 1
content_only (bool) – If set to True it will output only the needed content. Default value = False
filename (str) – input filename. Default value = ‘’
- spare_scores.data_prep.smart_unique(df1: DataFrame, df2: DataFrame | None = None, to_predict: str = '') str | DataFrame | tuple [source]
Select unique data points in a way that optimizes SPARE training. For SPARE regression, preserve data points with extreme values. For SPARE classification, preserve data points that help age match.
- Parameters:
df1 (pandas.DataFrame) – the passed dataframe
df2 (pandas.DataFrame) – optional, if df1 and df2 are two groups to classify.
to_predict (str) – variable to predict. Binary for classification and continuous for regression. Must be one of the columnes in df. Ignored if df2 is given.
- Returns:
a trimmed pandas dataframe or a tuple of two dataframes with only one time point per ID.
- Return type:
pandas.DataFrame
spare_scores.mlp module
- class spare_scores.mlp.MLPModel(predictors: list, to_predict: str, key_var: str, verbose: int = 1, **kwargs: Any)[source]
Bases:
object
A class for managing MLP models. Additionally, the class can be initialized with any number of keyword arguments. These will be added as attributes to the class.
- Parameters:
predictors (list) – List of predictors used for modeling.
to_predict (str) – Target variable for modeling.
key_var (str) – Key variable for modeling.
- fit(df: DataFrame, verbose: int = 1) dict [source]
Trains the model using the provided dataframe and default parameters.
- Parameters:
df (pandas.DataFrame) – the provided dataframe.
verbose (int) – the verbosity level
- Returns:
A dictionary with the results from training.
- Return type:
dict
- get_stats(y: ndarray, y_hat: ndarray) None [source]
Return the stats from the training
- Parameters:
y (np.ndarray) – original labels
y_hat (np.ndarray) – predicted values
spare_scores.mlp_torch module
- class spare_scores.mlp_torch.MLPDataset(X: list, y: list)[source]
Bases:
Dataset
A class for managing datasets that will be used for MLP training
- Parameters:
X (list) – the first dimension of the provided data(input)
y (list) – the second dimension of the provided data(output)
- class spare_scores.mlp_torch.MLPTorchModel(predictors: list, to_predict: str, key_var: str, verbose: int = 1, **kwargs: Any)[source]
Bases:
object
A class for managing MLP models.
- Parameters:
predictors (list) – List of predictors used for modeling.
to_predict (str) – Target variable for modeling.
key_var (str) – Key variable for modeling.
Additionally, the class can be initialized with any number of keyword arguments. These will be added as attributes to the class.
- find_best_threshold(y_hat: list, y: list) Any [source]
Returns best threshold value using the roc_curve
- Parameters:
y_hat (list) – predicted values
y (list) – original labels
- Returns:
the best threshold value
- Return type:
List
- get_all_stats(y_hat: list, y: list, classification: bool = True) dict [source]
Returns all stats from training in a dictionary
- Parameters:
y (list) – ground truth y (1: AD, 0: CN) -> numpy
:param y_hat:predicted y -> numpy, notice y_hat is predicted value [0.2, 0.8, 0.1 …] :type y_hat: list
- Returns:
A dictionary with the Accuracy, F1 score, Sensitivity, Specificity, Balanced Accuracy, Precision, Recall
- Return type:
dict
- class spare_scores.mlp_torch.SimpleMLP(hidden_size: int = 256, classification: bool = True, dropout: float = 0.2, use_bn: bool = False, bn: str = 'bn')[source]
Bases:
Module
A class to create a simple MLP model.
- Parameters:
num_features (int) – total number of features. Default value = 147.
hidden_size (int) – number of features that will be passed to normalization layers of the model. Default value = 256.
classification (bool) – If set to True, then the model will perform classification, otherwise, regression. Default value = True.
dropout (float) – the dropout value.
use_bn (bool) – if set to True, then the model will use the normalization layers, otherwise, the model will use the linear layers.
bn (str) – if set to ‘bn’ the model will use BatchNorm1d() for the hidden layers, otherwise, it will use InstanceNorm1d().
- forward(x: Tensor) Tensor [source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
spare_scores.spare_scores module
spare_scores.svm module
- class spare_scores.svm.SVMModel(predictors: list, to_predict: str, key_var: str, verbose: int = 1, **kwargs: Any)[source]
Bases:
object
A class for managing SVM models. Additionally, the class can be initialized with any number of keyword arguments. These will be added as attributes to the class.
- Methods:
- train_model(df, **kwargs):
Trains the model using the provided dataframe.
- apply_model(df):
Applies the trained model on the provided dataframe and returns the predictions.
- set_parameters(**parameters):
Updates the model’s parameters with the provided values. This also changes the model’s attributes, while retaining the original ones.
- Parameters:
predictors (list) – List of predictors used for modeling.
to_predict (str) – Target variable for modeling.
key_var (str) – Key variable for modeling.
spare_scores.util module
- spare_scores.util.add_file_extension(filename: str, extension: str) str [source]
Adds file extension to needed file
- Parameters:
filename (str) – The path to the file
extension (str) – The wanted extension(i.e. .txt, .csv, etc)
- Returns:
The filename
- Return type:
str
- spare_scores.util.check_file_exists(filename: str, logger: Any) Any [source]
Checks if file exists
- Parameters:
filename (str) – The file that will be searched
logger (logging.basicConfig) – Output logger
- Returns:
True if file exists, False otherwise
- Return type:
bool
- spare_scores.util.convert_to_number_if_possible(string: str) float | str [source]
Converts the the input string to a float if possible
- Parameters:
string (str) – the input string
- Returns:
float if the string is numeric, the same string if it’s not
- Return type:
float or str
- spare_scores.util.is_unique_identifier(df: DataFrame, column_names: list) bool [source]
Checks if the passed dataframe is a unique identifier
- Parameters:
df (pandas.DataFrame) – The passed dataframe
column_names (list) – The passed column names
- Returns:
True if the passed data frame is a unique identifier False otherwise
- Return type:
bool
- spare_scores.util.load_df(df: DataFrame | str) DataFrame [source]
Fast loader for dataframes
- Parameters:
df (Union[pd.DataFrame, str]) – Either pd.DataFrame or path to the .csv file
- Returns:
The dataframe
- Return type:
pd.DataFrame
- spare_scores.util.load_examples(file_name: str = '') Any [source]
Loads example data and models in the package.
- Parameters:
file_name – either name of the example data saved as .csv or
name of the SPARE model saved as .pkl.gz. :type file_name: str
- Returns:
the resulted dataframe
- Return type:
None or pandas.DataFrame
- spare_scores.util.load_model(mdl_path: str) Any [source]
Loads the model from the passed path
- Parameters:
mdl_path (str) – the path to the weights of the model
- spare_scores.util.save_file(result: Any, output: str, action: str, logger: Any) None [source]
Saves the results in a file depending the action
- Parameters:
result (Either .csv or pandas.DataFrame depending on the action) – The results that will be dumped into the file
output (str) – The output filename
action (str) – Either ‘train’ or ‘test’ depending on the action
logger (logging.basicConfig) – Output logger