ML Models
This document explains the inner workings of our 2 ML models.
MLP models
We use a total of 2 MLP models. One is implemented with sklearn and the other with torch.
MLP with Sklearn
The MLP model implemented with sklearn can be found at mlp.py. The MLP model class takes a total of 4 arguments, plus, extra parameters if needed.
def __init__(
self,
predictors: list,
to_predict: str,
key_var: str,
verbose: int = 1,
**kwargs: Any,
) -> None
The usage of each parameter is explained at usage.rst. Valid extra pamaters are “k”, “n_repeats”, “task” and “param_grid”. We perform thorough checking to always make sure that the parameters are valid and to ensure valid training and testing.
Note
If you notice any bugs with training and testing, please report it with an issue.
If the extra parameters are not passed, we use our default ones.
For optimization, we use sklearn’s GridSearchCV
. As we explained, you can specify the param grid,
otherwise we use our default. As metrics, we use AUC
, Accuracy
, Sensitivity
, Specificity
,
Precision
, Recall
and F1
for classification and MAE
, RMSE
and R2
for regression. Also note that by default, the model performs regression,
so always make sure that you specified the task if you want to perform classification.
You can always get all the stats with the get_stats
function and print them with output_stats
.
MLP with torch
The MLP model implemented with torch can be found at mlp_torch.py. The torch MLP model class takes a total of 4 arguments, plus, extra parameters if needed.
def __init__(
self,
predictors: list,
to_predict: str,
key_var: str,
verbose: int = 1,
**kwargs: Any,
) -> None
As you can see, it’s the same parameters as the sklearn implementation. Valid extra parameters
are “task”, “bs”(batch size) and “num_epochs”. We use the same metrics as the sklearn model, but now,
we perform optimization with optuna instead of GridSearchCV. Each fit creates an optuna study that tries
to maximize the values if the task is classification, otherwise, trying to minimize the error if the task is
regression.
You can always get all the stats with the get_stats
function and print them with output_stats
.
Note that the MLP implementation exists at a different class in the same file with name SimpleMLP
.
Also, we implemented a class to manage the data we pass to our MLP model. This class also exists in the
same file with name MLPDataset
.
SVM model
The SVM model can be found at svm.py. The SVL model class takes a total of 4 arguments, plus, extra parameters if needed.
def __init__(
self,
predictors: list,
to_predict: str,
key_var: str,
verbose: int = 1,
**kwargs: Any,
) -> None
As you can see again, it’s the same parameters as the other models. Valid extra parameters are
“kernel”, “k”, “n_repeats”, “task” and “param_grid”. If the task is classification and the kernel is
linear, we use LinearSVC
, if the kernel is not linear(i.e. rbf) we use SVC
. If the task is regression we use
LinearSVR
with squared epsilon insensitive
as a loss function. As metrics, we use AUC
, Accuracy
,
Sensitivity
, Specificity
, Precision
, Recall
and F1
for classification and MAE
, RMSE
and R2
for regression.
You can always get all the stats with the get_stats
function and print them with output_stats
.