Cancer Imaging Phenomics Toolkit (CaPTk)
1.9.0
|
This application allows training and cross-validation of machine learning models, as well as inference functionality to generate predictions. Currently, it supports a variety of classifiers for binary classification tasks. It also provides several approaches for feature selection.
REQUIREMENTS: Input features file (CSV) and a target label file (CSV) (if training).
USAGE:parameterize
${CaPTk_InstallDir}/bin/TrainingModule -f C:/TestFeatures.csv -l C:/TestLabels.csv -o C:/OutputDirectory -t crossvalidate -c 1 -k 10
${CaPTk_InstallDir}/bin/TrainingModule -f C:/TestFeatures.csv -l C:/TestLabels.csv -o C:/OutputDirectory -t train -c 2 -s 5 -n 2
${CaPTk_InstallDir}/bin/TrainingModule -f C:/TestFeatures.csv -m C:/ModelDirectory/ -o C:/OutputDirectory -t test 3
c is the classifier type (-c 1 for Linear SVM, -c 2 for RBF SVM, -c 3 for Polynomial SVM, -c 4 for Sigmoid SVM, -c 5 Chi-squared SVM, -c 6 Intersection SVM, -c 7 Random Forest, -c 8 SGD SVM, -c 9 Boosted Trees ) s is the feature selection type (-s 1 for Effect-size FS, -s 2 for Forward FS, -s 3 for Recursive Feature Elimination, -s 4 for Random Forest based FS, -5 for RELIEF-F FS). t is the execution mode ('cv' or 'crossvalidate' for cross-validation, 'train' for model training only, 'test' for testing only) k is the # of folds for cross-validation configuration. x is the maximum number of features to select during feature selection. Up to that many features can be included. A value of 0 produces different behavior depending on the feature selection method used. For Forward FS, Effect-size FS, and Recursive Feature Elimination, produces the best set overall. For Random Forest FS and RELIEF-F FS, selects all features but in the order of importance.