Cancer Imaging Phenomics Toolkit (CaPTk)  1.9.0
Feature Extraction

Pre-Existing Parameter Configurations

The Feature Extraction module ships with a set of pre-configured parameter files for 3D and 2D images (there are different extraction paradigms). The idea behind providing these configurations for users to have a template to start with and then customize the feature extraction based on their dataset(s).

During invocation from the CLI, the appropriate file should be passed after the -p parameter.

See the subsequent sections for details.

Default 3D Image Feature Extraction Parameters

This is primarily targeted at brain tumor image feature extraction and users should customize it for their own dataset.

For a description of the parameters and related explanations, please see ${CaPTk_InstallDir}/data/features/1_params_default.csv or https://github.com/CBICA/CaPTk/blob/master/src/applications/FeatureExtraction/data/1_params_default.csv.

Default 2D Image Feature Extraction Parameters

This are primarily targeted at mammogram image feature extraction and users should customize it for their own dataset.

For a description of the parameters and related explanations, please see ${CaPTk_InstallDir}/data/features/2_params_default_lattice.csv or https://github.com/CBICA/CaPTk/blob/master/src/applications/FeatureExtraction/data/2_params_default_lattice.csv.

Customizing Parameters

The user will need to pass a custom parameter file to Feature Extraction. For details on creating such a file, please see https://github.com/CBICA/CaPTk/tree/master/src/applications/FeatureExtraction#the-parameter-file.

Batch Mode

An example of the list file for batch processing can be found in ${CaPTk_InstallDir}/share/featureExtractionBatch/batch_featureExtraction.csv or https://github.com/CBICA/CaPTk/blob/master/src/applications/FeatureExtraction/data/batchMode/batch_featureExtraction.csv.

Extracted Features

Feature Family Specific
Features
Parameter
Name
Range Default Description, Formula and Comments
Intensity Features
(First-Order Statistics)
  • Minimum
  • Maximum
  • Mean
  • Standard Deviation
  • Variance
  • Skewness
  • Kurtosis
N.A. N.A. N.A.
  • Minimum Intensity = \( Min (I_{k}). \) where \( I_{k} \) is the intensity of pixel or voxel at index k.
  • Maximum Intensity = \( Max (I_{k}). \) where \( I_{k} \) is the intensity of pixel or voxel at index k.
  • Mean= \( \frac{\sum(X_{i})}{N} \) where N is the number of voxels/pixels.
  • Standard Deviation = \( \sqrt{\frac{\sum(X-\mu)^{2}}{N}}\) where \(\mu\) is the mean of the data.
  • Variance = \( \frac{\sum(X-\mu)^{2}}{N} \) where \(\mu\) is the mean intensity.
  • Skewness = \( \frac{\sum_{i=1}^{N}(X_{i} - \bar{X})^{3}/N} {s^{3}} \) where \(\bar{X}\) is the mean, s is the standard deviation and N is the number of pixels/voxels.
  • Kurtosis = \( \frac{\sum_{i=1}^{N}(X_{i} - \bar{X})^{4}/N}{s^{4}} \) where \(\bar{X}\) is the mean, s is the standard deviation and N is the number of pixels/voxels.

All features in this family are extracted from the raw intensities.
Histogram
-based
  • Bin Frequency
Bins N.A. 10
  • Uses number of bins as input and the number of pixels in each bin would be the output.
All features in this family are extracted from the discretized intensities.
Volumetric
  • Volume/Area
Dimensions
Axis
2D:3D
x,y,z
3D
z
  • Volume/Area (depending on image dimension) and number of voxels/pixels in the ROI.
Morphologic
  • Elongation
  • Perimeter
  • Roundness
  • Eccentricity
  • Ellipse Diameter
  • Equivalent Spherical Radius
Dimensions
Axis
2D:3D
x,y,z
3D
z
  • Elongation = \( \sqrt{\frac{i_{2}}{i_{1}}} \) where \(i_{n}\) are the second moments of particle around its principal axes.
  • Perimeter = \( 2 \pi r \) where r is the radius of the circle enclosing the shape.
  • Roundness = \( As/Ac = (Area of a shape)/(Area of circle) \) where circle has the same perimeter.
  • Eccentricity = \( \sqrt{1 - \frac{a*b}{c^{2}}} \) where c is the longest semi-principal axis of an ellipsoid fitted on an ROI, and a and b are the 2nd and 3rd longest semi-principal axes of the ellipsoid.
Edge
Enhancing
Index
Normalizing factor \( \eta \) 5 The edge-enhancing index of an image I is defined as:
  • E(I) = \( (\frac{\lambda_{1}-\lambda_{2}}{\lambda_{1}+\lambda_{2}+\eta})^2 \)
where \( \lambda_{1} \) and \( \lambda_{2} (\lambda_{1}>\lambda_{2}) \) are eigenvalues of the diffusion tension matrix of Image (I) and \( \eta \) is a normalizing factor. Available only on 2D.
Local Binary
Pattern (LBP)
Radius
Neighborhood
1
2:4:8
1
8
The pixel-wise LBP codes are computed using N number of neighbors on a circle of radius R around each pixel and using a rotation invariant implementation. The output value corresponds to the mean of the LBP map.
Grey Level
Co-occurrence
Matrix
(GLCM)
  • Energy (Angular Second Moment)
  • Contrast (Inertia)
  • Joint Entropy
  • Homogeneity (Inverse Difference Moment)
  • Correlation
  • Variance
  • SumAverage
  • Auto
    Correlation
Bins

Radius

Dimensions

Offset

Axis
N.A.

N.A.

2D:3D

Individual/Average/Combined

x,y,z
10

13

2

3D

Average

z
For a given image, a Grey Level Co-occurrence Matrix is created and \( g(i,j) \) represents an element in matrix
  • Energy = \( \sum_{i,j}g(i, j)^2 \)
  • Contrast = \( \sum_{i,j}(i - j)^2g(i, j) \)
  • Joint Entropy = \( -\sum_{i,j}g(i, j) \log_2 g(i, j) \)
  • Homogeneity = \( \sum_{i,j}\frac{1}{1 + (i - j)^2}g(i, j) \)
  • Correlation = \( \sum_{i,j}\frac{(i - \mu)(j - \mu)g(i, j)}{\sigma^2} \)
  • Sum Average = \( \sum_{i,j}i \cdot g(i, j) = \sum_{i,j}j \cdot g(i, j)\)(due to matrix symmetry)
  • Variance = \( \sum_{i,j}(i - \mu)^2 \cdot g(i, j) = \sum_{i,j}(j - \mu)^2 \cdot g(i, j)\) (due to matrix symmetry)
  • AutoCorrelation = \(\frac{\sum_{i,j}(i, j) g(i, j)-\mu_t^2}{\sigma_t^2}\) where \(\mu_t\) and \(\sigma_t\) are the mean and standard deviation of the row (or column, due to symmetry) sums.
All features are estimated within the ROI in an image, considering 26-connected neighboring voxels in the 3D volume. Note that the creation of the GLCM and its corresponding aforementioned features for all offsets are calculated using an existing ITK filter. The Individual option gives features for each individual offset, Average estimates the average across all offsets and assigns a single value for each feature and Combined combines the GLCM matrices generated across offsets and calculates a single set of features from this matrix.
Grey Level
Run-Length
Matrix
(GLRLM)
  • SRE
  • LRE
  • GLN
  • RLN
  • LGRE
  • HGRE
  • SRLGE
  • SRHGE
  • LRLGE
  • LRHGE
Bins

Radius

Dimensions

Axis

Offset
N.A.

N.A.

2D:3D

x,y,z

Individual/Average/Combined
10

13

2

3D

z

Average

1
For a given image, a run-length matrix \( P(i; j)\) is defined as the number of runs with pixels of gray level i and run length j. Please note that some features are only extracted in DebugMode (by using the "-d" parameter from the command line); these defines features that are mathematically formulated in previous published material but not completely aligned with The Image Biomarker Standardisation Initiative.
  • [COMPLETE MODE] Short Run Emphasis (SRE) = \( \frac{1}{n_r}\sum_{i,j}^{N}\frac{p(i,j)}{j^2} \)
  • [COMPLETE MODE] Long Run Emphasis (LRE) = \( \frac{1}{n_r}\sum_{j}^{N}p(i,j) \cdot j^2\)
  • [COMPLETE MODE] Grey Level Non-uniformity (GLN) = \( \frac{1}{n_r}\sum_{i}^{M}\Big(\sum_{j}^{N}p(i,j) \Big)^2 \)
  • [COMPLETE MODE] Run Length Non-uniformity (RLN) = \( \frac{1}{n_r}\sum_{j}^{N}\Big(\sum_{i}^{M}p(i,j) \Big)^2 \)
  • Low Grey-Level Run Emphasis (LGRE)= \( \frac{1}{n_r}\sum_{i}^{M}\frac{p_g(i)}{i^2} \)
  • High Grey-Level Run Emphasis (HGRE)= \( \frac{1}{n_r}\sum_{i}^{M}p_g(i) \cdot i^2 \)
  • Short Run Low Grey-Level Emphasis (SRLGE)= \(\frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}\frac{p(i,j)}{i^2 \cdot j^2} \)
  • Short Run High Grey-Level Emphasis (SRLGE) = \( \frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}\frac{p(i,j) \cdot i^2 }{j^2}\)
  • [COMPLETE MODE] Long Run Low Grey-Level Emphasis (LRLGE) = \( \frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}\frac{p(i,j) \cdot j^2 }{i^2} \)
  • [COMPLETE MODE] Long Run High Grey-Level Emphasis (LRHGE) = \( \frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}p(i,j) \cdot i^2 \cdot j^2 \)
All features are estimated within the ROI in an image, considering 26-connected neighboring voxels in the 3D volume. Note that the creation of the GLRLM and its corresponding aforementioned features for all offsets are calculated using an existing ITK filter. The Individual option gives features for each individual offset, Average estimates the average across all offsets and assigns a single value for each feature and Combined combines the GLRLM matrices generated across offsets and calculates a single set of features from this matrix.
Neighborhood
Grey-Tone
Difference
Matrix
(NGTDM)
  • Coarseness
  • Contrast
  • Busyness
  • Complexity
  • Strength
Bins

Dimensions

Axis
N.A.

2D:3D

x,y,z
10

13

3D

N.A.

1
  • Coarseness = \( \Big[ \epsilon + \sum_{i=0}^{G_{k}} p_{i}s(i) \Big]\)
  • Contrast = \( \Big[\frac{1}{N_{s}(N_{s}-1)}\sum_{i}^{G_{k}}\sum_{j}^{G_{k}}p_{i}p_{j}(i-j)^2\Big]\Big[\frac{1}{n^2}\sum_{i}^{G_{k}}s(i)\Big] \)
  • Busyness = \( \Big[\sum_{i}^{G_{k}}p_{i}s(i)\Big]\Big/ \Big[\sum_{i}^{G_{k}}\sum_{j}^{G_{k}}i p_{i} - j p_{j}\Big] \)
  • Complexity = \( \sum_{i}^{G_{k}}\sum_{j}^{G_{k}} \Big[ \frac{(|i-j|)}{(n^{2}(p_{i}+p_{j}))} \Big] \Big[ p_{i}s(i)+p_{j}s(j) \Big]\)
  • Strength = \( \Big[\sum_{i}^{G_{k}}\sum_{j}^{G_{k}}(p_{i}+p_{j})(i-j)^{2}\Big]/\Big[\epsilon + \sum_{i}^{G_{k}} s(i)\Big]\)
Where \(p_{i}\) is the probability of occurrence of a voxel of intensity i and \(s(i)\) represents the NGTDM value of intensity i calculated as: \( \sum │i - Ai│\). Ai indicates the average intensity of the surrounding voxels without including the central voxel.
Grey Level
Size-Zone
Matrix
(GLSZM)
  • SZE
  • LZE
  • GLN
  • ZSN
  • ZP
  • LGZE
  • HGZE
  • SZLGE
  • SZHGE
  • LZLGE
  • LZHGE
  • GLV
  • ZLV
Bins

Radius

Dimensions

Axis
N.A.

N.A.

2D:3D

x,y,z
10

13

2

3D

z

4
For a given image, a run-length matrix \( P(i; j)\) is defined as the number of runs with pixels of gray level i and run length j.
  • Small Zone Emphasis (SZE) = \( \frac{1}{n_r}\sum_{i,j}^{N}\frac{p(i,j)}{j^2} \)
  • Large Zone Emphasis(LZE) = \( \frac{1}{n_r}\sum_{j}^{N}p(i,j) \cdot j^2\)
  • Gray-Level Non-uniformity (GLN) = \( \frac{1}{n_r}\sum_{i}^{M}\Big(\sum_{j}^{N}p(i,j) \Big)^2 \)
  • Zone-Size Non-uniformity (ZSN) = \( \frac{1}{n_r}\sum_{j}^{N}\Big(\sum_{i}^{M}p(i,j) \Big)^2 \)
  • Zone Percentage (ZP) = \( \frac{n_{r}}{n_p} \) where \( n_r \) is the total number of runs and \( n_p \) is the number of pixels in the image.
  • Low Grey-Level Zone Emphasis (LGZE)= \( \frac{1}{n_r}\sum_{i}^{M}\frac{p_g(i)}{i^2} \)
  • High Grey-Level Zone Emphasis (HGZE)= \( \frac{1}{n_r}\sum_{i}^{M}p_g(i) \cdot i^2 \)
  • Short Zone Low Grey-Level Emphasis (SZLGE)= \(\frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}\frac{p(i,j)}{i^2 \cdot j^2} \)
  • Short Zone High Grey-Level Emphasis (SZLGE) = \( \frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}\frac{p(i,j) \cdot i^2 }{j^2}\)
  • Long Zone Low Grey-Level Emphasis (LZLGE) = \( \frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}\frac{p(i,j) \cdot j^2 }{i^2} \)
  • Long Zone High Grey-Level Emphasis (LZHGE) = \( \frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}p(i,j) \cdot i^2 \cdot j^2 \)
All features are estimated within the ROI in an image, considering 26-connected neighboring voxels in the 3D volume.
Gabor Wavelets
  • Mean
  • Standard deviation
  • Variance
  • Maximum
  • Sum
Radius
Direction
Level
Gamma
Fmax
N.A. 1
N.A.
4
\( \sqrt{2} \)
0.25
For a given image, gabor filters are created for the number of directions specificed, with maximum sinusoid frequency determined by F_max. The \( \gamma \) parameter determines the spatial aspect ratio, and the levels parameter determines the number of levels to perform wavelet decomposition. Features are extracted for each resulting image generated for each direction and level.
Power Spectrum Beta Center N.A. center of image

A discrete fourier transform is applied to each image to decompose the information in the image into signals with varying frequencies. The power spectrum is computed for each signal. Plotting the relation between spatial frequency and average power, the slope of a line fit to this relation is calculated.

Lattice
-based
  • Selected features
  • Feature Maps
  • FullImage
  • Window
  • Step
  • Boundary
  • PatchBoundary
  • 0-1
  • (mm)0:ImageSize
  • (mm)0:ImageSize
  • NoPadding:ZeroPadding
  • Full:ROI:None
  • 0
  • 6.3
  • 6.3
  • NoPadding
  • Full
When activated, this option performs feature calculation on multiple local square (for 2D images) or cubic (for 3D images) regions defined by a lattice virtually overlaid on the image.
Please see ${CaPTk_InstallDir}/data/features/2_params_default_lattice.csv for detailed descriptions.

Detailed explanation of the mathematics and other details can be found in the Image Biomarker Standardization Initiative (IBSI) page.

The parameterization of the lattice-based strategy for feature extraction is defined by:

  • The grid spacing representing the distance between consecutive lattice points (Default: 6.3mm).
  • The size of the local region centered at each lattice point (Default: 6.3mm).

For documentation related COLLAGE features, please see https://collageradiomics.readthedocs.io/en/latest/#