Cancer Imaging Phenomics Toolkit (CaPTk)  1.8.1
Feature Extraction

# Pre-Existing Parameter Configurations

The Feature Extraction module ships with a set of pre-configured parameter files for 3D and 2D images (there are different extraction paradigms). The idea behind providing these configurations for users to have a template to start with and then customize the feature extraction based on their dataset(s).

During invocation from the CLI, the appropriate file should be passed after the -p parameter.

See the subsequent sections for details.

## Default 3D Image Feature Extraction Parameters

This is primarily targeted at brain tumor image feature extraction and users should customize it for their own dataset.

## Customizing Parameters

The user will need to pass a custom parameter file to Feature Extraction. For details on creating such a file, please see https://github.com/CBICA/CaPTk/tree/master/src/applications/FeatureExtraction#the-parameter-file.

An example of the list file for batch processing can be found in {CaPTk_InstallDir}/share/featureExtractionBatch/batch_featureExtraction.csv or https://github.com/CBICA/CaPTk/blob/master/src/applications/FeatureExtraction/data/batchMode/batch_featureExtraction.csv. # Extracted Features  Feature Family Specific Features Parameter Name Range Default Description, Formula and Comments Intensity Features (First-Order Statistics) Minimum Maximum Mean Standard Deviation Variance Skewness Kurtosis N.A. N.A. N.A. Minimum Intensity = $$Min (I_{k}).$$ where $$I_{k}$$ is the intensity of pixel or voxel at index k. Maximum Intensity = $$Max (I_{k}).$$ where $$I_{k}$$ is the intensity of pixel or voxel at index k. Mean= $$\frac{\sum(X_{i})}{N}$$ where N is the number of voxels/pixels. Standard Deviation = $$\sqrt{\frac{\sum(X-\mu)^{2}}{N}}$$ where $$\mu$$ is the mean of the data. Variance = $$\frac{\sum(X-\mu)^{2}}{N}$$ where $$\mu$$ is the mean intensity. Skewness = $$\frac{\sum_{i=1}^{N}(X_{i} - \bar{X})^{3}/N} {s^{3}}$$ where $$\bar{X}$$ is the mean, s is the standard deviation and N is the number of pixels/voxels. Kurtosis = $$\frac{\sum_{i=1}^{N}(X_{i} - \bar{X})^{4}/N}{s^{4}}$$ where $$\bar{X}$$ is the mean, s is the standard deviation and N is the number of pixels/voxels. All features in this family are extracted from the raw intensities. Histogram -based Bin Frequency Bins N.A. 10 Uses number of bins as input and the number of pixels in each bin would be the output. All features in this family are extracted from the discretized intensities. Volumetric Volume/Area Dimensions Axis 2D:3D x,y,z 3D z Volume/Area (depending on image dimension) and number of voxels/pixels in the ROI. Morphologic Elongation Perimeter Roundness Eccentricity Ellipse Diameter Equivalent Spherical Radius Dimensions Axis 2D:3D x,y,z 3D z Elongation = $$\sqrt{\frac{i_{2}}{i_{1}}}$$ where $$i_{n}$$ are the second moments of particle around its principal axes. Perimeter = $$2 \pi r$$ where r is the radius of the circle enclosing the shape. Roundness = $$As/Ac = (Area of a shape)/(Area of circle)$$ where circle has the same perimeter. Eccentricity = $$\sqrt{1 - \frac{a*b}{c^{2}}}$$ where c is the longest semi-principal axis of an ellipsoid fitted on an ROI, and a and b are the 2nd and 3rd longest semi-principal axes of the ellipsoid. Edge Enhancing Index Normalizing factor $$\eta$$ 5 The edge-enhancing index of an image I is defined as: E(I) = $$(\frac{\lambda_{1}-\lambda_{2}}{\lambda_{1}+\lambda_{2}+\eta})^2$$ where $$\lambda_{1}$$ and $$\lambda_{2} (\lambda_{1}>\lambda_{2})$$ are eigenvalues of the diffusion tension matrix of Image (I) and $$\eta$$ is a normalizing factor. Available only on 2D. Local Binary Pattern (LBP) Radius Neighborhood 1 2:4:8 1 8 The pixel-wise LBP codes are computed using N number of neighbors on a circle of radius R around each pixel and using a rotation invariant implementation. The output value corresponds to the mean of the LBP map. Grey Level Co-occurrence Matrix (GLCM) Energy (Angular Second Moment) Contrast (Inertia) Joint Entropy Homogeneity (Inverse Difference Moment) Correlation Variance SumAverage Auto Correlation Bins Radius Dimensions Offset Axis N.A. N.A. 2D:3D Individual/Average/Combined x,y,z 10 13 2 3D Average z For a given image, a Grey Level Co-occurrence Matrix is created and $$g(i,j)$$ represents an element in matrix Energy = $$\sum_{i,j}g(i, j)^2$$ Contrast = $$\sum_{i,j}(i - j)^2g(i, j)$$ Joint Entropy = $$-\sum_{i,j}g(i, j) \log_2 g(i, j)$$ Homogeneity = $$\sum_{i,j}\frac{1}{1 + (i - j)^2}g(i, j)$$ Correlation = $$\sum_{i,j}\frac{(i - \mu)(j - \mu)g(i, j)}{\sigma^2}$$ Sum Average = $$\sum_{i,j}i \cdot g(i, j) = \sum_{i,j}j \cdot g(i, j)$$(due to matrix symmetry) Variance = $$\sum_{i,j}(i - \mu)^2 \cdot g(i, j) = \sum_{i,j}(j - \mu)^2 \cdot g(i, j)$$ (due to matrix symmetry) AutoCorrelation = $$\frac{\sum_{i,j}(i, j) g(i, j)-\mu_t^2}{\sigma_t^2}$$ where $$\mu_t$$ and $$\sigma_t$$ are the mean and standard deviation of the row (or column, due to symmetry) sums. All features are estimated within the ROI in an image, considering 26-connected neighboring voxels in the 3D volume. Note that the creation of the GLCM and its corresponding aforementioned features for all offsets are calculated using an existing ITK filter. The Individual option gives features for each individual offset, Average estimates the average across all offsets and assigns a single value for each feature and Combined combines the GLCM matrices generated across offsets and calculates a single set of features from this matrix. Grey Level Run-Length Matrix (GLRLM) SRE LRE GLN RLN LGRE HGRE SRLGE SRHGE LRLGE LRHGE Bins Radius Dimensions Axis Offset N.A. N.A. 2D:3D x,y,z Individual/Average/Combined 10 13 2 3D z Average 1 For a given image, a run-length matrix $$P(i; j)$$ is defined as the number of runs with pixels of gray level i and run length j. Please note that some features are only extracted in DebugMode (by using the "-d" parameter from the command line); these defines features that are mathematically formulated in previous published material but not completely aligned with The Image Biomarker Standardisation Initiative. [COMPLETE MODE] Short Run Emphasis (SRE) = $$\frac{1}{n_r}\sum_{i,j}^{N}\frac{p(i,j)}{j^2}$$ [COMPLETE MODE] Long Run Emphasis (LRE) = $$\frac{1}{n_r}\sum_{j}^{N}p(i,j) \cdot j^2$$ [COMPLETE MODE] Grey Level Non-uniformity (GLN) = $$\frac{1}{n_r}\sum_{i}^{M}\Big(\sum_{j}^{N}p(i,j) \Big)^2$$ [COMPLETE MODE] Run Length Non-uniformity (RLN) = $$\frac{1}{n_r}\sum_{j}^{N}\Big(\sum_{i}^{M}p(i,j) \Big)^2$$ Low Grey-Level Run Emphasis (LGRE)= $$\frac{1}{n_r}\sum_{i}^{M}\frac{p_g(i)}{i^2}$$ High Grey-Level Run Emphasis (HGRE)= $$\frac{1}{n_r}\sum_{i}^{M}p_g(i) \cdot i^2$$ Short Run Low Grey-Level Emphasis (SRLGE)= $$\frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}\frac{p(i,j)}{i^2 \cdot j^2}$$ Short Run High Grey-Level Emphasis (SRLGE) = $$\frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}\frac{p(i,j) \cdot i^2 }{j^2}$$ [COMPLETE MODE] Long Run Low Grey-Level Emphasis (LRLGE) = $$\frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}\frac{p(i,j) \cdot j^2 }{i^2}$$ [COMPLETE MODE] Long Run High Grey-Level Emphasis (LRHGE) = $$\frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}p(i,j) \cdot i^2 \cdot j^2$$ All features are estimated within the ROI in an image, considering 26-connected neighboring voxels in the 3D volume. Note that the creation of the GLRLM and its corresponding aforementioned features for all offsets are calculated using an existing ITK filter. The Individual option gives features for each individual offset, Average estimates the average across all offsets and assigns a single value for each feature and Combined combines the GLRLM matrices generated across offsets and calculates a single set of features from this matrix. Neighborhood Grey-Tone Difference Matrix (NGTDM) Coarseness Contrast Busyness Complexity Strength Bins Dimensions Axis N.A. 2D:3D x,y,z 10 13 3D N.A. 1 Coarseness = $$\Big[ \epsilon + \sum_{i=0}^{G_{k}} p_{i}s(i) \Big]$$ Contrast = $$\Big[\frac{1}{N_{s}(N_{s}-1)}\sum_{i}^{G_{k}}\sum_{j}^{G_{k}}p_{i}p_{j}(i-j)^2\Big]\Big[\frac{1}{n^2}\sum_{i}^{G_{k}}s(i)\Big]$$ Busyness = $$\Big[\sum_{i}^{G_{k}}p_{i}s(i)\Big]\Big/ \Big[\sum_{i}^{G_{k}}\sum_{j}^{G_{k}}i p_{i} - j p_{j}\Big]$$ Complexity = $$\sum_{i}^{G_{k}}\sum_{j}^{G_{k}} \Big[ \frac{(|i-j|)}{(n^{2}(p_{i}+p_{j}))} \Big] \Big[ p_{i}s(i)+p_{j}s(j) \Big]$$ Strength = $$\Big[\sum_{i}^{G_{k}}\sum_{j}^{G_{k}}(p_{i}+p_{j})(i-j)^{2}\Big]/\Big[\epsilon + \sum_{i}^{G_{k}} s(i)\Big]$$ Where $$p_{i}$$ is the probability of occurrence of a voxel of intensity i and $$s(i)$$ represents the NGTDM value of intensity i calculated as: $$\sum │i - Ai│$$. Ai indicates the average intensity of the surrounding voxels without including the central voxel. Grey Level Size-Zone Matrix (GLSZM) SZE LZE GLN ZSN ZP LGZE HGZE SZLGE SZHGE LZLGE LZHGE GLV ZLV Bins Radius Dimensions Axis N.A. N.A. 2D:3D x,y,z 10 13 2 3D z 4 For a given image, a run-length matrix $$P(i; j)$$ is defined as the number of runs with pixels of gray level i and run length j. Small Zone Emphasis (SZE) = $$\frac{1}{n_r}\sum_{i,j}^{N}\frac{p(i,j)}{j^2}$$ Large Zone Emphasis(LZE) = $$\frac{1}{n_r}\sum_{j}^{N}p(i,j) \cdot j^2$$ Gray-Level Non-uniformity (GLN) = $$\frac{1}{n_r}\sum_{i}^{M}\Big(\sum_{j}^{N}p(i,j) \Big)^2$$ Zone-Size Non-uniformity (ZSN) = $$\frac{1}{n_r}\sum_{j}^{N}\Big(\sum_{i}^{M}p(i,j) \Big)^2$$ Zone Percentage (ZP) = $$\frac{n_{r}}{n_p}$$ where $$n_r$$ is the total number of runs and $$n_p$$ is the number of pixels in the image. Low Grey-Level Zone Emphasis (LGZE)= $$\frac{1}{n_r}\sum_{i}^{M}\frac{p_g(i)}{i^2}$$ High Grey-Level Zone Emphasis (HGZE)= $$\frac{1}{n_r}\sum_{i}^{M}p_g(i) \cdot i^2$$ Short Zone Low Grey-Level Emphasis (SZLGE)= $$\frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}\frac{p(i,j)}{i^2 \cdot j^2}$$ Short Zone High Grey-Level Emphasis (SZLGE) = $$\frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}\frac{p(i,j) \cdot i^2 }{j^2}$$ Long Zone Low Grey-Level Emphasis (LZLGE) = $$\frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}\frac{p(i,j) \cdot j^2 }{i^2}$$ Long Zone High Grey-Level Emphasis (LZHGE) = $$\frac{1}{n_r}\sum_{i}^{M}\sum_{j}^{N}p(i,j) \cdot i^2 \cdot j^2$$ All features are estimated within the ROI in an image, considering 26-connected neighboring voxels in the 3D volume. Gabor Wavelets Mean Standard deviation Variance Maximum Sum Radius Direction Level Gamma Fmax N.A. 1 N.A. 4 $$\sqrt{2}$$ 0.25 For a given image, gabor filters are created for the number of directions specificed, with maximum sinusoid frequency determined by F_max. The $$\gamma$$ parameter determines the spatial aspect ratio, and the levels parameter determines the number of levels to perform wavelet decomposition. Features are extracted for each resulting image generated for each direction and level. Power Spectrum Beta Center N.A. center of image A discrete fourier transform is applied to each image to decompose the information in the image into signals with varying frequencies. The power spectrum is computed for each signal. Plotting the relation between spatial frequency and average power, the slope of a line fit to this relation is calculated. Lattice -based Selected features Feature Maps FullImage Window Step Boundary PatchBoundary 0-1 (mm)0:ImageSize (mm)0:ImageSize NoPadding:ZeroPadding Full:ROI:None 0 6.3 6.3 NoPadding Full When activated, this option performs feature calculation on multiple local square (for 2D images) or cubic (for 3D images) regions defined by a lattice virtually overlaid on the image. Please see{CaPTk_InstallDir}/data/features/2_params_default_lattice.csv for detailed descriptions.

Detailed explanation of the mathematics and other details can be found in the Image Biomarker Standardization Initiative (IBSI) page.

The parameterization of the lattice-based strategy for feature extraction is defined by:

• The grid spacing representing the distance between consecutive lattice points (Default: 6.3mm).
• The size of the local region centered at each lattice point (Default: 6.3mm).