Wednesday, March 23, 2016

My Submission For The Second National Data Science Bowl Compitition



This is my post for the preprocessing , segmentation and final evaluation pipeline for the second national data science bowl competition hosted at kaggle. While I placed 38th on the final leader-board, I think some of the methods I used are interesting enough to write a small blog. The preprocessing could have been greatly improved, with addition of more outliers handling, especially for the small dataset(train: 500 patients, valid: 200 patients, test: 440 patients). But my intention in this post is to highlight the way I used hyper-columns, extracted from VGG-16(state of the art CNN, basically 16 layers of the popular alternating Convolution and Max-Pooling architecture) and recurrent neural network (LSTM), which performed quite well. All work were done using keras (a deep learning framework).

The problem required segmentation of the left ventricle blood pool from all the time frames (a breath-hold or cardiac cycle consisting of 30 frames), for each slices(Short Axis View), obtained using MRI and stored as dicom images, i.e  data in both, time and spatial domain. Doing this facilitates calculation of end systolic volume and end diastolic volume, which in turn enables the calculation of ejection fraction, a parameter used by medical professionals to diagnose heart health. The specific of the data and problem description is well illustrated on the kaggle website. The evaluation metric is Continuous Ranked Probability Score (CRPS) for both end systolic and diastolic volume, i.e the submission requires 600 CRPS values representing 0 to 599 ml.

At the start of the competition, two tutorial style guidelines were also provided. From the Fourier based tutorial, a valuable method of locating the LV blood pool centroid(treats each slice as a separate 2D+T signal, applies fft to it and ifft back but only the first harmonics, and finally thresholded to reveal region of interest, create mask from this region and calculate centroid), was used in my initial preprocessing. This process removes all the static components of the image such as chest cavity. Basically I adapted this method, but only used the middle slice. The working example used Sunnybrook Cardiac dataset( external dataset with labeled LV blood pool, allowed as per contest rule). I also leveraged this external dataset while training my DNN.

Figure 1.  Hilbert Transformed Image

From the 250 labeled Sunnybrook dataset, I trained a VGG-16 DNN. The dicom images were converted to 128x128 jpeg, by  cropping, where the center was the centroid obtained from the labeled mask. The evaluation of the segmentation was then performed using the dice metric. Owing to the small dataset, the training was augmented by generating shift, in x and y axis and rotation of 0 to 10 degrees, in real time as shown in the snippets below (Note: in these cases the response variable also needs transformation similar to that applied to the predictor variables, hence I did not use the default augmentation routines, available in Keras).



After applying the Fourier based method on the middle slice of each patient, the images needed converting to a uniform spacing which was done by utilizing the pixel spacing, available in the dicom header, that varied from patient to patient. Then the images were cropped to 128x128 (In hindsight this should have been higher, perhaps 196x196, to make sure large diastole blood pools were within the boundaries of the image). From these images additional 200 samples from the apex slice, 200 from the base slice and 200 from middle slice were manually labeled and used to train another but similar VGG-16 model. The weights of this model was initialized from the weights of the VGG-16 model, trained on the Sunnybrooks dataset. This final model for segmentation, which was also augmented heavily used ADAM as optimizer. I was able to get a final dice metric of  0.104.
Remove Redundant components from image
Crop with centroid center



The output from the segmentation model when reshaped gives the mask of the region around LV blood pool. This mask served as a region proposal. While labeling endocardium (red), papillary muscles(blue), and epicardium (green), were labeled as LV blood pool. As my knowledge of LV blood pool goes, the region between endocardium and  epicardium does not form the LV blood pool. The region within endocardium is the blood pool, but my labeling included the epicardium wall as well. Since I needed the blood pool, wall and the papillary muscle separated i used simple Kmeans(number of cluster=2) on only the  predicted region from the segmentation using Hypercolumns extracted from Layer 1 and 3 of the VGG-16 model.
Manual label
epicardium(green) endocardium(red) papillary muscle(blue)
(Left) Segmented Region (Right) Two Cluster using hypercolumns on segmented region
In summary, hypercoulmns, a concept borrowed from cortical columns(real structure in brain), are outputs from the intermediate layers, such as VGG-16. The features in these layers are activated due to various edges and shapes and need examination to find which layers serves a particular purpose. Some excellent practical guide using theano functions can be be found in this blog-post. My results on the competition dataset below clearly shows the two cluster on the predicted output region, to further separate the LV blood pool from all other redundant components.
Segmentation and Hypercolumn Cluster on Apex Slice


5 cluster on image from basal slice
Two Cluster on image of Middle Slice
Two Cluster on image of Base Slice
The real difficulty in segmentation of the blood pool arises in the case of the Basal slice and the Apex slice, owing to the shape of the heart i.e blood pool extending to the mitral valve in Basal slices and small blood pool area in the Apex slice. I was primarily motivated to use hypercolumns to deal with these slices, which proved to be the most difficult.

Heart Left Ventricle (red)Hypercolumns cluster

With the use of clustered  images of hypercolumns, I wanted the final model to be able to learn the features from sequences of images going from the base slice to the apex slice. Since there were 30 time frame samples in each slice, I used the VGG-16 model , trained only on the Sunnybrooks data, to find the particular frame on the middle slice, using maximum and minimum area of blood pool, that coincided with end-diastole and end-systole. After this step, feature vectors were constructed for the final model, which required zero padding such that num samples( slices) were 18, since number of slices varied from 1 to 18 on the training data. The final layer was a Dense layer (600 neurons) with sigmoid activation. The end-systole and end-diastole volumes provided in the competition dataset was converted to CDFs, and so used to train this final model.

This input representation(18,1024) were used for training a recurrent neural network(LSTM). Recurrent Neural Network holds  a hidden state through its internal gating mechanisms. So the output is the product of the previous input and the the hidden state, and as we have slices that are spatially aligned from the base slice to the apex slice, recurrent neural network could capture relevant features from the slice sequences. It performed better than simple Fully Connected Dense layer or 1-D convolution. I also tried Time Distributed Dense layer but these converged fast but over-fitted highly. Stacking of LSTM did not improve the CRPS. While Dropout and additional augmentation again, for all slices, helped. In addition the final predicted output was also the result of the mean of the result from the transformed input sample, which improved the score. In addition, global histogram equalization was applied to the images and the clustered result added to the final training dataset. This also slightly helped the end CRPS score.

Finally the training set was partitioned into 2 sets with some overlap(278 samples each). For the final test set, the validation was the other set, totaling 3 sets. The final prediction on the test set was the result of the LSTM model trained independently on these 3 sets. Doing so got me better results than a single model trained on the whole derived dataset.



References:

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).

Hariharan, Bharath, et al. "Hypercolumns for object segmentation and fine-grained localization." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.


No comments:

Post a Comment