CISUC

An automatic mammogram system: from screening to diagnosis

Authors

Abstract

Breast cancer is the most common type of cancer in women worldwide, and the leading cause of death from cancer in women, especially those between 40 and 55 years of age. Screening mam- mography is performed in the asymptomatic population to detect early signs of breast cancer such as masses, calcifications, bilateral asymmetry and architectural distortion. Diagnostic mammog- raphy is performed on patients who have already demonstrated abnormal clinical findings. Both screening and diagnostic mammography are traditionally performed by radiologists who visually inspect mammograms. Manual inspection is a tiring and tedious task prone to human error. In this way, the search for Computer-Aided Detection and Diagnosis (CAD) techniques has been encour- aged. The present thesis describes an effort to develop image and machine learning methods to help radiologists in the analysis of mammogram images. Contributions were made in the different phases, including: (1) pre-processing, (2) screening, (3) detection of suspicious regions, (4) characterization of suspicious regions, and (5) classifica- tion. All the techniques were thoroughly evaluated using a database of full field digital mam- mogram images that, along with the images, contains meta-data information like breast density, BI-RADS (Breast Imaging Reporting And Data System) assessment and very accurate segmenta- tions of suspicious regions. This database is known as INbreast. Image manipulation can have a strong impact on the success of subsequent tasks. A typical pre-processing applied to mammogram images is the removal of the pectoral muscle region. Two methods for segmentation of the pectoral muscle are presented in this thesis, namely polar coordi- nates and the shortest path (SPPC) and shortest path with endpoints learnt by SVMs (SPLE). After pectoral muscle subtraction, the mammogram exam goes through the screening block. Since it has been observed that CAD performance depends on breast density, breasts are first classified as fatty or dense. Then, for each breast type, a specific classification block is designed to determine if the breast exam is suspicious. An extensive evaluation by testing a large set of features in combination with several classifiers was performed. The best density classification results were achieved with a k-Nearest Neighbours (kNN) classifier and using a feature vector consisting of statistic features extracted from both views. The best classifiers selected for the classification of images as suspicious or non-suspicious were also kNNs, but using Gabor features. Based on the outcome of the screening stage, non-suspicious patients return to the normal screening program advised by different countries while suspicious exams are sent to diagnosis. During this analysis it may be useful to direct the attention of the specialist to regions of the image that may be problematic. The two most common findings seen in mammogram images are calcifications and masses. Due to their different characteristics (size, intensity, shape, border contrast, etc.) different methods were used for each type of finding. An algorithm based on Bayesian surprise was developed for calcification detection, while an Iris filter followed by a closed contour segmentation method made in the original coordinate system was used to detect and segment masses. iii iv Calcifications and masses, when they exist, can be either benign or malign. BI-RADS de- scribes important factors for malignancy determination including the distribution and morphology for calcifications and margins, shape and density for masses. A review of features used in the literature was performed and a lack of consensus on the adequate set of automatic features for the characterization of these findings was found. In this way, a large portion of the existing features were evaluated on the INbreast database by using the Pearson correlation, distance correlation and the Maximal Information Coefficient. These metrics were used to select appropriate subsets of features. Based on the level of suspicion, lesions can be placed into one of seven BI-RADS categories: 0 when the exam is not conclusive, 1 for no findings, 2 for benign findings, 3 for probably benign findings, 4 for suspicious findings, 5 when there is a high probability of malignancy, and 6 for proven cancer. When more than one finding is present in the mammogram, the overall BI-RADS in the medical report corresponds to the finding with the highest BI-RADS. The typical learning settings described in the literature do not optimally represent this particular setting. In this way, a new learning paradigm is proposed, named max-ordinal learning (MOL), which sits in between supervised and semi-supervised classification. For every observation, some information about the label is available. However, in a subset of the examples, the knowledge is incomplete. This cor- responds to the worst-case classification of the individual views of the example. A formalization of the max-ordinal learning paradigm led to two new learning schemes, MOL.LA and MOL.CD. MOL.CD uses coordinate descent in the space of the models, while in MOL.LA, the focus is on the partitioning of training instances into the two subsets. The experimental evaluation showed that the methodologies developed give better results than traditional approaches. All the described techniques were thoroughly evaluated both independently (assuming all the previous step are correct) and in connection. It could be concluded that pectoral muscle detection, screening, mass segmentation, feature extraction and BI-RADS classification are ready to be used in practice. The calcification and mass detection algorithms, however, need to be improved in order to provide higher sensitivities with fewer false positives. The impact of the conducted research will be reflected in its ability to improve the quality of breast cancer detection, speeding up the time to output a diagnosis with the correspondent bene- ficial implications in treatment possibilities and psychological patient well being. The radiologist will also benefit from the fact that he can better use his time concentrating on more difficult cases.

Keywords

3D,BIRADS,CT,DDSM,Deep Learning,Evolutionary,Feature selection,GLCM,InBreast,Interpretable,Matlab,NeuralNetwork,PET,SIFT,VGG,breast,mammography,medicalReport,mimic,monotonicClassification,ordinal,shape,tri-training

PhD Thesis

An automatic mammogram system: from screening to diagnosis 2014

Cited by

No citations found