Lung cancer (LC) is one of the most common cancers and the leading causes of cancer globally. Globally, there were an estimated 2.1 million lung cancer cases.
Information on a patient’s survival expectancy has many benefits. It allows both patients and caregivers to plan resources, time, and intensity of care to provide the best possible treatment path.
In a recent study by a team of Penn State Great Valley, scientists build a survival prediction model using deep learning techniques to tackle both cancer survival classification and regression problems. Their model could help doctors and healthcare workers making better treatment decisions for lung cancer patients.
During the study, the model showed more than 71% accuracy in predicting survival expectancy- far better than traditional machine learning models that showed a 61% accuracy rate.
Youakim Badr, associate professor of data analytics, said, “This is a high-performance system that is highly accurate and is aimed at helping doctors make these important decisions about providing care to their patients. Of course, this tool can’t be used as a substitute for a doctor in making decisions on lung cancer treatments.”
Robin G. Qiu, professor of information science and engineering and an affiliate of the Institute for Computational and Data Sciences, said, “The model can analyze a large amount of data — typically called features in machine learning — that describe the patients and the disease to understand how a combination of factors affect lung cancer survival periods. Features can include information such as types of cancer, size of tumors, the speed of tumor growth, and demographic data.”
For the study, scientists examined the Surveillance, Epidemiology, and End Results (SEER) program. The dataset offers early diagnosis information for cancer patients in the United States. The program’s cancer registries cover almost 35% of U.S. cancer patients.
Shreyesh Doppalapudi, a graduate-student research assistant and first author of the paper, said, “Deep learning architecture is better suited to processing such large, diverse datasets, such as the SEER program. Working on these types of datasets requires robust computational capacity. In this study, the researchers relied on ICDS’s Roar supercomputer.”
“With about 800,000 to 900,000 entries in the SEER dataset, manually finding these associations in the data with an entire team would be extremely difficult without assistance from machine learning.”
- Shreyesh Doppalapudi et al. Lung cancer survival period prediction and understanding: Deep learning approaches. DOI: 10.1016/j.ijmedinf.2020.104371