
PERFORMANCE OF MACHINE LEARNING ALGORITHMS FOR LUNG CANCER PREDICTION: A COMPARATIVE STUDY
Abstract
This study compares the performance of five machine learning algorithms—logistic regression, support vector machines, random forests, gradient boosting, and neural networks—for lung cancer prediction using demographic, lifestyle, and medical data from the UCI Machine Learning Repository. Gradient boosting and random forests achieved the highest accuracy (89% and 87%, respectively) and AUC-ROC scores (0.93 and 0.92), while neural networks reached 90% accuracy but presented interpretability limitations. Key predictors included smoking history, chronic disease, and respiratory symptoms, aligning with established risk factors. Ensemble methods, particularly gradient boosting and random forests, provided an optimal balance of accuracy and interpretability, highlighting their potential for clinical applications in early lung cancer detection.
ZENODO DOI:- https://doi.org/10.5281/zenodo.14160193
Keywords
Lung cancer prediction, Machine learning algorithms, Comparative analysis
References
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.
Gómez-Ruiz, J. A., Stoean, C., & Braojos, R. (2019). A predictive model for lung cancer diagnosis based on ensemble learning techniques. Journal of Healthcare Engineering, 2019, 1–13.
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1), 389–422.
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons.
Jemal, A., Torre, L. A., Siegel, R. L., & Ward, E. M. (2020). Global patterns and trends in lung cancer incidence and mortality. CA: A Cancer Journal for Clinicians, 70(6), 458–471.
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., & Fotiadis, D. I. (2015). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, 13, 8–17.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 4765–4774).
Noble, W. S. (2006). What is a support vector machine? Nature Biotechnology, 24(12), 1565–1567.
Soneji, S., Tanner, N. T., Silvestri, G. A., & Black, W. (2018). Rethinking lung cancer screening. The New England Journal of Medicine, 378(22), 2030–2032.
Torre, L. A., Siegel, R. L., Ward, E. M., & Jemal, A. (2016). Global cancer incidence and mortality rates and trends—an update. Cancer Epidemiology Biomarkers & Prevention, 25(1), 16–27.
Wang, Y., Zhang, S., & Xia, J. (2021). A comparative study of machine learning algorithms for lung cancer prediction. Journal of Cancer Research and Clinical Oncology, 147(2), 505–516.
World Health Organization (WHO). (2023). Cancer. WHO
Shahid, R., Mozumder, M. A. S., Sweet, M. M. R., Hasan, M., Alam, M., Rahman, M. A., ... & Islam, M. R. (2024). Predicting Customer Loyalty in the Airline Industry: A Machine Learning Approach Integrating Sentiment Analysis and User Experience. International Journal on Computational Engineering, 1(2), 50-54.
Mozumder, M. A. S., Mahmud, F., Shak, M. S., Sultana, N., Rodrigues, G. N., Al Rafi, M., ... & Bhuiyan, M. S. M. (2024). Optimizing Customer Segmentation in the Banking Sector: A Comparative Analysis of Machine Learning Algorithms. Journal of Computer Science and Technology Studies, 6(4), 01-07.
Chowdhury, M. S., Shak, M. S., Devi, S., Miah, M. R., Al Mamun, A., Ahmed, E., ... & Mozumder, M. S. A. (2024). Optimizing E-Commerce Pricing Strategies: A Comparative Analysis of Machine Learning Models for Predicting Customer Satisfaction. The American Journal of Engineering and Technology, 6(09), 6-17.
Md Abu Sayed, Badruddowza, Md Shohail Uddin Sarker, Abdullah Al Mamun, Norun Nabi, Fuad Mahmud, Md Khorshed Alam, Md Tarek Hasan, Md Rashed Buiya, & Mashaeikh Zaman Md. Eftakhar Choudhury. (2024). COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR PREDICTING CYBERSECURITY ATTACK SUCCESS: A PERFORMANCE EVALUATION. The American Journal of Engineering and Technology, 6(09), 81–91. https://doi.org/10.37547/tajet/Volume06Issue09-10
Md Al-Imran, Salma Akter, Md Abu Sufian Mozumder, Rowsan Jahan Bhuiyan, Tauhedur Rahman, Md Jamil Ahmmed, Md Nazmul Hossain Mir, Md Amit Hasan, Ashim Chandra Das, & Md. Emran Hossen. (2024). EVALUATING MACHINE LEARNING ALGORITHMS FOR BREAST CANCER DETECTION: A STUDY ON ACCURACY AND PREDICTIVE PERFORMANCE. The American Journal of Engineering and Technology, 6(09), 22–33. https://doi.org/10.37547/tajet/Volume06Issue09-04
Md Murshid Reja Sweet, Md Parvez Ahmed, Md Abu Sufian Mozumder, Md Arif, Md Salim Chowdhury, Rowsan Jahan Bhuiyan, Tauhedur Rahman, Md Jamil Ahmmed, Estak Ahmed, & Md Atikul Islam Mamun. (2024). COMPARATIVE ANALYSIS OF MACHINE LEARNING TECHNIQUES FOR ACCURATE LUNG CANCER PREDICTION. The American Journal of Engineering and Technology, 6(09), 92–103. https://doi.org/10.37547/tajet/Volume06Issue09-11
Bahl, S., Kumar, P., & Agarwal, A. (2021). Sentiment analysis in banking services: A review of techniques and challenges. International Journal of Information Management, 57, 102317.
Ashim Chandra Das, Md Shahin Alam Mozumder, Md Amit Hasan, Maniruzzaman Bhuiyan, Md Rasibul Islam, Md Nur Hossain, Salma Akter, & Md Imdadul Alam. (2024). MACHINE LEARNING APPROACHES FOR DEMAND FORECASTING: THE IMPACT OF CUSTOMER SATISFACTION ON PREDICTION ACCURACY. The American Journal of Engineering and Technology, 6(10), 42–53. https://doi.org/10.37547/tajet/Volume06Issue10-06
Rowsan Jahan Bhuiyan, Salma Akter, Aftab Uddin, Md Shujan Shak, Md Rasibul Islam, S M Shadul Islam Rishad, Farzana Sultana, & Md. Hasan-Or-Rashid. (2024). SENTIMENT ANALYSIS OF CUSTOMER FEEDBACK IN THE BANKING SECTOR: A COMPARATIVE STUDY OF MACHINE LEARNING MODELS. The American Journal of Engineering and Technology, 6(10), 54–66. https://doi.org/10.37547/tajet/Volume06Issue10-07
C. Modak, M. A. Shahriyar, M. S. Taluckder, M. S. Haque and M. A. Sayed, "A Study of Lung Cancer Prediction Using Machine Learning Algorithms," 2023 3rd International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS), Yogyakarta, Indonesia, 2023, pp. 213-217, doi: 10.1109/ICE3IS59323.2023.10335237.
INNOVATIVE MACHINE LEARNING APPROACHES TO FOSTER FINANCIAL INCLUSION IN MICROFINANCE. (2024). International Interdisciplinary Business Economics Advancement Journal, 5(11), 6-20. https://doi.org/10.55640/business/volume05issue11-02
Md Al-Imran, Eftekhar Hossain Ayon, Md Rashedul Islam, Fuad Mahmud, Sharmin Akter, Md Khorshed Alam, Md Tarek Hasan, Sadia Afrin, Jannatul Ferdous Shorna, & Md Munna Aziz. (2024). TRANSFORMING BANKING SECURITY: THE ROLE OF DEEP LEARNING IN FRAUD DETECTION SYSTEMS. The American Journal of Engineering and Technology, 6(11), 20–32. https://doi.org/10.37547/tajet/Volume06Issue11-04
Article Statistics
Downloads
Copyright License
Copyright (c) 2024 Nur Hossain, Nafis Anjum, Murshida Alam, Md Habibur Rahman, Md Siam Taluckder, Md Nad Vi Al Bony, S M Shadul Islam Rishad, Afrin Hoque Jui

This work is licensed under a Creative Commons Attribution 4.0 International License.