Survey of SMS Spam Detection Techniques: A Taxonomy
Keywords:
NLP, Deep learning, SMS spam detection, Machine learning, Text classificationAbstract
Short Message Service (SMS) spam remains a significant threat to users and businesses, with spammers constantly adopting more sophisticated techniques. This paper comprehensively surveys SMS spam detection methods, categorizing existing approaches into five primary groups: rule-based methods, traditional machine learning techniques, deep learning models, hybrid models, and ensemble methods. Each category is examined in detail, highlighting its strengths, limitations, and evolution. Rule-based methods, though historically significant, are limited by their inability to handle new or evolving spam tactics. Traditional machine learning techniques, such as Naive Bayes and support vector machines (SVM), offer improved accuracy but depend on handcrafted features. In contrast, deep learning models, including recurrent neural networks (RNN) and convolutional neural networks (CNN), excel in feature extraction and adaptability yet face challenges with model complexity and the need for large labeled datasets. Hybrid and ensemble methods combine the benefits of various models to improve performance, reduce bias, and enhance robustness. This review aims to provide a structured overview of the state of SMS spam detection, identify emerging trends, and suggest future research directions, including improving generalization, reducing data dependency, and exploring the integration of contextual information. The findings underscore the need for continued innovation to address the evolving landscape of SMS spam.
References
S. J. Delany, M. Buckley, and D. Greene, "SMS spam filtering: Methods and data," Expert Syst. Appl., vol. 39, no. 10, pp. 9899-9908, Aug. 2012.
S. M. R. A. Saidat, S. Y. Yerima, and K. Shaalan, "Advancements of SMS Spam Detection: A Comprehensive Survey of NLP and ML Techniques," Procedia Computer Science, vol. 244, pp. 248-259, 2023. doi: 10.1016/j.procs.2024.10.198.
A. Qazi, N. Hasan, R. Mao, M. Elhag Mohamed Abo, S. Kumar Dey and G. Hardaker, "Machine Learning-Based Opinion Spam Detection: A Systematic Literature Review," in IEEE Access, vol. 12, pp. 143485-143499, 2024, doi: 10.1109/ACCESS.2024.3399264.
K. Hanif and H. Ghous, "Detection of SMS Spam and Filtering by Using Data Mining Methods: Literature Review," Irjmets.com, vol. 1, pp. 874-886, 2021.
H. Sajedi, G. Z. Parast, and F. Akbari, "SMS Spam Filtering Using Machine Learning Techniques: A Survey," Machine Learning Research, vol. 1, no. 1, pp. 1-14, 2016.
S. Kaddoura, G. Chandrasekaran, D. E. Popescu, and J. H. Duraisamy, "A Systematic Literature Review on Spam Content Detection and Classification," PeerJ Computer Science, vol. 8, e830, 2022.
S. Ali, "SMS spam identification based on message duplication detection by cuckoo filters," J. Kerbala Univ., vol. 10, pp. 48-55, 2014.
H. A. Al-Kabbi, M. -R. Feizi-Derakhshi and S. Pashazadeh, "Multi-Type Feature Extraction and Early Fusion Framework for SMS Spam Detection," in IEEE Access, vol. 11, pp. 123756-123765, 2023, doi: 10.1109/ACCESS.2023.3327897.
J. W. Joo et al., "S-Detector: An enhanced security model for detecting smishing attack for mobile computing," Telecommun. Syst., vol. 66, pp. 29-38, 2017.
J. M. Gómez Hidalgo, G. Cajigas Bringas, E. Puertas Sánz, and F. Carrero García, "Content-based SMS spam filtering," in Proc. 2006 ACM Symp. Document Engineering (DocEng), 2006, pp. 107-114.
A. G. West, A. J. Aviv, J. Chang, and I. Lee, "Spam mitigation using spatiotemporal reputations from blocklist history," in Proc. 26th Annu. Comput. Security Appl. Conf., Dec. 2010, pp. 161-170.
T. Almeida and A. Yamakami, "Content-based SMS spam filtering," in Proc. 2010 Int. Symp. Multimedia and Security (ISMS), 2010, pp. 123-130.
A. N. Kamber, "A variety of electrocardiogram (ECG) signal processing," Solid State Technol., vol. 63, no. 3, pp. 5370-5377, 2020.
N. B. Hassan, A. K. Nawar, S. A. Jebur, and I. Tareq, "Internet of Things: Architecture, technologies, applications, and challenges," Alkadhim J. Comput. Sci., vol. 2, no. 1, 2024.
S. Gupta, S. D. Saha, and S. K. Das, "SMS spam detection using machine learning," J. Phys.: Conf. Ser., vol. 1797, no. 1, p. 012017, 2021.
D. D. Arifin and M. A. Bijaksana, "Enhancing spam detection on mobile phone Short Message Service (SMS) performance using FP-growth and Naive Bayes Classifier," in Proc. 2016 IEEE Asia Pacific Conf. Wireless Mobile (APWiMob), 2016, pp. 80-84.
Tekerek, Adem. "Support vector machine-based spam SMS detection." Politeknik Dergisi 22, no. 3 (2019): 779-784.
N. N. A. Sjarif, N. F. M. Azmi, S. Chuprat, H. M. Sarkan, Y. Yahya, and S. M. Sam, "SMS spam message detection using term frequency-inverse document frequency and random forest algorithm," Procedia Comput. Sci., vol. 161, pp. 509-515, 2019.
S. M. Hossain, K. M. Kamal, A. Sen, and I. H. Sarker, "TF-IDF feature-based spam filtering of mobile SMS using a machine learning approach," in Applied Intelligence for Industry 4.0, 2023, pp. 162-175.
G. Ubale and S. Gaikwad, "SMS Spam Detection Using TFIDF and Voting Classifier," 2022 International Mobile and Embedded Technology Conference (MECON), Noida, India, 2022, pp. 363-366, doi: 10.1109/MECON53876.2022.9752078.
O. Abayomi-Alli, S. Misra, A. Abayomi-Alli and M. Odusami, "A review of soft techniques for SMS spam classification: Methods approaches and applications," Eng. Appl. Artif. Intell., vol. 86, pp. 197-212, Nov. 2019.
W. H. Gomaa, "The impact of deep learning techniques on SMS spam filtering," Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 1, 2020.
V. V. Kalyani, M. R. Sundari, S. Neelima, P. S. S. Prasad, P. P. Mohan, and A. Lakshmanarao, "SMS Spam Detection using NLP and Deep Learning Recurrent Neural Network Variants," in Proc. 2024 Int. Conf. Cogn. Robot. Intell. Syst. (ICC-ROBINS), Apr. 2024, pp. 92-96.
C. L. Sri, D. D. Lakshmi, K. Ravali, V. Kukreja, and S. Hariharan, "Improved spam detection through LSTM-based approach," in Proc. 2024 Third Int. Conf. Intelligent Techniques Control, Optim. Signal Process. (INCOS), 2024, pp. 1-6.
A. L. Rosewelt, N. D. Raju, and S. Ganapathy, "An effective spam message detection model using feature engineering and bi-LSTM," in Proc. 2022 Int. Conf. Adv. Comput., Commun. Appl. Inform. (ACCAI), Jan. 2022, pp. 1-6.
T. Huang, "A CNN model for SMS spam detection," in Proc. 2019 4th Int. Conf. Mechanical, Control Comput. Eng. (ICMCCE), 2019, pp. 851-85110.
C. Oswald, S. E. Simon, and A. Bhattacharya, "Spotspam: Intention analysis–driven SMS spam detection using BERT embeddings," ACM Trans. Web (TWEB), vol. 16, no. 3, pp. 1-27, 2022.
H. A. Al-Kabbi, M.-R. Feizi-Derakhshi, and S. Pashazadeh, "A hierarchical two-level feature fusion approach for SMS spam filtering," Intell. Autom. Soft Comput., vol. 39, no. 4, 2024.
X. Liu, H. Lu, and A. Nayak, "A spam transformer model for SMS spam detection," IEEE Access, vol. 9, pp. 80253-80263, 2021.
A. Ghourabi and M. Alohaly, "Enhancing spam message classification and detection using transformer-based embedding and ensemble learning," Sensors, vol. 23, no. 8, p. 3861, 2023.
H. Baaqeel and R. Zagrouba, "Hybrid SMS spam filtering system using machine learning techniques," in Proc. 2020 21st Int. Arab Conf. Inf. Technol. (ACIT), 2020, pp. 1-8.
A. Ghourabi, M. A. Mahmood, and Q. M. Alzubi, "A hybrid CNN-LSTM model for SMS spam detection in Arabic and English messages," Future Internet, vol. 12, no. 9, p. 156, 2020.
M. R. F. Derakhshi, E. Zafarani-Moattar, H. A. Al-Kabi, and A. H. J. Almarashy, "PCLF: Parallel CNN-LSTM fusion model for SMS spam filtering," in BIO Web Conf., vol. 97, p. 00136, 2024.
E. Ramanujam, K. Shankar, and A. Sharma, "Multi-lingual spam SMS detection using a hybrid deep learning technique," in Proc. 2022 IEEE Silchar Subsection Conf. (SILCON), 2022, pp. 1-6.
V. Gupta, A. Mehta, A. Goel, U. Dixit, and A. C. Pandey, "Spam detection using ensemble learning," in Harmony Search and Nature Inspired Optimization Algorithms: Theory and Applications, ICHSA 2018, Springer Singapore, 2019, pp. 661-668.
R. Hajahan and P. L. Lekshmy, "Hybrid Learning Approach for E-mail Spam Detection and Classification," in Intelligent Cyber Physical Systems and Internet of Things. ICoICI 2022. Engineering Cyber-Physical Systems and Critical Infrastructures, vol. 3, J. Hemanth, D. Pelusi, and J. I. Z. Chen, Eds. Cham: Springer, 2023.
T. Xia, X. Chen, J. Wang, and F. Qiu, "A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts," Sensors, vol. 23, no. 21, p. 8975, 2023.
S. Hosseinpour and H. Shakibian, "An ensemble learning approach for SMS spam detection," in Proc. 2023 9th Int. Conf. Web Res. (ICWR), 2023, pp. 125-128.
A. Al Maruf, A. Al Numan, M. M. Haque, T. T. Jidney, and Z. Aung, "Ensemble approach to classifying spam SMS from Bengali text," in Proc. Int. Conf. Adv. Comput. Data Sci., Cham: Springer Nature Switzerland, Apr. 2023, pp. 440-453.
F. Akbari and H. Sajedi, "SMS spam detection using selected text features and boosting classifiers," in Proc. 2015 7th Conf. Inf. Knowl. Technol. (IKT), May 2015, pp. 1-5.
C. Ulus, Z. Wang, S. M. Iqbal, K. M. S. Khan, and X. Zhu, "Transfer Naïve Bayes Learning using Augmentation and Stacking for SMS Spam Detection," in Proc. 2022 IEEE Int. Conf. Knowl. Graph (ICKG), Nov. 2022, pp. 275-282.
W. Saeed, "Comparison of automated machine learning tools for SMS spam message filtering," in Advances in Cyber Security: Third Int. Conf. ACeS 2021, Penang, Malaysia, Aug. 24–25, 2021, Revised Selected Papers 3, Singapore: Springer, 2021, pp. 307-316.
G. Airlangga, "Optimizing SMS spam detection using machine learning: A comparative analysis of ensemble and traditional classifiers," J. Comput. Networks, Archit. High Perform. Comput., vol. 6, no. 4, pp. 1942-1951, 2024.
T. A. Almeida, J. M. G. Hidalgo, and A. Yamakami, "Contributions to the study of SMS spam filtering," Proc. 11th ACM Symp. Document Eng., pp. 259-262, Sep. 2011.
T. Chen and M.-Y. Kan, "Creating a live, public short message service corpus: the NUS SMS corpus," Language Resour. Eval., vol. 47, pp. 299-335, 2013.
M. A. Shafi’I, M. S. Abd Latiff, H. Chiroma, O. Osho, G. Abdul-Salaam, A. I. Abubakar, and T. Herawan, "A review on mobile SMS spam filtering techniques," IEEE Access, vol. 5, pp. 15650-15666, 2017.
I. S. Mambina, J. D. Ndibwile, D. Uwimpuhwe, and K. F. Michael, "Uncovering SMS spam in Swahili text using deep learning approaches," IEEE Access, 2024.
U. Maqsood, S. Ur Rehman, T. Ali, K. Mahmood, T. Alsaedi, and M. Kundi, "An intelligent framework based on deep learning for SMS and e-mail spam detection," Appl. Comput. Intell. Soft Comput., vol. 2023, no. 1, Art. no. 6648970, 2023.
T. Xia and X. Chen, "A discrete hidden Markov model for SMS spam detection," Appl. Sci., vol. 10, no. 14, p. 5011, 2020.
A. Theodorus, T. K. Prasetyo, R. Hartono, and D. Suhartono, "Short message service (SMS) spam filtering using machine learning in Bahasa Indonesia," in Proc. 2021 3rd East Indonesia Conf. Comput. Inf. Technol. (EIConCIT), Apr. 2021, pp. 199-203.
A. H. J. Almarashy, M. -R. Feizi-Derakhshi and P. Salehpour, "Enhancing Fake News Detection by Multi-Feature Classification," in IEEE Access, vol. 11, pp. 139601-139613, 2023, doi: 10.1109/ACCESS.2023.3339621.
A. K. Jasim, J. Tanha, and M. A. Balafar, "Neighborhood information based semi-supervised fuzzy C-means employing feature-weight and cluster-weight learning," Chaos Solitons Fractals, vol. 181, p. 114670, 2024.
Z. H. Ali, H. M. Salman, and A. H. Harif, "SMS spam detection using multiple linear regression and extreme learning machines," Iraqi J. Sci., 2023, pp. 6342-6351.
A. I. Jabbooree, L. M. Khanli, P. Salehpour, and S. Pourbahrami, "Geometrical facial expression recognition approach based on fusion CNN-SVM," Int. J. Intell. Eng. Syst., vol. 17, no. 1, 2024.
F. Wei and T. Nguyen, "A lightweight deep neural model for SMS spam detection," in Proc. 2020 Int. Symp. Networks, Comput. Commun. (ISNCC), 2020, pp. 1-6.
Y. Xu and R. Goodacre, "On splitting training and validation set: a comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning," J. Anal. Test., vol. 2, no. 3, pp. 249-262, 2018.
M. Ramezani, M.-R. Feizi-Derakhshi, M. A. Balafar, et al., "Automatic personality prediction: an enhanced method using ensemble modeling," Neural Comput. Appl., vol. 34, pp. 18369–18389, 2022, doi: 10.1007/s00521-022-07444-6.
M. I. Khaleel, Z. A. Taha, I. A. Murad, and M. Zahawii, "ChatGPT and the Crisis of Academic Honesty," AlKadhim Journal for Computer Science, vol. 2, no. 1, pp. 28–35, 2024.
D. Salman and N. Sulaiman, "A Review of Encryption Algorithms for Enhancing Data Security in Cloud Computing," AlKadhim Journal for Computer Science, vol. 2, no. 1, pp. 53–71, 2024.
H. K. Hoomod, A. J. Al-Mousawi, and J. R. Naif, "New Complex Hybrid Security Algorithm (CHSA) for Network Applications," in Inventive Communication and Computational Technologies: Proceedings of ICICCT 2019, Singapore: Springer, 2020, pp. 87–103.
Published
How to Cite
Issue
Section
Copyright (c) 2024 Hussein Al-Kaabi, Ali Darroudi Darroudi , Ali Kadhim Jasim (Author)
This work is licensed under a Creative Commons Attribution 4.0 International License.