Several Machine Learning-based models have been proposed to identify botnets in the cyber space. The performances of some of these detection schemes have been traced to the relevance of the features used for the classification models. Aside feature selection, another area that has not received much attention in botnet research is addressing data imbalance problem in the dataset before the classification. Until very recently, botnet researches have suffered from scarcity of large and benchmark datasets.CTU-13 dataset is the reference botnet dataset that has been identified to be large and realistic. However, the dataset is high dimensional and largely class imbalanced. There is a need to have model that considers data imbalance issue together with other pre-processing steps such as imputing missing values, removing redundant features, data transformation before carrying out botnet classification. In the proposed framework, Simple Minority Over-sampling Technique (SMOTE) is the technique for handling the data imbalance. Then, the study provides a methodological framework that can be used to achieve such improved botnet detection in the selected CTU-13 botnet dataset.
Keywords: Cyber Malware, Data Imbalance, Machine Learning Algorithms, Classification Problem.
Abu-Mostafa Y.S., Magdon-Ismal M. & Lin H.T.: Learning from data. AML Book, (2012).
Alenazi A., Traore I., Ganame K., Woungang I. (2017) Holistic Model for HTTP Botnet Detection Based on DNS Traffic Analysis. In: Traore I., Woungang I., Awad A. (eds) Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments. ISDDC 2017. Lecture Notes in Computer Science, vol 10618. Springer, Cham.
Barford P. and Yegneswaran V. (2006). An Inside Look at Botnets, to appear in Series: Advances in Information Security. Springer, (2006).
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953.
Dagon D., Gu G., and Lee W. (2007). A Taxonomy of Botnet Structures, In Proc. of the Twenty-Third Annual Computer Security Applications Conference(ACSAC 2007), 325-339, (2007).
Elhassan A.T., Aljourf M., Al-Mohanna F., Shoukri M. (2017). Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method, Global Journal of Technology and Optimsitation: 111. doi: 10.4172/2229-8711.S1:111.
Garcia Sebastian, Grill Martin, Stiborek Jan and Zunino Alejandro (2014). An empirical comparison of botnet detection method, Computers and Security Journal, Elsevier, 45, 100-123.
Grizzard Julian B., Sharma V., Nunnery C., Kang B.B. & Dagon D. (2007). Peer-to-Peer Botnets: Overview and Case Study, Proceedings of the First Conference on First work on Hot Topics in Understanding Botnets, (2007) retrieved from https://pdfs.semanticscholar.org/2820/fe12f286700ca9e7937e4cf3d082fb6d1a23.pdf.
Haddadi, F., Runkel, D., NurZincir-Heywood, A., & Heywood, M. I. (2014). On botnet behaviour analysis using GP and C4.5.GECCO 2014 - Companion Publication of the 2014 Genetic and Evolutionary Computation Conference, (2014), 1253–1260. https://doi.org/10.1145/2598394.2605435.
Hall, M. (1999). Correlation Based Feature Selection for Machine Learning In: Doctoral dissertation. Department of Computer Science, University of Waikato.
Harun, S. (2017). Bot Classification for Real-Life Highly Class- Imbalanced Dataset. (November). https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.102.
Katz Or, Perets Raviv, Matzliach Guy (2006).Digging Deeper – An In-Depth Analysis of a Fast Flux Network, Akamai White Paper.
Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets : A review. Science, 30(1), 25–36. https://doi.org/10.1007/978-0-387-09823-4_45.
Lee, D.-h., Kim, D.-y., & Jung, J.-i. (2008). Multi-Stage Intrusion Detection System Using Hidden Markov Model Algorithm. International Conference on Information Science and Security , 72-7(2008).
Liu, J., Xiao, Y., Ghaboosi, K., Hongmei, D., and Zhang, J. (2015).Botnet: Classification, Attacks, Detection, Tracing and Preventing Measures. Journal on Wireless Communication and Networking, (2009).
Malowidzki Marek, Berezinski Przemyslaw, and Mazur Micha (2015). Network Intrusion Detection: Half a Kingdom for a Good Dataset.
Muhammad Mahmoud, Manjinder Nir, and Ashraf Matrawy: A Survey on Botnet Architectures, Detection and Defences, International Journal of Network Security, 0(0), PP.1-19, (2013).
Narang, P., Reddy, J. M., & Hota, C. (2013). Feature selection for detection of peer-to-peer botnet traffic. Compute 2013 - 6th ACM India Computing Convention: Next Generation Computing Paradigms and Technologies. https://doi.org/10.1145/2522548.2523133.
Oyelakin A.M. and Jimoh R.G. (2019). A Review on the Identification Techniques for Detection-Evasive Botnet Malware, in the proceedings of International Conference of Nigeria Computer Society, Gombe, Nigeria, July 2019.
Pektas, A. and Acarman T. (2017). Effective Feature Selection for Botnet Detection Based on Network Flow Analysis. International Conference Automatics and Informatics’2017, (2017 October).
Pektaş, A., & Acarman, T. (2018). Botnet detection based on network flow summary and deep learning. International Journal of Network Management, 28(6), 1–15, https://doi.org/10.1002/nem.2039.
Ping Wang, Lei Wu, Baber Salam & Cliff C. Zou (2014). Analysis of Peer-to-Peer Botnet Attacks and Defences, Department of Electrical Engineering and Computer Science.
Samina Khalid, Tehmina Khalil, & Shaomila Nasreen (2014). A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning, 2014 Science and Information Conference.
Shiravi A., H. Shiravi, M. Tavallaee, and A. A. Ghorbani (2012). Toward developing a systematic approach to generate benchmark datasets for intrusion detection, Computers & Security, 31(3): 357–374.
A New Issue was published – Volume 8, Issue 2, 2025
13-04-2025 11-01-2025