Метод виявлення шкідливих програмних засобів на основі алгоритму найближчих сусідів
Вантажиться...
Файли
Дата
2017
Автори
Лисенко, С.М.
Гуменюк, В.В
Lysenko, S.M.
Gumenyuk, V.V.
Назва журналу
Номер ISSN
Назва тому
Видавець
Хмельницький національний університет
Анотація
В роботі представлено метод виявлення шкідливого програмного забезпечення на основі алгоритму k-найближчих
сусідів, який здійснює класифікацію програмного забезпечення на шкідливе і нормальне. Метод
передбачає аналіз поведінки програмного забезпечення в комп’ютерній системі. В процесі роботи методу кожен
досліджуваний процес представляється у вигляді вектора, у якому кожен елемент є значення статистичного
показника, що відображає кількість системних викликів, які процес здійснює на протязі його виконання. Метод
дозволяє забезпечити реагування на нові загрози, забезпечуючи захист комп’ютерних систем від як відомого так і
невідомого ШПЗ. Робота системи виявлення ШПЗ здійснюється на основі виявлення аномалій через порівняння
поведінки досліджуваного процесу із базою даних відомих поведінок програмного забезпечення в комп’ютерних
системах.
Abstract – this article describes a method for malware detection based on k-nearest neighbors algorithm (kNN) through binary classification of processes as malicious or benign. The stilldominant method of malware detection is signature detection. This method relies on detecting unique patterns of data inside of scanned files. The rate of appearance of new malware specimens tends to outpace malware specialists’ ability to create new signatures. This means that signaturebased detection methods suffer from low detection rate of new types of malware. The field of machine learning provides more reliable means of protection against such threats. The method described in this article employs kNearest Neighbor classification algorithm, successfully employed in the field of text classification, for the task of detection of malicious software. This method treats each process as a document to be classified. Feature vectors extracted from each process contain weighted frequencies of each system call performed during the process' runtime. Each unknown processes is compared with processes from a database of known benign processes, which is to be formed on the training phase. When the average similarity of k most similar processes reaches the certain threshold this process can be classified as benign, otherwise – as malicious. For the measuring of the similarities of feature vectors, cosine similarity was used. Two weighting methods: tfidf and frequency for extraction of feature vectors were compared. It was shown that performance of this method depends on the choice of weighting method, the value of k and the threshold value. Different tradeoffs can be achieved by varying these parameters. It was shown that tfidf weighting method provides the optimal balance between detection rate and false positive rate. On the other hand, frequency weighting provides higher detection rate and the ease of forming the training database at the cost of much higher false positive rate. It makes sense to investigate different weighting methods and measures of similarity further. Experiments have shown applicability of the proposed method and the possibility of implementing it in software at the detection rate of 9698% with the false positive rate of 36%. This method allows for detection of new, previously unknown types of malware, providing protection of computer systems from both known and unknown malware.
Abstract – this article describes a method for malware detection based on k-nearest neighbors algorithm (kNN) through binary classification of processes as malicious or benign. The stilldominant method of malware detection is signature detection. This method relies on detecting unique patterns of data inside of scanned files. The rate of appearance of new malware specimens tends to outpace malware specialists’ ability to create new signatures. This means that signaturebased detection methods suffer from low detection rate of new types of malware. The field of machine learning provides more reliable means of protection against such threats. The method described in this article employs kNearest Neighbor classification algorithm, successfully employed in the field of text classification, for the task of detection of malicious software. This method treats each process as a document to be classified. Feature vectors extracted from each process contain weighted frequencies of each system call performed during the process' runtime. Each unknown processes is compared with processes from a database of known benign processes, which is to be formed on the training phase. When the average similarity of k most similar processes reaches the certain threshold this process can be classified as benign, otherwise – as malicious. For the measuring of the similarities of feature vectors, cosine similarity was used. Two weighting methods: tfidf and frequency for extraction of feature vectors were compared. It was shown that performance of this method depends on the choice of weighting method, the value of k and the threshold value. Different tradeoffs can be achieved by varying these parameters. It was shown that tfidf weighting method provides the optimal balance between detection rate and false positive rate. On the other hand, frequency weighting provides higher detection rate and the ease of forming the training database at the cost of much higher false positive rate. It makes sense to investigate different weighting methods and measures of similarity further. Experiments have shown applicability of the proposed method and the possibility of implementing it in software at the detection rate of 9698% with the false positive rate of 36%. This method allows for detection of new, previously unknown types of malware, providing protection of computer systems from both known and unknown malware.
Опис
Ключові слова
алгоритм k-найближчих сусідів, шкідливе програмне забезпечення, категоризація, системні виклики, k-Nearest neighbor classifier, malwar, system calls, attack, document categorization
Бібліографічний опис
Лисенко, С.М. Метод виявлення шкідливих програмних засобів на основі алгоритму найближчих сусідів [Текст] / С. М. Лисенко, В. В. Гуменюк // Вісник Хмельницького національного університету. Технічні науки. – 2017. – № 6. – С. 96-101.