Malware Detection using machine learning

Malware Detection machine learning

Malware Detection using machine learning

Analysis modules:

  • Static: Features are extracted from PE file headers (mainly Optional Header)

  • Dynamic: Features are the API calls traced using Cuckoo Sandbox

Datasets construction

  • Static

Malware samples were acquired from MalwareBazaar while benign samples were acquired from multiple online hosting websites (ie. CNET) we then used the pefile module in Python to parse PE headers and extract relevant features (chosen using benchmarks), we also used Yara capabilities, digital signature, and packing as features

  • Dynamic

we tweaked the APIMDS dataset from hksecurity and changed it from a dataset of API calls sequences to a dataset of binary values with predetermined features

Algorithm used

We compared multiple algorithms using a 10-Fold stratified cross validation process algorithm, we settled on Extreme Gradient Boosting (XGBoost) classification algorithm as it had the highest F1 score

Project interfaces

Static

Dynamic

Install

Copyright (c) 2022 Mohamed Benchikh