Malware Detection using machine learning
Malware Detection using machine learning
Analysis modules:
-
Static: Features are extracted from PE file headers (mainly Optional Header)
-
Dynamic: Features are the API calls traced using Cuckoo Sandbox
Datasets construction
- Static
Malware samples were acquired from MalwareBazaar while benign samples were acquired from multiple online hosting websites (ie. CNET) we then used the pefile module in Python to parse PE headers and extract relevant features (chosen using benchmarks), we also used Yara capabilities, digital signature, and packing as features
- Dynamic
we tweaked the APIMDS dataset from hksecurity and changed it from a dataset of API calls sequences to a dataset of binary values with predetermined features
Algorithm used
We compared multiple algorithms using a 10-Fold stratified cross validation process algorithm, we settled on Extreme Gradient Boosting (XGBoost) classification algorithm as it had the highest F1 score
Project interfaces
Static
Dynamic
Install
Copyright (c) 2022 Mohamed Benchikh