CrowdSource: Automated inference of high level malware functionality from low-level symbols using a crowd trained machine learning model

Joshua Saxe; Rafael Turner; Kristina Blokhin

doi:10.1109/MALWARE.2014.6999417

2014 9th International Conference on Malicious and Unwanted Software: "The Americas" (MALWARE)

CrowdSource: Automated inference of high level malware functionality from low-level symbols using a crowd trained machine learning model

Year: 2014, Pages: 68-75

DOI Bookmark: 10.1109/MALWARE.2014.6999417

Authors

Joshua Saxe, Invincea Labs
Rafael Turner, Invincea Labs
Kristina Blokhin, Invincea Labs

Abstract

In this paper we introduce CrowdSource, a statistical natural language processing system designed to make rapid inferences about malware functionality based on printable character strings extracted from malware binaries. CrowdSource “learns” a mapping between low-level language and high-level software functionality by leveraging millions of web technical documents from StackExchange, a popular network of technical question and answer sites, using this mapping to infer malware capabilities. This paper describes our approach and provides an evaluation of its accuracy and performance, demonstrating that it can detect at least 14 high-level malware capabilities in unpacked malware binaries with an average per-capability f-score of 0.86 and at a rate of tens of thousands of binaries per day on commodity hardware.

Like what you’re reading?

Already a member?Sign In

Member Price

$11

Non-Member Price

$21

Add to Cart Sign In

Get this article FREE with a new membership!

Malware Detection Systems Based on API Log Data Mining
2015 IEEE 39th Annual Computer Software and Applications Conference (COMPSAC)
Mining Malware to Detect Variants
2014 Fifth Cybercrime and Trustworthy Computing Conference (CTC)
CARDINAL: similarity analysis to defeat malware compiler variations
2016 11th International Conference on Malicious and Unwanted Software (MALWARE)
RePEconstruct: reconstructing binaries with self-modifying code and import address table destruction
2016 11th International Conference on Malicious and Unwanted Software (MALWARE)
BinGraph: Discovering mutant malware using hierarchical semantic signatures
2012 7th International Conference on Malicious and Unwanted Software
Indoor Location Estimation with Reduced Calibration Exploiting Unlabeled Data via Hybrid Generative/Discriminative Learning
IEEE Transactions on Mobile Computing
An improved RepLKNet-based malware detection method
2023 11th International Conference on Information Technology: IoT and Smart City (ITIoTSC)
Cross-Architecture Binary Function Fingerprinting
IEEE Security & Privacy
Email Spam Classification Using LBSVM
2024 9th International Conference on Intelligent Computing and Signal Processing (ICSP)
BobGAT: Towards Inferring Software Bill of Behavior with Pre-Trained Graph Attention Networks
2024 IEEE 6th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA)

CrowdSource: Automated inference of high level malware functionality from low-level symbols using a crowd trained machine learning model

Authors

Abstract

Related Articles