Binary Black-box Evasion Attacks Against Deep Learning-based Static Malware Detectors with Adversarial Byte-Level Language Model
M. Ebrahimi, N. Zhang, J. Hu, M. T. Raza,H. Chen
AAAI Conference on Artificial Intelligence, Workshop on Robust, Secure, and Efficient Machine Learning (RSEML), February 8-9, 2021
Abstract
BibTeX
PDF
AAAI RSEML'21 (DLS)
|
Anti-malware engines are the first line of defense against malicious software. While widely used, feature engineering-based anti-malware engines are vulnerable to unseen (zero-day) attacks. Recently, deep learning-based static anti-malware detectors have achieved success in identifying unseen attacks without requiring feature engineering and dynamic analysis. However, these detectors are susceptible to malware variants with slight perturbations, known as adversarial examples. Generating effective adversarial examples is useful to reveal the vulnerabilities of such systems. Current methods for launching such attacks require accessing either the specifications of the targeted anti-malware model, the confidence score of the anti-malware response, or dynamic malware analysis, which are either unrealistic or expensive. We propose MalRNN, a novel deep learning-based approach to automatically generate evasive malware variants without any of these restrictions. Our approach features an adversarial example generation process, which learns a language model via a generative sequence-to-sequence recurrent neural network to augment malware binaries. MalRNN effectively evades three recent deep learning-based malware detectors and outperforms current benchmark methods. Findings from applying our MalRNN on a real dataset with eight malware categories are discussed. |
@article{ebrahimi2021malrnn, title={Binary Black-box Evasion Attacks Against Deep Learning-based Static Malware Detectors with Adversarial Byte-Level Language Model},
author={Ebrahimi, Mohammadreza and Zhang, Ning and Hu, James and Raza, Muhammad Taqi and Chen Hsinchun}, journal={AAAI Conference on Artificial Intelligence, Workshop on Robust, Secure, and Efficient Machine Learning (RSEML)},
year={2021}, publisher={AAAI}}
|
Detecting Cyber Threats in Non-English Hacker Forums: An Adversarial Cross-Lingual Knowledge Transfer Approach
M. Ebrahimi, S. Samtani, Y. Chai, H. Chen
IEEE Symposium on Security and Privacy (IEEE S&P), Deep Learning and Security Workshop, San Francisco, May 2020.
Abstract
PDF
IEEE S&P'20 (DLS)
|
The regularity of devastating cyber-attacks has made cybersecurity a grand societal challenge. Many cybersecurity professionals are closely examining the international Dark Web to proactively pinpoint potential cyber threats. Despite its potential, the Dark Web contains hundreds of thousands of non-English posts. While machine translation is the prevailing approach to process non-English text, applying MT on hacker forum text results in mistranslations. In this study, we draw upon Long-Short Term Memory (LSTM), Cross-Lingual Knowledge Transfer (CLKT), and Generative Adversarial Networks (GANs) principles to design a novel Adversarial CLKT (A-CLKT) approach. A-CLKT operates on untranslated text to retain the original semantics of the language and leverages the collective knowledge about cyber threats across languages to create a language invariant representation without any manual feature engineering or external resources. Three experiments demonstrate how A-CLKT outperforms state-of-the-art machine learning, deep learning, and CLKT algorithms in identifying cyber-threats in French and Russian forums. |
Detecting Cyber Threats in Non-English Dark Net Markets: A Cross-Lingual Transfer Learning Approach
M. Ebrahimi, Y. Chai, H. Zhang, H. Chen
IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 85-90, Florida, US, IEEE, Nov. 8-10, 2018, (Best Paper Award Runner-Up).
Abstract
BibTeX
PDF
IEEE ISI'18
|
Recent advances in proactive cyber threat intelligence rely on early detection of cyber threats in hacker communities. Dark Net Markets (DNMs) are growing platforms in hacker community that provide hackers with highly-specialized tools and products which may not be found in other platforms. While text classification techniques have been used for cyber threat detection in English DNMs, the task is hindered in non-English platforms due to the language barrier and lack of ground-truth data. Current approaches use monolingual models on machine translated data to overcome these challenges. However, the translation errors can deteriorate the classification results. The abundance of data in English DNMs can be leveraged in learning non-English threats without using machine translation. In this study, we show that a deep cross-lingual model that can jointly learn the common language representation from two languages, significantly outperforms a monolingual model learned on machine translated data for identifying cyber threats in non-English DNMs. Unlike most studies, our approach does not require any external data source such as bilingual word embeddings or bilingual lexicons. Our experiments on Russian DNMs show that this approach can achieve better performance than state-of-the-art methods for non-English cyber threat detection in malicious hacker community. |
@INPROCEEDINGS{8587404, author={M. {Ebrahimi} and M. {Surdeanu} and S. {Samtani} and H. {Chen}},
booktitle={2018 IEEE International Conference on Intelligence and Security Informatics (ISI)},
title={Detecting Cyber Threats in Non-English Dark Net Markets: A Cross-Lingual Transfer Learning Approach}, year={2018},
volume={}, number={}, pages={85-90},}
|
Recognizing Predatory Chat Documents using Semi-supervised Anomaly Detection
M. Ebrahimi, C. Y. Suen, O. Ormandjieva, A. Krzyzak
23rd Document Recognition Retrieval Conference (DRR 2016), pp. 1-9(9), San Francisco, CA, February 14-18, 2016.
Abstract
BibTeX
PDF
DRR'16
|
Chat-logs are informative documents available to nowadays social network providers. Providers and law enforcement tend to use these huge logs anonymously for automatic online Sexual Predator Identification (SPI) which is a relatively new area of application. The task plays an important role in protecting children and juveniles against being exploited by online predators. Pattern recognition techniques facilitate automatic identification of harmful conversations in cyber space by law enforcements. These techniques usually require a large volume of high-quality training instances of both predatory and non-predatory documents. However, collecting non-predatory documents is not practical in real-world applications, since this category contains a large variety of documents with many topics including politics, sports, science, technology and etc. We utilized a new semi-supervised approach to mitigate this problem by adapting an anomaly detection technique called One-class Support Vector Machine which does not require non-predatory samples for training. We compared the performance of this approach against other state-of-the-art methods which use both positive and negative instances. We observed that although anomaly detection approach utilizes only one class label for training (which is a very desirable property in practice); its performance is comparable to that of binary SVM classification. In addition, this approach outperforms the classic two-class Naïve Bayes algorithm, which we used as our baseline, in terms of both classification accuracy and precision. |
@article{ebrahimi2016recognizing, title={Recognizing predatory chat documents using semi-supervised anomaly detection},
author={Ebrahimi, Mohammadreza and Suen, Ching Y and Ormandjieva, Olga and Krzyzak, Adam}, journal={Electronic Imaging},
volume={2016}, number={17}, pages={1--9}, year={2016}, publisher={Society for Imaging Science and Technology}}
|
Identifying High-Impact Opioid Products and Key Sellers in Dark Net Marketplaces: An Interpretable Text Analytics Approach
P. Du, M. Ebrahimi, N. Zhang, H. Chen, R. A. Brown and S. Samtani
IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 110-115, Shenzhen, China, Jul. 1-3, 2019.
Abstract
BibTeX
PDF
IEEE ISI'19
|
As the Internet based applications become more and more ubiquitous, drug retailing on Dark Net Marketplaces (DNMs) has raised public health and law enforcement concerns due to its highly accessible and anonymous nature. To combat illegal drug transaction among DNMs, authorities often require agents to impersonate DNM customers in order to identify key actors within the community. This process can be costly in time and resource. Research in DNMs have been conducted to provide better understanding of DNM characteristics and drug sellers' behavior. Built upon the existing work, researchers can further leverage predictive analytics techniques to take proactive measures and reduce the associated costs. To this end, we propose a systematic analytical approach to identify key opioid sellers in DNMs. Utilizing machine learning and text analysis, this research provides prediction of high-impact opioid products in two major DNMs. Through linking the high-impact products and their sellers, we then identify the key opioid sellers among the communities. This work intends to help law enforcement authorities to formulate strategies by providing specific targets within the DNMs and reduce the time and resources required for prosecuting and eliminating the criminals from the market. |
@inproceedings{du2019identifying, title={Identifying High-Impact Opioid Products and Key Sellers in Dark Net Marketplaces:
An Interpretable Text Analytics Approach}, author={Du, Po-Yi and Ebrahimi, Mohammadreza and Zhang, Ning and Chen, Hsinchun and
Brown, Randall A and Samtani, Sagar}, booktitle={2019 IEEE International Conference on Intelligence and Security Informatics (ISI)},
pages={110--115}, year={2019}, organization={IEEE}}
|
Dark-Net Ecosystem Cyber-Threat Intelligence (CTI) Tool
N. Arnold, M. Ebrahimi, N. Zhang, B. Lazarine, M. Patton, H. Chen, S. Samtani
IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 92-97. Shenzhen, China, IEEE, Jul. 1-2, 2019.
Abstract
BibTeX
PDF
IEEE ISI'19
|
The frequency and costs of cyber-attacks are increasing each year. By the end of 2019, the total cost of data breaches is expected to reach $2.1 trillion through the ever-growing online presence of enterprises and their consumers. The tools to perform these attacks and the breached data can often be purchased within the Dark-net. Many of the threat actors within this realm use its various platforms to broker, discuss, and strategize these cyber-threat assets. To combat these attacks, researchers are developing Cyber-Threat Intelligence (CTI) tools to proactively monitor the ever-growing online hacker community. This paper will detail the creation and use of a CTI tool that leverages a social network to identify cyber-threats across major Dark-net data sources. Through this network, emerging threats can be quickly identified so proactive or reactive security measures can be implemented. |
@inproceedings{arnold2019dark, title={Dark-Net Ecosystem Cyber-Threat Intelligence (CTI) Tool}, author={Arnold, Nolan and
Ebrahimi, Mohammadreza and Zhang, Ning and Lazarine, Ben and Patton, Mark and Chen, Hsinchun and Samtani, Sagar},
booktitle={2019 IEEE International Conference on Intelligence and Security Informatics (ISI)}, pages={92--97}, year={2019},
organization={IEEE}}
|
Identifying, Collecting, and Presenting Hacker Community Data: Forums, IRC, Carding Shops, and DNMs
P. Du, N. Zhang, M. Ebrahimi et al.
IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 70-75, Miami, FL, Nov. 8-10, 2018.
Abstract
BibTeX
PDF
IEEE ISI'18
|
Cyber-attacks cost the global economy over $450 billion annually. To combat this issue, researchers and practitioners put enormous efforts into developing Cyber Threat Intelligence, or the process of identifying emerging threats and key hackers. However, the reliance on internal network data to has resulted in inherently reactive intelligence. CTI experts have urged the importance of proactively studying the large, ever-evolving online hacker community. Despite their CTI value, collecting data from hacker community platforms is a non-trivial task. In this paper, we summarize our efforts in systematically identifying and automatically collecting a large-scale of hacker forums, carding shops, Internet-Relay-Chat, and Dark Net Marketplaces. We also present our efforts to provide this data to the larger CTI community via the AZSecure Hacker Assets Portal (www.azsecure-hap.com). With our methodology, we collected 102 platforms for a total of 43,981,647 records. To the best of our knowledge, this compilation of hacker community data is the largest such collection in academia. |
@inproceedings{du2018identifying, title={Identifying, Collecting, and Presenting Hacker Community Data: Forums, IRC,
Carding Shops, and DNMs}, author={Du, Po-Yi and Zhang, Ning and Ebrahimi, Mohammedreza and Samtani, Sagar and Lazarine, Ben and
Arnold, Nolan and Dunn, Rachael and Suntwal, Sandeep and Angeles, Guadalupe and Schweitzer, Robert and others},
booktitle={2018 IEEE International Conference on Intelligence and Security Informatics (ISI)}, pages={70--75}, year={2018},
organization={IEEE}}
|
Detecting and Investigating Crime by Means of Data Mining: A General Crime Matching Framework
M. Keyvanpour, M. Javideh, M. Ebrahimi
World Conference on Information Technology 2010, Procedia Computer Science, Volume 3, pp. 872-880, Edited by AdemKarahoca, Sezer, 2011.
Abstract
BibTeX
PDF
WorldCIST
|
Data mining is a way to extract knowledge out of usually large data sets; in other words it is an approach to discover hidden relationships among data by using artificial intelligence methods. The wide range of data mining applications has made it an important field of research. Criminology is one of the most important fields for applying data mining. Criminology is a process that aims to identify crime characteristics. Actually crime analysis includes exploring and detecting crimes and their relationships with criminals. The high volume of crime datasets and also the complexity of relationships between these kinds of data have made criminology an appropriate field for applying data mining techniques. Identifying crime characteristics is the first step for developing further analysis. The knowledge that is gained from data mining approaches is a very useful tool which can help and support police forces. An approach based on data mining techniques is discussed in this paper to extract important entities from police narrative reports which are written in plain text. By using this approach, crime data can be automatically entered into a database, in law enforcement agencies. We have also applied a SOM clustering method in the scope of crime analysis and finally we will use the clustering results in order to perform crime matching process. |
@article{keyvanpour2011detecting, title={Detecting and investigating crime by means of data mining: a general crime matching
framework}, author={Keyvanpour, Mohammad Reza and Javideh, Mostafa and Ebrahimi, Mohammad Reza}, journal={Procedia Computer
Science}, volume={3}, pages={872--880}, year={2011}, publisher={Elsevier}}
|