MULTILINGUAL HACKER ASSET IDENTIFICATION LEXICON
Version 1, 2020, Mohammadreza Ebrahimi, Ashley Ireson
This is a working lexicon and we will provide updates as we come upon additional, relevant sources. Contributors, welcome!
While there exist renowned cybersecurity lexicons with general terms, to our knowledge, there is no multilingual lexicon specifically for hacker asset identification. Therefore, we constructed a customized lexicon with 1,059 hacker asset identifiers by compiling and modifying five publicly-available lexicons, as well as incorporating the indicators suggested by two subject matter experts after reviewing product descriptions from large, dark web platforms. The compilation process is detailed in the following seven steps:
- Step 1: Start with an extensive, existing lexicon. “Explore Terms,” a cybersecurity lexicon compiled by the Department of Homeland Security (DHS) (NICCS 2019), was constructed as part of National Initiative for Cybersecurity Careers and Studies (NICCS) program. To our knowledge, it is the most credible and one the most comprehensive lexicons. It also complements other lexicons, such as the NISTIR (National Institute of Standards and Technology Internal Reports). While this lexicon provides a good starting point, it lacks some hacker asset indicators (e.g., ‘XSS’ (cross-site scripting), ‘zero-day,’ and ‘ransomware’).
- Step 2: Add other lexicons. To extend the number of jargons and acronyms related to cyber threat detection, we expanded this lexicon with three publicly available smaller but more up-to-date lexicon from three websites (Arvatz 2017; DarkOwl 2019; Motherboard 2019).
- Step 3: Append common indicators. Consulting with a cybersecurity expert, we added some of the threat indicators. Our cybersecurity expert added almost 55 new frequent threat-related keywords they observed on the dark web (e.g., dump, dox, Citadel, Mirai, etc.).
- Step 4: Include malware variants. We added 642 malicious file variants from the New Jersey Cybersecurity and Communications Integration Cell (NJCCIC) (“Cyber Threat Profiles” 2020), including botnets (54), mobile malware (94), ransomware (234), Trojan variants (161), and other malware types (99).
- Step 5: Add related terms. To add related terms to hacker asset identification in the dark web, two cybersecurity experts scanned 15,000 product descriptions from two large and well-known dark web platforms and added 148 items that did not exist in the previous steps. For example, carding terminology (e.g., fullz, dump etc.) was added at this step.
- Step 6: Remove overly general terms. After the compilation, to improve the accuracy of the lexicon, we removed some of the very general terms (e.g., architecture), that come from homeland security lexicon and may appear in contexts other than cybersecurity.
- Step 7: Translate the lexicon. We translated the English indicators into Russian, Italian, and French to complete our lexicon. We could not translate acronyms, such as RAT (Remote access Trojan). Instead, we kept the English acronyms since we have observed that, in the hacker context, English words can be used in foreign languages.