SCT Topic 10: Advanced Strategies

Stealthiness Strategies

Before launch the attack, the attacker should check AV(anti-virus)/Filters detection, otherwise, their actions will leave some warnings or tracks in the target system

Encoding (e.g., msfencode, also use multiple at once), change the encoding of the file, the checking signature of malware analysis may not be triggered.
Transient malware, e.g., run only in memory
Mimicry, simulate legitimate app process to escape from the anti-virus
Packers, obfuscate malicious code, and unpack routine at runtime

Note: Offline version of AVs offer less functionality

Persistence Strategies

Attackers also want to keep their malware running on target system

Payloads in Metasploit (e.g, reverse shell, meterpreter) for further interactions
Scheduled Tasks, launch everytime after users rebooting systems
Backdoors, leave backdoor for further interaction and exploitation
In BeEf Stored XSS will remain valid whenever the victims visits the page

Possible to do some “privilege escalation” and “sandbox evasion”

Keep up to date

Keep track of what bad guys are doing

Technical Reports
MITRE CVEs - https://cve.mitre.org/about/documents.html
Google Project Zero
Exploit DB
https://www.reddit.com/r/netsec/

Adversarial Machine Learning

Definition of machine Learning

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E

Tom Mitchell, Machine Learning

Machine Learning for Security

5 Phases of ML for Security

Data Collection
Pre-processing and Feature Engineering
Model Selection and Training
Testing and Evaluation
Evaluation against Time Evolution and Adversaries

Machine Learning Algorithms Categories

Classification: given a labeled dataset, find a model that separates instances into classes
Regression: given some points, try to generalise and predict real-valued numbers
Clustering: given an unlabeled dataset, try to group similar elements

Performance metrics

I used to confuse about the metrics used in ML, I think the difficult point is how to define the Positive and Negative

We assume we are doing a malware classification, thus our target is to determine where a sample is a malware or not?

If we decide one program is a malware, we called it Positive result, otherwise, it is a Negative result.

Thus, a True Positive means, we think a program is a malware, it is actually a real malware
a True Negative means, we think a program is a goodware, it is actually a goodware
a False Positive means, we think a program is a malware, but it is actually a goodware, which we also called false alarm
a False Negative means, we think a program is a goodware, but it is actually a malware

Precision and Recall

Precision = TP / (TP + FP), it reflects the machine learning algorithm’s performance on “How many times you are right?”, TP + FP indicates the total sample are labeled by ML as malware (positive)

Recall = TP / (TP + FN), it reflects “How many malware found by you”, TP + FN means all the actual malware within the dataset

F1-Score is defined as the harmonic mean of Precision and Recall

F1-Score = 2 x (Precision x Recall) / (Precision + Recall)

Accuracy = (TP + TN) / (TP + FP + TN + FN), indicates that all correct decision made by the ML module
- Is Accuracy a good metric in Security?
  - Accuracy can be misleading when datasets are very imbalanced
  - In a malware dataset, the ratio of goodware and malware could be like 1000 : 1, thus it is easily to find benign program than malware, the dataset can easily become 99%

Evasion

An attacker may try to evade detection or poison training data

Spam Filtering

Features: presence/absence of words Attacks: bad word obfuscation / good word insertion

Adversarial Machine Learning: Taxonomy

Test-time Evasion
Training-time Poisoning
Inference Attacks
Model Stealing

Membership Inference

Shadow Model Estimation