SCT Topic 10: Advanced Strategies
Stealthiness Strategies
Before launch the attack, the attacker should check AV(anti-virus)/Filters detection, otherwise, their actions will leave some warnings or tracks in the target system
- Encoding (e.g.,
msfencode
, also use multiple at once), change the encoding of the file, the checking signature of malware analysis may not be triggered. - Transient malware, e.g., run only in memory
- Mimicry, simulate legitimate app process to escape from the anti-virus
- Packers, obfuscate malicious code, and unpack routine at runtime
Note: Offline version of AVs offer less functionality
Persistence Strategies
Attackers also want to keep their malware running on target system
-
Payloads in Metasploit (e.g, reverse shell, meterpreter) for further interactions
-
Scheduled Tasks, launch everytime after users rebooting systems
-
Backdoors, leave backdoor for further interaction and exploitation
-
In BeEf Stored XSS will remain valid whenever the victims visits the page
Possible to do some “privilege escalation” and “sandbox evasion”
Keep up to date
Keep track of what bad guys are doing
- Technical Reports
- MITRE CVEs - https://cve.mitre.org/about/documents.html
- Google Project Zero
- Exploit DB
- https://www.reddit.com/r/netsec/
Adversarial Machine Learning
- Definition of machine Learning
- A computer program is said to learn from
experience E
with respect to someclass of tasks T
andperformance measure P
, if its performance at tasks in T, as measured by P, improves with experience E- Tom Mitchell, Machine Learning
Machine Learning for Security
- 5 Phases of ML for Security
- Data Collection
- Pre-processing and Feature Engineering
- Model Selection and Training
- Testing and Evaluation
- Evaluation against Time Evolution and Adversaries
Machine Learning Algorithms Categories
- Classification: given a labeled dataset, find a model that separates instances into classes
- Regression: given some points, try to generalise and predict real-valued numbers
- Clustering: given an unlabeled dataset, try to group similar elements
Performance metrics
I used to confuse about the metrics used in ML, I think the difficult point is how to define the Positive
and Negative
We assume we are doing a malware classification, thus our target is to determine where a sample is a malware or not?
If we decide one program is a malware, we called it Positive
result, otherwise, it is a Negative
result.
- Thus, a
True Positive
means, we think a program is a malware, it is actually a real malware - a
True Negative
means, we think a program is a goodware, it is actually a goodware - a
False Positive
means, we think a program is a malware, but it is actually a goodware, which we also calledfalse alarm
- a
False Negative
means, we think a program is a goodware, but it is actually a malware
-
Precision and Recall
Precision
= TP / (TP + FP), it reflects the machine learning algorithm’s performance on “How many times you are right?”, TP + FP indicates the total sample are labeled by ML as malware (positive)Recall
= TP / (TP + FN), it reflects “How many malware found by you”, TP + FN means all the actual malware within the datasetF1-Score
is defined as theharmonic mean
of Precision and RecallF1-Score = 2 x (Precision x Recall) / (Precision + Recall)
Accuracy
= (TP + TN) / (TP + FP + TN + FN), indicates that all correct decision made by the ML module- Is Accuracy a good metric in Security?
- Accuracy can be
misleading
when datasets are very imbalanced - In a malware dataset, the ratio of goodware and malware could be like 1000 : 1, thus it is easily to find benign program than malware, the dataset can easily become 99%
- Accuracy can be
- Is Accuracy a good metric in Security?
Evasion
An attacker may try to evade
detection or poison
training data
-
Spam Filtering
Features: presence/absence of words Attacks: bad word obfuscation / good word insertion
Adversarial Machine Learning: Taxonomy
-
Test-time Evasion
-
Training-time Poisoning
-
Inference Attacks
-
Model Stealing
- Membership Inference
- Shadow Model Estimation