As part of our partnership with Hugging Face, Foundation AI operates custom-fit signatures that detect deserialization risks in common model file formats such as .pt and .pkl, in addition to scanning for traditional malware.
As part of our partnership with Hugging Face, Foundation AI operates custom-fit signatures that detect deserialization risks in common model file formats such as .pt and .pkl, in addition to scanning for traditional malware.
ClamAV scans every file in the repository for traces of malware (also known as "signatures"). Multiple matches can be found for a single file.
If the signature in your alert begins with the string Py.Malware, it is for model-specific malware.
Each signature looks for import(s) of a specific Python module within the serialized model file. These modules are considered suspicious in the context of ML models.
In the example above, Py.Malware.NetAccess_webbrowser_ANY_GLOBAL indicates that the model is importing the webbrowser module during deserialization, which constitutes a potential network access threat (NetAccess).
All model malware signatures follow the {risk category}_{module}_{function}_{opcode} naming convention.
Code Execution (CodeExec): Allows attackers to execute arbitrary code within the target environment, potentially leading to full system compromise, unauthorized access, and data manipulation.
System Access (SysAccess): Allows attackers to execute system-level commands on the host OS, which can result in unauthorized access, privilege escalation, and control over the system.
Network Access (NetAccess): Allows attackers to exploit weaknesses in network communications or remote access, enabling unauthorized access, data interception, or remote system control.
Our signatures are intended to only flag modules which have virtually no justification for usage during model deserialization (e.g., subprocess, requests.post, socketserver), but false positives may occur.
This helps confirm whether the file has been flagged by other security vendors and provides additional context about the threat.
The file hash is available in the SHA256 field on the Hugging Face file details page.
If one does not exist, ask the provider to release one!
Suppose an alert flags for "system access via os.environ" in a .jsonl dataset containing code snippet data.
Because the file is not pickle-based, it's unlikely that there is a pickle deserialization risk. Furthermore, because the dataset intentionally contains code snippets, it's reasonable for it to contain instances of "os.environ" (though inspecting the actual references is always a good measure).
If the signature in your alert does not begin with the string Py.Malware, it is a standard ClamAV malware signature.
ClamAV has been detecting malware since 2002! Millions of unique signatures are available out of the box, detecting trojans, botnets, crypto miners, loaders, and other malware.
This helps confirm whether the file has been flagged by other security vendors and provides additional context about the threat.
The file hash is available in the SHA256 field on the Hugging Face file details page.
This Each ClamAV signature is designed to detect a specific malware pattern within a file's contents.
You can inspect the underlying pattern by decoding the signature with sigtool.
For example, if the scanner in Hugging Face detected a virus called Eicar-Signature (a known test file in malware detection), we could decode that signature like so:
The final line contains the actual string that was detected in the file (X50 ... +H*).