What is serialization and deserialization? How does it work in Python?
Serialization transforms an object into a byte stream. Deserialization is the inverse process: taking a byte stream to create an object.
The reason for saving objects and restoring them to/from byte streams is to be able to communicate objects through the filesystem and the network. For example, in distributing system, the server may receive objects from a client (or the other way around - the server can send objects back to the client).
Why is deserialization unsafe?
Deserialization is unsafe for one simple reason: you cannot trust the binary value passed to you. While there is no problem with serializing data (e.g. sending to someone else), deserialization takes a random byte stream and converts it to an object. There is absolutely no guarantee that the object is safe to be used and can include code that may compromise your system.
Unsafe deserialization is a common software weakness. MITRE, in their Common Weakness Enumeration (CWE) system, references it under CWE-502: Deserialization of Untrusted Data
This blog post
illustrates how unsafe deserialization works with Python and the standard
What Python modules are vulnerable to unsafe deserialization?
While it's hard to enumerate all Python modules that serialize/deserialize data, the most used are:
Note that the documentation of all these modules mentions security concerns and warns developers only to deserialize data from trusted sources. In very specific cases, it might be safe to deserialize data (e.g., when loading data you previously saved on your local machine). In the vast majority of cases, it's unsafe and highly not recommended to deserialize data.
How to safely serialize and deserialize data?
Unfortunately, there is no silver bullet, and the safest way to deserialize data is not to rely on deserialization and instead use API that exchanges the data you need.
Automatically detect unsafe deserialization
Codiga provides IDE plugins and integrations with GitHub, GitLab, or Bitbucket to detect unsafe deserialization for multiple Python modules
The Codiga static code analysis detects unsafe deserialization in your IDE or code reviews ; here is a dedicated rule. This rule detects unsafe deserialization
from the following Python modules:
To use this rule consistently, all you need to do is to install the integration in your IDE (for VS Code or JetBrains) or code management system and add a
codiga.yml file at the root of your profile with the following content:
It will then check all your Python code against 100+ rules that detect unsafe and insecure code and suggests fixes for each of them.