During execution, all variables are stored in memory. For example, define a dict:

d = dict(name='Bob', age=20, score=88)
Copy the code

Variables can be changed at any time, such as changing name to ‘Bill’, but as soon as the program ends, the memory occupied by variables is reclaimed by the operating system. If the modified ‘Bill’ is not stored on disk, the variable is initialized to ‘Bob’ again the next time you run the program again.

The process by which a variable is moved from memory to a physical that can be stored or transmitted is called serialization. In Python, a pickling is called serialization, marshalling, flattening, etc. in other languages.

After serialization, the serialized content can be written to disk or transferred over the network to another machine.

Conversely, reading the contents of a variable back into memory from a serialized object is called deserialization, or unpickling.

Python provides the pickle module for serialization. First, we try to serialize an object and write it to a file:

In [1] :import pickle

In [2]: d = dict(name='Bob', age=20, score=88)

In [3]: pickle.dumps(d)
Out[3] :b'\x80\x04\x95$\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x04name\x94\x8c\x03Bob\x94\x8c\x03age\x94K\x14\x8c\x05score\x94KXu . '
Copy the code

The pickle.dumps() method serializes any object into a bytes, which can then be written to a file. Or use another method pickle.dump() to serialize the Object directly to a file-like Object:

In [5]: f = open('dump.txt'.'wb')

In [6]: d = dict(name='Bob', age=20, score=88)

In [7]: pickle.dump(d, f)

In [8]: f.close()
Copy the code

Look at what’s writtendump.txtFiles, a bunch of messy stuff, that’s the internal information that Python keeps for objects.When we want to read an object from disk to memory, we can first read the content to a bytes, and then usepickle.loads()The deserialize () method can also be used directlypickle.load()The deserialize () method directly deserializes the Object from a file-like Object. Let’s open another Python command line to deserialize the object we just saved:

In [23]: f = open('dump.txt'.'rb')

In [24]: d = pickle.load(f)

In [25]: f.close()

In [26]: d
Out[26] : {'name': 'Bob'.'age': 20.'score': 88}
Copy the code

Variable content is back!

Of course, this variable and the original variable are completely unrelated objects, they just have the same content.

The problem with Pickle, as with serialization specific to all other programming languages, is that it works only in Python and may not be compatible with each other in different versions of Python, so you can only use Pickle to store data that is not important, even if it can’t be successfully deserialized.

JSON

If we want to pass objects between different programming languages, we have to serialize them to a standard format, such as XML, but a better way is to serialize them to JSON, because JSON represents a string that can be read by all languages and easily stored to disk or transmitted over a network. JSON is not only a standard format, it’s faster than XML, and you can easily read it directly from a Web page.

JSON objects are standard JavaScript objects. JSON and Python’s built-in data types correspond as follows:

JSON type Python types
{} dict
[] list
“string” str
1234.56 Int or float
true/false True/False
null None

Python’s built-in JSON module provides a very complete conversion of Python objects to JSON formats. Let’s first look at how to turn a Python object into a JSON:

In [27] :import json

In [28]: d = dict(name='Bob', age=20, score=88)

In [29]: json.dumps(d)
Out[29] :'{"name": "Bob", "age": 20, "score": 88}'

In [30] :type(json.dumps(d))
Out[30] :str
Copy the code

The dumps() method returns a STR, which is standard JSON. Similarly, the dump() method writes JSON directly to a file-like Object.

Loads () to deserialize JSON into Python objects, loads() or loads() reads from file-like objects and deserializes them:

In [31]: json_str = '{"age": 20, "score": 88, "name": "Bob"}'

In [32]: json.loads(json_str)
Out[32] : {'age': 20.'score': 88.'name': 'Bob'}

In [33] :type(json.loads(json_str))
Out[33] :dict
Copy the code

Since the JSON standard dictates that the JSON encoding is UTF-8, we can always correctly convert between Python STR and JSON string.

JSON advanced

Python dict objects can be serialized directly to JSON {}, but most of the time we prefer class. Represents an object, such as defining the Student class and serializing it:

import json

class Student(object) :
    def __init__(self, name, age, score) :
        self.name = name
        self.age = age
        self.score = score

s = Student('Bob'.20.88)
print(json.dumps(s))
Copy the code

Run the code and get a TypeError without mercy:

Traceback (most recent call last):
  ...
TypeError: Object of type Student is not JSON serializable
Copy the code

The reason for the error is that the Student object is not a serializable JSON object.

If you can’t serialize an instance of class to JSON, it doesn’t make sense!

Don’t worry, we take a closer look at dumps () method of the parameter list, can be found, in addition to the first must obj parameters, dumps () method also provides a lot of optional parameters: docs.python.org/3/library/j…

These optional parameters let us customize the JSON serialization. The previous code failed to serialize the Student class instance to JSON because, by default, the dumps() method doesn’t know how to turn the Student instance into a JSON {} object.

The optional argument default is to convert any object into a seriable JSON object. We just need to write a conversion function for Student and pass it in:

In [40]: s.name
Out[40] :'Bob'

In [41]: s.age
Out[41] :20

In [42]: s.score
Out[42] :88
Copy the code
def student2dict(std) :
    return {
        'name': std.name,
        'age': std.age,
        'score': std.score
    }
Copy the code

Thus, the Student instance is first converted to dict by student2dict() and then smoothly serialized to JSON:

print(json.dumps(s, default=student2dict))
Copy the code

However, the next time you encounter an instance of the Teacher class, you can’t serialize it to JSON. We can also write another function, but we can be lazy and turn any instance of class into dict:

print(json.dumps(s, default=lambda obj: obj.__dict__))
Copy the code

Because instances of class usually have a __dict__ attribute, it is a dict that stores instance variables. There are a few exceptions, such as classes that define __slots__.

In the same way, if we were deserializing JSON to an instance of Student, loads() would first convert a dict object, and then the object_hook function we pass in converts the dict to an instance of Student:

def dict2student(d) :
    return Student(d['name'], d['age'], d['score'])
Copy the code

The running results are as follows:

In [48]: json_str = '{"age": 20, "score": 88, "name": "Bob"}'

In [49] :def dict2student(d) :. :return Student(d['name'], d['age'], d['score'])
    ...:

In [50] :print(json.loads(json_str, object_hook=dict2student))
<__main__.Student object at 0x1065c6f70>
Copy the code

The print is the deserialized Student instance object.

practice

Json.dumps () provides an ensure_ASCII parameter when serializing Chinese JSON to see how it affects the result:

import json

obj = dict(name='Ming', age=20)
s = json.dumps(obj, ensure_ascii=True)
print(s)
Copy the code

summary

The Python-specific serialization module is pickle, but if you want to make serialization more generic and Web standards-compliant, you can use the JSON module.

The dumps() and loads() functions of json modules are examples of well-defined interfaces. When we use it, we only need to pass in a required argument. However, when the default serialization or de-serialization mechanism does not meet our requirements, we can pass in more parameters to customize the serialization or de-serialization rules, which not only makes the interface easy to use, but also achieves full scalability and flexibility.