Sometimes we cannot transfer data or store data so that we use it later because data is in a structured form. So, data serialization is a procedure which converts the structured data into a format which can be easily shareable. The second reason for data serialization is to convert the data size to the minimum data size.
In python, there are lots of tools to serialize data. Not only data we can also serialize the code and data structures for later use.
Following are some modules which are vastly used for data serialization in python.
In this section, we will see all of these serializers and their pros and cons.
Pickle is one of the first module purely built in python for serialization. With pickle, data is converted into bytes which can then be stored locally and can be shared anywhere moreover sometimes we have to load a big data into program memory, so it is useful to load only once and then store it as a pickle.
Pickle load much faster. Following is the program to pickle the data. In this example, we will take a python dictionary and converted into a pickle.
From the output, we can deduce that it is converted into bytes. The dump () function will dump the data into a byte. To recover the data, we have to use the loads () function as follows:
This method will retrieve the data as it was earlier.
Storing the file locally:
We can store the pickle in our local disk for later use by using file handling technique. Following is the code to implement the method:
- Easy to use
- Less memory is required to store.
- It is good on small data but distort the data of big size as in case machine learning model when pickled like KNN model which store all of the data it distorts the when load.
- Cannot be use in different languages
The Cpickle module support serialization and deserialization of the object it is 1000 time faster because unless pickle which is developed in python cpickle is developed in C which makes it more reliable and faster.
In python 3x, if you want to use the cpickle than you have to import it as follows:
You don’t have to install cpickle in python3x as it is already installed and ready to use. The functionality is same as of the pickle. Let us look at the code:
The only difference in the code is to change the import statement a convert pickle to cpickle.
- Do not distort data
- Can be un-pickle in other modern languages.
- JSON can be deserialized in every programming language.
- Not reliable for storing the data.
YAML is another type of serializer. As the JSON serializer stores the data as it is. YAML stores the data in YAML formal which is a little bit different. Is advance than JSON. The data is not forged in it.
Moreover, it works efficiently in speed. It converts the data into YAML swiftly and securely. Following is the method to dump, but before dumping we have to install the module from pip command which is as follow:
After installing it in your computer you can run the following program:
If you ever saw a YAML file, it would be not a weird thing. The YAML file store data at each line with a hyphen at the start “-”. The output of our program is as follows:
At each line we can see the element of the list. And when data is load it will again become the list.
We can also save it locally by using the file handling technique as follows:
From all of the above serialization method, cpickle and YAML are of the efficient. Cpickle is more efficient than YAML as it is written in C and it also hides the data into bytes which cannot easily understand.
It is a python function which converts and execute anything that is written as a string but contains some python code in it. If we have a string of list or dictionary, it can convert it into a python object and even execute functions. Following are some example of this module:
The pickling method and eval method are extensively used in machine learning and data science.