Serialization in Python

Python Serialization: Learn Python Serialization in Detail with Real Life Examples
Written by Paayi Tech |20-Apr-2019 | 0 Comments | 88 Views

Sometimes we cannot transfer data or store data so that we use it later because data is in a structured form. So, data serialization is a procedure which converts the structured data into a format which can be easily shareable. The second reason for data serialization is to convert the data size to the minimum data size.

In python, there are lots of tools to serialize data. Not only data we can also serialize the code and data structures for later use.

 

Following are some modules which are vastly used for data serialization in python.

Pickle
cPickle
JSON
YAML


In this section, we will see all of these serializers and their pros and cons.

Pickle:

Pickle is one of the first module purely built in python for serialization. With pickle, data is converted into bytes which can then be stored locally and can be shared anywhere moreover sometimes we have to load a big data into program memory, so it is useful to load only once and then store it as a pickle.

Pickle load much faster. Following is the program to pickle the data. In this example, we will take a python dictionary and converted into a pickle.

import pickle
 
data = {‘Pikachu': 'elctric', 'balbasour': 'grass', 'Charizard': 'fire’}
 
s_data = pickle.dumps(data)
print(s_data)

Output

b'x80x03}qx00(Xx07x00x00x00Pikachuqx01Xx07x00x00x00elctricqx02X x00x00x00balbasourqx03Xx05x00x00x00grassqx04X x00x00x00Charizardqx05Xx04x00x00x00fireqx06u.'

 

From the output, we can deduce that it is converted into bytes. The dump () function will dump the data into a byte. To recover the data, we have to use the loads () function as follows:

import pickle
 
data = {‘Pikachu': 'elctric', 'balbasour': 'grass', 'Charizard': 'fire’}
 
s_data = pickle.dumps (data)
r_data = pickle.loads (s_data)
print (r_data)

This method will retrieve the data as it was earlier.

 

Storing the file locally:

We can store the pickle in our local disk for later use by using file handling technique. Following is the code to implement the method:

import pickle
 
data = {‘Pikachu': 'elctric', 'balbasour': 'grass', 'Charizard': 'fire’}
 
with open ('data.pickle', 'wb') as out:
    pickle.dump(data, out)
 
with open ('data.pickle', 'rb') as inFile:
    b = pickle.load(inFile)
 
print(b)

Pros:

  1. Fast
  2. Easy to use
  3. Less memory is required to store.

Cons:

  1. It is good on small data but distort the data of big size as in case machine learning model when pickled like KNN model which store all of the data it distorts the when load.
  2. Cannot be use in different languages

 

Cpickle:

The Cpickle module support serialization and deserialization of the object it is 1000 time faster because unless pickle which is developed in python cpickle is developed in C which makes it more reliable and faster.

In python 3x, if you want to use the cpickle than you have to import it as follows:

import _pickle as cpickle

You don’t have to install cpickle in python3x as it is already installed and ready to use. The functionality is same as of the pickle. Let us look at the code:

import _pickle as cpickle
 
data = {‘Pikachu': 'elctric', 'balbasour': 'grass', 'Charizard': 'fire’}
 
s_data = cpickle.dumps(data)
r_data = cpickle.loads(s_data)
print(r_data)

 

import _pickle as cpickle
 
data = {'Pikachu': 'elctric', 'balbasour': 'grass', 'Charizard': 'fire'}
 
with open ('data.pickle', 'wb') as out:
    cpickle.dump(data, out)
 
with open ('data.pickle', 'rb') as inFile:
    b = cpickle.load(inFile)
 
print(b)

 

The only difference in the code is to change the import statement a convert pickle to cpickle.

Pros:

  1. Fast
  2. Reliable
  3. Do not distort data
  4. Can be un-pickle in other modern languages.

 

JSON:

JSON also is known as JavaScript Object Notation is another way of serialization. JSON serialize the data as it is without any changes. User can see the data when open the file unlike to pickle which is not readable file. Following is the method to serialize the data:

import json
 
data = [1,2,3,4,5, [1,2,3,4,5,6], {1:1,2:2,3:3}]
 
s_data = json.dumps(data)
r_data = json.loads(s_data)
print(s_data)

Pros:

  1. JSON can be deserialized in every programming language.

Cons:

  1. Not reliable for storing the data.

 

YAML:

YAML is another type of serializer. As the JSON serializer stores the data as it is. YAML stores the data in YAML formal which is a little bit different. Is advance than JSON. The data is not forged in it.
Moreover, it works efficiently in speed. It converts the data into YAML swiftly and securely. Following is the method to dump, but before dumping we have to install the module from pip command which is as follow:

sudo pip3 install pyyaml

After installing it in your computer you can run the following program:

import yaml
 
data = [1,2,3,4,5, [1,2,3,4,5,6], {1:1,2:2,3:3}]
 
s_data = yaml.dump(data)
r_data = yaml.load(s_data)
print(r_data)

If you ever saw a YAML file, it would be not a weird thing. The YAML file store data at each line with a hyphen at the start “-”. The output of our program is as follows:

Output

 

- 1

- 2

- 3

- 4

- 5

- [1, 2, 3, 4, 5, 6]

- {1: 1, 2: 2, 3: 3}

At each line we can see the element of the list. And when data is load it will again become the list.

 

We can also save it locally by using the file handling technique as follows:

import yaml
 
data = [1,2,3,4,5, [1,2,3,4,5,6], {1:1,2:2,3:3}]
 
with open('data.yml', 'w') as out:
    yaml.dump(data, out)
 
with open ('data.yml', 'r') as inFile:
    b = yaml.load(inFile)
print(b)

 

Conclusion:

From all of the above serialization method, cpickle and YAML are of the efficient. Cpickle is more efficient than YAML as it is written in C and it also hides the data into bytes which cannot easily understand.

ast.literal_eval ():

It is a python function which converts and execute anything that is written as a string but contains some python code in it. If we have a string of list or dictionary, it can convert it into a python object and even execute functions. Following are some example of this module:

from ast import literal_eval as le
 
print(le("2+2"))

Output

4

 

 

from ast import literal_eval as le
 
ls = "[1,2,3,4,5]"
 
print("Before eval func")
print(type(ls))
 
print("after eval func")
print(type(le(ls)))

Output

Before eval func

 

after eval func

The pickling method and eval method are extensively used in machine learning and data science.





Login/Sign Up

Comments




Related Posts



© Copyright 2019, All Rights Reserved. paayi.com

This site uses cookies. By continuing to use this site or clicking "I Agree", you agree to the use of cookies. Read our cookies policy and privacy statement for more information.