Can Python do typed JSON serialization?

Creating Python typed JSON serialization library

Orrbenyamini
Dev Genius
Published in
8 min readAug 21, 2022

Foreword

Around 2 years ago my friends asked me to help them with building a platform that streamlines second hand items donation process between individuals, while allowing anonymity for everyone.
Sounded like pretty noble cause, so of course I was happy to help.

They asked me to help them start the project, building the data layer and REST API. I recommended working with Python as a programming language, as they didn’t have much programming experience and I thought it would allow them get going quickly.

I’m not a python developer and I have very little experience working with python.
Actually my main programming language through the years is Java,
so dynamically typed languages are not exactly what I’m used to work with.

When I started writing the data model and DB CRUD operations I expected to find well known solution for serializing class instances to JSON and back, while maintaining the runtime type of the deserialized object, similar to Gson or Jackson in Java.
I was surprised to struggle finding comprehensive solution for this problem.

I decided to take this as a challenge and create a Python JSON typed serialization library — Jsonic (see Jsonic Github repository)

In this article I’ll walk through some of the first attempts I made with typed serialization process in Python, from which Jsonic was created.

Problem definition

Create a Python JSON serialization library that supports serializing concrete object O1 of class C to JSON dictionary representation J, and deserializing J back to concrete object O2 of class C such that O1 === O2,
where O1 === O2 defined as:

For every instance attribute F1 of O1, there is an instance attribute F2 in O2 with the same name as F1 that upholds:

  1. If F1 is of type int or str then F1 == F2
  2. If F1 is a class instance then F1=== F2

For simplicity, we will ignore cases where F1 is a dict, list, tuple, set, complex
or any other type other than int, str (We will support nested classes).

First attempt — Naive approach

First, let’s define a simple class which we will use to test our serialization and deserialization logic:

Figure 1

Our User class has 3 fields: name, id and age, and a method named describe that returns a description of the User as String.
Let’s try simple approach for serializing and deserializing a User instance:

In this solution we are using Python’s json dumps function to serialize the User instance to a JSON String representation and loads function to deserialize the JSON string back to a User instance.

  • It seems json.dumps works pretty well for our use case as its output is well formatted JSON (line 11), but our goal is to serialize User instance to a JSON dict representation and not a JSON string. You can see in line 8 that dumps gives us a str output.
  • json.loads returns an instance of type dict (line 17) of class User, as our goal requires.
  • When trying to call deserialized_user.describe we get an exception as deserialized_user is dict and not an instance of class User and it has no instance method named describe.
  • json.dumps(self, default=lambda o: o.__dict__) the default argument defines function to call when encountering object that cannot be serialized otherwise. we pass lambda o: o.__dict__ , which returns the __dict__ attribute of the object for these cases. We will get back to this argument later.

Our first approach failed, but we can try to leverage Python built in json serialization to accomplish our goal.

Second attempt — deserialization class method

We can leverage python **kwargs capability , which allows your function to take an arbitrary number of keyword arguments,
in combination with ** Dictionary Unpacking to create a User instance from the dict returned from json.loads call

Figure 4

We added 2 methods to our User class:

  • serialize — We use json.dumps to serialize the User instance to JSON String and then json.loads to deserialize it as JSON dictionary. This serialization logic works well for our use case
  • deserialize — This is a class method which gets dict JSON representation of User instance, and constructs new User instance using User class constructor which now accepts arbitrary number of named arguments.
    It passes to the constructor the unpacked dict it got as parameter.
    For dict with values {“name”: “Bob”, “id”: 1, “age”: 18}, the analogous User instance initiation call would be:
User(name: Bob, id: 1, age: 18)

Let’s test our code:

  • We create User instance, and then we serialize it to JSON (line 5), and we do get JSON dict representation and not str as our goal requires (line 8)
  • We then deserialize the JSON dict back to User instance using the deserialize method (line 13), and we do get instance of User class (line 17).
    We call describe method and this time it works as expected (lines 19–20)

Our solution works for our use case, but let’s test another use case with nested Address class inside our User class:

Figure 7

Let’s test our solution for this class

  • deserialized_user is indeed User class instance (line 22)
  • but deserialized_user address attribute is dict and not Address instance (line 25), and therefore we can’t call deserialized_user.address.describe() (lines 27–28)
  • Our solution does not work for nested classes. Let’s try fixing this

When deserializing the User instance we will first deserialize the Address instance explicitly into variable, and pass it to the User class constructor, together with: name, id, age parameters

Figure 10

Let’s test the new deserialize method

  • deserialized_user is instance of class User as it was before (line 8)
  • but now also deserialized_user.address is indeed instance of Address class as required (line 11) and therefore we can call to deserialized_user.address.describe() (lines 16–17)

The serialize method does not really require any state of the class instance and can easily be converted from instance method to function that receives object and serializes it. But the deserialize method does require explicit implementation for each class when there are nested classes.

While this approach works it is easy to see it is not scalable.
We will need to implement the deserialize method for each class we want to be able to deserialize.

Let’s try to gather what we have learnt and try last third attempt to support serialization and deserialization of nested classes, without the need for boiler plate code in each class.

Third attempt — serializing type name and deserializing in recursion

Let’s start from serialization. We are basing our serialization on json.dumps.
json.dumps has an important argument named default which is described in the official docs as follows:

If specified, default should be a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError. If not specified, TypeError is raised.

Until now we passed a function that simply returns the object __dict__ attribute.
We can make better use of the default parameter and pass a custom function that will handle serializing class instances.
Let’s try to create serialize function that gets object and serialize it to a dict like we did before, but adds an attribute to each serialized class instance with the fully qualified class name, so we can initiate instance of the relevant class when deserializing

  • serialize function: Our serialize function is very similar to what we did before, but we don’t pass function that just returns o.__dict__ to json.dumps default parameter , to be called for objects that otherwise cannot be serialized. Instead we pass our _serialize_object function
  • _serialize_object function: This function returns dict that is populated with the object attributes, but it adds one important attribute, which is the Objects class fully qualified name (type_name). This will allow us when deserializing this JSON to understand into what class it should be deserialized into.
  • _full_type_name function: extracting from object its fully qualified type name including its module and class name. We will use this qualifier to get the relevant class and create instance of it of when deserializing

Let’s see what output we are getting when serializing User instance using our new serialize function

We can see that we are getting all the expected fields in our JSON in addition to _serialized_type for both User and its nested Address .
__main__ is the name of the module and Address, User are the class names.

Let’s try to create deserialize function that will take JSON dict created by our serialize function and deserialize it back into class instance

  1. First of all we are validating attribute with name _serialized_type exists, otherwise we don’t know which class to deserialize into. If it is missing we are raising TypeError.
  2. in line 6–7 we are getting the fully qualified type name and then extracting the class object itself using the _get_type_by_name function (i will not get into the details of implementation of this function)
  3. In lines 9–17 we are creating deserialized_dict with all of the objects items, mapping key to deserialized value. We are skipping the _serialized_type item, and for items that are of type dict we do recursive call to deserialize their value. For other types we just use the actual value (remember we are only dealing with int and str items types for simplicity)
  4. line 20 retrieves the signature of our class constructor.
    In lines 19–31 we create initialization dict mapping every of our class constructor argument to it’s deserialized value. we are skipping self argument (lines 23–24), and args and kwargs arguments (lines 25–27).
    If the deserialized values dict is missing value for some constructor argument we are raising AttributeError (lines 29–30)
  5. Lastly in lines 33–35 we are initializing and returning class instance using our cls initiator, passing the initialization dict, which maps constructor argument name to corresponding deserialized value

Let’s test our deserialize method

It works just as we wanted!
deserialized_user is instance of class User (line 5) and deserialized_user.address is instance of class Address (line 8)
In lines 10–11 we can see the output of deserialized_user.describe is as we expected and in lines 13–14 we can deserialized_user.address.describe is also as expected!

Conclusion

We have successfully solved the problem we defined for ourselves.
Our serialize and deserialize function support serializing class instance to JSON and back to newly created instance of the same class, for nested classes with str and int attributes.
It does not required boiler plate code for each class like in our second attempt.

This problem is highly simplified though.
Visit Jsonic Github repository If you want to see how i handled other requirements like:

  • more built in types like list, dict and more
  • Allowing registering custom serializers for specific types, like datetime
  • supporting similar functionality to Java transient field which is a field that should not be part of the serialization
  • Run type deserialization type validation (when deserializing class instance, pass the expected type of the deserialized instance and raise error if it does not match the deserialization result type)
  • Serialization/deserialization process to JSON String instead of JSON dict representation
  • supporting serializing attributes that are not passed to constructor but should be serialized nevertheless, and custom constructor arguments names mapping to corresponding class attributes and

To use Jsonic in your own project:

pip install py-jsonic

feel free to read through Jsonic Github repository and contribute code if you have neat ideas of how to improve Jsonic.

Thanks for reading, see you in the next one!

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response