Recently, i found out that serialization object in Django REST Framework it’s very slow. For the size of the framework itsefl to build REST API, this might be a problem and questionable. To be honest, i didn’t really pay much attention for performance in the Django REST Framework itself until i finally read a good article from Haki Benita about why serialization in DRF is so slow.

Object Serialization

Serialization itself can be defined as a method that allow complex data such as querysets and model instances to be converted to native Python datatypes that can then be easily rendered into JSON, XML or other content types. Serializers also provide deserialization, it means allowing parsed data to be converted back into complex types, after first validating the incoming data. The serialization method in DRF work very similarly to Django’s Form and ModelForm classes. The DRF provied a serializer class which gives a powerful, generic way to control the output of the responses, as well as a ModelSerializer class which provides a useful shortcut for creating serializers that deal with model instances and querysets. So, the question itself, where is the problem that causes performance in DRF become very slow? if i may said, this is actually a problem of the DRF itself.

Proxy Model vs. Regular Serialization

Funny things is, there’s a evidence that there has been a comparison between object serialization in Python. If you look at the comparison table on that article, you will find that the performance of DRF serialization itself is at the very last level, reaching 0.78 seconds to serialize many objects and 0.57 seconds to serialize a single object. So why did this happen?

When we use model proxies or use ModelSerializer class, the performance of our own API will decrease. This is because when using ModelSerialization class, every object model in the database will be encapsulated into another method without need to be evaluate for each object model, whatever we declare in the model object, whether in the model object there is a relationship with another model object (like One-on-One relationship) or its only using verbose_name, the object will not be evaluated. So? what’s the difference? the difference lies in the use of the method when encapsulating it to another method. When we’re using ModelSerialization, Django will directly-forcing ti use the lazy() method, which based to the their documentation is : Turn any callable model objects into a lazy evaluated callable. result classes or types is required - at least one is needed so that the automatic forcing of the lazy evaluation code is triggered. Results are not memoized; the function is evaluated on every access.

My first assumptions actually more focus on the section the results are not memoized, for your information, memoization itself is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again. This is very different if we directly use the saving objects instance with the default create() and update() methods. Let’s take a look:

class UserSerializer(serializers.Serializer):
    id = serializers.IntegerField()
    username = serializers.CharField(max_length=254)
    email = serializers.CharField(max_length=254)
 
    def create(self, validated_data):
        return User.objects.create_user(**validated_data)
 
    def update(self, instance, validated_data):
        instance.id = validated_data.get('id', instance.id)
        instance.username = validated_data.get('username', instance.username)
        instance.email = validated_data.get('email', instance.email)
        instance.save()
        return instance

Based on the code, when we use a saving object instance, each object model that we declare in the database model will be evaluated from each of the object models before the model is serialized into an object such as the JSON type. This is very contrast if we directly use the proxy model, because we only need to pass our database model and don’t have to bother writing program code like the one above. The revelation with memoization concept itself lies in the evaluation process and the encapsulation method. When we use a proxy model or ModelSerialization class, we don’t need to create new interpretable functions / method, and our object model will not be cached, if we use a saving object instance, where after we override the create() and update() method, according to the concept of memoize that the two methods will later be stored into a cache and return them to the result cache when the function is called. This in itself can be proven by the results of experiments written by Haki Benita that when using ModelSerialization class, it takes at least 13 seconds to serialize objects of 5000 data, and this is very different when using a saving object instance which only takes about 2 seconds to perform serializing for the same number of objects.

Another Way to Fixing?

There are several other ways to improve performance in DRF itself. Based on my experience, we can do several ways according to different cases :

  • If the object model that we create is very much, likes, is very much, and requires us to use a model proxy, the solution is to add the read_only_fields attribute, so that there will be model objects that don’t need to be executed.
  • If the model object that we create is composed of several dependencies or depends on other model objects, let’s say there is a nested relationship we can use a general solution, we can using the select_related() method and prefetch_related(), besides that, by using these two methods the program that we created will be avoided from N + 1 database problem.
  • Apart from these two cases, we can serialize objects using a saving object instance. In accordance with Tom Christie (author of Django REST Framework), said that “You don’t always need to use serializers. For performance critical views you might consider dropping the serializers entirely and simply use .values() in your database queries”
  • If we get a case where we are still required to use the Django REST Framework, then you can replace the default serialization method from DRF with other packages such as pickle or serpy.
  • If the case we need is more concerned with performance, then in my opinion it would be better not to use Django REST Framework to create an API. There are other frameworks such as FastAPI which are to be said to be the fastest API python frameworks which have comparable performance like using Go or Express (i’ve never tried FastAPI for myself, but after reading their documentation, FastAPI itself already uses asynchronous methods).