So, during the past week, i often find some issues from users that when the user tries to apply for a job, especially a job for the Internship category, the applicant from that user will automatically be rejected. Afterwards, i tried to reproduce the issue by checking whether the application was rejected or not, the user already appliead similar job or not, checking the API response or because the posted job has expired. Nonetheless, i’ve rarely found the rootcause. So, i tried digging some more to reproduce the issue and i found that piqued my interest that in system, especially in the database, i found a phenomenon called dirty-read in SQL. To be honest, this is my first experience finding something like this. In general, this dirty-read phenomenon will occur when a system or application that has been scaled-out horizontally (adding servers or spec instances) and will occur if there is a process from our code (take the example of adapters for interacting with databases), that is executed will run and overwrite the other program code. This kind of thing shouldn’t need to happen if in the database, we have created a “rule” which in a manner acts as Atomic transactions.

The Issue

Let’s say, we still take the issues that have been previously mentioned before. There is 1 model in which there are several functions to process rejected (0. it means the value will be remain / increase), apply (1. it means the point/value will be decrease), hired (1) and shortlisted (1). Now, let’s try to apply the user problem that was automatically rejected earlier when trying to apply for a job.

Let’s say there are some limitations for the job (1 job only has a maximum quota of only 2 shortlisted/hired), then i tried to write down the flow and details how this issue can occur at the same time (3 users apply simultaneously on same job):

  • Jobsekeer A tried to apply to a job, and he was shortlisted. The quota of the job is reduced by 1, so now there is 1 quota left
  • Then Jobseeker B tried to apply to the Job, and he was included in the rejected category. The quota of the job has been reduced to 1, so now there are 0 quotas left
  • And Jobseeker C tries to apply to a job, and when he applies for a job he will be automatically rejected because the quota has been maxed out

Then what happen? The expectation for the quota for the job should be still 1 quota, but this condition is included in the stale category and runs out because in the second step the user who should not reduce the quota is actually reduced. This condition can be call as Dirty-read like as said earlier. Well, actually in our system we already use Queue-based priority (something like, we calculated user profile complenetess and take the user onto high-low priority for matches between jobseeker-employer), but the case here is that there are 6 candidates with high priority and all of them applied for the same job at roughly the same time. The queue itself is actually quite helpful, considering that the system only processes 6 users, so we can roll back the shortlisted/rejected candidate process using the Queue-based priority again or doing manually by the Employer. But what if there are 100 users who request simultaneously? using Queue alone will take a long request process, and this is where Distributed Locking in the code and design database is needed.

First Solution

Here I’m trying to apply it to django models. As far i know from django db models, by default all models defined in django don’t locks on the database (cmiw). For that, the safest first approach and i’ve implemented it several times is to use the transaction.atomic method in each particular function that performs committed process such as: updated queries in the database, inserting or deleting. By implementing something like this, as far i understand (correct me if I’m wrong) that there are certain possibilities is that its safe for all objects to be updated to not overwrite or collided with other objects. That is, if object A is working, its condition will automatically be in exclusive locks until the update process is complete, so that other objects will not affect the object A. This can be seen from the following code:

from django.db import transaction
 
def filter_user_status(request, id: int) -> Dict[int, int]:
    with transaction.atomic():
        qs = User.objects.get(id=id)
        if qs.status == "hired":
            qs.status -= limit
            qs.save()
        elif qs.status == "rejected":
            qs.status = limit
            qs.save()
        ...
    return render(request, ..., {"qs": qs})

With such conditions, we can take preventive action so that all objects in the “hired” or “rejected” category doesn’t interfere with other process. By using the transaction.atomic method, we instruct and dictate the database that every time we retrieve the user by their id’s, we are instructed to lock first for the “hired” state until the operation is complete, which will be continued with the next state. In this way, this issue can be avoided. However, this first solution of course has its own trade-offs when it comes to an application of a large enough number of requests or users. Among others are:

  • There will be useless and unecessary workers operations (2 operations will run at once, for the “rejected” status it will continue to run while waiting for the “hired” status operation to finish), because we do locks as a whole function. And this of course will have an impact on the system resources itself
  • Performance degradation. If we look at the code above, the lock operation is carried out as a whole in one specific function scope.
  • There is no savepoint or rollback if there is an error when committing lock. And this will not be known, where is the point of failure

This approach to solution 1 can be fairly safe if there are not so many incoming transactions / requests. You could say this method is quite safe for our database and this solution is called “Pessimistic Locking”

Second Approach

Based on 2 articles that i’ve read, “Optimistic Locking” in django can be done by using the select_for_update() method and also adding a new field or entity in our model database which is useful as a savepoint or rollback point if something goes wrong or the program catch the existing Exception.

from django.db import transaction
 
# using new attribute named `version`
def filter_user_status(self, request, id: int) -> int:
    with transaction.atomic():
        qs = User.objects.get(id=id)
        if qs.status == "hired":
            qs.status -= limit
            qs.version = self.version
            qs.save()
        else:
            pass
        ...
    return qs
 
# or using select_for_update() method
def filter_user_status(request, id: int) -> int:
    with transaction.atomic():
        qs = User.objects.select_for_update().get(id=id)
        if qs.status == "hired":
            qs.status -= limit
            qs.save()
        else:
            pass
        ...
    return qs

Some of the points that i’ve gained from this latest method are,

  • By using the select_for_update() method, the status objects for “hired” or “rejected” no longer need to wait alternately for each other, for which operation will finish first. Thus, this method can eliminate the problem of unecessary workers operations.
  • Actually, transaction processing like rollback, autocommit state are not allowed when using the transaction.atomic method and it seems occured to be built-in from Django APIs (i’ve also barely known about this issue), because when i try to do this, i’ve got something like a try-catch error message. This can happen because when there is a change and when we want the rollback process, the previous transaction will not be committed or recorded in the database. For that we can do it by adding a new entity called version which is arguably (probably) a safe version of the rollback to a certain commit. This versioning also indicates, if an object that has been “hired” has been recorded, then other objects don’t need to overwrite or wait for the object.

Alternative Solution

The two ways that we do it can be called “exclusive locking”, because we are trying to make independent isolation of certain functions in our system. Django itself recommends using this method even though the trade-off is quite large enoguh because we also need to pay attention to which queryset functions we need to modify and what abstraction from each code that should be executed. In addition to these two ways, we can do consideration in other ways including:

  • Implementation of custom global locks in our session database based on the database that we’re using. For this one, eventually in the past i’ve created some custom method for operating this solution:
class TransactionAtomic(Atomic):
    # custom method that inherit from django atomic
    def __init__(self, *models, using=None, savepoint=None) -> None:
        if self.using is None:
            self.using = DEFAULT_DB_ALIAS
        super().__init__(self, savepoint)
        self.models = models
 
    def __enter__(self):
        super(TransactionAtomic, self).__enter__()
        # sqlite3 doesn't support row locking, just sayin`
        if settings.DATABASES[self.using]["ENGINE"] != "django.db.backends.sqlite3":
            cursor = None
            try:
                cursor = get_connection(self.using).cursor()
                for model in self.models:
                    # lock the entire database transaction
                    cursor.execute(f"LOCK TABLE {model._meta.db_table}")
            finally:
                if cursor and not cursor.closed:
                    cursor.close()

And it’s actually surprised me is that this method is not recommended by Django if there are quite a lot of incoming APIs requests (causing performance degradation).

  • Avoid race-condition in low-level state. By all means, when we’re gonna write some models for database, avoiding concurrent entities (i call this as “shared entity”, an entity that which is used to operate simultaneously at the same time) that it might be become blocker when the state changes
  • Using cache. Whether it’s redis, memcached or whatever, it suits your needs (particulary, i dont pretty well known about this method). When the API calls / request to the system grows bigger, i think we need separation of concern to store repetitively object
  • Implementing the built-in features provided by each database. In postgres and mysql, there is already a feature or method for locking at the row level query
  • Better understanding of using the try-catch-except mechanism. To be honest, i also don’t really understand this (loll), whether it means preventive actions in the program code without using the 2 solutions described above or waiting until each lock on the object completes each operation or using the retry-timeout mechanism.

Thank you for taking the time to read!