Race Condition in DRF
Table of Contents
Situation
In Django Rest Framework, I have an endpoint in which I like to receive an object via POST and based on three fields of that object, decide if the object is new or already exists. my model is Product
and the three fields are brand
, serial
, and manufacturer
. if it already exists (i.e., an object with those three fields exists) update other fields; and if it does not exist, create it. I have not implemented any unique
constraints in my database based on these three fields. I could do that using unique_together, like this:
class MyModel(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
class Meta:
unique_together = ('first_name', 'last_name',)
At first, I tried writing a very manual code, something like this:
- receive request data, and pass it to the serializer
- check the three fields from
validated_data
- check if a model instance exists with those three values
- if it exists: update other fields
- if does not exist: create a new object with the received data
- finally, return the created or updated object with the proper status code (201 CREATED or 200 OK)
from rest_framework import status
from rest_framework.response import Response
from rest_framework.views import APIView
from .models import Product
from .serializers import ProductSerializer
class ProductView(APIView):
def post(self, request):
serializer = ProductSerializer(data=request.data)
if serializer.is_valid():
# Extracting the values of the fields
brand = serializer.validated_data.get('brand')
serial = serializer.validated_data.get('serial')
manufacturer = serializer.validated_data.get('manufacturer')
# Check if object already exists
existing_instance = Product.objects.filter(
brand=brand,
serial=serial,
manufacturer=manufacturer
).first()
if existing_instance:
# Object exists, update it
serializer_instance = ProductSerializer(existing_instance, data=request.data)
if serializer_instance.is_valid():
serializer_instance.save()
return Response(serializer_instance.data, status=status.HTTP_200_OK)
return Response(serializer_instance.errors, status=status.HTTP_400_BAD_REQUEST)
else:
# Object doesn't exist, create it
serializer.save()
return Response(serializer.data, status=status.HTTP_201_CREATED)
return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)
Problem
I faced race condition. I deployed the code and the client used it but after some time we observed that there were still duplicates, despite setting conditions for updating and creating.
I tried using the update_or_create method from Django ORM too. using something like this:
from rest_framework import status
from rest_framework.response import Response
from rest_framework.views import APIView
from .models import Product
from .serializers import ProductSerializer
class ProductView(APIView):
def post(self, request):
serializer = ProductSerializer(data=request.data)
if serializer.is_valid():
# Extracting the values of the fields
brand = serializer.validated_data.get('brand')
serial = serializer.validated_data.get('serial')
manufacturer = serializer.validated_data.get('manufacturer')
# Try to update the object, or create it if it doesn't exist
instance, created = Product.objects.update_or_create(
brand=brand,
serial=serial,
manufacturer=manufacturer,
defaults=serializer.validated_data
)
if created:
# Object was created
return Response(serializer.data, status=status.HTTP_201_CREATED)
else:
# Object was updated
return Response(serializer.data, status=status.HTTP_200_OK)
return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)
Investigation
I debugged the code but it was working as expected so I suspected that it might be the result of a race condition, where two incoming requests clash.
The system receives two duplicate objects as two simultaneous requests:
- checks the first one
- find no duplicates
- checks the second
- finds no duplicates again
- creates the first object
- creates the second object
Solution
After studying about race condition, atomic transactions, and some relevant Django conventions I figured there is more than one solution to this problem, such as:
- using select_for_update
- locking mechanisms
- rate limiting
- idempotent operation
- concurrency control in front-end
- atomic database transactions - almost similar to the next option
- optimistic concurrency control
I eventually re-wrote my view to look like this:
class ProductViewset(viewsets.ModelViewSet):
queryset = Product.objects.all()
serializer_class = ProductSerializer
def create(self, request, *args, **kwargs):
serializer = self.get_serializer(data=request.data)
serializer.is_valid(raise_exception=True)
serial = serializer.validated_data["serial"]
manufacturer = serializer.validated_data["manufacturer"]
brand = serializer.validated_data["brand"]
# enter a transaction block
with transaction.atomic():
# if it's new, create
if not Product.objects.filter(
serial=serial, manufacturer=manufacturer, brand=brand
).exists():
self.perform_create(serializer)
status_code = status.HTTP_201_CREATED
else: # if it already exists, update
instance = Product.objects.get(
serial=serial, manufacturer=manufacturer, brand=brand
)
# update other fields
for field, value in serializer.validated_data.items():
setattr(instance, field, value)
instance.save()
status_code = status.HTTP_200_OK
return Response(serializer.data, status=status_code)
I did *NOT* implement unique constraints on database level.
Takeaways
- race condition is a possible risk and must be considered.
- there are several solutions on how to handle the condition, choose based on specific requirements.