The strange case of the MySQL server has gone away

A while ago, I had to track down the root cause for a strange issue observed in a Django-based service that I was involved with: our database operations were systematically failing when our service had been inactive for a long period. Whenever this happened, the following message appeared in the logs:

OperationalError: (2006, 'MySQL server has gone away')

Afterwards, the service would no longer function properly until it was manually restarted.

After a bit of Internet research, I gathered that this issue was most likely caused by the reuse of a persistent database connection which had timed out at the server side. This happens when the connection is inactive for a period longer than the value set for wait_timeout. However, as we were not setting a value for CONN_MAX_AGE in our Django configuration, we should not even being reusing a database connection in the first place—the default value for the CONN_MAX_AGE setting is 0, in which case the database connections should not persist.

After digging into Django’s codebase, I realized that the function to close the old database connections was only called at the start and at the end of each HTTP request handling:

1
2
signals.request_started.connect(close_old_connections)
signals.request_finished.connect(close_old_connections)

This was the gist of the problem. Our service was a simple Django management command which consumed requests from a Kafka queue. As it never received any HTTP traffic, those signals would never be triggered and the old database connection would never be closed. This caused Django to continuously reuse the same connection in the subsequent database operations, even when it had possibly timed out due inactivity.

The fix was to manually call the function to close old database connections immediately after receiving a new request from Kafka and before doing any database operation. This will ensure that the next time a database operation is performed, a new database connection will be used if its age is older than CONN_MAX_AGE.

1
2
3
4
5
6
7
import django.db

msg = kafka_consumer.poll()

db.close_old_connections()

MyModel.objects.get()

No comments