Decoding Django Sessions in PostgreSQL

When solving a problem that requires you to link a user’s session data to their actual user object, Postgres comes in handy.

Decoding Django Sessions in PostgreSQL

Sessions are an important part of any HTTP-based web framework. They allow web servers to keep track of the identities of repeat HTTP clients without requiring them to re-authenticate for each request. There are several different ways to keep track of sessions. Some do not require the server to persist session data (like JSON Web Tokens), while others do.

Django, a popular Python-based web framework, ships with a default session backend that does store persistent session data. There are several storage and caching options; you can elect to simply store sessions in the SQL database and look them up each time, store them in a cache like Redis or Memcached, or use both, with the caching engine set up in front of the database store. If you use one of the options that ultimately stores sessions in SQL the table will contain your user's sessions.

Screenshots in this post come from Arctype.

Session Schema

When perusing through your application’s data, you may come across a problem that requires you to link a user’s session data to their actual User entry (the table). This happened to me recently and when I took a look at the session table's schema definition, I was surprised that the is not stored as a column. There are important design decisions as to why this is the case, but it is inconvenient to SQL'ers like myself.

Decoding Django Sessions in PostgreSQL
No auth_user id.

is the key that clients are provided. Generally, clients making a request will include the as part of a cookie. When the web server receives the request, it finds , if it exists, then queries to see if the key is known. If it is, it will then look at the correlated and retrieve metadata about the user and their session.

This is how you are able to access something like in a Django request. The is fetched from the decoded , the built-in User object is populated based on the stored , and then the User object is available for use throughout the project's views.

Some quick Googling showed me that by default session data is stored as JSON. I was already aware of Postgres’ excellent JSON abilities (if you are not, check out this blog post), so I suspected this was something we could work with within the bounds of Postgres. This was great news for someone like myself who spends a lot of time in Postgres.

Building the Query

First Look

Decoding Django Sessions in PostgreSQL

As you can see in the first image, session_data doesn’t appear to be JSON. The metadata stored as JSON is hiding behind base64 encoding. Fortunately, we can easily base64 decode in Postgres.

Decoding from Base64

Decoding Django Sessions in PostgreSQL

This is hardly more readable. We need to convert from binary data into text.

Encoding to Text

The “encode” function in Postgres allows you to “Encode binary data into a textual representation.”

Decoding Django Sessions in PostgreSQL
Screenshot from Arctype.

Now, we can finally see something human readable. Here is one of the full decoded results in text format:

11fcbb0d460fd406e83b60ae082991818a1321a4:{"_auth_user_hash":"39308b9542b9305fc038d28a51088905e14246a1","_auth_user_backend":"x.alternate_auth.Backend","_auth_user_id":"52135"}

Extracting JSON

What we have here is a JSON blob prefaced by a colon and a hash of sorts. We’re only interested in the JSON blob. A quick way to extract just the text past the hash and colon is to find the position of the first colon and extract all characters after it.

To accomplish this, we can utilize both the function, which returns n characters at the end of a and the function, which returns the position of a character in a string. will only return the position of the first instance of the string for which you are searching.

The function accepts a negative index. A negative index extracts characters from the right side of the string EXCEPT the characters indicated by the negative index.

To further construct this query, we are going to break it out into 2 parts using CTEs. CTEs are helpful when you have built and selected a nontrivial column and you need to use it more than once. If we continued with only one , we'd have to type the part multiple times. This is messy and if you decide to change how you wanted to parse the encoded data, you'd have to change the function calls in 2 places.

Here is our updated query which extracts the JSON part.

Decoding Django Sessions in PostgreSQL

Full result example:

{"_auth_user_hash":"396db3c0f4ba3d35b350a","_auth_user_backend":"x.alternate_auth.Backend","_auth_user_id":"52646"}

JSON Validation

Now that the column is parsable as JSON we can continue. However, if you try to cast text to JSON in Postgres when the text is not valid JSON, Postgres will throw an error and stop your query. In my database, some of the sessions were not parseable JSON. Here’s a way to quickly make sure the text looks like parseable JSON.

where
substring(decoded, position(':' in decoded) + 1, 1) = '{'
and right(decoded, 1) = '}'`

Any string that does not begin and end with curly braces will be filtered out.

This does not guarantee it will be able to parse, but for my database of several million sessions it did the job. You could write a custom Postgres function to verify JSON parsability, but it would be slower.

JSON Casting

With a clause to exclude invalid session metadata, it's time to cast our string to Postgres' JSON type and extract the key from the JSON. Depending on your Django configuration, this key could be different. Once an object is cast to JSON type, you can query a JSON value by key using the syntax.

Decoding Django Sessions in PostgreSQL
Screenshot from Arctype.

String Cleanup

We’re getting close! When casting from JSON to , Postgres adds double quotes around it. Ultimately we want the user_id field to be an , but Postgres will not parse a string that includes double quotes into an . Even JavaScript won't allow that!

The function with the will strip the specified character from both ends of a string, leaving us with a clean string that can easily be cast into an integer.

Final Query

Here is our final query after trimming the excess double quotes and casting to .

Decoding Django Sessions in PostgreSQL

Now, as the sample result shows, we have linked to Django id.

Here is the full query in copyable form:

with step1 as (
select
session_key,
encode(decode(session_data, 'base64'), 'escape') :: text as decoded
from
django_session
)
select
session_key,
trim(
both '"'
from
(
right(
decoded,
0 - position(':' in decoded)
) :: json -> '_auth_user_id'
) :: text
) :: int as user_id
from
step1
where
substring(decoded, position(':' in decoded) + 1, 1) = '{'
and right(decoded, 1) = '}'

Using a Materialized View for Quick Querying

If your database has a lot of users you’ll notice this query is very slow. Making a materialized view will allow you to repeatedly query the result from a persistent view without re-running the SQL.

When you create the materialized view (and anytime you refresh it), the source code for the view will be run and it will be populated with rows from the result. Be sure to refresh the view when you need up-to-date data!

create materialized view mv_django_session_user as
with step1 as (

To refresh:

refresh materialized view mv_django_session_user;

Summary

Encoding and string manipulation in Postgres is a little more tedious than it would be in common languages used for web applications like Python, Ruby, or PHP, but it is very satisfying to build a view entirely in Postgres that quickly extracts the exact data you want and allows you to directly join to other tables.

The next time you need to extract data encoded by a web framework or another 3rd-party, check Postgres for the answer!

Screenshots in this post come from Arctype.

The modern SQL editor built for collaboration www.arctype.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store