Storing Semi-Structured Data in PostgreSQL

Traditionally, relational databases required all tables to have a fixed data schema, i.e. a set of attributes (such as a “user” table with attributes “firstName”, “lastName” and “email”). You could change the schema by adding and removing attributes, but the available attributes were always know at a given point in time.

For many use cases, this was acceptable. However, some applications have the need to store semi-structured data: entries where the attributes are not known ahead of time.

Since 2012, PostgreSQL allows to store semi-structured data inside tables in the JSON notation:

{
  "ID": "123",
  "Country": "Denmark",
  "Title": "How to store JSON in PostgreSQL"
}

This allows having different sets of attributes for different rows in your table while leaving your table schema constant.

Extracting JSON data into their own columns

But what if you later decide that some of the attributes stored in your JSON column should rather have their own column so you can more easily define foreign keys, triggers and all the other useful things a relational database allows you to?

There is a simple way to extract values from JSON data into their own column.

First, create the new column with the appropriate data type (e.g., VARCHAR):

ALTER TABLE user ADD COLUMN country VARCHAR;

Then run an update statement over your whole table to populate the new column:

UPDATE user SET country = (rawData->>'Country')::VARCHAR WHERE ID >= 0;

This assumes your JSON data was stored in the column “rawData”. It is converted to a VARCHAR with the double-colon operator (::) and stored in the new column “country”.

Bernhard Knasmüller on Software Development