In this post I’ll be writing about the database model and also show some examples of the types of operations we’ll be doing on our JSON data. This post is the second part of a series, so you will probably want to read the previous post first.

The database schema

Here’s an overview of how our database schema looks like:

Database schema

The schema has been simplified in order to make for simpler examples. In our application we have users, which in turn have 0 or more clients. Each client will have 0 or more forms that the user will be filling out for them. Here’s a script you can use to generate the tables.

For our purposes, the interesting part is the data column of the forms table. That column will store the entire form. The form JSON document looks like this:

{
    "Id": "Some guid. Generated by our software. More on this later.",
    "ExpenseAmount": 500,
    "Company": {
        "Id": 1,
        "FullName": "Some name",
        "AccountNumber": "Some account number"
    },
    "Printed": false,
    "Filed": false,
    "FiledDate": null,
    "Filer": {
        "FilerName": "Filer name",
        "License": "Some license number",
        "CompanyName": "Company whose filing this in behalf of client"
    }
};

This is of course a fake schema that will probably make no sense as a form or application, but the basic functionality I want to go over is there. Keep in mind, though, that as of Postgres 9.3 the JSON datatype is missing many operators, so you will need to cast your values from text.

The fun stuff

Before moving forward, I’d advise you see the documentation about JSON operators, specifically Table 9-39. It’ll help out making sense of what the -> and ->> operators do and the subtle difference between them. With that in mind, let’s dive into some cool things we can do with JSON fields:

-- Get all forms that have already been printed out
select * from forms where (data->>'Printed')::bool = true;

Lets go over the query. First, we use data->>'Printed' to retrieve the value of Printed as text. Since the JSON data type is missing many operators, we can’t just simply do data->'Printed' = true as it will fail to cast that JSON object to bool. To work around this, we use the ::bool operator to cast the value of the property as text to a boolean . In Postgres we do casts by appending ::<type> to a value. In our case we want to cast the text value returned by data->>'Printed' to a bool. Note that we wrap the entire data->>'Printed' part in parentheses to avoid Postgres from mistakingly casting the text 'Printed' to bool. You will see this pattern repeat itself in all of the following examples.

Other things we can do:

-- Get all forms that were filed before a date 
-- (our dates are serialized in this format: "2012-12-31T16:54:11")
select * from forms where 
to_timestamp(data->>'FiledDate', 'YYYY-MM-DD""T""HH24:MI:SS.MS""-04:00""') < '2014-01-01'; 

Here we cast our FiledDate to a timestamp using the pattern in which our dates are stored. This is quite handy, as different JSON libraries serialize dates differently. For more information on to_timestamp check the official docs.

-- Check that a field isn't null or empty string 
select * from forms where nullif(data->'Company'->>'FullName', '') is not null; 

Note that in this example we use -> to get the Company property and then we retrieve the text value of it’s FullName property by using ->>. The -> operator returns the JSON object (Company), which then allows us to use the ->> operator to retrieve the text value of its FullName property. Also note that we’re using the nullif function to make sure we treat empty strings as nulls, making our comparison far easier to write and read. For more information on nullif check the official docs.

We can also do standard text comparisons. Note that =, LIKE, and other text comparisons are case sensitive in Postgres. For a case-insensitive comparison either use tolower on both values or use ILIKE. For example:

-- Get all forms that belong to clients whose AccountNumber begins with 'Z1'
select * from forms where data->'Company'->>'AccountNumber' LIKE 'Z1%';

-- Get all forms that belong to clients whose name begins with 'A' or 'a'
select * from forms where data->'Company'->>'FullName' ILIKE 'A%';

We can also do numerical comparisons, as long as we provide the correct casts. In the following example we have to cast to int in order to filter (note the parentheses wrapping the expression):

-- Get all forms for clients whose Expenses are over $100
select * from forms where (data->>'ExpenseAmount')::int > 100;

Maybe you want to generate a chart with daily filing activity. Personally, when I realized I could do this, it blew my mind. Yes, we’re grouping by a value inside a JSON object:

select to_date(data->>'FiledDate', 'YYYY-MM-DD""T""HH24:MI:SS.MS""-04:00""') as point, 
       count(id) as amount
from forms 
where nullif(data->>'FiledDate', '') is not null
group by to_date(data->>'FiledDate', 'YYYY-MM-DD""T""HH24:MI:SS.MS""-04:00""');

I’m using to_date in this example because our fictional chart only needs the date part without time information. For more information on to_date check the official docs.

Maybe you need to join tables against JSON data. Postgres’ got you covered as well. This also blew my mind the first time I did it:

-- Super contrived example, but you get the idea.
select * 
from 
    client inner join 
    forms on 
        client.id = (forms.data->'Company'->>'Id')::int

Limitations

One thing that you can’t do yet using regular queries is updating values within a JSON document. In our case, we don’t need to do this because our application will generate a new JSON document each time, so we update the entire document in one go. For example, if we wanted to update the Filed property, we can’t do:

update forms set data->'Filed' = true;

Instead we have to provide the entire JSON document to the query and cast it to JSON this way:

update forms set data = '{
    "Id": "f2fef415-0d91-42cf-af2f-3ab29eb3d44b"
    "ExpenseAmount": 500,
    "Company": {
        "Id": 1,
        "FullName": "Fake Company",
        "AccountNumber": "Z12345"
    },
    "Printed": false,
    "Filed": true,
    "FiledDate": "2014-03-04T09:33:36",
    "Filer": {
        "FilerName": "Employee McEmployerson",
        "License": "AYX0129",
        "CompanyName": "Filing, Ltd."
    }
}'::json where id = 1;

Inserting data is done in the same fasion.

insert into forms (client, data) values 
  (1, '{
    "Id": "f2fef415-0d91-42cf-af2f-3ab29eb3d44b"
    "ExpenseAmount": 500,
    "Company": {
        "Id": 1,
        "FullName": "Fake Company",
        "AccountNumber": "Z12345"
    },
    "Printed": false,
    "Filed": true,
    "FiledDate": "2014-03-04T09:33:36",
    "Filer": {
        "FilerName": "Employee McEmployerson",
        "License": "AYX0129",
        "CompanyName": "Filing, Ltd."
    }
}'::json);

Wrapping up

Hopefully this gives you an idea of how we can use regular ol’ sql queries to retrieve JSON data. There’s some functionality missing from the JSON data type, no doubt. Don’t let this discourage you, though. The JSON data type is evolving and there are changes planned for the 9.4 release to add operators and other great stuff to it. For those of you concerned about performance, you can use expression indexes to index your JSON fields. If I can find the time, I’ll add a post at the end of the series with some numbers on this (I make no promises, though).

In the next part of this series I’ll show you how I implemented the data access layer on a C# application. Stay tuned.


Here are the links to all the parts of this series: