JSON and Relational Databases – Part One

JSON and Relational Databases

JSON and Relational DatabasesJSON data and relational databases have traditionally been diametric opposites. One is very free form with minimal rules on the formatting and the other is very strict and formal. So many folks were surprised when relational databases started adding a JSON data type to their products.

In this multi-part blog series, we will start with an overview of using JSON in a relational database. Then we will dig deeper into the issues raised to see what, if any, progress has been made in resolving the possible conflicts and issues. It is obvious that neither RDMS nor JSON are going away but can they coexist?

Why JSON?

JSON is popular. Very popular. It is the interchange format of choice for many developers, is easy to parse, and is ridiculously flexible. For some, the ability to encapsulate a JSON document in a row is a necessity. Others have found that JSON allows some schema flexibility or ‘fudge factor’ in situations where database schema changes are expensive.

Unstructured data has a lot of appeal in environments where defining the data itself, especially at the beginning of a project, is hard to do. Or, there are too many known unknowns. This approach to never-ending incrementalism is not optimal for database operations.

The Achilles Heel of unstructured data is that it is hard to apply rigor to the data. Making a key/value pair a required item or of a certain data type or format is not part of the JSON standard. And it is easy for a team to decide to record electronic mail addresses with inconsistent keys such as e-mail, EMail, or email as there is no ‘native’ enforcement mechanism.

Why not JSON?

Structured data has a lot going for it and the relational model works well with most business models. JSON breaks normalization rules.

In the structured world of relational databases, all columns are carefully thought out, typed, and arranged. So the idea of a free-form column of unstructured does not sit well with long-time database practices.

With a traditional relational database, you know that a column of integers is going to be nothing but integers. This makes searching that column of integers very easy if it is indexed. And if you try to put in a string to that column of integers the server will refuse to accept that row of data. It is a lot less expensive to keep out bad data than having to go back later and clean it up.

But altering tables to add or qualify columns can be slow, expensive, and unsettling. The alter table process can be thought of as the server making a copy of the existing table (goodbye memory and/or disk during the process), making the change on the new version of the table, and then loading the data into the new table. The actual process has a lot more intricacies than that and of course, larger tables take more time to alter. Yup, the free-form JSON stuff looks a lot more attractive now, doesn’t it?

But you do not need a JSON data type!

Before JSON data types were introduced, it was possible to save a JSON document in a text string. It was very simple to put the entire document in a column and be done with it. Sadly this approach made it hard to search for data embedded in the document. Regular Expression searches are not exactly speedy. Changing one part of the JSON document meant the whole document had to be rewritten. And usually, you had to pull the entire document out of the table and parse it in an application. This approach works but not well.

JSON with relational

So all but the most devoted to the relational model can see the benefits of having a JSON column, or two, in a row of a table. In MySQL, you get one gigabyte of space for each JSON column – nothing to be sneezed at.

Both the PostgreSQL and the MySQL implementations of JSON data types provide some extra benefits. First, you do not have to update the entire document when you make minor changes – just the changed bits. The server has many functions to extract data efficiently on the server itself.

There is a performance hit when searching JSON data (see the part on indexing below) but the extra flexibility is handy. For example, look how MySQL stores Dual Passwords in the mysql.user tables.

{"additional_password": "$A$005$;H7\u001b\u001b\u0006<`qFRU\tNRxT Z\u0003Ya/iej8Az8LoXGTv.dtf9K3RdJuaLFtXZHBs3/DntG2"}

In this case, you have a temporary second password on a user account so you do not need a column in the table to take care of this data. This information is around only for the accounts that need it. JSON_TABLE() is a powerful command! Sadly, PostgreSQL 15 withdrew its JSON_TABLE() at the last minute but hopefully, it will be in version 16. Simply stated, JSON_TABLE() temporarily converts the unstructured JSON data into structured data so it can be processed with Structured Query Language (SQL). This allows you to use CTEs, Window Functions, or aggregate functions on that data.

mysql> select country_name, IndyYear from countryinfo,
json_table(doc,"$" columns (country_name char(20) path "$.Name",
IndyYear int path "$.IndepYear")) as stuff
where IndyYear > 1992;
+----------------+----------+
| country_name | IndyYear |
+----------------+----------+
| Czech Republic | 1993 |
| Eritrea | 1993 |
| Palau | 1994 |
| Slovakia | 1993 |
+----------------+----------+
4 rows in set, 67 warnings (0.00 sec)

Generated columns and indexes

MySQL has a different approach to indexing JSON data that provides two choices. If you need to extract values from a JSON key/value pair into its own column then you can use generated columns.

ALTER TABLE car ADD car_type VARCHAR(30)
AS (JSON_UNQUOTE(car->"$.type")) STORED;

Then this column can be indexed for quick searches.

The second MySQL option is Multi-Value indexes. These are secondary indexes defined on a column that stores an array of values. Usually, there is a one-to-one relationship between an index entry and a row of data but Mult-Value indexes have a one-to-many relationship that is intended for indexing JSON arrays

With PostgreSQL, you can simply create a GIN index on the values of a specified key in the JSON document.

CREATE INDEX books_tages_idx ON books USING gin (data->'tags');

Rigor

MySQL implemented a subset of the JSON-Shema.org proposal to ensure that your JSON data is what you want before it is inserted into the database. This way you can require certain fields, as well as check data types and data ranges. The JSON_VALID() function checks the incoming JSON document against a parametrized list of attributes to see if it complied. Used in conjunction with a constraint check, you can keep the bad or incomplete data out of your JSON column.

Conclusion

JSON will remain popular for data interchange usage for a long time and relational databases will need to adapt to its usage. The old relational arguments about abstaining from using JSON will fade as efforts like that from JSON-Schema.org provide the ability to ensure the completeness of data. Altering tables will never be completely instantaneous even though work like MySQL’s instant add column work has proven useful.

And the need to add or alter columns in a table will always be around. Being able to get some flexibility with a JSON column could provide some relief in shops where continuous operations with zero downtime is the goal and could be enough of a trade-off when using the relational model.

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to our Newsletter