One of the routine operations when administering PostgreSQL is periodic updates to the database system’s architecture. PostgreSQL does a good job of allowing one to update a schema, add types, functions, triggers, or alter a table by adding and removing columns and updating column data types, etc., in a reliable manner. However, there is no built-in mechanism to help identify the differences, let alone generate the necessary SQL, to accomplish updates in an easy manner from the development to the production environment.
So let’s talk about possible approaches to schema changes.
Using logical dump manifests
The easiest way to identify changes between schemas from one database to another is to compare schema dump manifests.
The following example demonstrates an approach one can take looking for differences between schema on different databases:
EXAMPLE:
-- create database schemas create database db01 create database db01
-- db01: version 1 create table t1 ( c1 int, c2 text, c4 date ); create table t2( c1 int, c2 varchar(3), c3 timestamp, c4 date );
-- db02: version 2 create table t1 ( c1 serial primary key, c2 varchar(256), c3 date default now() ); create table t2( c1 serial primary key, c2 varchar(3), c3 varchar(50), c4 timestamp with time zone default now(), c5 int references t1(c1) ); create index on t2 (c5);
# generate schema dumps pg_dump -s db01 -Fc > db01.db pg_dump -s db02 -Fc > db02.db
# generate manifests pg_restore -l db01.db > db01_manifest.ini pg_restore -l db02.db > db02_manifest.ini
This snippet demonstrates looking for differences by comparing the md5 checksums:
# EX 1: generate checksums md5sum \ <(tail +16 db01_manifest.ini | cut -d ' ' -f 4-) \ <(tail +16 db02_manifest.ini | cut -d ' ' -f 4-) # output $ 9d76c028259f2d8bed966308c256943e /dev/fd/63 $ ba124f9410ea623085c237dc4398388a /dev/fd/62
This next snippet diffs the differences between the two manifests identifying only those objects and attributes that have changed. Notice that redundant information, the first 16 lines, are skipped:
# EX 2: perform diff diff \ > <(tail +16 db01_manifest.ini | cut -d ' ' -f 4-) \ > <(tail +16 db02_manifest.ini | cut -d ' ' -f 4-)
This resultant diff shows the changes made between the two schemas:
1a2,3 > SEQUENCE public t1_c1_seq postgres > SEQUENCE OWNED BY public t1_c1_seq postgres 2a5,12 > SEQUENCE public t2_c1_seq postgres > SEQUENCE OWNED BY public t2_c1_seq postgres > DEFAULT public t1 c1 postgres > DEFAULT public t2 c1 postgres > CONSTRAINT public t1 t1_pkey postgres
> CONSTRAINT public t2 t2_pkey postgres
> INDEX public t2_c5_idx postgres
> FK CONSTRAINT public t2 t2_c5_fkey postgres
The good news is that there are a number of existing tools that can reconcile differences between a proposed schema design and the target schema:
- Commercial offerings can differentiate schema between databases in an elegant and efficient manner. Researching, ala Google, yields the most popular technologies one can use.
- In regards to open source solutions, there are a number of projects capable of diffing Postgres database schemas.
Working with the apgdiff extension
The following is an example implementation of the open source tool apgdiff.
Apgdiff can be found in the Postgres community repository. It compares two schema dump files and creates an SQL output file that is, for the most part, suitable for upgrades of old schemata:
Package: apgdiff Version: 2.7.0-1.pgdg18.04+1 Architecture: all Maintainer: Debian PostgreSQL Maintainers <team+postgresql@tracker.debian.org> Installed-Size: 173 Depends: default-jre-headless | java2-runtime-headless Homepage: https://www.apgdiff.com/ Priority: optional Section: database Filename: pool/main/a/apgdiff/apgdiff_2.7.0-1.pgdg18.04+1_all.deb Size: 154800 SHA256: 9a83fcf54aed00e1a28c3d00eabe1c166977af1e26e91035e15f88b5215b181b SHA1: ea713acb55898f07374dadd1bebb09ec2fa4b589 MD5sum: e70a97903cb23b8df8a887da4c54e945
The following example demonstrates how one can update differences between the development environment and the production database schema using apgdiff.
EXAMPLE:
apt install -y apgdiff
# EX 1: dump as SQL statements pg_dump -s db01 -Fp > db01.sql pg_dump -s db02 -Fp > db02.sql createdb db03 --template=db01 apgdiff --ignore-start-with db01.sql db02.sql > db01-db02.sql # “psql -1” encapsulates statements within a transaction psql -1 -f db01-db02.sql db03
# EX 2: uses logical dumps # notice the dumps are standard logical dumps and includes data pg_dump db01 -Fc > db01.db pg_dump db02 -Fc > db02.db createdb db03 --template=db01 # this invocation assumes the resultant diff doesn’t require editing apgdiff --ignore-start-with \ <(pg_restore -s -f - db01.db) \ <(pg_restore -s -f - db02.db) \ | psql -1 db03
There’s more you can accomplish with these simple approaches. By incorporating variations of these, one can create fairly sophisticated shell scripts with little code and, with a little luck, not that much effort.