While working as a DBA, we perform many regular tasks, and one of them is upgrading our database systems. There are some techniques to perform a PostgreSQL database upgrade, such as data dump and import, logical replication, or in-site upgrade using pg_upgrade.
The last is a good option when you can afford some downtime and desire to reuse the same server where your current instance is running.
The process using pg_upgrade is well documented, and you can easily find the instructions with little googling.
However, there are a few occasions when you face some unusual conditions that require additional research. In this blog post, I will show you a particular example I faced when working with an upgrade exercise using pg_upgrade.
As a regular procedure, the following steps were followed:
- Install the same PostgreSQL packages you already have installed in the current version for the new one.
- Run pg_upgrade with the –check flag to verify the compatibility between the current and the new version clusters.
- If all is good, perform the upgrade removing the –check flag.
You might consider some extra steps, such as verifying the existence of gin/gist indexes, unknown datatypes, or planning the upgrade of any standby server using the rsync approach. These are not in this blog’s scope to keep this example simple.
Ideally, all the three above steps execute with no issues, and you get your new PostgreSQL version up and running.
There are some situations where the second step, the –check one, fails, and you need to investigate and fix it before moving forward. A “regular” case is you missed installing a required package in the new version. You will end with something like the following:
Performing Consistency Checks on Old Live Server ------------------------------------------------ Checking cluster versions ok Checking database user is the install user ok Checking database connection settings ok Checking for prepared transactions ok Checking for system-defined composite types in user tables ok Checking for reg* data types in user tables ok Checking for contrib/isn with bigint-passing mismatch ok Checking for tables WITH OIDS ok Checking for invalid "sql_identifier" user columns ok Checking for presence of required libraries fatal
In this case, the pg_upgrade –check command will provide a hint:
Your installation references loadable libraries that are missing from the new installation. You can add these libraries to the new installation, or remove the functions using them from the old installation. A list of problem libraries is in the file: loadable_libraries.txt
For example, if checking the content of the loadable_libraries.txt file, you see something like the next:
$ cat loadable_libraries.txt could not load library "$libdir/pg_repack": ERROR: could not access file "$libdir/pg_repack": No such file or directory
You can immediately identify the missing package as the pg_repack, so you need to install it for the new version.
Well, sometimes, this information is not quite “evident.” Let’s review the case.
Example case
When working with the upgrade exercise, the goal was to move from PostgreSQL 11 to PostgreSQL 12. This was a specific requirement, but the situation could also happen when upgrading to a different version.
Following the main process, I installed the same packages as the existing version 11 to version 12 and verified.
ii postgresql-11 11.18-1.pgdg20.04+1 amd64 The World's Most Advanced Open Source Relational Database ii postgresql-12 12.13-1.pgdg20.04+1 amd64 The World's Most Advanced Open Source Relational Database ii postgresql-common 246.pgdg20.04+1 all PostgreSQL database-cluster manager ii postgresql-client-11 11.18-1.pgdg20.04+1 amd64 front-end programs for PostgreSQL 11 ii postgresql-client-12 12.13-1.pgdg20.04+1 amd64 front-end programs for PostgreSQL 12 ii postgresql-client-common 246.pgdg20.04+1 all manager for multiple PostgreSQL client versions
Looking good, then we can execute the second step and verify the cluster’s compatibility.
Mmmh, something is missing. Let’s check what could be.
Ok, a “randomness” library doesn’t sound like a usual library, but we can review some details to find it.
I could miss some packages for the new version; let’s check again.
The same packages are installed for both versions.
The next idea is to verify the extensions we have installed in the source database.
Here, nothing looks “related” to the randomness library.
Just to double-check check, we can even try to find the related libraries from the PostgreSQL $libdir directory. For example, with the following bash snippet.
for lib in $(psql -qtA pgbench -c"select extname||'.so' from pg_extension") ; do find $(<pg version bin path>/pg_config --pkglibdir) -name $lib -type f ; done
Checked for version 11:
And the same for version 12:
Everything seems OK, so where is the randomness library coming from?
There is something I am missing. I tried pg_dump with the –binary-upgrade flag to review some other details regarding the extensions
pg_dump -C --schema-only --quote-all-identifiers --binary-upgrade dbname=<DATABASE_NAME> -f dump_binary_upgrade.sql
From the PostgreSQL documentation we can see the following statement about this flag:
–binary-upgrade
This option is for use by in-place upgrade utilities. Its use for other purposes is not recommended or supported. The behavior of the option may change in future releases without notice.
But the intention is not to restore. Just take a look at the extra information we can get.
The –binary-upgrade option will produce a “verbose” output related to the extensions. The CREATE EXTENSION IF NOT EXISTS commands will be replaced with DROP EXTENSION IF EXISTS …, SELECT pg_catalog.binary_upgrade_create_empty_extension() calls.
Also, you will see a section with the explicit creation of the C language functions for the extensions. An example is the next for the pg_stat_statements extension:
-- -- Name: pg_stat_statements_reset(); Type: FUNCTION; Schema: public; Owner: postgres -- CREATE FUNCTION "public"."pg_stat_statements_reset"() RETURNS "void" LANGUAGE "c" PARALLEL SAFE AS '$libdir/pg_stat_statements', 'pg_stat_statements_reset'; -- For binary upgrade, handle extension membership the hard way ALTER EXTENSION "pg_stat_statements" ADD FUNCTION "public"."pg_stat_statements_reset"(); ALTER FUNCTION "public"."pg_stat_statements_reset"() OWNER TO "postgres";
Looking at the following line:
AS '$libdir/pg_stat_statements', 'pg_stat_statements_reset';
You can identify the physical library ($libdir/pg_stat_statements) used to build the Postgres function and the C function (‘pg_stat_statements_reset‘) within that library.
For some extra good material about writing extensions in PostgreSQL, you can take a look at the blog post Writing PostgreSQL Extensions is Fun – C Language, it really worth it.
Well, while doing this review about the extension’s definition, I saw the next one:
-- -- Name: bytea_size("bytea"); Type: FUNCTION; Schema: public; Owner: postgres -- CREATE FUNCTION "public"."bytea_size"("bytea") RETURNS integer LANGUAGE "c" STRICT AS 'randomness', 'bytea_size'; ALTER FUNCTION "public"."bytea_size"("bytea") OWNER TO "postgres";
Aha! There is the randomness library call! A custom C function is using it.
We should remember PostgreSQL includes support for some procedural languages with its base distribution, such as PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python. So, having C functions is perfectly supported; however, it is not what we might say the usual.
As described in the PostgreSQL documentation, there are a few paths where the libraries can be placed:
The following algorithm is used to locate the shared object file based on the name given in the CREATE FUNCTION command:
1. If the name is an absolute path, the given file is loaded.
2. If the name starts with the string $libdir, that part is replaced by the PostgreSQL package library directory name, which is determined at build time.
3. If the name does not contain a directory part, the file is searched for in the path specified by the configuration variable dynamic_library_path.
4. Otherwise (the file was not found in the path, or it contains a non-absolute directory part), the dynamic loader will try to take the name as given, which will most likely fail. (It is unreliable to depend on the current working directory.)
For this case, I searched all the paths and did not find the randomness. No library. It seems somebody was playing around to build a C function and at some point, removed the library but left the Postgres function behind (this was a DEV environment).
After confirming this “public“, “bytea_size” function is not used, the solution was to DROP it, then retry the pg_upgrade –check command.
After this point, the pg_upgrade was successful.
Conclusion
As we all already know, PostgreSQL is really powerful, and its core capabilities can be extended with custom functions and extensions. Writing your own in C language functions requires some extra knowledge, but you can definitely get very interesting results.
Remember, every time a custom C function is created, PostgreSQL itself doesn’t compile the C code, which should be done beforehand, but the CREATE FUNCTION command is recorded literally in the system catalogs. So it will be referenced for any other load calls, including the pg_upgrade command.