Missing Library: A pg_upgrade History

Missing Library: A pg_upgrade

While working as a DBA, we perform many regular tasks, and one of them is upgrading our database systems. There are some techniques to perform a PostgreSQL database upgrade, such as data dump and import, logical replication, or in-site upgrade using pg_upgrade.

The last is a good option when you can afford some downtime and desire to reuse the same server where your current instance is running.

The process using pg_upgrade is well documented, and you can easily find the instructions with little googling.

However, there are a few occasions when you face some unusual conditions that require additional research. In this blog post, I will show you a particular example I faced when working with an upgrade exercise using pg_upgrade.

As a regular procedure, the following steps were followed:

  • Install the same PostgreSQL packages you already have installed in the current version for the new one.
  • Run pg_upgrade with the –check flag to verify the compatibility between the current and the new version clusters.
  • If all is good, perform the upgrade removing the –check flag.

You might consider some extra steps, such as verifying the existence of gin/gist indexes, unknown datatypes, or planning the upgrade of any standby server using the rsync approach. These are not in this blog’s scope to keep this example simple.

Ideally, all the three above steps execute with no issues, and you get your new PostgreSQL version up and running.

There are some situations where the second step, the –check one, fails, and you need to investigate and fix it before moving forward. A “regular” case is you missed installing a required package in the new version. You will end with something like the following:

Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for system-defined composite types in user tables  ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for tables WITH OIDS                               ok
Checking for invalid "sql_identifier" user columns          ok
Checking for presence of required libraries                 fatal

In this case, the pg_upgrade –check command will provide a hint:

Your installation references loadable libraries that are missing from the
new installation.  You can add these libraries to the new installation,
or remove the functions using them from the old installation.  A list of
problem libraries is in the file:
    loadable_libraries.txt

For example, if checking the content of the loadable_libraries.txt file, you see something like the next:

$ cat loadable_libraries.txt
could not load library "$libdir/pg_repack": ERROR: could not access file "$libdir/pg_repack": No such file or directory

You can immediately identify the missing package as the pg_repack, so you need to install it for the new version.

Well, sometimes, this information is not quite “evident.” Let’s review the case.

Example case

When working with the upgrade exercise, the goal was to move from PostgreSQL 11 to PostgreSQL 12. This was a specific requirement, but the situation could also happen when upgrading to a different version.

Following the main process, I installed the same packages as the existing version 11 to version 12 and verified.

ii  postgresql-11                   11.18-1.pgdg20.04+1               amd64        The World's Most Advanced Open Source Relational Database
ii  postgresql-12                   12.13-1.pgdg20.04+1               amd64        The World's Most Advanced Open Source Relational Database
ii  postgresql-common               246.pgdg20.04+1                   all          PostgreSQL database-cluster manager
ii  postgresql-client-11            11.18-1.pgdg20.04+1               amd64        front-end programs for PostgreSQL 11
ii  postgresql-client-12            12.13-1.pgdg20.04+1               amd64        front-end programs for PostgreSQL 12
ii  postgresql-client-common        246.pgdg20.04+1                   all          manager for multiple PostgreSQL client versions

Looking good, then we can execute the second step and verify the cluster’s compatibility.

verify the cluster's compatibility

Mmmh, something is missing. Let’s check what could be.

Ok, a “randomness” library doesn’t sound like a usual library, but we can review some details to find it.

I could miss some packages for the new version; let’s check again.

postgresql packages

The same packages are installed for both versions.

The next idea is to verify the extensions we have installed in the source database.

verify the extensions postgresql

Here, nothing looks “related” to the randomness library.

Just to double-check check, we can even try to find the related libraries from the PostgreSQL $libdir directory. For example, with the following bash snippet.

for lib in $(psql -qtA pgbench -c"select extname||'.so' from pg_extension") ; do find $(<pg version bin path>/pg_config --pkglibdir) -name $lib -type f ; done

Checked for version 11:

And the same for version 12:

Everything seems OK, so where is the randomness library coming from?

There is something I am missing. I tried pg_dump with the –binary-upgrade flag to review some other details regarding the extensions

pg_dump -C --schema-only --quote-all-identifiers --binary-upgrade dbname=<DATABASE_NAME> -f dump_binary_upgrade.sql

From the PostgreSQL documentation we can see the following statement about this flag:

–binary-upgrade

This option is for use by in-place upgrade utilities. Its use for other purposes is not recommended or supported. The behavior of the option may change in future releases without notice.

But the intention is not to restore. Just take a look at the extra information we can get.

The –binary-upgrade option will produce a “verbose” output related to the extensions. The CREATE EXTENSION IF NOT EXISTS commands will be replaced with DROP EXTENSION IF EXISTS …, SELECT pg_catalog.binary_upgrade_create_empty_extension() calls.

Also, you will see a section with the explicit creation of the C language functions for the extensions. An example is the next for the pg_stat_statements extension:

--
-- Name: pg_stat_statements_reset(); Type: FUNCTION; Schema: public; Owner: postgres
-- CREATE FUNCTION "public"."pg_stat_statements_reset"() RETURNS "void"
    LANGUAGE "c" PARALLEL SAFE
    AS '$libdir/pg_stat_statements', 'pg_stat_statements_reset'; -- For binary upgrade, handle extension membership the hard way
ALTER EXTENSION "pg_stat_statements" ADD FUNCTION "public"."pg_stat_statements_reset"(); ALTER FUNCTION "public"."pg_stat_statements_reset"() OWNER TO "postgres";

Looking at the following line:

    AS '$libdir/pg_stat_statements', 'pg_stat_statements_reset';

You can identify the physical library ($libdir/pg_stat_statements) used to build the Postgres function and the C function (‘pg_stat_statements_reset‘) within that library.

For some extra good material about writing extensions in PostgreSQL, you can take a look at the blog post Writing PostgreSQL Extensions is Fun – C Language, it really worth it.

Well, while doing this review about the extension’s definition, I saw the next one:

--
-- Name: bytea_size("bytea"); Type: FUNCTION; Schema: public; Owner: postgres
-- CREATE FUNCTION "public"."bytea_size"("bytea") RETURNS integer
    LANGUAGE "c" STRICT
    AS 'randomness', 'bytea_size'; ALTER FUNCTION "public"."bytea_size"("bytea") OWNER TO "postgres";

Aha! There is the randomness library call! A custom C function is using it.

We should remember PostgreSQL includes support for some procedural languages with its base distribution, such as  PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python. So, having C functions is perfectly supported; however, it is not what we might say the usual.

As described in the PostgreSQL documentation, there are a few paths where the libraries can be placed:

The following algorithm is used to locate the shared object file based on the name given in the CREATE FUNCTION command:

1. If the name is an absolute path, the given file is loaded.
2. If the name starts with the string $libdir, that part is replaced by the PostgreSQL package library directory name, which is determined at build time.
3. If the name does not contain a directory part, the file is searched for in the path specified by the configuration variable dynamic_library_path.
4. Otherwise (the file was not found in the path, or it contains a non-absolute directory part), the dynamic loader will try to take the name as given, which will most likely fail. (It is unreliable to depend on the current working directory.)

For this case, I searched all the paths and did not find the randomness. No library. It seems somebody was playing around to build a C function and at some point, removed the library but left the Postgres function behind (this was a DEV environment).

After confirming this “public“, “bytea_size” function is not used, the solution was to DROP it, then retry the pg_upgrade –check command.

bytea_size postgresql

After this point, the pg_upgrade was successful.

Conclusion

As we all already know, PostgreSQL is really powerful, and its core capabilities can be extended with custom functions and extensions. Writing your own in C language functions requires some extra knowledge, but you can definitely get very interesting results.

Remember, every time a custom C function is created, PostgreSQL itself doesn’t compile the C code, which should be done beforehand, but the CREATE FUNCTION command is recorded literally in the system catalogs. So it will be referenced for any other load calls, including the pg_upgrade command.

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to our Newsletter