pg_upgrade of 11 -> 13: free(): invalid pointer
I’m continuing my upgrade journey, this time from 11 to 13, and the process is dying in the copy phase, always on the same DB:
—
Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows in the new cluster ok
Deleting files from new pg_xact ok
Copying old pg_xact to new server ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
messages
*failure*
Consult the last few lines of "pg_upgrade_dump_16387.log" for
the probable cause of the failure.
Failure, exiting
—
The log contains (which is different each time):
—
pg_restore: WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres
pg_restore: error: could not execute query: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if the raster is empty (width = 0 and height = 0). Otherwise, returns false.’;
—
And the pgsql13 server log contains:
—
2020-11-17 11:51:40.953 EST [96545] LOG: database system is ready to accept connections
free(): invalid pointer
2020-11-17 11:51:42.880 EST [96545] LOG: server process (PID 96575) was terminated by signal 6: Aborted
2020-11-17 11:51:42.880 EST [96545] LOG: terminating any other active server processes
2020-11-17 11:51:42.880 EST [96582] WARNING: terminating connection because of crash of another server process
2020-11-17 11:51:42.880 EST [96582] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-17 11:51:42.880 EST [96582] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-11-17 11:51:42.884 EST [96545] LOG: all server processes terminated; reinitializing
2020-11-17 11:51:42.904 EST [96545] LOG: received fast shutdown request
2020-11-17 11:51:42.905 EST [96585] LOG: database system was interrupted; last known up at 2020-11-17 11:51:42 EST
2020-11-17 11:51:42.906 EST [96585] LOG: database system was not properly shut down; automatic recovery in progress
2020-11-17 11:51:42.906 EST [96585] LOG: redo starts at E0/DB6B2960
2020-11-17 11:51:42.907 EST [96545] LOG: abnormal database system shutdown
2020-11-17 11:51:42.909 EST [96545] LOG: database system is shut down
—
So I’m assuming it’s that free() call. Servers have PostGIS 3.0 on them, all installed from repo, and running CentOS 8.
On 11/17/20 8:59 AM, Jeremy Wilson wrote:
I’m continuing my upgrade journey, this time from 11 to 13, and the process is dying in the copy phase, always on the same DB:
—
Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows in the new cluster ok
Deleting files from new pg_xact ok
Copying old pg_xact to new server ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
messages
*failure*Consult the last few lines of "pg_upgrade_dump_16387.log" for
the probable cause of the failure.
Failure, exiting
—The log contains (which is different each time):
—
pg_restore: WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres
pg_restore: error: could not execute query: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if the raster is empty (width = 0 and height = 0). Otherwise, returns false.’;
—And the pgsql13 server log contains:
—
2020-11-17 11:51:40.953 EST [96545] LOG: database system is ready to accept connections
free(): invalid pointer
2020-11-17 11:51:42.880 EST [96545] LOG: server process (PID 96575) was terminated by signal 6: Aborted
2020-11-17 11:51:42.880 EST [96545] LOG: terminating any other active server processes
2020-11-17 11:51:42.880 EST [96582] WARNING: terminating connection because of crash of another server process
2020-11-17 11:51:42.880 EST [96582] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-11-17 11:51:42.880 EST [96582] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-11-17 11:51:42.884 EST [96545] LOG: all server processes terminated; reinitializing
2020-11-17 11:51:42.904 EST [96545] LOG: received fast shutdown request
2020-11-17 11:51:42.905 EST [96585] LOG: database system was interrupted; last known up at 2020-11-17 11:51:42 EST
2020-11-17 11:51:42.906 EST [96585] LOG: database system was not properly shut down; automatic recovery in progress
2020-11-17 11:51:42.906 EST [96585] LOG: redo starts at E0/DB6B2960
2020-11-17 11:51:42.907 EST [96545] LOG: abnormal database system shutdown
2020-11-17 11:51:42.909 EST [96545] LOG: database system is shut down
—So I’m assuming it’s that free() call. Servers have PostGIS 3.0 on them, all installed from repo, and running CentOS 8.
Was this after a clean install of the corrected RPM's?
--
Adrian Klaver
adrian.klaver@aklaver.com
On Nov 17, 2020, at 12:18 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/17/20 8:59 AM, Jeremy Wilson wrote:
Was this after a clean install of the corrected RPM’s?
Yes, this is a fresh install of CentOS 8 and installed using the updated repo and RPMs.
On Tue, Nov 17, 2020 at 11:59:10AM -0500, Jeremy Wilson wrote:
pg_restore: WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres
pg_restore: error: could not execute query: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if the raster is empty (width = 0 and height = 0). Otherwise, returns false.’;
My guess is that this is a crash in the PostGIS shared library. I would
ask the PostGIS team if they know of any crash cases, and if not, I
think you need to do a pg_dump of the database and test-load it into a
new database to see what query makes it fail, and then load debug
symbols and do a backtrace of the stack at the point of the crash.
Yeah, not fun.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
On Tue, Nov 17, 2020 at 02:44:47PM -0500, Bruce Momjian wrote:
On Tue, Nov 17, 2020 at 11:59:10AM -0500, Jeremy Wilson wrote:
pg_restore: WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres
pg_restore: error: could not execute query: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if the raster is empty (width = 0 and height = 0). Otherwise, returns false.’;My guess is that this is a crash in the PostGIS shared library. I would
ask the PostGIS team if they know of any crash cases, and if not, I
think you need to do a pg_dump of the database and test-load it into a
new database to see what query makes it fail, and then load debug
symbols and do a backtrace of the stack at the point of the crash.
Yeah, not fun.
Actually pg_dump --schema-only is what you want to dump and load into a
separate databsae. No need to dump the data.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
On Nov 17, 2020, at 11:44 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Nov 17, 2020 at 11:59:10AM -0500, Jeremy Wilson wrote:
pg_restore: WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
pg_restore: creating COMMENT "public.FUNCTION "st_isempty"("rast" "public"."raster")"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 5338; 0 0 COMMENT FUNCTION "st_isempty"("rast" "public"."raster") postgres
pg_restore: error: could not execute query: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Command was: COMMENT ON FUNCTION "public"."st_isempty"("rast" "public"."raster") IS 'args: rast - Returns true if the raster is empty (width = 0 and height = 0). Otherwise, returns false.’;My guess is that this is a crash in the PostGIS shared library. I would
ask the PostGIS team if they know of any crash cases, and if not, I
think you need to do a pg_dump of the database and test-load it into a
new database to see what query makes it fail, and then load debug
symbols and do a backtrace of the stack at the point of the crash.
Yeah, not fun.
These kinds of problems have been almost always due to multiple versions of dependencies installed simultaneously. So packaging fun. You'll get some version of postgis compiled against one train of dependencies and another against another train, and for upgrade both trains will end up installed simultaneously, and things will break.
P
Show quoted text
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.comThe usefulness of a cup is in its emptiness, Bruce Lee