Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication)
Hi Team,
We are facing an issue related to index corruption after migrating our
PostgreSQL 11 setup from *RHEL 7* to *RHEL 9* using *streaming replication*
(base backup method).
After bringing up the standby on RHEL 9, we observed that certain tables
are not returning results when queries use indexed scans. Upon
investigation, the following check confirms index corruption:
The same indexes work fine on the RHEL 7 (primary) side. However, on
the RHEL 9 replica, queries that rely on this index return zero
records.
Rebuilding the indexes fixed the issue temporarily but we have many
indexes and our DB size is more than 10TB.
*Environment details:*
-
PostgreSQL Version: 11.15
-
OS on primary: RHEL 7.9
-
OS on standby: RHEL 9.6
-
Replication Type: Streaming replication (initialized using pg_basebackup)
-
Data Directory initialized from RHEL 7 base backup
*Issue Summary:*
-
Indexes appear and are the same size as per prod on standby after base
backup restore.
-
We are able to read the data from the tables without index scans on
standy by RHEL 9.
-
No filesystem or WAL errors observed in logs.
Could this be related to OS-level binary or page layout differences between
RHEL 7 and RHEL 9 for PostgreSQL 11 binaries?
Any insights or recommended actions would be greatly appreciated.
Thanks & Regards,
*krishna.*
Hi Krishna,
This might be due to the upgrade in glibc between RHEL7 and RHEL8, which
includes updated collations that change sort orders and thus invalidate
existing indexes, forcing them to be rebuilt (on a RHEL>7 system) if they
contain any values which sort differently under the new collation than the
old one.
More details can be found here, including a query to identify affected
indexes: https://wiki.postgresql.org/wiki/Locale_data_changes
[RHEL] *Version 8 uses the new locale data*. Therefore, caution will be
necessary when upgrading."
Thanks, Chris.
On Wed, 22 Oct 2025 at 11:53, Bala M <krishna.pgdba@gmail.com> wrote:
Show quoted text
Hi Team,
We are facing an issue related to index corruption after migrating our
PostgreSQL 11 setup from *RHEL 7* to *RHEL 9* using *streaming
replication* (base backup method).After bringing up the standby on RHEL 9, we observed that certain tables
are not returning results when queries use indexed scans. Upon
investigation, the following check confirms index corruption:The same indexes work fine on the RHEL 7 (primary) side. However, on the RHEL 9 replica, queries that rely on this index return zero records.
Rebuilding the indexes fixed the issue temporarily but we have many indexes and our DB size is more than 10TB.*Environment details:*
-
PostgreSQL Version: 11.15
-OS on primary: RHEL 7.9
-OS on standby: RHEL 9.6
-Replication Type: Streaming replication (initialized using
pg_basebackup)
-Data Directory initialized from RHEL 7 base backup
*Issue Summary:*
-
Indexes appear and are the same size as per prod on standby after
base backup restore.
-We are able to read the data from the tables without index scans on
standy by RHEL 9.
-No filesystem or WAL errors observed in logs.
Could this be related to OS-level binary or page layout differences
between RHEL 7 and RHEL 9 for PostgreSQL 11 binaries?
Any insights or recommended actions would be greatly appreciated.Thanks & Regards,
*krishna.*
On 10/22/25 02:03, Bala M wrote:
Hi Team,
We are facing an issue related to index corruption after migrating our
PostgreSQL 11 setup from *RHEL 7* to *RHEL 9* using *streaming
replication* (base backup method).
Postgres 11 is two years past EOL per:
https://www.postgresql.org/support/versioning/
Thanks & Regards,
*krishna.*
--
Adrian Klaver
adrian.klaver@aklaver.com
On Wed, Oct 22, 2025 at 6:53 AM Bala M <krishna.pgdba@gmail.com> wrote:
-
PostgreSQL Version: 11.15
Ouch! Not only is the major version end of life, but that's not even the
latest revision of 11. At this point, you should use logical replication to
migrate from your v11 to a shiny new Postgres v18 on your RHEL9 box. As
Chris pointed out, the indexes are a libc problem, but your bigger problem
is being on v11.15. If you want to trade complexity for time, you could
consider using pg_dump to do the upgrade instead of logical replication.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
On Wed, Oct 22, 2025 at 1:35 PM Greg Sabino Mullane <htamfids@gmail.com>
wrote:
On Wed, Oct 22, 2025 at 6:53 AM Bala M <krishna.pgdba@gmail.com> wrote:
-
PostgreSQL Version: 11.15
Ouch! Not only is the major version end of life, but that's not even the
latest revision of 11.
Do pre-built PG11 binaries even exist for RHEL9?
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On 10/22/25 11:18, Ron Johnson wrote:
On Wed, Oct 22, 2025 at 1:35 PM Greg Sabino Mullane <htamfids@gmail.com
<mailto:htamfids@gmail.com>> wrote:On Wed, Oct 22, 2025 at 6:53 AM Bala M <krishna.pgdba@gmail.com
<mailto:krishna.pgdba@gmail.com>> wrote:*
PostgreSQL Version: 11.15
Ouch! Not only is the major version end of life, but that's not even
the latest revision of 11.Do pre-built PG11 binaries even exist for RHEL9?
I believe you can get them here:
https://yum.postgresql.org/repopackages/#archivedrepos
I don't use RH do I can't verify.
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
--
Adrian Klaver
adrian.klaver@aklaver.com
-
*Acceptable downtime:* ~1 day
-*Logical replication:* Not feasible due to the number of schemas,
tables, and overall data volumeI'm not sure why this is not feasible. Can you expand on this?
* For a *15 TB database* with roughly *1 day downtime*, what would be the
most reliable approach to migrate from *RHEL 7 → RHEL 9* while avoiding
collation/index corruption issues?
pg_dump is the most reliable, and the slowest. Keep in mind that only the
actual data needs to move over (not the indexes, which get rebuilt after
the data is loaded). You could also mix-n-match pg_logical and pg_dump if
you have a few tables that are super large. Whether either approach fits in
your 24 hour window is hard to say without you running some tests.
* Would using *pg_upgrade* (with --check and --clone options) be safe when
moving between OS versions with different glibc libraries?
No, you cannot use pg_upgrade for this. It can move your system across
Postgres versions, but across servers/operating systems.
* If we temporarily remain on PostgreSQL 11, is it *mandatory to rebuild
all indexes* after restoring the base backup on RHEL 9 to ensure data
consistency? Would running REINDEX DATABASE across all databases be
sufficient?
Yes, and yes.
* Are there any *community-tested procedures or best practices* for
migrating large (15 TB+) environments between RHEL 7 and RHEL 9 with
minimal downtime?
Yes - logical replication is both battle-tested and best practice for such
an upgrade. But with such a large downtime window, investigate pg_dump to
v18. You can find a large table and dump just that one table to start
getting some measurements, e.g. run from the new server:
pg_dump -h my_rhel7_server -d mydb -t mybigtable | psql -h localhost -d
mydb -f -
Make sure log_min_duration_statement is set on the new server to help you
see how long each step takes.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
Import Notes
Reply to msg id not found: CAJ4rSwvsRJhwfuem0vsws1WwRZPN8Ju2Nw0e0Oc2j7n=MXUhMg@mail.gmail.com
On 10/23/25 07:51, Bala M wrote:
Hi All,
Thank you for the valuable responses regarding the *index corruption
issue* we observed during our test migration of a *PostgreSQL 11*
environment from *RHEL 7 to RHEL 9* using *streaming replication*.Based on the replies, I understand that.
Suggested upgrading to a newer PostgreSQL version (e.g., v16 or v18)
and using *logical replication* or *pg_dump/restore* for a clean
rebuild.
I am not sure version 18 would a good choice at this time, it has just
been released and has no bug fix releases against it yet. Given the
other complications in your upgrade, OS upgrade and multi-version jump,
I don't think you want to add a brand new version to the mix. Postgres
17 on the other hand has been out a year and has had 6 bug/security
fixes. It is probably a more stable target version.
--
Adrian Klaver
adrian.klaver@aklaver.com
Import Notes
Reply to msg id not found: CAJ4rSwvsRJhwfuem0vsws1WwRZPN8Ju2Nw0e0Oc2j7n=MXUhMg@mail.gmail.com
On Thu, Oct 23, 2025 at 11:21 AM Greg Sabino Mullane <htamfids@gmail.com>
wrote:
-
*Acceptable downtime:* ~1 day
-*Logical replication:* Not feasible due to the number of schemas,
tables, and overall data volumeI'm not sure why this is not feasible. Can you expand on this?
* For a *15 TB database* with roughly *1 day downtime*, what would be the
most reliable approach to migrate from *RHEL 7 → RHEL 9* while avoiding
collation/index corruption issues?pg_dump is the most reliable, and the slowest. Keep in mind that only the
actual data needs to move over (not the indexes, which get rebuilt after
the data is loaded). You could also mix-n-match pg_logical and pg_dump if
you have a few tables that are super large. Whether either approach fits in
your 24 hour window is hard to say without you running some tests.
Last year. I did a dump/restore of a 4.3TB (inclusive of indices; heavy on
poorly-compressible BYTEA) database from RHEL6 + 9.6.24 to RHEL 8 +
14.latest. It took just under 11 hours.
Gzip Level = 1
Remote database size: 4307406 MB
RemoteThreads: 16
LocalThreads: 24
SharedBuffs: 32 GB
MaintWorkMem: 3 GB
CheckPoint: 30 min
MaxWalSize: 36 GB
WalBuffs: 128 MB
Both systems were SAN-attached ESX VMs on the same virtual network
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
Hi,
This happens because of the glibc version differrence between RHEL X and RHEL Y. At this point you either have to rebuild all indexes (sorry!) or redo the upgrade via logical replication (if it works for your app's behaviour)
Devrim
Show quoted text
On 22 October 2025 12:03:06 EEST, Bala M <krishna.pgdba@gmail.com> wrote:
Hi Team,
We are facing an issue related to index corruption after migrating our
PostgreSQL 11 setup from *RHEL 7* to *RHEL 9* using *streaming replication*
(base backup method).After bringing up the standby on RHEL 9, we observed that certain tables
are not returning results when queries use indexed scans. Upon
investigation, the following check confirms index corruption:The same indexes work fine on the RHEL 7 (primary) side. However, on
the RHEL 9 replica, queries that rely on this index return zero
records.
Rebuilding the indexes fixed the issue temporarily but we have many
indexes and our DB size is more than 10TB.*Environment details:*
-
PostgreSQL Version: 11.15
-OS on primary: RHEL 7.9
-OS on standby: RHEL 9.6
-Replication Type: Streaming replication (initialized using pg_basebackup)
-Data Directory initialized from RHEL 7 base backup
*Issue Summary:*
-
Indexes appear and are the same size as per prod on standby after base
backup restore.
-We are able to read the data from the tables without index scans on
standy by RHEL 9.
-No filesystem or WAL errors observed in logs.
Could this be related to OS-level binary or page layout differences between
RHEL 7 and RHEL 9 for PostgreSQL 11 binaries?
Any insights or recommended actions would be greatly appreciated.Thanks & Regards,
*krishna.*
On Fri, 24 Oct 2025 at 07:17, Devrim Gündüz <devrim@gunduz.org> wrote:
This happens because of the glibc version differrence between RHEL X and RHEL Y. At this point you either have to rebuild all indexes (sorry!) or redo the upgrade via logical replication (if it works for your app's behaviour)
There was a mention in [1]/messages/by-id/CAOg7f80JJCMHXhNLUjnHMwBFV-f9Msv4L=PML0CdaZ4n47uEeA@mail.gmail.com about a wiki page which includes a query to
identify which indexes could be affected. While that *could* be all
indexes, it does seem unlikely.
David
[1]: /messages/by-id/CAOg7f80JJCMHXhNLUjnHMwBFV-f9Msv4L=PML0CdaZ4n47uEeA@mail.gmail.com
Just wanted to clarify, this failed because OP used streaming which may not work between OS versions? But logical would have been just fine, between OS and PG versions or even to Windows? I always assumed streaming would “just work” as long as it’s the same major PG version and Linux-to-Linux regardless of OS/glibc version. That’s an awesome piece of design work BTW… congratulations and thank you for that. It never occurred to me that there could be an OS influencing factor like the glibc version for streaming replication.
I almost took the same streaming path about two months ago to go from RH7 to RH8; instead opted to take some downtime overnight and do a dump/restore to go from RH7 to RH8 because logical replication was not an option and it seemed safer. Sounds like I chose the better path by chance.
Thank you for sharing your knowledge.
Scot Kreienkamp | Applications Infrastructure Architect | La-Z-Boy Corporate
(734) 384-6403 | 1-734-915-1444 | Scot.Kreienkamp@la-z-boy.com
One La-Z-Boy Drive | Monroe, Michigan 48162 | la-z-boy.com<http://www.la-z-boy.com/>
facebook.com/lazboy<http://facebook.com/lazboy> | instagram.com/lazboy<https://instagram.com/lazboy> | youtube.com/lazboy<http://youtube.com/lazboy>
[cid:lazboy_2024_inc_navy_4a4d68ec-613a-4141-a2aa-d73a2ae749f6.png]
From: Devrim Gündüz <devrim@gunduz.org>
Sent: Thursday, October 23, 2025 2:17 PM
To: pgsql-general@lists.postgresql.org; Bala M <krishna.pgdba@gmail.com>
Subject: Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication)
Hi,
This happens because of the glibc version differrence between RHEL X and RHEL Y. At this point you either have to rebuild all indexes (sorry!) or redo the upgrade via logical replication (if it works for your app's behaviour)
Devrim
On 22 October 2025 12:03:06 EEST, Bala M <krishna.pgdba@gmail.com<mailto:krishna.pgdba@gmail.com>> wrote:
Hi Team,
We are facing an issue related to index corruption after migrating our PostgreSQL 11 setup from RHEL 7 to RHEL 9 using streaming replication (base backup method).
After bringing up the standby on RHEL 9, we observed that certain tables are not returning results when queries use indexed scans. Upon investigation, the following check confirms index corruption:
The same indexes work fine on the RHEL 7 (primary) side. However, on the RHEL 9 replica, queries that rely on this index return zero records.
Rebuilding the indexes fixed the issue temporarily but we have many indexes and our DB size is more than 10TB.
Environment details:
· PostgreSQL Version: 11.15
· OS on primary: RHEL 7.9
· OS on standby: RHEL 9.6
· Replication Type: Streaming replication (initialized using pg_basebackup)
· Data Directory initialized from RHEL 7 base backup
Issue Summary:
· Indexes appear and are the same size as per prod on standby after base backup restore.
· We are able to read the data from the tables without index scans on standy by RHEL 9.
· No filesystem or WAL errors observed in logs.
Could this be related to OS-level binary or page layout differences between RHEL 7 and RHEL 9 for PostgreSQL 11 binaries?
Any insights or recommended actions would be greatly appreciated.
Thanks & Regards,
krishna.
This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.
Attachments:
On Thu, Oct 23, 2025 at 4:24 PM Scot Kreienkamp <
Scot.Kreienkamp@la-z-boy.com> wrote:
I always assumed streaming would “just work” as long as it’s the same
major PG version and Linux-to-Linux regardless of OS/glibc version....It
never occurred to me that there could be an OS influencing factor like the
glibc version for streaming replication.
In addition to the locale checking when things are accessed, when you bring
up a database cluster it checks some pg_controldata entries to make sure
they match what the server's source code was built with. If any of them
are off, it won't run against those databases.
As a simple example that happens sometimes, if your main Linux PG install
increased the block size changed at compile time, a different PG binary
built with the default sizes will fail trying to read data from the
modified one. Because all these compile options have to match, sometimes
you can't migrate a database built with one Linux distribution to another.
When that happens it's sometimes possible to hack together a custom build
that matches the origin primary better, but now you're into packaging your
own PG binaries.
On Thu, Oct 23, 2025 at 11:49 AM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
I am not sure version 18 would a good choice at this time, it has just
been released and has no bug fix releases against it yet. Given the
other complications in your upgrade, OS upgrade and multi-version jump, I
don't think you want to add a brand new version to the mix. Postgres
17 on the other hand has been out a year and has had 6 bug/security fixes.
It is probably a more stable target version.
I hear your concern, but I'm pretty confident in v18. Additionally, they
are on version 11 (eleven!) so obviously major upgrades are a rare thing,
so might as well buy themselves another year. :)
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
On Thu, Oct 23, 2025 at 10:51 AM Bala M <krishna.pgdba@gmail.com> wrote:
Any advice, recommendations, or shared experiences from others who have
performed similar migrations would be greatly appreciated.
Some related advice: put some system in place to make sure you are always
running the latest revision in whatever major version you end up. See:
https://www.postgresql.org/support/versioning/
In particular, note this part:
*The community considers performing minor upgrades to be less risky than
continuing to run an old minor version.*
You also want to put something in place to make sure your major version
does not fall so far behind again. You don't need to upgrade every year,
but certainly target a major upgrade every 2-3 years. As you will discover,
the major upgrade process is going to be much easier than this current
upgrade we are talking about in this thread.
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
Import Notes
Reply to msg id not found: CAJ4rSwvsRJhwfuem0vsws1WwRZPN8Ju2Nw0e0Oc2j7n=MXUhMg@mail.gmail.com
On Fri, Oct 24, 2025 at 8:59 AM Greg Sabino Mullane <htamfids@gmail.com>
wrote:
On Thu, Oct 23, 2025 at 10:51 AM Bala M <krishna.pgdba@gmail.com> wrote:
Any advice, recommendations, or shared experiences from others who have
performed similar migrations would be greatly appreciated.Some related advice: put some system in place to make sure you are always
running the latest revision in whatever major version you end up. See:https://www.postgresql.org/support/versioning/
In particular, note this part:
*The community considers performing minor upgrades to be less risky than
continuing to run an old minor version.*
Not only that, but applying minor version update RPMs is *fast*.
Last weekend, this took 12 seconds to run:
alias suip='sudo -iupostgres'
suip pg_ctl stop -mfast -wt9999 \
&& sudo yum install -y Pg17.6_RHEL8/*rpm \
&& suip pg_ctl start -wt9999 \
&& psql -Xc "select version()"
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On 10/24/25 05:53, Greg Sabino Mullane wrote:
On Thu, Oct 23, 2025 at 11:49 AM Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:I am not sure version 18 would a good choice at this time, it has
just been released and has no bug fix releases against it yet. Given
the
other complications in your upgrade, OS upgrade and multi-version
jump, I don't think you want to add a brand new version to the mix.
Postgres
17 on the other hand has been out a year and has had 6 bug/security
fixes. It is probably a more stable target version.I hear your concern, but I'm pretty confident in v18. Additionally, they
Which is pretty much the criteria for a GA release, confidence that it
is ready to face the general public. That is not the same thing as it
being guaranteed bug free. Only time facing the manipulations of said
public proves how many bugs there are and how bad they are.
are on version 11 (eleven!) so obviously major upgrades are a rare
thing, so might as well buy themselves another year. :)
Major upgrade being the operative phrase here. The OP and company are
going to face enough new hoops to jump through in the move from 11,
adding a new release to that mix is pushing it a little too far in my
estimation.
Cheers,
Greg--
Crunchy Data - https://www.crunchydata.com <https://www.crunchydata.com>
Enterprise Postgres Software Products & Tech Support
--
Adrian Klaver
adrian.klaver@aklaver.com
On Fri, Oct 24, 2025 at 10:54 AM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
On 10/24/25 05:53, Greg Sabino Mullane wrote:
On Thu, Oct 23, 2025 at 11:49 AM Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:I am not sure version 18 would a good choice at this time, it has
just been released and has no bug fix releases against it yet. Given
the
other complications in your upgrade, OS upgrade and multi-version
jump, I don't think you want to add a brand new version to the mix.
Postgres
17 on the other hand has been out a year and has had 6 bug/security
fixes. It is probably a more stable target version.I hear your concern, but I'm pretty confident in v18. Additionally, they
Which is pretty much the criteria for a GA release, confidence that it
is ready to face the general public. That is not the same thing as it
being guaranteed bug free. Only time facing the manipulations of said
public proves how many bugs there are and how bad they are.
"Never trust a .0 release with important data" is just as true in 2025 as
it was in 1985.
That's a chicken and egg problem, though, isn't it?
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On 10/24/25 08:00, Ron Johnson wrote:
On Fri, Oct 24, 2025 at 10:54 AM Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
"Never trust a .0 release with important data" is just as true in 2025
as it was in 1985.That's a chicken and egg problem, though, isn't it?
There is nothing stopping you from setting up a test instance and
kicking the tires on a new release to see if your setup will work
correctly. Though the issue in this case is deciding what part of the 15
TB you would test against or finding the resources to replicate the
entire setup.
--
Adrian Klaver
adrian.klaver@aklaver.com
On Thu, 23 Oct 2025 at 17:21, Greg Sabino Mullane <htamfids@gmail.com> wrote
pg_dump is the most reliable, and the slowest. Keep in mind that only the
actual data needs to move over (not the indexes, which get rebuilt after
the data is loaded). You could also mix-n-match pg_logical and pg_dump if
you have a few tables that are super large. Whether either approach fits in
your 24 hour window is hard to say without you running some tests.
Long time ago I had a similar problem and did a "running with scissors"
restore. This means:
1.- Prepare normal configuration, test, etc for the new version.
2.- Prepare a restore configuration, with fsync=off, wallevel=minimal,
whatever option gives you any speed advantage.
As the target was empty, if restore failed we could just clean and restart.
3.- Dump, boot with the restore configuration, restore, clean shutdown,
switch to production configuration, boot again and follow on.
Time has passed and I lost my notes, but I remember the restore was much
faster than doing it with the normal production configuration. Given
current machine speeds, it maybe doable.
Francisco Olarte.