Handling glibc v2.28 breaking changes

Started by Pradeep Chhetrialmost 4 years ago6 messagesgeneral
Jump to latest
#1Pradeep Chhetri
pradeepchhetri4444@gmail.com

Hello everyone,

I am sure this has been discussed multiple times in the past but I would
like to initiate this discussion again. I have 3 nodes cluster of Postgres
v9.6. They all are currently running on Debian 9 (with glibc v2.24)
and need to upgrade them to Debian 10 (with glibc v2.28) without downtime.
In order to bypass the glibc issue, I am trying to evaluate whether I can
compile glibc v2.24 on Debian 10, pin postgres to use this manually
compiled glibc and upgrade the linux distribution in rolling fashion. I
would like to know how others have achieved such distro upgrades without
downtime. I am new to Postgres so please pardon my ignorance.

Thank you for your help.
Best regards,
Pradeep

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Pradeep Chhetri (#1)
Re: Handling glibc v2.28 breaking changes

On 4/24/22 08:31, Pradeep Chhetri wrote:

Hello everyone,

I am sure this has been discussed multiple times in the past but I would
like to initiate this discussion again. I have 3 nodes cluster of
Postgres v9.6. They all are currently running on Debian 9 (with glibc
v2.24) and need to upgrade them to Debian 10 (with glibc v2.28) without
downtime. In order to bypass the glibc issue, I am trying to evaluate
whether I can compile glibc v2.24 on Debian 10, pin postgres to use this
manually compiled glibc and upgrade the linux distribution in rolling
fashion. I would like to know how others have achieved such distro
upgrades without downtime. I am new to Postgres so please pardon my
ignorance.

You are going to have to be more specific as upgrading a distro involves
downtime. I'm guessing you mean downtime for Postgres, still at least
one of the instances is going to be down while it's OS is being
upgraded. So:

1) Define how the 3 node cluster works?

2) What is the locale for the Postgres instances?

3) What is acceptable downtime in the process?

4) Are you using ICU collation?

Also you might want to look at:

https://wiki.postgresql.org/wiki/Locale_data_changes

Thank you for your help.
Best regards,
Pradeep

--
Adrian Klaver
adrian.klaver@aklaver.com

#3Pradeep Chhetri
pradeepchhetri4444@gmail.com
In reply to: Adrian Klaver (#2)
Re: Handling glibc v2.28 breaking changes

Hi Adrian,

Thank you for your quick response.

By zero downtime, I meant at least one of the three nodes is up at any time
to handle the writes and reads.

Define how the 3 node cluster works?

These 3 nodes are configured as 1 primary, 1 sync replica and 1 async
replica. These are managed via stolon.

What is the locale for the Postgres instances?

We are using en_US.UTF-8 collation.

What is acceptable downtime in the process?

We want to minimize as little as possible since these will be customer
facing clusters.

Are you using ICU collation?

As far as I know, ICU collation is supported from Postgres v10 but we are
still running v9.6 so I guess that is not an option unless we upgrade our
cluster first.

I am open to ways including changing architecture or upgrading cluster
first or evaluating logical replication or any other option but our primary
goal is to achieve it with minimal downtime.

Thank you for your help.
Best regards,
Pradeep

On Sun, Apr 24, 2022 at 11:43 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 4/24/22 08:31, Pradeep Chhetri wrote:

Hello everyone,

I am sure this has been discussed multiple times in the past but I would
like to initiate this discussion again. I have 3 nodes cluster of
Postgres v9.6. They all are currently running on Debian 9 (with glibc
v2.24) and need to upgrade them to Debian 10 (with glibc v2.28) without
downtime. In order to bypass the glibc issue, I am trying to evaluate
whether I can compile glibc v2.24 on Debian 10, pin postgres to use this
manually compiled glibc and upgrade the linux distribution in rolling
fashion. I would like to know how others have achieved such distro
upgrades without downtime. I am new to Postgres so please pardon my
ignorance.

You are going to have to be more specific as upgrading a distro involves
downtime. I'm guessing you mean downtime for Postgres, still at least
one of the instances is going to be down while it's OS is being
upgraded. So:

1) Define how the 3 node cluster works?

2) What is the locale for the Postgres instances?

3) What is acceptable downtime in the process?

4) Are you using ICU collation?

Also you might want to look at:

https://wiki.postgresql.org/wiki/Locale_data_changes

Thank you for your help.
Best regards,
Pradeep

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Pradeep Chhetri (#1)
Re: Handling glibc v2.28 breaking changes

On Sun, 2022-04-24 at 23:31 +0800, Pradeep Chhetri wrote:

I am sure this has been discussed multiple times in the past but I would like to initiate
this discussion again. I have 3 nodes cluster of Postgres v9.6. They all are currently
running on Debian 9 (with glibc v2.24) and need to upgrade them to Debian 10 (with glibc v2.28)
without downtime. In order to bypass the glibc issue, I am trying to evaluate whether I can
compile glibc v2.24 on Debian 10, pin postgres to use this manually compiled glibc and
upgrade the linux distribution in rolling fashion.

Don't use an old glibc.

You will want to move to a different machine or upgrade the operating system, so you will
have some down time anyway.

You could consider upgrade in several steps:

- pg_upgrade to v14 on the current operating system
- use replication, than switchover to move to a current operating system on a different
machine
- REINDEX CONCURRENTLY all indexes on string expressions

You could get data corruption and bad query results between the second and the third steps,
so keep that interval short.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#5Nick Cleaton
nick@cleaton.net
In reply to: Laurenz Albe (#4)
Re: Handling glibc v2.28 breaking changes

On Mon, 25 Apr 2022 at 12:45, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

You could consider upgrade in several steps:

- pg_upgrade to v14 on the current operating system
- use replication, than switchover to move to a current operating system
on a different
machine
- REINDEX CONCURRENTLY all indexes on string expressions

You could get data corruption and bad query results between the second and
the third steps,
so keep that interval short.

We did something like this, with the addition of a step where we used a
new-OS replica to run amcheck's bt_index_check() over all of the btree
indexes to find those actually corrupted by the libc upgrade in practice
with our data. It was a small fraction of them, and we were able to fit an
offline reindex of those btrees and all texty non-btree indexes into an
acceptable downtime window, with REINDEX CONCURRENTLY of everything else as
a lower priority after the upgrade.

#6Pradeep Chhetri
pradeepchhetri4444@gmail.com
In reply to: Nick Cleaton (#5)
Re: Handling glibc v2.28 breaking changes

Thank you Laurenz and Nick. That sounds like a good plan to me.

Best Regards,
Pradeep

On Mon, Apr 25, 2022 at 9:44 PM Nick Cleaton <nick@cleaton.net> wrote:

Show quoted text

On Mon, 25 Apr 2022 at 12:45, Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

You could consider upgrade in several steps:

- pg_upgrade to v14 on the current operating system
- use replication, than switchover to move to a current operating system
on a different
machine
- REINDEX CONCURRENTLY all indexes on string expressions

You could get data corruption and bad query results between the second
and the third steps,
so keep that interval short.

We did something like this, with the addition of a step where we used a
new-OS replica to run amcheck's bt_index_check() over all of the btree
indexes to find those actually corrupted by the libc upgrade in practice
with our data. It was a small fraction of them, and we were able to fit an
offline reindex of those btrees and all texty non-btree indexes into an
acceptable downtime window, with REINDEX CONCURRENTLY of everything else as
a lower priority after the upgrade.