pgsql: Remove the restriction that the relmap must be 512 bytes.

Started by Robert Haasalmost 4 years ago16 messageshackers

robertmhaas@gmail.com

almost 4 years ago

Remove the restriction that the relmap must be 512 bytes.

Instead of relying on the ability to atomically overwrite the
entire relmap file in one shot, write a new one and durably
rename it into place. Removing the struct padding and the
calculation showing why the map is exactly 512 bytes, and change
the maximum number of entries to a nearby round number.

Patch by me, reviewed by Andres Freund and Dilip Kumar.

Discussion: /messages/by-id/CA+TgmoZq5=LWDK7kHaUbmWXxcaTuw_QwafgG9dr-BaPym_U8WQ@mail.gmail.com
Discussion: /messages/by-id/CAFiTN-ttOXLX75k_WzRo9ar=VvxFhrHi+rJxns997F+yvkm==A@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/d8cd0c6c95c0120168df93aae095df4e0682a08a

Modified Files
--------------
doc/src/sgml/monitoring.sgml | 4 +-
src/backend/utils/activity/wait_event.c | 4 +-
src/backend/utils/cache/relmapper.c | 94 +++++++++++++++++++--------------
src/include/utils/wait_event.h | 2 +-
4 files changed, 58 insertions(+), 46 deletions(-)

Michael Paquier

michael@paquier.xyz

almost 4 years ago

In reply to: Robert Haas (#1)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

Hi Robert,

On Tue, Jul 26, 2022 at 07:10:22PM +0000, Robert Haas wrote:

Remove the restriction that the relmap must be 512 bytes.

Instead of relying on the ability to atomically overwrite the
entire relmap file in one shot, write a new one and durably
rename it into place. Removing the struct padding and the
calculation showing why the map is exactly 512 bytes, and change
the maximum number of entries to a nearby round number.

Patch by me, reviewed by Andres Freund and Dilip Kumar.

Discussion: /messages/by-id/CA+TgmoZq5=LWDK7kHaUbmWXxcaTuw_QwafgG9dr-BaPym_U8WQ@mail.gmail.com
Discussion: /messages/by-id/CAFiTN-ttOXLX75k_WzRo9ar=VvxFhrHi+rJxns997F+yvkm==A@mail.gmail.com

The CI on Windows is blowing up here and there after something that
looks to come from this commit, as of this backtrace:
00000000`007fe300 00000001`405c62dd postgres!errfinish(
char * filename = 0x00000001`40bf1513 "fd.c",
int lineno = 0n756,
char * funcname = 0x00000001`40bf14e0 "durable_rename")+0x41b
[c:\cirrus\src\backend\utils\error\elog.c @ 683]
00000000`007fe360 00000001`4081647b postgres!durable_rename(
char * oldfile = 0x00000000`007fe430 "base/16384/pg_filenode.map.tmp",
char * newfile = 0x00000000`007fe830 "base/16384/pg_filenode.map",
int elevel = 0n21)+0x22d [c:\cirrus\src\backend\storage\file\fd.c @
753]
00000000`007fe3b0 00000001`408166c9 postgres!write_relmap_file(
struct RelMapFile * newmap = 0x00000000`007fecb0,
bool write_wal = true,
bool send_sinval = true,
bool preserve_files = true,
unsigned int dbid = 0x4000,
unsigned int tsid = 0x67f,
char * dbpath = 0x00000000`0090b1c0 "base/16384")+0x38b
[c:\cirrus\src\backend\utils\cache\relmapper.c @ 971]

Here is one of them, kicked by the CF bot, but I have seen similar
crashes with some of my own things (see the txt file in crashlog, in a
manual VACUUM):
https://cirrus-ci.com/task/5240408958566400

Thanks,
--
Michael

Thomas Munro

thomas.munro@gmail.com

almost 4 years ago

In reply to: Michael Paquier (#2)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

On Wed, Jul 27, 2022 at 4:35 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jul 26, 2022 at 07:10:22PM +0000, Robert Haas wrote:

Remove the restriction that the relmap must be 512 bytes.

The CI on Windows is blowing up here and there after something that
looks to come from this commit, as of this backtrace:
00000000`007fe300 00000001`405c62dd postgres!errfinish(
char * filename = 0x00000001`40bf1513 "fd.c",
int lineno = 0n756,
char * funcname = 0x00000001`40bf14e0 "durable_rename")+0x41b
[c:\cirrus\src\backend\utils\error\elog.c @ 683]

And here's what the error looks like:

2022-07-26 19:38:04.321 GMT [8020][client backend]
[pg_regress/vacuum][8/349:4527] PANIC: could not rename file
"global/pg_filenode.map.tmp" to "global/pg_filenode.map": Permission
denied

Someone else still has the old file open, so we can't rename the new
one to its name? On Windows that should have gone through pgrename()
in dirmod.c, which would retry 100 times with a 100ms sleep between.
Since every backend reads that file (I added an elog() and saw it 2289
times during make check), I guess you can run out of luck.

/me thinks

Thomas Munro

thomas.munro@gmail.com

almost 4 years ago

In reply to: Thomas Munro (#3)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

On Wed, Jul 27, 2022 at 5:01 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Jul 27, 2022 at 4:35 PM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Jul 26, 2022 at 07:10:22PM +0000, Robert Haas wrote:

Remove the restriction that the relmap must be 512 bytes.

The CI on Windows is blowing up here and there after something that
looks to come from this commit, as of this backtrace:
00000000`007fe300 00000001`405c62dd postgres!errfinish(
char * filename = 0x00000001`40bf1513 "fd.c",
int lineno = 0n756,
char * funcname = 0x00000001`40bf14e0 "durable_rename")+0x41b
[c:\cirrus\src\backend\utils\error\elog.c @ 683]

And here's what the error looks like:

2022-07-26 19:38:04.321 GMT [8020][client backend]
[pg_regress/vacuum][8/349:4527] PANIC: could not rename file
"global/pg_filenode.map.tmp" to "global/pg_filenode.map": Permission
denied

Someone else still has the old file open, so we can't rename the new
one to its name? On Windows that should have gone through pgrename()
in dirmod.c, which would retry 100 times with a 100ms sleep between.
Since every backend reads that file (I added an elog() and saw it 2289
times during make check), I guess you can run out of luck.

/me thinks

Maybe we just have to rearrange the locking slightly? Something like
the attached.

Thomas Munro

thomas.munro@gmail.com

almost 4 years ago

In reply to: Thomas Munro (#4)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

On Wed, Jul 27, 2022 at 6:06 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Wed, Jul 27, 2022 at 5:01 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Someone else still has the old file open, so we can't rename the new
one to its name? On Windows that should have gone through pgrename()
in dirmod.c, which would retry 100 times with a 100ms sleep between.
Since every backend reads that file (I added an elog() and saw it 2289
times during make check), I guess you can run out of luck.

/me thinks

Maybe we just have to rearrange the locking slightly? Something like
the attached.

Erm, let me try that again, this time with the CloseTransientFile()
also under the lock, so that we never have a file handle without a
lock.

Michael Paquier

michael@paquier.xyz

almost 4 years ago

In reply to: Thomas Munro (#5)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

On Wed, Jul 27, 2022 at 06:21:07PM +1200, Thomas Munro wrote:

Erm, let me try that again, this time with the CloseTransientFile()
also under the lock, so that we never have a file handle without a
lock.

Right. The whole write_relmap_file() already happens while taking
RelationMappingLock, so that seems like a good idea for consistency at
the end (even if I remember that there is a patch floating around to
improve the concurrency of pgrename, which may become an easier move
now that we require Windows 10).

I have tested three runs and that was working here even if the
issue is sporadic, so more runs may be better to have more
confidence.
--
Michael

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Michael Paquier (#6)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

On Wed, Jul 27, 2022 at 6:25 AM Michael Paquier <michael@paquier.xyz> wrote:

Right. The whole write_relmap_file() already happens while taking
RelationMappingLock, so that seems like a good idea for consistency at
the end (even if I remember that there is a patch floating around to
improve the concurrency of pgrename, which may become an easier move
now that we require Windows 10).

I have tested three runs and that was working here even if the
issue is sporadic, so more runs may be better to have more
confidence.

OK, I committed Thomas's patch, after taking the liberty of adding an
explanatory comment.

--
Robert Haas
EDB: http://www.enterprisedb.com

Alvaro Herrera

alvherre@2ndquadrant.com

almost 4 years ago

In reply to: Robert Haas (#1)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

On 2022-Jul-26, Robert Haas wrote:

Remove the restriction that the relmap must be 512 bytes.

Instead of relying on the ability to atomically overwrite the
entire relmap file in one shot, write a new one and durably
rename it into place. Removing the struct padding and the
calculation showing why the map is exactly 512 bytes, and change
the maximum number of entries to a nearby round number.

Another thing that seems to have happened here is that catversion ought
to have been touched and wasn't. Trying to start a cluster that was
initdb'd with the previous code enters an infinite loop that dies each
time with

2022-07-27 19:17:27.589 CEST [2516547] LOG: database system is ready to accept connections
2022-07-27 19:17:27.589 CEST [2516730] FATAL: could not read file "global/pg_filenode.map": read 512 of 524
2022-07-27 19:17:27.589 CEST [2516731] FATAL: could not read file "global/pg_filenode.map": read 512 of 524
2022-07-27 19:17:27.589 CEST [2516547] LOG: autovacuum launcher process (PID 2516730) exited with exit code 1
2022-07-27 19:17:27.589 CEST [2516547] LOG: terminating any other active server processes

Perhaps we should still do a catversion bump now, since one hasn't
happened since the commit.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"El número de instalaciones de UNIX se ha elevado a 10,
y se espera que este número aumente" (UPM, 1972)

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Alvaro Herrera (#8)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

On Wed, Jul 27, 2022 at 1:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Another thing that seems to have happened here is that catversion ought
to have been touched and wasn't. Trying to start a cluster that was
initdb'd with the previous code enters an infinite loop that dies each
time with

2022-07-27 19:17:27.589 CEST [2516547] LOG: database system is ready to accept connections
2022-07-27 19:17:27.589 CEST [2516730] FATAL: could not read file "global/pg_filenode.map": read 512 of 524
2022-07-27 19:17:27.589 CEST [2516731] FATAL: could not read file "global/pg_filenode.map": read 512 of 524
2022-07-27 19:17:27.589 CEST [2516547] LOG: autovacuum launcher process (PID 2516730) exited with exit code 1
2022-07-27 19:17:27.589 CEST [2516547] LOG: terminating any other active server processes

Perhaps we should still do a catversion bump now, since one hasn't
happened since the commit.

Hmm, interesting. I didn't think about bumping catversion because I
didn't change anything in the catalogs. I did think about changing the
magic number for the file at one point, but unlike some of our other
constants, there's no indication that this one is intended to be used
as a version number. But in retrospect it would have been good to
change something somewhere. If you want me to bump catversion now, I
can. If you or someone else wants to do it, that's also fine.

--
Robert Haas
EDB: http://www.enterprisedb.com

#10

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Robert Haas (#9)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jul 27, 2022 at 1:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Another thing that seems to have happened here is that catversion ought
to have been touched and wasn't.

Hmm, interesting. I didn't think about bumping catversion because I
didn't change anything in the catalogs. I did think about changing the
magic number for the file at one point, but unlike some of our other
constants, there's no indication that this one is intended to be used
as a version number. But in retrospect it would have been good to
change something somewhere. If you want me to bump catversion now, I
can. If you or someone else wants to do it, that's also fine.

If there's a magic number, then I'd (a) change that and (b) adjust
whatever comments led you to think you shouldn't. Bumping catversion
is a good fallback choice when there's not any more-proximate version
indicator, but here there is.

regards, tom lane

#11

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Tom Lane (#10)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

On Wed, Jul 27, 2022 at 1:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If there's a magic number, then I'd (a) change that and (b) adjust
whatever comments led you to think you shouldn't. Bumping catversion
is a good fallback choice when there's not any more-proximate version
indicator, but here there is.

Maybe I just got cold feet because it doesn't ever have seem to have
been changed before, because the definition says:

#define RELMAPPER_FILEMAGIC 0x592717 /* version ID value */

And the fact that "version" is in there sure makes it seem like a
version number, now that I look again.

I'll add 1 to the value.

--
Robert Haas
EDB: http://www.enterprisedb.com

#12

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Robert Haas (#11)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

On Wed, Jul 27, 2022 at 2:13 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jul 27, 2022 at 1:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

If there's a magic number, then I'd (a) change that and (b) adjust
whatever comments led you to think you shouldn't. Bumping catversion
is a good fallback choice when there's not any more-proximate version
indicator, but here there is.

Maybe I just got cold feet because it doesn't ever have seem to have
been changed before, because the definition says:

#define RELMAPPER_FILEMAGIC 0x592717 /* version ID value */

And the fact that "version" is in there sure makes it seem like a
version number, now that I look again.

I'll add 1 to the value.

Hmm, but that doesn't actually produce nice behavior. It just goes
into an infinite loop, like this:

2022-07-27 14:21:12.826 EDT [32849] LOG: database system was
interrupted; last known up at 2022-07-27 14:21:12 EDT
2022-07-27 14:21:12.860 EDT [32849] LOG: database system was not
properly shut down; automatic recovery in progress
2022-07-27 14:21:12.861 EDT [32849] LOG: invalid record length at
0/14B3BB8: wanted 24, got 0
2022-07-27 14:21:12.861 EDT [32849] LOG: redo is not required
2022-07-27 14:21:12.864 EDT [32850] LOG: checkpoint starting:
end-of-recovery immediate wait
2022-07-27 14:21:12.865 EDT [32850] LOG: checkpoint complete: wrote 3
buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.001 s, sync=0.001 s, total=0.002 s; sync files=2,
longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB;
lsn=0/14B3BB8, redo lsn=0/14B3BB8
2022-07-27 14:21:12.868 EDT [31930] LOG: database system is ready to
accept connections
2022-07-27 14:21:12.869 EDT [32853] FATAL: relation mapping file
"global/pg_filenode.map" contains invalid data
2022-07-27 14:21:12.869 EDT [32854] FATAL: relation mapping file
"global/pg_filenode.map" contains invalid data
2022-07-27 14:21:12.870 EDT [31930] LOG: autovacuum launcher process
(PID 32853) exited with exit code 1
2022-07-27 14:21:12.870 EDT [31930] LOG: terminating any other active
server processes
2022-07-27 14:21:12.870 EDT [31930] LOG: background worker "logical
replication launcher" (PID 32854) exited with exit code 1
2022-07-27 14:21:12.871 EDT [31930] LOG: all server processes
terminated; reinitializing

While I agree that changing a version identifier that is more closely
related to what got changed is better than changing a generic one, I
think people won't like an infinite loop that spews messages into the
log at top speed as a way of telling them about the problem.

So now I'm back to being unsure what to do here.

--
Robert Haas
EDB: http://www.enterprisedb.com

#13

Alvaro Herrera

alvherre@2ndquadrant.com

almost 4 years ago

In reply to: Robert Haas (#12)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

On 2022-Jul-27, Robert Haas wrote:

Hmm, but that doesn't actually produce nice behavior. It just goes
into an infinite loop, like this:

2022-07-27 14:21:12.869 EDT [32853] FATAL: relation mapping file
"global/pg_filenode.map" contains invalid data

This seems almost identical what happens without the version number
change, so I wouldn't call it much of an improvement.

While I agree that changing a version identifier that is more closely
related to what got changed is better than changing a generic one, I
think people won't like an infinite loop that spews messages into the
log at top speed as a way of telling them about the problem.

So now I'm back to being unsure what to do here.

I vote to go for the catversion bump for now.

Maybe it's possible to patch the relmapper code afterwards, so that if a
version mismatch is detected the server is stopped with a reasonable
message. An hypothetical improvement would be to keep the code to read
the old version and upgrade the file automatically, but given the number
of times that this file has changed, it's likely pointless effort.

Therefore, my proposal is to add a comment next to the struct definition
suggesting to bump catversion and call it a day.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#14

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Alvaro Herrera (#13)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Hmm, but that doesn't actually produce nice behavior. It just goes
into an infinite loop, like this:
So now I'm back to being unsure what to do here.

I vote to go for the catversion bump for now.

What this is showing us is that any form of corruption in the relmapper
file causes very unpleasant behavior. We probably had better do something
about that, independently of this issue.

In the meantime, I still think bumping the file magic number is a better
answer. It won't make the behavior any worse for un-updated code than
it is already.

regards, tom lane

#15

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Tom Lane (#14)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

On Wed, Jul 27, 2022 at 3:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Hmm, but that doesn't actually produce nice behavior. It just goes
into an infinite loop, like this:
So now I'm back to being unsure what to do here.

I vote to go for the catversion bump for now.

What this is showing us is that any form of corruption in the relmapper
file causes very unpleasant behavior. We probably had better do something
about that, independently of this issue.

I'm not sure how important that is, but it certainly wouldn't hurt.

In the meantime, I still think bumping the file magic number is a better
answer. It won't make the behavior any worse for un-updated code than
it is already.

But it also won't make it any better, so why even bother? The goal of
catversion bumps is to replace crashes or unpredictable misbehavior
with a nice error message that tells you exactly what the problem is.
Here we'd just be replacing an infinite series of crashes with an
infinite series of crashes with a slightly different error message.
It's probably worth comparing those error messages:

FATAL: could not read file "global/pg_filenode.map": read 512 of 524
FATAL: relation mapping file "global/pg_filenode.map" contains invalid data

The first message is what you get now. The second message is what you
get with the proposed change to the magic number. I would argue that
the second message is actually worse than the first one, because the
first one actually gives you some hint what the problem is, whereas
the second one really just says that an unspecified bad thing
happened.

In short, I think Alvaro's idea is unprincipled but solves the problem
of making it clear that a new initdb is required. Your idea is
principled but does not solve any problem.

--
Robert Haas
EDB: http://www.enterprisedb.com

#16

Tom Lane

tgl@sss.pgh.pa.us

almost 4 years ago

In reply to: Robert Haas (#15)

Re: pgsql: Remove the restriction that the relmap must be 512 bytes.

Robert Haas <robertmhaas@gmail.com> writes:

In short, I think Alvaro's idea is unprincipled but solves the problem
of making it clear that a new initdb is required. Your idea is
principled but does not solve any problem.

[ shrug... ] Fair enough.

regards, tom lane

pgsql: Remove the restriction that the relmap must be 512 bytes.

Attachments:

Attachments: