pg_walfile_name uses XLByteToPrevSeg

Started by Ashutosh Bapatalmost 4 years ago5 messages
#1Ashutosh Bapat
ashutosh.bapat.oss@gmail.com

Hi All,
pg_walfile_name() returns the WAL file name corresponding to the given
WAL location. Per
https://www.postgresql.org/docs/14/functions-admin.html
---
pg_walfile_name ( lsn pg_lsn ) → text

Converts a write-ahead log location to the name of the WAL file
holding that location.
---

The function uses XLByteToPrevSeg() which gives the name of previous
WAL file if the location falls on the boundary of WAL segment. I find
it misleading since the given LSN will fall into the segment provided
by XLByteToSeg() and not XLBytePrevSeg().

And it gives some surprising results as well
---
#select pg_walfile_name('0/0'::pg_lsn);
pg_walfile_name
--------------------------
00000001FFFFFFFF000000FF
(1 row)
----

Comment in the code says
---
/*
* Compute an xlog file name given a WAL location,
* such as is returned by pg_stop_backup() or pg_switch_wal().
*/
Datum
pg_walfile_name(PG_FUNCTION_ARGS)
---
XLByteToPrevSeg() may be inline with the comment but I don't think
that's what is conveyed by the documentation at least.

--
Best Wishes,
Ashutosh

#2Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Bapat (#1)
Re: pg_walfile_name uses XLByteToPrevSeg

On Fri, Feb 4, 2022 at 9:05 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

And it gives some surprising results as well
---
#select pg_walfile_name('0/0'::pg_lsn);
pg_walfile_name
--------------------------
00000001FFFFFFFF000000FF
(1 row)
----

Yeah, that seems wrong.

--
Robert Haas
EDB: http://www.enterprisedb.com

#3Nathan Bossart
nathandbossart@gmail.com
In reply to: Robert Haas (#2)
Re: pg_walfile_name uses XLByteToPrevSeg

On Fri, Feb 04, 2022 at 09:17:54AM -0500, Robert Haas wrote:

On Fri, Feb 4, 2022 at 9:05 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

And it gives some surprising results as well
---
#select pg_walfile_name('0/0'::pg_lsn);
pg_walfile_name
--------------------------
00000001FFFFFFFF000000FF
(1 row)
----

Yeah, that seems wrong.

It looks like it's been this way for a while (704ddaa).
pg_walfile_name_offset() has the following comment:

* Note that a location exactly at a segment boundary is taken to be in
* the previous segment. This is usually the right thing, since the
* expected usage is to determine which xlog file(s) are ready to archive.

I see a couple of discussions about this as well [0]/messages/by-id/1154384790.3226.21.camel@localhost.localdomain [1]/messages/by-id/15952.1154827205@sss.pgh.pa.us.

[0]: /messages/by-id/1154384790.3226.21.camel@localhost.localdomain
[1]: /messages/by-id/15952.1154827205@sss.pgh.pa.us

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Nathan Bossart (#3)
Re: pg_walfile_name uses XLByteToPrevSeg

At Fri, 4 Feb 2022 14:50:57 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in

On Fri, Feb 04, 2022 at 09:17:54AM -0500, Robert Haas wrote:

On Fri, Feb 4, 2022 at 9:05 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

And it gives some surprising results as well
---
#select pg_walfile_name('0/0'::pg_lsn);
pg_walfile_name
--------------------------
00000001FFFFFFFF000000FF
(1 row)
----

Yeah, that seems wrong.

It looks like it's been this way for a while (704ddaa).
pg_walfile_name_offset() has the following comment:

* Note that a location exactly at a segment boundary is taken to be in
* the previous segment. This is usually the right thing, since the
* expected usage is to determine which xlog file(s) are ready to archive.

I see a couple of discussions about this as well [0] [1].

[0] /messages/by-id/1154384790.3226.21.camel@localhost.localdomain
[1] /messages/by-id/15952.1154827205@sss.pgh.pa.us

Yes, its the deliberate choice of design, or a kind of
questionable-but-unoverturnable decision. I think there are many
external tools conscious of this behavior.

It is also described in the documentation.

https://www.postgresql.org/docs/current/functions-admin.html

When the given write-ahead log location is exactly at a write-ahead
log file boundary, both these functions return the name of the
preceding write-ahead log file. This is usually the desired behavior
for managing write-ahead log archiving behavior, since the preceding
file is the last one that currently needs to be archived.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#5Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#4)
Re: pg_walfile_name uses XLByteToPrevSeg

At Mon, 07 Feb 2022 13:21:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Fri, 4 Feb 2022 14:50:57 -0800, Nathan Bossart <nathandbossart@gmail.com> wrote in

On Fri, Feb 04, 2022 at 09:17:54AM -0500, Robert Haas wrote:

On Fri, Feb 4, 2022 at 9:05 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

And it gives some surprising results as well
---
#select pg_walfile_name('0/0'::pg_lsn);
pg_walfile_name
--------------------------
00000001FFFFFFFF000000FF
(1 row)
----

Yeah, that seems wrong.

It looks like it's been this way for a while (704ddaa).
pg_walfile_name_offset() has the following comment:

* Note that a location exactly at a segment boundary is taken to be in
* the previous segment. This is usually the right thing, since the
* expected usage is to determine which xlog file(s) are ready to archive.

I see a couple of discussions about this as well [0] [1].

[0] /messages/by-id/1154384790.3226.21.camel@localhost.localdomain
[1] /messages/by-id/15952.1154827205@sss.pgh.pa.us

Yes, its the deliberate choice of design, or a kind of
questionable-but-unoverturnable decision. I think there are many
external tools conscious of this behavior.

It is also described in the documentation.

https://www.postgresql.org/docs/current/functions-admin.html

When the given write-ahead log location is exactly at a write-ahead
log file boundary, both these functions return the name of the
preceding write-ahead log file. This is usually the desired behavior
for managing write-ahead log archiving behavior, since the preceding
file is the last one that currently needs to be archived.

I forgot to mentino, but I don't think we need to handle the
wrap-around case of the function.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center