BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

Started by PG Bug reporting formabout 1 year ago26 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18828
Logged by: Robins Tharakan
Email address: tharakan@gmail.com
PostgreSQL version: Unsupported/Unknown
Operating system: Ubuntu
Description:

Passing an empty string to the recently added extension function
pg_get_logical_snapshot_meta() triggers PANIC / Abort.

SQL
===
CREATE EXTENSION pg_logicalinspect;
SELECT pg_get_logical_snapshot_meta('');

Commit
======
.
.
Checking (56ba0463d38~950) - 3c7d78427e - fail
Checking (56ba0463d38~1110) - dc68515968 - Pass
Checking (56ba0463d38~1030) - 67a54b9e83 - Pass
Checking (56ba0463d38~990) - e2fd615ecc - Pass
Checking (56ba0463d38~970) - 665785d85f - fail
Checking (56ba0463d38~980) - b360d1762b - fail
Checking (56ba0463d38~985) - 04bec894a0 - fail
Checking (56ba0463d38~987) - d5ca15ee54 - fail
Checking (56ba0463d38~988) - 2453196107 - fail
Checking (56ba0463d38~989) - 7cdfeee320 - fail
Current=7cdfeee320e72162b62dddddee638e713c2b8680 - Commit Found.

Error Log
=========
2025-03-02 04:28:02.355 ACDT [3826851] PANIC: could not open file
"pg_logical/snapshots/": Is a directory
2025-03-02 04:28:02.355 ACDT [3826851] STATEMENT: select
pg_get_logical_snapshot_meta('');
2025-03-02 04:28:02.896 ACDT [3745459] LOG: client backend (PID 3826851)
was terminated by signal 6: Aborted
2025-03-02 04:28:02.896 ACDT [3745459] DETAIL: Failed process was running:
select pg_get_logical_snapshot_meta('');
2025-03-02 04:28:02.896 ACDT [3745459] LOG: terminating any other active
server processes

Output
======
smith@camry:~/proj/sqith$ psql -c "select
pg_get_logical_snapshot_meta('');"
PANIC: could not open file "pg_logical/snapshots/": Is a directory
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

PG Bug reporting form <noreply@postgresql.org> writes:

Passing an empty string to the recently added extension function
pg_get_logical_snapshot_meta() triggers PANIC / Abort.

Thanks for noticing. It looks like this function is far too trusting
of its input. Another way to crash it is to point to a directory:

regression=# SELECT pg_get_logical_snapshot_meta('../snapshots');
PANIC: could not open file "pg_logical/snapshots/../snapshots": Is a directory
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

This example incidentally shows that it's not sanitizing the request
to prevent reference to files outside the snapshots directory, which
possibly would be a security bug down the road. I realize that we
restrict access to the function, but still we shouldn't be *this*
trusting of the argument.

Possibly some of the defense for this ought to be in
SnapBuildRestoreSnapshot: it looks like that function isn't
considering the possibility that OpenTransientFile will succeed on a
directory. Does it have any other callers passing
less-than-fully-trustworthy paths?

Another interesting question is why the error is being promoted
to PANIC. That sure seems unnecessary, and there's a comment
in SnapBuildRestoreSnapshot that claims it won't happen.

regards, tom lane

#3Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Tom Lane (#2)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

On Sat, Mar 1, 2025 at 12:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

Passing an empty string to the recently added extension function
pg_get_logical_snapshot_meta() triggers PANIC / Abort.

Thanks for noticing. It looks like this function is far too trusting
of its input. Another way to crash it is to point to a directory:

regression=# SELECT pg_get_logical_snapshot_meta('../snapshots');
PANIC: could not open file "pg_logical/snapshots/../snapshots": Is a directory
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

This example incidentally shows that it's not sanitizing the request
to prevent reference to files outside the snapshots directory, which
possibly would be a security bug down the road. I realize that we
restrict access to the function, but still we shouldn't be *this*
trusting of the argument.

Agreed.

Possibly some of the defense for this ought to be in
SnapBuildRestoreSnapshot: it looks like that function isn't
considering the possibility that OpenTransientFile will succeed on a
directory. Does it have any other callers passing
less-than-fully-trustworthy paths?

SnapBuildRestoreSnapshot() is called by pg_logicalinspect functions
(pg_get_logical_snapshot_meta() and pg_get_logical_snapshot_info())
and SnapBuildRestore(). For the latter, SnapBuildRestore() surely
passes the file name, so it would not be a problem.

One way to fix this issue is that SnapBuildRestoreSnapshot() takes an
LSN corresponding to the serialized snapshot instead of a path, and it
constructs the .snap file path correctly. This fix would also prevent
it from being a security bug.

pg_logicalinspect functions, pg_get_logical_snapshot_info() and
pg_get_logical_snapshot_meta(), would also need to be changed so that
they extract the LSN from the given file name. If the given file name
is incorrect, it should raise an error.

Another interesting question is why the error is being promoted
to PANIC. That sure seems unnecessary, and there's a comment
in SnapBuildRestoreSnapshot that claims it won't happen.

This PANIC occurred because fsync_fname() with isdir=false was called
for the directory, pg_logical/snapshots/' in the above case
(data_sync_retry is false by default). I think you referred to the
following comment in SnapBuildRestoreSnapshot():

/* ----
* Make sure the snapshot had been stored safely to disk, that's normally
* cheap.
* Note that we do not need PANIC here, nobody will be able to use the
* slot without fsyncing, and saving it won't succeed without an fsync()
* either...
* ----
*/
fsync_fname(path, false);
fsync_fname(PG_LOGICAL_SNAPSHOTS_DIR, true);

These comments have been present since the initial logical decoding
commit, whereas data_sync_retry was introduced at a later stage. While
the comment's reference to PANIC suggests that a critical section
isn't necessary here, the actual PANIC occurred in an unanticipated
manner. SnapBuildRestoreSnapshot() works correctly as long as it takes
a correct file path but 7cdfeee320e72 violated this assumption.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#4Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Tom Lane (#2)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

Hi,

On Sat, Mar 01, 2025 at 03:47:12PM -0500, Tom Lane wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

Passing an empty string to the recently added extension function
pg_get_logical_snapshot_meta() triggers PANIC / Abort.

Thanks for the report!

Thanks for noticing. It looks like this function is far too trusting
of its input. Another way to crash it is to point to a directory:

regression=# SELECT pg_get_logical_snapshot_meta('../snapshots');
PANIC: could not open file "pg_logical/snapshots/../snapshots": Is a directory
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

This example incidentally shows that it's not sanitizing the request
to prevent reference to files outside the snapshots directory, which
possibly would be a security bug down the road. I realize that we
restrict access to the function, but still we shouldn't be *this*
trusting of the argument.

Indeed that's bad.

Possibly some of the defense for this ought to be in
SnapBuildRestoreSnapshot: it looks like that function isn't
considering the possibility that OpenTransientFile will succeed on a
directory. Does it have any other callers passing
less-than-fully-trustworthy paths?

I don't think so. The other caller is in snapbuild.c where the input is
build based on a lsn.

Now I wonder if pg_get_logical_snapshot_meta() (and pg_get_logical_snapshot_info()
which suffers for the same issues reported above) should take a lsn as input
parameter. That's how it has been proposed initially and that would simplify
things. Thoughts?

Another interesting question is why the error is being promoted
to PANIC. That sure seems unnecessary, and there's a comment
in SnapBuildRestoreSnapshot that claims it won't happen.

Yeah. That's the fsync_fname() call in SnapBuildRestoreSnapshot() that promotes
to PANIC due to (data_sync_elevel(ERROR)). This behavior has been introduced in
9ccdd7f66e3 (that made the comment in SnapBuildRestore() (at that time) inaccurate).

I'm not sure we should change the logic here, maybe just update the comment?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#5Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Bertrand Drouvot (#4)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

On Mon, Mar 3, 2025 at 12:50 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Sat, Mar 01, 2025 at 03:47:12PM -0500, Tom Lane wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

Passing an empty string to the recently added extension function
pg_get_logical_snapshot_meta() triggers PANIC / Abort.

Thanks for the report!

Thanks for noticing. It looks like this function is far too trusting
of its input. Another way to crash it is to point to a directory:

regression=# SELECT pg_get_logical_snapshot_meta('../snapshots');
PANIC: could not open file "pg_logical/snapshots/../snapshots": Is a directory
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

This example incidentally shows that it's not sanitizing the request
to prevent reference to files outside the snapshots directory, which
possibly would be a security bug down the road. I realize that we
restrict access to the function, but still we shouldn't be *this*
trusting of the argument.

Indeed that's bad.

Possibly some of the defense for this ought to be in
SnapBuildRestoreSnapshot: it looks like that function isn't
considering the possibility that OpenTransientFile will succeed on a
directory. Does it have any other callers passing
less-than-fully-trustworthy paths?

I don't think so. The other caller is in snapbuild.c where the input is
build based on a lsn.

Now I wonder if pg_get_logical_snapshot_meta() (and pg_get_logical_snapshot_info()
which suffers for the same issues reported above) should take a lsn as input
parameter. That's how it has been proposed initially and that would simplify
things. Thoughts?

Yeah, that's one solution. But it would make pg_logicalinspect
functions hard to work with the pg_ls_loigcalsnapdir(). Users would
need to parse the filename by themselve and pass the LSN to them. I've
attached a draft patch for a different approach where we extract LSN
from the given file name in pg_logicalinspect functions. Thoughts?

Another interesting question is why the error is being promoted
to PANIC. That sure seems unnecessary, and there's a comment
in SnapBuildRestoreSnapshot that claims it won't happen.

Yeah. That's the fsync_fname() call in SnapBuildRestoreSnapshot() that promotes
to PANIC due to (data_sync_elevel(ERROR)). This behavior has been introduced in
9ccdd7f66e3 (that made the comment in SnapBuildRestore() (at that time) inaccurate).

I'm not sure we should change the logic here, maybe just update the comment?

I think that we don't need to change the logic. It might be a good
idea to clarify the comment, though.

Regards,

--
Masahiko Sawada

Amazon Web Services: https://aws.amazon.com

Attachments:

fix_pg_logicalinspect.patchapplication/octet-stream; name=fix_pg_logicalinspect.patchDownload+33-18
#6Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Masahiko Sawada (#5)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

Hi,

On Mon, Mar 03, 2025 at 12:42:01PM -0800, Masahiko Sawada wrote:

On Mon, Mar 3, 2025 at 12:50 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Now I wonder if pg_get_logical_snapshot_meta() (and pg_get_logical_snapshot_info()
which suffers for the same issues reported above) should take a lsn as input
parameter. That's how it has been proposed initially and that would simplify
things. Thoughts?

Yeah, that's one solution. But it would make pg_logicalinspect
functions hard to work with the pg_ls_loigcalsnapdir(). Users would
need to parse the filename by themselve and pass the LSN to them. I've
attached a draft patch for a different approach where we extract LSN
from the given file name in pg_logicalinspect functions. Thoughts?

Yeah I agree that your proposal is more user friendly. Thanks for the patch,
you beat me on it! (I was waiting that we agree on the approach before working
on it).

Looking at your patch, some comments:

=== 1

+       if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2)
+               ereport(ERROR,

sscanf would not return an error if the file being used is say "0-40796E18.foo":

postgres=# SELECT pg_get_logical_snapshot_info('0-40796E18.foo');
pg_get_logical_snapshot_info
-------------------------------------------------------------------------
(consistent,751,751,0/40796AF8,0/40796AF8,0,f,f,0/0,0,0,,2,"{751,752}")

I think that as long as it finds the &hi and &lo then it does not return an
error.

PFA 0001 a new version to also parse the ".snap".

=== 2

+ errmsg("invalid serialized snapshot file name \"%s\"", filename));

In the extension doc we mention "a snapshot file" so I think it makes sense to
remove "serialized" from the error message: also done in 0001 attached.

=== 3

+ * Extract LSN from the given serialized snapshot

s/Extract LSN/Extract the LSN/? (done in 0001 attached).

=== 4

Also adding some regression tests in 0001 attached to validate the imputs
and the permissions.

Yeah. That's the fsync_fname() call in SnapBuildRestoreSnapshot() that promotes
to PANIC due to (data_sync_elevel(ERROR)). This behavior has been introduced in
9ccdd7f66e3 (that made the comment in SnapBuildRestore() (at that time) inaccurate).

I'm not sure we should change the logic here, maybe just update the comment?

I think that we don't need to change the logic. It might be a good
idea to clarify the comment, though.

Agree. I think that we can just get rid of it: done in 0002 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-Fix-bug-in-pg_logicalinspect-functions.patchtext/x-diff; charset=us-asciiDownload+159-19
v2-0002-Remove-obsolete-comment-in-SnapBuildRestoreSnapsh.patchtext/x-diff; charset=us-asciiDownload+0-4
#7Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#6)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

Hi,

On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote:

PFA 0001 a new version to also parse the ".snap".

PFA v3 (a slightly different version) to "correctly" use the new report_error
introduced in v2.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-Fix-bug-in-pg_logicalinspect-functions.patchtext/x-diff; charset=us-asciiDownload+159-19
v3-0002-Remove-obsolete-comment-in-SnapBuildRestoreSnapsh.patchtext/x-diff; charset=us-asciiDownload+0-4
#8Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#7)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

Hi,

On Tue, Mar 04, 2025 at 10:45:54AM +0000, Bertrand Drouvot wrote:

Hi,

On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote:

PFA 0001 a new version to also parse the ".snap".

PFA v3 (a slightly different version) to "correctly" use the new report_error
introduced in v2.

It looks like that CheckPointSnapBuild() also rely on sscanf() to check
for the snapshot file extension. As seen, up-thread we can't rely on sscanf()
to do so.

Attached a c file to show the "issue":

$ gcc -o sscanf_file_ext sscanf_file_ext.c
$ ./sscanf_file_ext 0-40796E18.snap
File: "0-40796E18.snap"
Parse result: 2 (2 means success)

$ ./sscanf_file_ext 0-40796E18.foo
File: "0-40796E18.foo"
Parse result: 2 (2 means success)

So it looks like that, in reality, in CheckPointSnapBuild() we report a parsing
message and "continue" even if the file name extension is not ".snap".

That's not a big deal because, in pratice, it still removes the .$pid.tmp files
(given they have the "hex" part well formated). But, it does not look like it was
the original intent, so maybe we should also re-think CheckPointSnapBuild() and
open a dedicated thread? (If so, I'm happy to do so).

Note that there might be other places that may need the same kind of attention:
pg_archivecleanup.c?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

sscanf_file_ext.ctext/x-csrc; charset=us-asciiDownload
#9Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Bertrand Drouvot (#7)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

On Tue, Mar 4, 2025 at 2:45 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote:

PFA 0001 a new version to also parse the ".snap".

PFA v3 (a slightly different version) to "correctly" use the new report_error
introduced in v2.

Thank you for the patch! I agree to have some regression tests for that.

Looking at the 0001 patch, it seems that the pg_logicalinspect
functions still accepts invalid filenames as follows:

postgres(1:2713976)=# select
pg_get_logical_snapshot_info('0-40796E18.foo.snap');
ERROR: could not open file "pg_logical/snapshots/0-40796E18.snap": No
such file or directory

Regarding the 0002 patch, TBH I'm confused about what the original
comment intended to mean. I thought that it means that we don't need
to use a critical section for fsync_fname() calls to promote the fsync
error. If that's right, it would be better to update the comment
instead of removing it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#10Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Bertrand Drouvot (#8)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

On Tue, Mar 4, 2025 at 7:05 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Tue, Mar 04, 2025 at 10:45:54AM +0000, Bertrand Drouvot wrote:

Hi,

On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote:

PFA 0001 a new version to also parse the ".snap".

PFA v3 (a slightly different version) to "correctly" use the new report_error
introduced in v2.

It looks like that CheckPointSnapBuild() also rely on sscanf() to check
for the snapshot file extension. As seen, up-thread we can't rely on sscanf()
to do so.

Attached a c file to show the "issue":

$ gcc -o sscanf_file_ext sscanf_file_ext.c
$ ./sscanf_file_ext 0-40796E18.snap
File: "0-40796E18.snap"
Parse result: 2 (2 means success)

$ ./sscanf_file_ext 0-40796E18.foo
File: "0-40796E18.foo"
Parse result: 2 (2 means success)

So it looks like that, in reality, in CheckPointSnapBuild() we report a parsing
message and "continue" even if the file name extension is not ".snap".

That's not a big deal because, in pratice, it still removes the .$pid.tmp files
(given they have the "hex" part well formated). But, it does not look like it was
the original intent, so maybe we should also re-think CheckPointSnapBuild() and
open a dedicated thread? (If so, I'm happy to do so).

I believe that the situation is a bit different in these cases; in
CheckPointSnapBuild () we iterate over all files in
pg_logical/snapshots directory and extract the LSN from the filenames
whereas in pg_logicalinspect case, user can pass arbitrary filename.
Therefore, as for the former cases, it might make sense to check the
file extension as well if we want to prevent a problem where someone
injects files into pg_logical/snapshots directory. But it might be too
much and I guess that there are other places where we assume that
there are no malformed files injected under $PGDATA.

Note that there might be other places that may need the same kind of attention:
pg_archivecleanup.c?

In pg_archivecleanup.c, we use sscanf() two places: for partial WAL
files and for backup history files. IIUC we check the format of the
filename using IsPartialXLogFileName() and IsBackupHistoryFileName()
before using sscanf() so I think it's okay.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#11Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Masahiko Sawada (#9)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

Hi,

On Tue, Mar 04, 2025 at 12:11:07PM -0800, Masahiko Sawada wrote:

On Tue, Mar 4, 2025 at 2:45 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Tue, Mar 04, 2025 at 07:38:14AM +0000, Bertrand Drouvot wrote:

PFA 0001 a new version to also parse the ".snap".

PFA v3 (a slightly different version) to "correctly" use the new report_error
introduced in v2.

Thank you for the patch! I agree to have some regression tests for that.

Looking at the 0001 patch, it seems that the pg_logicalinspect
functions still accepts invalid filenames as follows:

postgres(1:2713976)=# select
pg_get_logical_snapshot_info('0-40796E18.foo.snap');
ERROR: could not open file "pg_logical/snapshots/0-40796E18.snap": No
such file or directory

Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree()
in parse_snapshot_filename() is not needed per say because the function is
currently executed in a short-lived memory context. It's there for safety reason
in case it's called outside those SQL apis in the future.

Regarding the 0002 patch, TBH I'm confused about what the original
comment intended to mean. I thought that it means that we don't need
to use a critical section for fsync_fname() calls to promote the fsync
error.

Yeah I think the same.

If that's right, it would be better to update the comment
instead of removing it.

hmm I'm not sure what to say then. It was saying that there is not need to promote
to PANIC because it won't be usable anyway (if not fsynced).

Since 9ccdd7f66e3 we do promote to PANIC so I thought the most simple to avoid
any extra confusion is to remove the comment (since it's wrong since 9ccdd7f66e3).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0001-Fix-bug-in-pg_logicalinspect-functions.patchtext/x-diff; charset=us-asciiDownload+180-19
#12Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#11)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

Hi,

On Tue, Mar 04, 2025 at 09:45:54PM +0000, Bertrand Drouvot wrote:

Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree()
in parse_snapshot_filename() is not needed per say because the function is
currently executed in a short-lived memory context. It's there for safety reason
in case it's called outside those SQL apis in the future.

After sleeping on it, PFA a simplified version.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v5-0001-Fix-bug-in-pg_logicalinspect-functions.patchtext/x-diff; charset=us-asciiDownload+173-19
#13Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Bertrand Drouvot (#12)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

On Tue, Mar 4, 2025 at 10:44 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Tue, Mar 04, 2025 at 09:45:54PM +0000, Bertrand Drouvot wrote:

Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree()
in parse_snapshot_filename() is not needed per say because the function is
currently executed in a short-lived memory context. It's there for safety reason
in case it's called outside those SQL apis in the future.

After sleeping on it, PFA a simplified version.

Thank you for updating the patch.

I think we don't need to even do palloc() for the buffer as we can use
the char[MAXPGPATH] instead. I've attached the patch to improve the
parse_snapshot_filename() function and add some regression tests.
Please review these changes.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

improve_parse_snapshot_filename.patchapplication/octet-stream; name=improve_parse_snapshot_filename.patchDownload+24-21
#14Robins Tharakan
tharakan@gmail.com
In reply to: Masahiko Sawada (#13)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

On Thu, 6 Mar 2025 at 17:13, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Mar 4, 2025 at 10:44 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

After sleeping on it, PFA a simplified version.

Running tests on v5 patch for a few hours was uneventful (which is good).

I think we don't need to even do palloc() for the buffer as we can use
the char[MAXPGPATH] instead. I've attached the patch to improve the
parse_snapshot_filename() function and add some regression tests.
Please review these changes.

I've updated that with this patch too. Will update if something comes up.

-
robins

#15Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Masahiko Sawada (#13)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

Hi,

On Wed, Mar 05, 2025 at 10:42:35PM -0800, Masahiko Sawada wrote:

On Tue, Mar 4, 2025 at 10:44 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Tue, Mar 04, 2025 at 09:45:54PM +0000, Bertrand Drouvot wrote:

Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree()
in parse_snapshot_filename() is not needed per say because the function is
currently executed in a short-lived memory context. It's there for safety reason
in case it's called outside those SQL apis in the future.

After sleeping on it, PFA a simplified version.

Thank you for updating the patch.

I think we don't need to even do palloc() for the buffer as we can use
the char[MAXPGPATH] instead.

Sure.

I've attached the patch to improve the
parse_snapshot_filename() function and add some regression tests.
Please review these changes.

Thanks for the patch!

=== 1

-parse_snapshot_filename(const char *filename)
+parse_snapshot_filename(char *filename)

Why?

=== 2

-   if (sscanf(filename, "%X-%X", &hi, &lo) != 2)
+   if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2)

We could replace (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) with
(sscanf(filename, "%X-%X.foo", &hi, &lo) != 2) and the regression tests would
still pass.

So, I think it's better to remove the .snap here as it could give the "wrong"
impression that it's "useful".

The attached removes the .snap and adds a comment like:

"
* Note: We deliberately don't use "%X-%X.snap" because sscanf only counts
* converted values (%X), not literal text matches.
"

I think it makes sense to document this behavior.

=== 3

+   /*
+    * Bring back the LSN to the snapshot file format and compare
+    * it to  the given name to see if the extracted LSN is sane.
+    */
+   sprintf(tmpfname, "%X-%X.snap", hi, lo);
+   if (strcmp(tmpfname, filename) != 0)

The idea was also to ensure that there are no extra characters between the LSN
values and the .snap extension: Adding this as an extra comment in the attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v6-0001-Fix-bug-in-pg_logicalinspect-functions.patchtext/x-diff; charset=us-asciiDownload+184-19
#16Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Bertrand Drouvot (#15)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

On Thu, Mar 6, 2025 at 12:11 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Wed, Mar 05, 2025 at 10:42:35PM -0800, Masahiko Sawada wrote:

On Tue, Mar 4, 2025 at 10:44 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Tue, Mar 04, 2025 at 09:45:54PM +0000, Bertrand Drouvot wrote:

Indeed, thanks for looking at it! Fixed in v4 attached. Note that the pfree()
in parse_snapshot_filename() is not needed per say because the function is
currently executed in a short-lived memory context. It's there for safety reason
in case it's called outside those SQL apis in the future.

After sleeping on it, PFA a simplified version.

Thank you for updating the patch.

I think we don't need to even do palloc() for the buffer as we can use
the char[MAXPGPATH] instead.

Sure.

I've attached the patch to improve the
parse_snapshot_filename() function and add some regression tests.
Please review these changes.

Thanks for the patch!

=== 1

-parse_snapshot_filename(const char *filename)
+parse_snapshot_filename(char *filename)

Why?

Sorry, that's a leftover that I attempted. We should use 'const'.

=== 2

-   if (sscanf(filename, "%X-%X", &hi, &lo) != 2)
+   if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2)

We could replace (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) with
(sscanf(filename, "%X-%X.foo", &hi, &lo) != 2) and the regression tests would
still pass.

So, I think it's better to remove the .snap here as it could give the "wrong"
impression that it's "useful".

The attached removes the .snap and adds a comment like:

"
* Note: We deliberately don't use "%X-%X.snap" because sscanf only counts
* converted values (%X), not literal text matches.
"

Yes, but we include the suffix when doing sscanf() in many other
places. I don't see the specific benefit of doing it in another way
only in this case.

I think it makes sense to document this behavior.

=== 3

+   /*
+    * Bring back the LSN to the snapshot file format and compare
+    * it to  the given name to see if the extracted LSN is sane.
+    */
+   sprintf(tmpfname, "%X-%X.snap", hi, lo);
+   if (strcmp(tmpfname, filename) != 0)

The idea was also to ensure that there are no extra characters between the LSN
values and the .snap extension:

Right. How about the following comment?

Bring back the extracted LSN to the snapshot file format and compare
it to the given filename. This check strictly checks if the given filename
follows the format of the snapshot filename.

FYI I've analyzed the idea of extracting LSN from the filename and
bringing it back to the format to validate the given filename. IIUC we
strictly validate the given filename. For example, we accept
'0-ABC.snap' but not '0-0ABC.snap, which is probably fine. Also, I was
concerned that sscanf() could be affected by locale, but it's fine
particularly in this case as we scan only hex numbers.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#17Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Masahiko Sawada (#16)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

Hi,

On Thu, Mar 06, 2025 at 11:35:58AM -0800, Masahiko Sawada wrote:

On Thu, Mar 6, 2025 at 12:11 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

=== 1

-parse_snapshot_filename(const char *filename)
+parse_snapshot_filename(char *filename)

Why?

Sorry, that's a leftover that I attempted. We should use 'const'.

No problem at all! Okay, re-added in the attached.

=== 2

-   if (sscanf(filename, "%X-%X", &hi, &lo) != 2)
+   if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2)

We could replace (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) with
(sscanf(filename, "%X-%X.foo", &hi, &lo) != 2) and the regression tests would
still pass.

So, I think it's better to remove the .snap here as it could give the "wrong"
impression that it's "useful".

The attached removes the .snap and adds a comment like:

"
* Note: We deliberately don't use "%X-%X.snap" because sscanf only counts
* converted values (%X), not literal text matches.
"

Yes, but we include the suffix when doing sscanf() in many other
places. I don't see the specific benefit of doing it in another way
only in this case.

The idea was to not give the "wrong" impression that it's "useful" to add the
suffix. That said I agree, it would be better to also update the other places
where the suffix is being used. I think I'll start a dedicated thread for
that (to at least put some comments around those places).

As far our current patch then I propose to add a comment though, like in the
attached?

=== 3

+   /*
+    * Bring back the LSN to the snapshot file format and compare
+    * it to  the given name to see if the extracted LSN is sane.
+    */
+   sprintf(tmpfname, "%X-%X.snap", hi, lo);
+   if (strcmp(tmpfname, filename) != 0)

The idea was also to ensure that there are no extra characters between the LSN
values and the .snap extension:

Right. How about the following comment?

Bring back the extracted LSN to the snapshot file format and compare
it to the given filename. This check strictly checks if the given filename
follows the format of the snapshot filename.

Yeah sounds good, done that way in the attached.

FYI I've analyzed the idea of extracting LSN from the filename and
bringing it back to the format to validate the given filename. IIUC we
strictly validate the given filename. For example, we accept
'0-ABC.snap' but not '0-0ABC.snap',

Yeah, and so '0A-ABC.snap' for example.

which is probably fine.

I think so, as long as the snapshot files are generated with %X (and not %08X
for example) in SnapBuildSerialize():

"
sprintf(path, "%s/%X-%X.snap",
PG_LOGICAL_SNAPSHOTS_DIR,
LSN_FORMAT_ARGS(lsn));
"

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v7-0001-Fix-bug-in-pg_logicalinspect-functions.patchtext/x-diff; charset=us-asciiDownload+183-19
#18Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Bertrand Drouvot (#17)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

On Thu, Mar 6, 2025 at 11:02 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Thu, Mar 06, 2025 at 11:35:58AM -0800, Masahiko Sawada wrote:

On Thu, Mar 6, 2025 at 12:11 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

=== 1

-parse_snapshot_filename(const char *filename)
+parse_snapshot_filename(char *filename)

Why?

Sorry, that's a leftover that I attempted. We should use 'const'.

No problem at all! Okay, re-added in the attached.

=== 2

-   if (sscanf(filename, "%X-%X", &hi, &lo) != 2)
+   if (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2)

We could replace (sscanf(filename, "%X-%X.snap", &hi, &lo) != 2) with
(sscanf(filename, "%X-%X.foo", &hi, &lo) != 2) and the regression tests would
still pass.

So, I think it's better to remove the .snap here as it could give the "wrong"
impression that it's "useful".

The attached removes the .snap and adds a comment like:

"
* Note: We deliberately don't use "%X-%X.snap" because sscanf only counts
* converted values (%X), not literal text matches.
"

Yes, but we include the suffix when doing sscanf() in many other
places. I don't see the specific benefit of doing it in another way
only in this case.

The idea was to not give the "wrong" impression that it's "useful" to add the
suffix. That said I agree, it would be better to also update the other places
where the suffix is being used. I think I'll start a dedicated thread for
that (to at least put some comments around those places).

As far our current patch then I propose to add a comment though, like in the
attached?

=== 3

+   /*
+    * Bring back the LSN to the snapshot file format and compare
+    * it to  the given name to see if the extracted LSN is sane.
+    */
+   sprintf(tmpfname, "%X-%X.snap", hi, lo);
+   if (strcmp(tmpfname, filename) != 0)

The idea was also to ensure that there are no extra characters between the LSN
values and the .snap extension:

Right. How about the following comment?

Bring back the extracted LSN to the snapshot file format and compare
it to the given filename. This check strictly checks if the given filename
follows the format of the snapshot filename.

Yeah sounds good, done that way in the attached.

FYI I've analyzed the idea of extracting LSN from the filename and
bringing it back to the format to validate the given filename. IIUC we
strictly validate the given filename. For example, we accept
'0-ABC.snap' but not '0-0ABC.snap',

Yeah, and so '0A-ABC.snap' for example.

which is probably fine.

I think so, as long as the snapshot files are generated with %X (and not %08X
for example) in SnapBuildSerialize():

"
sprintf(path, "%s/%X-%X.snap",
PG_LOGICAL_SNAPSHOTS_DIR,
LSN_FORMAT_ARGS(lsn));
"

Thank you for updating the patch. I've made some cosmetic changes and
attached a new version patch. Could you review it?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v8-0001-pg_logicalinspect-Add-validation-check-for-input-.patchapplication/octet-stream; name=v8-0001-pg_logicalinspect-Add-validation-check-for-input-.patchDownload+183-19
#19Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Masahiko Sawada (#18)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

Hi,

On Fri, Mar 07, 2025 at 03:47:14PM -0800, Masahiko Sawada wrote:

Thank you for updating the patch. I've made some cosmetic changes and
attached a new version patch. Could you review it?

LGTM, thanks!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#20Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Bertrand Drouvot (#19)
Re: BUG #18828: Crash when pg_get_logical_snapshot_meta() passed empty string

On Sat, Mar 8, 2025 at 12:02 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Fri, Mar 07, 2025 at 03:47:14PM -0800, Masahiko Sawada wrote:

Thank you for updating the patch. I've made some cosmetic changes and
attached a new version patch. Could you review it?

LGTM, thanks!

Pushed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#21Nathan Bossart
nathandbossart@gmail.com
In reply to: Masahiko Sawada (#20)
#22Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Nathan Bossart (#21)
#23Nathan Bossart
nathandbossart@gmail.com
In reply to: Masahiko Sawada (#22)
#24Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Nathan Bossart (#23)
#25David Rowley
dgrowleyml@gmail.com
In reply to: Masahiko Sawada (#22)
#26Masahiko Sawada
sawada.mshk@gmail.com
In reply to: David Rowley (#25)