Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

Started by Fujii Masao9 months ago9 messageshackers
Jump to latest
#1Fujii Masao
masao.fujii@gmail.com

Hi,

It looks like pg_dump --filter can mistakenly treat invalid object types
in the filter file as valid ones. For example, the invalid type "table-data"
(probably a typo for "table_data") is incorrectly recognized as "table",
and pg_dump runs without error when it should fail.

--------------------------------------------
$ cat filter.txt
exclude table-data one

$ pg_dump --filter filter.txt
--
-- PostgreSQL database dump
--
...

$ echo $?
0
--------------------------------------------

This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
identifies tokens as sequences of ASCII alphabetic characters, treating
non-alphabetic characters (like hyphens) as token boundaries. As a result,
"table-data" is parsed as "table".

To fix this, I've attached the patch that updates pg_dump --filter so that
it treats tokens as strings of non-space characters separated by spaces
or line endings, ensuring invalid types like "table-data" are correctly
rejected. Thought?

With the patch:
--------------------------------------------
$ cat filter.txt
exclude table-data one

$ pg_dump --filter filter.txt
pg_dump: error: invalid format in filter read from file "filter.txt"
on line 1: unsupported filter object type: "table-data"
--------------------------------------------

Regards,

--
Fujii Masao

Attachments:

v1-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchapplication/octet-stream; name=v1-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchDownload+25-16
#2Srinath Reddy Sadipiralla
srinath2133@gmail.com
In reply to: Fujii Masao (#1)
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

On Sat, Aug 2, 2025 at 3:18 PM Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

It looks like pg_dump --filter can mistakenly treat invalid object types
in the filter file as valid ones. For example, the invalid type
"table-data"
(probably a typo for "table_data") is incorrectly recognized as "table",
and pg_dump runs without error when it should fail.

--------------------------------------------
$ cat filter.txt
exclude table-data one

$ pg_dump --filter filter.txt
--
-- PostgreSQL database dump
--
...

$ echo $?
0
--------------------------------------------

This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
identifies tokens as sequences of ASCII alphabetic characters, treating
non-alphabetic characters (like hyphens) as token boundaries. As a result,
"table-data" is parsed as "table".

To fix this, I've attached the patch that updates pg_dump --filter so that
it treats tokens as strings of non-space characters separated by spaces
or line endings, ensuring invalid types like "table-data" are correctly
rejected. Thought?

With the patch:
--------------------------------------------
$ cat filter.txt
exclude table-data one

$ pg_dump --filter filter.txt
pg_dump: error: invalid format in filter read from file "filter.txt"
on line 1: unsupported filter object type: "table-data"
--------------------------------------------

Hi Fujii-san , +1 for the patch , I have reviewed and tested it and LGTM.

--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

#3Xuneng Zhou
xunengzhou@gmail.com
In reply to: Fujii Masao (#1)
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

Hi Fujii-san,

Thanks for working on this.

On Sat, Aug 2, 2025 at 5:48 PM Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

It looks like pg_dump --filter can mistakenly treat invalid object types
in the filter file as valid ones. For example, the invalid type "table-data"
(probably a typo for "table_data") is incorrectly recognized as "table",
and pg_dump runs without error when it should fail.

--------------------------------------------
$ cat filter.txt
exclude table-data one

$ pg_dump --filter filter.txt
--
-- PostgreSQL database dump
--
...

$ echo $?
0
--------------------------------------------

This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
identifies tokens as sequences of ASCII alphabetic characters, treating
non-alphabetic characters (like hyphens) as token boundaries. As a result,
"table-data" is parsed as "table".

To fix this, I've attached the patch that updates pg_dump --filter so that
it treats tokens as strings of non-space characters separated by spaces
or line endings, ensuring invalid types like "table-data" are correctly
rejected. Thought?

With the patch:
--------------------------------------------
$ cat filter.txt
exclude table-data one

$ pg_dump --filter filter.txt
pg_dump: error: invalid format in filter read from file "filter.txt"
on line 1: unsupported filter object type: "table-data"
--------------------------------------------

After testing, the patch LGTM. I noticed two very small possible nits:

1) Comment wording

The loop now calls isspace((unsigned char)*ptr), so a token ends at
any whitespace, not just at ASCII space (0x20). Could we revise the
comment—from
“strings of non-space characters bounded by space characters”
to something like
“strings of non-space characters bounded by whitespace”
—to match the behavior?

2) Variable name

const char *keyword = filter_get_token(&str, &size);
keyword = filter_get_token(&str, &size);

After the patch, filter_get_token() no longer returns a keyword
(letters-only identifier); it now returns any non-whitespace token.
Renaming the variable from keyword to token (or similar) might make
the intent clearer..

Best,
Xuneng

#4Fujii Masao
masao.fujii@gmail.com
In reply to: Xuneng Zhou (#3)
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

On Sun, Aug 3, 2025 at 3:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

After testing, the patch LGTM. I noticed two very small possible nits:

Thanks for the review!

1) Comment wording

The loop now calls isspace((unsigned char)*ptr), so a token ends at
any whitespace, not just at ASCII space (0x20). Could we revise the
comment—from
“strings of non-space characters bounded by space characters”
to something like
“strings of non-space characters bounded by whitespace”
—to match the behavior?

I agree with the change. But the phrase "strings of non-space characters
bounded by whitespace" is a bit redundant, and "strings of non-whitespace
characters" is sufficient, isn't it? So I used that wording in the updated
patch I've attached.

2) Variable name

const char *keyword = filter_get_token(&str, &size);
keyword = filter_get_token(&str, &size);

After the patch, filter_get_token() no longer returns a keyword
(letters-only identifier); it now returns any non-whitespace token.
Renaming the variable from keyword to token (or similar) might make
the intent clearer..

This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?

Regards,

--
Fujii Masao

Attachments:

v2-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchapplication/octet-stream; name=v2-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchDownload+15-10
#5Xuneng Zhou
xunengzhou@gmail.com
In reply to: Fujii Masao (#4)
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

On Mon, Aug 4, 2025 at 11:18 PM Fujii Masao <masao.fujii@gmail.com> wrote:

On Sun, Aug 3, 2025 at 3:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:

After testing, the patch LGTM. I noticed two very small possible nits:

Thanks for the review!

1) Comment wording

The loop now calls isspace((unsigned char)*ptr), so a token ends at
any whitespace, not just at ASCII space (0x20). Could we revise the
comment—from
“strings of non-space characters bounded by space characters”
to something like
“strings of non-space characters bounded by whitespace”
—to match the behavior?

I agree with the change. But the phrase "strings of non-space characters
bounded by whitespace" is a bit redundant, and "strings of non-whitespace
characters" is sufficient, isn't it? So I used that wording in the updated
patch I've attached.

2) Variable name

const char *keyword = filter_get_token(&str, &size);
keyword = filter_get_token(&str, &size);

After the patch, filter_get_token() no longer returns a keyword
(letters-only identifier); it now returns any non-whitespace token.
Renaming the variable from keyword to token (or similar) might make
the intent clearer..

This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?

+1, this looks more elegant to me.

Best,
Xuneng

#6Daniel Gustafsson
daniel@yesql.se
In reply to: Fujii Masao (#4)
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

On 4 Aug 2025, at 17:18, Fujii Masao <masao.fujii@gmail.com> wrote:

I missed this thread while being on vacation, thanks for finding and fixing
this!

This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?

Agreed, this should work fine, and it aligns the code somwhat with read_pattern
which is a good thing.

+ * in line buffer. Returns NULL when the buffer is empty or no keyword exists.
Since "is empty" could be interpreted as being a null pointer, maybe we should
add a if (!*line) check (or an Assert) before we dereference the passed in
buffer?

--
Daniel Gustafsson

#7Fujii Masao
masao.fujii@gmail.com
In reply to: Daniel Gustafsson (#6)
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

On Tue, Aug 5, 2025 at 4:52 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 4 Aug 2025, at 17:18, Fujii Masao <masao.fujii@gmail.com> wrote:

I missed this thread while being on vacation, thanks for finding and fixing
this!

This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?

Agreed, this should work fine, and it aligns the code somwhat with read_pattern
which is a good thing.

+ * in line buffer. Returns NULL when the buffer is empty or no keyword exists.
Since "is empty" could be interpreted as being a null pointer, maybe we should
add a if (!*line) check (or an Assert) before we dereference the passed in
buffer?

Thanks for the review!

I've added Assert(*line != NULL) at the start of filter_get_keyword().
Updated patch attached.

Regards,

--
Fujii Masao

Attachments:

v3-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchapplication/octet-stream; name=v3-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchDownload+18-10
#8Daniel Gustafsson
daniel@yesql.se
In reply to: Fujii Masao (#7)
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

On 6 Aug 2025, at 06:49, Fujii Masao <masao.fujii@gmail.com> wrote:

I've added Assert(*line != NULL) at the start of filter_get_keyword().
Updated patch attached.

LGTM.

--
Daniel Gustafsson

#9Fujii Masao
masao.fujii@gmail.com
In reply to: Daniel Gustafsson (#8)
Re: Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid

On Thu, Aug 7, 2025 at 9:17 PM Daniel Gustafsson <daniel@yesql.se> wrote:

On 6 Aug 2025, at 06:49, Fujii Masao <masao.fujii@gmail.com> wrote:

I've added Assert(*line != NULL) at the start of filter_get_keyword().
Updated patch attached.

LGTM.

I've pushed the patch. Thanks!

Regards,

--
Fujii Masao