Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid
Hi,
It looks like pg_dump --filter can mistakenly treat invalid object types
in the filter file as valid ones. For example, the invalid type "table-data"
(probably a typo for "table_data") is incorrectly recognized as "table",
and pg_dump runs without error when it should fail.
--------------------------------------------
$ cat filter.txt
exclude table-data one
$ pg_dump --filter filter.txt
--
-- PostgreSQL database dump
--
...
$ echo $?
0
--------------------------------------------
This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
identifies tokens as sequences of ASCII alphabetic characters, treating
non-alphabetic characters (like hyphens) as token boundaries. As a result,
"table-data" is parsed as "table".
To fix this, I've attached the patch that updates pg_dump --filter so that
it treats tokens as strings of non-space characters separated by spaces
or line endings, ensuring invalid types like "table-data" are correctly
rejected. Thought?
With the patch:
--------------------------------------------
$ cat filter.txt
exclude table-data one
$ pg_dump --filter filter.txt
pg_dump: error: invalid format in filter read from file "filter.txt"
on line 1: unsupported filter object type: "table-data"
--------------------------------------------
Regards,
--
Fujii Masao
Attachments:
v1-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchapplication/octet-stream; name=v1-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchDownload+25-16
On Sat, Aug 2, 2025 at 3:18 PM Fujii Masao <masao.fujii@gmail.com> wrote:
Hi,
It looks like pg_dump --filter can mistakenly treat invalid object types
in the filter file as valid ones. For example, the invalid type
"table-data"
(probably a typo for "table_data") is incorrectly recognized as "table",
and pg_dump runs without error when it should fail.--------------------------------------------
$ cat filter.txt
exclude table-data one$ pg_dump --filter filter.txt
--
-- PostgreSQL database dump
--
...$ echo $?
0
--------------------------------------------This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
identifies tokens as sequences of ASCII alphabetic characters, treating
non-alphabetic characters (like hyphens) as token boundaries. As a result,
"table-data" is parsed as "table".To fix this, I've attached the patch that updates pg_dump --filter so that
it treats tokens as strings of non-space characters separated by spaces
or line endings, ensuring invalid types like "table-data" are correctly
rejected. Thought?With the patch:
--------------------------------------------
$ cat filter.txt
exclude table-data one$ pg_dump --filter filter.txt
pg_dump: error: invalid format in filter read from file "filter.txt"
on line 1: unsupported filter object type: "table-data"
--------------------------------------------
Hi Fujii-san , +1 for the patch , I have reviewed and tested it and LGTM.
--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
Hi Fujii-san,
Thanks for working on this.
On Sat, Aug 2, 2025 at 5:48 PM Fujii Masao <masao.fujii@gmail.com> wrote:
Hi,
It looks like pg_dump --filter can mistakenly treat invalid object types
in the filter file as valid ones. For example, the invalid type "table-data"
(probably a typo for "table_data") is incorrectly recognized as "table",
and pg_dump runs without error when it should fail.--------------------------------------------
$ cat filter.txt
exclude table-data one$ pg_dump --filter filter.txt
--
-- PostgreSQL database dump
--
...$ echo $?
0
--------------------------------------------This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
identifies tokens as sequences of ASCII alphabetic characters, treating
non-alphabetic characters (like hyphens) as token boundaries. As a result,
"table-data" is parsed as "table".To fix this, I've attached the patch that updates pg_dump --filter so that
it treats tokens as strings of non-space characters separated by spaces
or line endings, ensuring invalid types like "table-data" are correctly
rejected. Thought?With the patch:
--------------------------------------------
$ cat filter.txt
exclude table-data one$ pg_dump --filter filter.txt
pg_dump: error: invalid format in filter read from file "filter.txt"
on line 1: unsupported filter object type: "table-data"
--------------------------------------------
After testing, the patch LGTM. I noticed two very small possible nits:
1) Comment wording
The loop now calls isspace((unsigned char)*ptr), so a token ends at
any whitespace, not just at ASCII space (0x20). Could we revise the
comment—from
“strings of non-space characters bounded by space characters”
to something like
“strings of non-space characters bounded by whitespace”
—to match the behavior?
2) Variable name
const char *keyword = filter_get_token(&str, &size);
keyword = filter_get_token(&str, &size);
After the patch, filter_get_token() no longer returns a keyword
(letters-only identifier); it now returns any non-whitespace token.
Renaming the variable from keyword to token (or similar) might make
the intent clearer..
Best,
Xuneng
On Sun, Aug 3, 2025 at 3:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
After testing, the patch LGTM. I noticed two very small possible nits:
Thanks for the review!
1) Comment wording
The loop now calls isspace((unsigned char)*ptr), so a token ends at
any whitespace, not just at ASCII space (0x20). Could we revise the
comment—from
“strings of non-space characters bounded by space characters”
to something like
“strings of non-space characters bounded by whitespace”
—to match the behavior?
I agree with the change. But the phrase "strings of non-space characters
bounded by whitespace" is a bit redundant, and "strings of non-whitespace
characters" is sufficient, isn't it? So I used that wording in the updated
patch I've attached.
2) Variable name
const char *keyword = filter_get_token(&str, &size);
keyword = filter_get_token(&str, &size);After the patch, filter_get_token() no longer returns a keyword
(letters-only identifier); it now returns any non-whitespace token.
Renaming the variable from keyword to token (or similar) might make
the intent clearer..
This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?
Regards,
--
Fujii Masao
Attachments:
v2-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchapplication/octet-stream; name=v2-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchDownload+15-10
On Mon, Aug 4, 2025 at 11:18 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Sun, Aug 3, 2025 at 3:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
After testing, the patch LGTM. I noticed two very small possible nits:
Thanks for the review!
1) Comment wording
The loop now calls isspace((unsigned char)*ptr), so a token ends at
any whitespace, not just at ASCII space (0x20). Could we revise the
comment—from
“strings of non-space characters bounded by space characters”
to something like
“strings of non-space characters bounded by whitespace”
—to match the behavior?I agree with the change. But the phrase "strings of non-space characters
bounded by whitespace" is a bit redundant, and "strings of non-whitespace
characters" is sufficient, isn't it? So I used that wording in the updated
patch I've attached.2) Variable name
const char *keyword = filter_get_token(&str, &size);
keyword = filter_get_token(&str, &size);After the patch, filter_get_token() no longer returns a keyword
(letters-only identifier); it now returns any non-whitespace token.
Renaming the variable from keyword to token (or similar) might make
the intent clearer..This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?
+1, this looks more elegant to me.
Best,
Xuneng
On 4 Aug 2025, at 17:18, Fujii Masao <masao.fujii@gmail.com> wrote:
I missed this thread while being on vacation, thanks for finding and fixing
this!
This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?
Agreed, this should work fine, and it aligns the code somwhat with read_pattern
which is a good thing.
+ * in line buffer. Returns NULL when the buffer is empty or no keyword exists.
Since "is empty" could be interpreted as being a null pointer, maybe we should
add a if (!*line) check (or an Assert) before we dereference the passed in
buffer?
--
Daniel Gustafsson
On Tue, Aug 5, 2025 at 4:52 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 4 Aug 2025, at 17:18, Fujii Masao <masao.fujii@gmail.com> wrote:
I missed this thread while being on vacation, thanks for finding and fixing
this!This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?Agreed, this should work fine, and it aligns the code somwhat with read_pattern
which is a good thing.+ * in line buffer. Returns NULL when the buffer is empty or no keyword exists.
Since "is empty" could be interpreted as being a null pointer, maybe we should
add a if (!*line) check (or an Assert) before we dereference the passed in
buffer?
Thanks for the review!
I've added Assert(*line != NULL) at the start of filter_get_keyword().
Updated patch attached.
Regards,
--
Fujii Masao
Attachments:
v3-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchapplication/octet-stream; name=v3-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchDownload+18-10
On 6 Aug 2025, at 06:49, Fujii Masao <masao.fujii@gmail.com> wrote:
I've added Assert(*line != NULL) at the start of filter_get_keyword().
Updated patch attached.
LGTM.
--
Daniel Gustafsson
On Thu, Aug 7, 2025 at 9:17 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 6 Aug 2025, at 06:49, Fujii Masao <masao.fujii@gmail.com> wrote:
I've added Assert(*line != NULL) at the start of filter_get_keyword().
Updated patch attached.LGTM.
I've pushed the patch. Thanks!
Regards,
--
Fujii Masao