Bug in pg_dump --filter? - Invalid object types can be misinterpreted as valid
Hi,
It looks like pg_dump --filter can mistakenly treat invalid object types
in the filter file as valid ones. For example, the invalid type "table-data"
(probably a typo for "table_data") is incorrectly recognized as "table",
and pg_dump runs without error when it should fail.
--------------------------------------------
$ cat filter.txt
exclude table-data one
$ pg_dump --filter filter.txt
--
-- PostgreSQL database dump
--
...
$ echo $?
0
--------------------------------------------
This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
identifies tokens as sequences of ASCII alphabetic characters, treating
non-alphabetic characters (like hyphens) as token boundaries. As a result,
"table-data" is parsed as "table".
To fix this, I've attached the patch that updates pg_dump --filter so that
it treats tokens as strings of non-space characters separated by spaces
or line endings, ensuring invalid types like "table-data" are correctly
rejected. Thought?
With the patch:
--------------------------------------------
$ cat filter.txt
exclude table-data one
$ pg_dump --filter filter.txt
pg_dump: error: invalid format in filter read from file "filter.txt"
on line 1: unsupported filter object type: "table-data"
--------------------------------------------
Regards,
--
Fujii Masao
Attachments:
v1-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchapplication/octet-stream; name=v1-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchDownload
From ed6f7a0586f79179d12365c258531c726b2cf556 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Sat, 2 Aug 2025 16:15:37 +0900
Subject: [PATCH v1] pg_dump: Fix incorrect parsing of object types in pg_dump
--filter.
Previously, pg_dump --filter could misinterpret invalid object types
in the filter file as valid ones. For example, the invalid object type
"table-data" (likely a typo for the valid "table_data") could be
mistakenly recognized as "table", causing pg_dump to succeed
when it should have failed.
This happened because pg_dump identified tokens as sequences of
ASCII alphabetic characters, treating non-alphabetic characters
(like hyphens) as token boundaries. As a result, "table-data" was
parsed as "table".
To fix this, pg_dump --filter now treats tokens as strings of
non-space characters separated by spaces or line endings,
ensuring invalid types like "table-data" are correctly rejected.
Back-patch to v17, where the --filter option was introduced.
---
src/bin/pg_dump/filter.c | 26 ++++++++++++---------
src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 14 +++++++----
2 files changed, 25 insertions(+), 15 deletions(-)
diff --git a/src/bin/pg_dump/filter.c b/src/bin/pg_dump/filter.c
index 7214d514137..71b4bd9a73a 100644
--- a/src/bin/pg_dump/filter.c
+++ b/src/bin/pg_dump/filter.c
@@ -169,31 +169,35 @@ pg_log_filter_error(FilterStateData *fstate, const char *fmt,...)
}
/*
- * filter_get_keyword - read the next filter keyword from buffer
+ * filter_get_token - read the next filter token from buffer
*
- * Search for keywords (limited to ascii alphabetic characters) in
- * the passed in line buffer. Returns NULL when the buffer is empty or the first
- * char is not alpha. The char '_' is allowed, except as the first character.
+ * Search for tokens (strings of non-space characters bounded by
+ * space characters or end of line) in the passed in line buffer.
+ * Returns NULL when the buffer is empty or no token exists.
* The length of the found keyword is returned in the size parameter.
*/
static const char *
-filter_get_keyword(const char **line, int *size)
+filter_get_token(const char **line, int *size)
{
const char *ptr = *line;
const char *result = NULL;
- /* Set returned length preemptively in case no keyword is found */
+ /* Set returned length preemptively in case no token is found */
*size = 0;
- /* Skip initial whitespace */
+ /* Skip initial space characters */
while (isspace((unsigned char) *ptr))
ptr++;
- if (isalpha((unsigned char) *ptr))
+ /*
+ * Grab one token that's the string of non-space characters bounded by
+ * space characters or end of line.
+ */
+ if (*ptr != '\0' && !isspace((unsigned char) *ptr))
{
result = ptr++;
- while (isalpha((unsigned char) *ptr) || *ptr == '_')
+ while (*ptr != '\0' && !isspace((unsigned char) *ptr))
ptr++;
*size = ptr - result;
@@ -414,7 +418,7 @@ filter_read_item(FilterStateData *fstate,
* First we expect sequence of two keywords, {include|exclude}
* followed by the object type to operate on.
*/
- keyword = filter_get_keyword(&str, &size);
+ keyword = filter_get_token(&str, &size);
if (!keyword)
{
pg_log_filter_error(fstate,
@@ -433,7 +437,7 @@ filter_read_item(FilterStateData *fstate,
fstate->exit_nicely(1);
}
- keyword = filter_get_keyword(&str, &size);
+ keyword = filter_get_token(&str, &size);
if (!keyword)
{
pg_log_filter_error(fstate, _("missing filter object type"));
diff --git a/src/bin/pg_dump/t/005_pg_dump_filterfile.pl b/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
index f05e8a20e05..d788406add4 100644
--- a/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
+++ b/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
@@ -418,10 +418,16 @@ command_fails_like(
qr/invalid filter command/,
"invalid syntax: incorrect filter command");
-# Test invalid object type
+# Test invalid object type.
+#
+# This test also verifies that tokens are correctly recognized as sequences of
+# non-space characters separated by spaces or line endings. If the parser
+# incorrectly treats non-space delimiters (like hyphens) as token boundaries,
+# "table-data" might be misread as the valid object type "table". To catch such
+# issues, "table-data" is used here as an intentionally invalid object type.
open $inputfile, '>', "$tempdir/inputfile.txt"
or die "unable to open filterfile for writing";
-print $inputfile "include xxx";
+print $inputfile "exclude table-data one";
close $inputfile;
command_fails_like(
@@ -432,8 +438,8 @@ command_fails_like(
'--filter' => "$tempdir/inputfile.txt",
'postgres'
],
- qr/unsupported filter object type: "xxx"/,
- "invalid syntax: invalid object type specified, should be table, schema, foreign_data or data"
+ qr/unsupported filter object type: "table-data"/,
+ "invalid syntax: invalid object type specified"
);
# Test missing object identifier pattern
--
2.50.1
On Sat, Aug 2, 2025 at 3:18 PM Fujii Masao <masao.fujii@gmail.com> wrote:
Hi,
It looks like pg_dump --filter can mistakenly treat invalid object types
in the filter file as valid ones. For example, the invalid type
"table-data"
(probably a typo for "table_data") is incorrectly recognized as "table",
and pg_dump runs without error when it should fail.--------------------------------------------
$ cat filter.txt
exclude table-data one$ pg_dump --filter filter.txt
--
-- PostgreSQL database dump
--
...$ echo $?
0
--------------------------------------------This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
identifies tokens as sequences of ASCII alphabetic characters, treating
non-alphabetic characters (like hyphens) as token boundaries. As a result,
"table-data" is parsed as "table".To fix this, I've attached the patch that updates pg_dump --filter so that
it treats tokens as strings of non-space characters separated by spaces
or line endings, ensuring invalid types like "table-data" are correctly
rejected. Thought?With the patch:
--------------------------------------------
$ cat filter.txt
exclude table-data one$ pg_dump --filter filter.txt
pg_dump: error: invalid format in filter read from file "filter.txt"
on line 1: unsupported filter object type: "table-data"
--------------------------------------------
Hi Fujii-san , +1 for the patch , I have reviewed and tested it and LGTM.
--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
Hi Fujii-san,
Thanks for working on this.
On Sat, Aug 2, 2025 at 5:48 PM Fujii Masao <masao.fujii@gmail.com> wrote:
Hi,
It looks like pg_dump --filter can mistakenly treat invalid object types
in the filter file as valid ones. For example, the invalid type "table-data"
(probably a typo for "table_data") is incorrectly recognized as "table",
and pg_dump runs without error when it should fail.--------------------------------------------
$ cat filter.txt
exclude table-data one$ pg_dump --filter filter.txt
--
-- PostgreSQL database dump
--
...$ echo $?
0
--------------------------------------------This happens because pg_dump (filter_get_keyword() in pg_dump/filter.c)
identifies tokens as sequences of ASCII alphabetic characters, treating
non-alphabetic characters (like hyphens) as token boundaries. As a result,
"table-data" is parsed as "table".To fix this, I've attached the patch that updates pg_dump --filter so that
it treats tokens as strings of non-space characters separated by spaces
or line endings, ensuring invalid types like "table-data" are correctly
rejected. Thought?With the patch:
--------------------------------------------
$ cat filter.txt
exclude table-data one$ pg_dump --filter filter.txt
pg_dump: error: invalid format in filter read from file "filter.txt"
on line 1: unsupported filter object type: "table-data"
--------------------------------------------
After testing, the patch LGTM. I noticed two very small possible nits:
1) Comment wording
The loop now calls isspace((unsigned char)*ptr), so a token ends at
any whitespace, not just at ASCII space (0x20). Could we revise the
comment—from
“strings of non-space characters bounded by space characters”
to something like
“strings of non-space characters bounded by whitespace”
—to match the behavior?
2) Variable name
const char *keyword = filter_get_token(&str, &size);
keyword = filter_get_token(&str, &size);
After the patch, filter_get_token() no longer returns a keyword
(letters-only identifier); it now returns any non-whitespace token.
Renaming the variable from keyword to token (or similar) might make
the intent clearer..
Best,
Xuneng
On Sun, Aug 3, 2025 at 3:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
After testing, the patch LGTM. I noticed two very small possible nits:
Thanks for the review!
1) Comment wording
The loop now calls isspace((unsigned char)*ptr), so a token ends at
any whitespace, not just at ASCII space (0x20). Could we revise the
comment—from
“strings of non-space characters bounded by space characters”
to something like
“strings of non-space characters bounded by whitespace”
—to match the behavior?
I agree with the change. But the phrase "strings of non-space characters
bounded by whitespace" is a bit redundant, and "strings of non-whitespace
characters" is sufficient, isn't it? So I used that wording in the updated
patch I've attached.
2) Variable name
const char *keyword = filter_get_token(&str, &size);
keyword = filter_get_token(&str, &size);After the patch, filter_get_token() no longer returns a keyword
(letters-only identifier); it now returns any non-whitespace token.
Renaming the variable from keyword to token (or similar) might make
the intent clearer..
This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?
Regards,
--
Fujii Masao
Attachments:
v2-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchapplication/octet-stream; name=v2-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchDownload
From 999aa4e428a87fc7f27c54856f511b781953024c Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Mon, 4 Aug 2025 23:00:19 +0900
Subject: [PATCH v2] pg_dump: Fix incorrect parsing of object types in pg_dump
--filter.
Previously, pg_dump --filter could misinterpret invalid object types
in the filter file as valid ones. For example, the invalid object type
"table-data" (likely a typo for the valid "table_data") could be
mistakenly recognized as "table", causing pg_dump to succeed
when it should have failed.
This happened because pg_dump identified keywords as sequences of
ASCII alphabetic characters, treating non-alphabetic characters
(like hyphens) as keyword boundaries. As a result, "table-data" was
parsed as "table".
To fix this, pg_dump --filter now treats keywords as strings of
non-whitespace characters, ensuring invalid types like "table-data"
are correctly rejected.
Back-patch to v17, where the --filter option was introduced.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Srinath Reddy <srinath2133@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwFzPKUwiV5C-NLBqz1oK1+z9K8cgrF+LcxFem-p3_Ftug@mail.gmail.com
Backpatch-through: 17
---
src/bin/pg_dump/filter.c | 10 +++++-----
src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 14 ++++++++++----
2 files changed, 15 insertions(+), 9 deletions(-)
diff --git a/src/bin/pg_dump/filter.c b/src/bin/pg_dump/filter.c
index 7214d514137..5dd02fc31fc 100644
--- a/src/bin/pg_dump/filter.c
+++ b/src/bin/pg_dump/filter.c
@@ -171,9 +171,8 @@ pg_log_filter_error(FilterStateData *fstate, const char *fmt,...)
/*
* filter_get_keyword - read the next filter keyword from buffer
*
- * Search for keywords (limited to ascii alphabetic characters) in
- * the passed in line buffer. Returns NULL when the buffer is empty or the first
- * char is not alpha. The char '_' is allowed, except as the first character.
+ * Search for keywords (strings of non-whitespace characters) in the passed
+ * in line buffer. Returns NULL when the buffer is empty or no keyword exists.
* The length of the found keyword is returned in the size parameter.
*/
static const char *
@@ -189,11 +188,12 @@ filter_get_keyword(const char **line, int *size)
while (isspace((unsigned char) *ptr))
ptr++;
- if (isalpha((unsigned char) *ptr))
+ /* Grab one keyword that's the string of non-whitespace characters */
+ if (*ptr != '\0' && !isspace((unsigned char) *ptr))
{
result = ptr++;
- while (isalpha((unsigned char) *ptr) || *ptr == '_')
+ while (*ptr != '\0' && !isspace((unsigned char) *ptr))
ptr++;
*size = ptr - result;
diff --git a/src/bin/pg_dump/t/005_pg_dump_filterfile.pl b/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
index f05e8a20e05..5c69ec31c39 100644
--- a/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
+++ b/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
@@ -418,10 +418,16 @@ command_fails_like(
qr/invalid filter command/,
"invalid syntax: incorrect filter command");
-# Test invalid object type
+# Test invalid object type.
+#
+# This test also verifies that keywords are correctly recognized as strings of
+# non-whitespace characters. If the parser incorrectly treats non-whitespace
+# delimiters (like hyphens) as keyword boundaries, "table-data" might be
+# misread as the valid object type "table". To catch such issues,
+# "table-data" is used here as an intentionally invalid object type.
open $inputfile, '>', "$tempdir/inputfile.txt"
or die "unable to open filterfile for writing";
-print $inputfile "include xxx";
+print $inputfile "exclude table-data one";
close $inputfile;
command_fails_like(
@@ -432,8 +438,8 @@ command_fails_like(
'--filter' => "$tempdir/inputfile.txt",
'postgres'
],
- qr/unsupported filter object type: "xxx"/,
- "invalid syntax: invalid object type specified, should be table, schema, foreign_data or data"
+ qr/unsupported filter object type: "table-data"/,
+ "invalid syntax: invalid object type specified"
);
# Test missing object identifier pattern
--
2.50.1
On Mon, Aug 4, 2025 at 11:18 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Sun, Aug 3, 2025 at 3:03 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
After testing, the patch LGTM. I noticed two very small possible nits:
Thanks for the review!
1) Comment wording
The loop now calls isspace((unsigned char)*ptr), so a token ends at
any whitespace, not just at ASCII space (0x20). Could we revise the
comment—from
“strings of non-space characters bounded by space characters”
to something like
“strings of non-space characters bounded by whitespace”
—to match the behavior?I agree with the change. But the phrase "strings of non-space characters
bounded by whitespace" is a bit redundant, and "strings of non-whitespace
characters" is sufficient, isn't it? So I used that wording in the updated
patch I've attached.2) Variable name
const char *keyword = filter_get_token(&str, &size);
keyword = filter_get_token(&str, &size);After the patch, filter_get_token() no longer returns a keyword
(letters-only identifier); it now returns any non-whitespace token.
Renaming the variable from keyword to token (or similar) might make
the intent clearer..This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?
+1, this looks more elegant to me.
Best,
Xuneng
On 4 Aug 2025, at 17:18, Fujii Masao <masao.fujii@gmail.com> wrote:
I missed this thread while being on vacation, thanks for finding and fixing
this!
This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?
Agreed, this should work fine, and it aligns the code somwhat with read_pattern
which is a good thing.
+ * in line buffer. Returns NULL when the buffer is empty or no keyword exists.
Since "is empty" could be interpreted as being a null pointer, maybe we should
add a if (!*line) check (or an Assert) before we dereference the passed in
buffer?
--
Daniel Gustafsson
On Tue, Aug 5, 2025 at 4:52 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 4 Aug 2025, at 17:18, Fujii Masao <masao.fujii@gmail.com> wrote:
I missed this thread while being on vacation, thanks for finding and fixing
this!This also got me thinking, if we simply define keywords as strings of
non-whitespace characters, maybe we don't need to change the term "keyword"
to "token" at all. I've updated the patch with that in mind. Thoughts?Agreed, this should work fine, and it aligns the code somwhat with read_pattern
which is a good thing.+ * in line buffer. Returns NULL when the buffer is empty or no keyword exists.
Since "is empty" could be interpreted as being a null pointer, maybe we should
add a if (!*line) check (or an Assert) before we dereference the passed in
buffer?
Thanks for the review!
I've added Assert(*line != NULL) at the start of filter_get_keyword().
Updated patch attached.
Regards,
--
Fujii Masao
Attachments:
v3-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchapplication/octet-stream; name=v3-0001-pg_dump-Fix-incorrect-parsing-of-object-types-in-.patchDownload
From 00ec1c7de4b6cab469a0bbc52363d1d38148aa6d Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Wed, 6 Aug 2025 13:38:47 +0900
Subject: [PATCH v3] pg_dump: Fix incorrect parsing of object types in pg_dump
--filter.
Previously, pg_dump --filter could misinterpret invalid object types
in the filter file as valid ones. For example, the invalid object type
"table-data" (likely a typo for the valid "table_data") could be
mistakenly recognized as "table", causing pg_dump to succeed
when it should have failed.
This happened because pg_dump identified keywords as sequences of
ASCII alphabetic characters, treating non-alphabetic characters
(like hyphens) as keyword boundaries. As a result, "table-data" was
parsed as "table".
To fix this, pg_dump --filter now treats keywords as strings of
non-whitespace characters, ensuring invalid types like "table-data"
are correctly rejected.
Back-patch to v17, where the --filter option was introduced.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>
Reviewed-by: Srinath Reddy <srinath2133@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAHGQGwFzPKUwiV5C-NLBqz1oK1+z9K8cgrF+LcxFem-p3_Ftug@mail.gmail.com
Backpatch-through: 17
---
src/bin/pg_dump/filter.c | 13 ++++++++-----
src/bin/pg_dump/t/005_pg_dump_filterfile.pl | 14 ++++++++++----
2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/src/bin/pg_dump/filter.c b/src/bin/pg_dump/filter.c
index 7214d514137..e3cdcf40975 100644
--- a/src/bin/pg_dump/filter.c
+++ b/src/bin/pg_dump/filter.c
@@ -171,9 +171,8 @@ pg_log_filter_error(FilterStateData *fstate, const char *fmt,...)
/*
* filter_get_keyword - read the next filter keyword from buffer
*
- * Search for keywords (limited to ascii alphabetic characters) in
- * the passed in line buffer. Returns NULL when the buffer is empty or the first
- * char is not alpha. The char '_' is allowed, except as the first character.
+ * Search for keywords (strings of non-whitespace characters) in the passed
+ * in line buffer. Returns NULL when the buffer is empty or no keyword exists.
* The length of the found keyword is returned in the size parameter.
*/
static const char *
@@ -182,6 +181,9 @@ filter_get_keyword(const char **line, int *size)
const char *ptr = *line;
const char *result = NULL;
+ /* The passed buffer must not be NULL */
+ Assert(*line != NULL);
+
/* Set returned length preemptively in case no keyword is found */
*size = 0;
@@ -189,11 +191,12 @@ filter_get_keyword(const char **line, int *size)
while (isspace((unsigned char) *ptr))
ptr++;
- if (isalpha((unsigned char) *ptr))
+ /* Grab one keyword that's the string of non-whitespace characters */
+ if (*ptr != '\0' && !isspace((unsigned char) *ptr))
{
result = ptr++;
- while (isalpha((unsigned char) *ptr) || *ptr == '_')
+ while (*ptr != '\0' && !isspace((unsigned char) *ptr))
ptr++;
*size = ptr - result;
diff --git a/src/bin/pg_dump/t/005_pg_dump_filterfile.pl b/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
index f05e8a20e05..5c69ec31c39 100644
--- a/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
+++ b/src/bin/pg_dump/t/005_pg_dump_filterfile.pl
@@ -418,10 +418,16 @@ command_fails_like(
qr/invalid filter command/,
"invalid syntax: incorrect filter command");
-# Test invalid object type
+# Test invalid object type.
+#
+# This test also verifies that keywords are correctly recognized as strings of
+# non-whitespace characters. If the parser incorrectly treats non-whitespace
+# delimiters (like hyphens) as keyword boundaries, "table-data" might be
+# misread as the valid object type "table". To catch such issues,
+# "table-data" is used here as an intentionally invalid object type.
open $inputfile, '>', "$tempdir/inputfile.txt"
or die "unable to open filterfile for writing";
-print $inputfile "include xxx";
+print $inputfile "exclude table-data one";
close $inputfile;
command_fails_like(
@@ -432,8 +438,8 @@ command_fails_like(
'--filter' => "$tempdir/inputfile.txt",
'postgres'
],
- qr/unsupported filter object type: "xxx"/,
- "invalid syntax: invalid object type specified, should be table, schema, foreign_data or data"
+ qr/unsupported filter object type: "table-data"/,
+ "invalid syntax: invalid object type specified"
);
# Test missing object identifier pattern
--
2.50.1
On 6 Aug 2025, at 06:49, Fujii Masao <masao.fujii@gmail.com> wrote:
I've added Assert(*line != NULL) at the start of filter_get_keyword().
Updated patch attached.
LGTM.
--
Daniel Gustafsson
On Thu, Aug 7, 2025 at 9:17 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 6 Aug 2025, at 06:49, Fujii Masao <masao.fujii@gmail.com> wrote:
I've added Assert(*line != NULL) at the start of filter_get_keyword().
Updated patch attached.LGTM.
I've pushed the patch. Thanks!
Regards,
--
Fujii Masao