Tighten up a few overly lax regexes in pg_dump's tap tests
There are a few regexes in pg_dump's tap tests that are neglecting to
escape the dot in "schema.table" type expressions. This could result
in the test passing when it shouldn't. It seems worth putting that
right.
Patch attached.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
tighten_up_pg_dump_tap_tests.patchapplication/octet-stream; name=tighten_up_pg_dump_tap_tests.patchDownload+29-29
On 4 Feb 2019, at 11:54, David Rowley <david.rowley@2ndquadrant.com> wrote:
There are a few regexes in pg_dump's tap tests that are neglecting to
escape the dot in "schema.table" type expressions. This could result
in the test passing when it shouldn't. It seems worth putting that
right.
+1 for tightening it up, and the patch looks good to me.
We may also want to use the + metacharacter instead of * in a few places, since
the intent is to always match something, where matching nothing should be
considered an error:
- qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m,
+ qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m,
cheers ./daniel
On Mon, Feb 04, 2019 at 01:12:48PM +0100, Daniel Gustafsson wrote:
+1 for tightening it up, and the patch looks good to me.
We may also want to use the + metacharacter instead of * in a few places, since
the intent is to always match something, where matching nothing should be
considered an error:- qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m, + qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m,
Some tests are missing the update, and it seems to me that
tightening things up is a good thing, still we ought to do it
consistently. Some places missing the update:
- ALTER OPERATOR FAMILY
- ALTER OPERATOR CLASS
- ALTER SEQUENCE
- ALTER TABLE (ONLY, partitioned table)
- BLOB load
- COMMENT ON
- COPY
- INSERT INTO
- CREATE COLLATION
test_pg_dump's 001_base.pl needs also a refresh.
--
Michael
On Tue, 5 Feb 2019 at 13:15, Michael Paquier <michael@paquier.xyz> wrote:
Some tests are missing the update, and it seems to me that
tightening things up is a good thing, still we ought to do it
consistently. Some places missing the update:
- ALTER OPERATOR FAMILY
- ALTER OPERATOR CLASS
- ALTER SEQUENCE
- ALTER TABLE (ONLY, partitioned table)
- BLOB load
- COMMENT ON
- COPY
- INSERT INTO
- CREATE COLLATION
I think I've done all the ones that use \E. Those \Q ones don't need
any escaping. I assume that's the ones you've found. I didn't do an
exhaustive check though.
test_pg_dump's 001_base.pl needs also a refresh.
Yeah, I thought about looking, but I thought it might be a bottomless pit.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Feb 05, 2019 at 01:50:50PM +1300, David Rowley wrote:
I think I've done all the ones that use \E. Those \Q ones don't need
any escaping. I assume that's the ones you've found. I didn't do an
exhaustive check though.
Oh, good point. I didn't know that anything between \Q and \E are
interpreted as literal characters. Instead of the approach you are
proposing, perhaps it would make sense to extend this way of doing
things then? For example some tests with CREATE CONVERSION do so. It
looks much more portable than having to escape every dot.
test_pg_dump's 001_base.pl needs also a refresh.
Yeah, I thought about looking, but I thought it might be a bottomless pit.
I just double-checked this one, and the regex patterns there look
all correct.
--
Michael
On Tue, 5 Feb 2019 at 14:41, Michael Paquier <michael@paquier.xyz> wrote:
Instead of the approach you are
proposing, perhaps it would make sense to extend this way of doing
things then? For example some tests with CREATE CONVERSION do so. It
looks much more portable than having to escape every dot.
I'm not particularly excited either way, but here's a patch with it that way.
I did leave a couple untouched as there was quite a bit of escaping
going on already. I didn't think switching between \Q and \E would
have made those ones any more pleasing to the eye.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
tighten_up_pg_dump_tap_tests_v2.patchapplication/octet-stream; name=tighten_up_pg_dump_tap_tests_v2.patchDownload+29-29
On Tue, 5 Feb 2019 at 01:12, Daniel Gustafsson <daniel@yesql.se> wrote:
We may also want to use the + metacharacter instead of * in a few places, since
the intent is to always match something, where matching nothing should be
considered an error:- qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m, + qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m,
I looked for instances of * alone and didn't see any. I only saw ones
prefixed with ".", in which case, isn't that matching 1 or more chars
already?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Monday, February 4, 2019, David Rowley <david.rowley@2ndquadrant.com>
wrote:
On Tue, 5 Feb 2019 at 01:12, Daniel Gustafsson <daniel@yesql.se> wrote:
We may also want to use the + metacharacter instead of * in a few
places, since
the intent is to always match something, where matching nothing should be
considered an error:- qr/^ALTER TEXT SEARCH DICTIONARY
dump_test.alt_ts_dict1 OWNER TO .*;/m,
+ qr/^ALTER TEXT SEARCH DICTIONARY
dump_test\.alt_ts_dict1 OWNER TO .*;/m,
I looked for instances of * alone and didn't see any. I only saw ones
prefixed with ".", in which case, isn't that matching 1 or more chars
already?
No. In Regex the following are equivalent:
.* == .{0,}
.+ == .{1,}
. == .{1}
A “*” by itself would either be an error or, assuming the preceding
character is a space (so it visually looks alone) would be zero or more
consecutive spaces.
In the above “...OWNER TO<space>;” is a valid match.
David J.
On Tue, Feb 05, 2019 at 05:46:55PM +1300, David Rowley wrote:
I did leave a couple untouched as there was quite a bit of escaping
going on already. I didn't think switching between \Q and \E would
have made those ones any more pleasing to the eye.
- qr/^ALTER TABLE dump_test.test_table ENABLE ROW LEVEL SECURITY;/m,
+ qr/^\QALTER TABLE dump_test.test_table ENABLE ROW LEVEL SECURITY;/m,
I would have just appended the \E for consistency at the end of the
strings. Except that it looks fine. No need to send an updated
version, it seems that you have all the spots. I'll do an extra pass
on it tomorrow and see if I can commit it.
--
Michael
On 5 Feb 2019, at 06:55, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Monday, February 4, 2019, David Rowley <david.rowley@2ndquadrant.com <mailto:david.rowley@2ndquadrant.com>> wrote:
On Tue, 5 Feb 2019 at 01:12, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote:We may also want to use the + metacharacter instead of * in a few places, since
the intent is to always match something, where matching nothing should be
considered an error:- qr/^ALTER TEXT SEARCH DICTIONARY dump_test.alt_ts_dict1 OWNER TO .*;/m, + qr/^ALTER TEXT SEARCH DICTIONARY dump_test\.alt_ts_dict1 OWNER TO .*;/m,I looked for instances of * alone and didn't see any. I only saw ones
prefixed with ".", in which case, isn't that matching 1 or more chars
already?No. In Regex the following are equivalent:
.* == .{0,}
.+ == .{1,}
. == .{1}A “*” by itself would either be an error or, assuming the preceding character is a space (so it visually looks alone) would be zero or more consecutive spaces.
Sorry for being a bit unclear in my original email, it’s as David says above:
.* matches zero or more characters and .+ matches 1 or more characters.
In the above “...OWNER TO<space>;” is a valid match.
Indeed, so we should move to matching with .+ to force an owner.
cheers ./daniel
On Tue, Feb 05, 2019 at 04:26:18PM +0900, Michael Paquier wrote:
On Tue, Feb 05, 2019 at 05:46:55PM +1300, David Rowley wrote:
I did leave a couple untouched as there was quite a bit of escaping
going on already. I didn't think switching between \Q and \E would
have made those ones any more pleasing to the eye.
Indeed. I have stuck with your version here.
- qr/^ALTER TABLE dump_test.test_table ENABLE ROW LEVEL SECURITY;/m, + qr/^\QALTER TABLE dump_test.test_table ENABLE ROW LEVEL SECURITY;/m, I would have just appended the \E for consistency at the end of the strings. Except that it looks fine. No need to send an updated version, it seems that you have all the spots. I'll do an extra pass on it tomorrow and see if I can commit it.
And done after checking the whole set.
--
Michael
On 6 Feb 2019, at 09:39, Michael Paquier <michael@paquier.xyz> wrote:
And done after checking the whole set.
I still think we should enforce one-or-more matching on the OWNER part as well,
since matching zero would be a syntax error. There are more .* matches but
I’ve only touched the ones which match SQL, since there is a defined grammar to
rely on there. The attached patch does that on top of your commit.
cheers ./daniel
Attachments:
pg_dump_testregex.patchapplication/octet-stream; name=pg_dump_testregex.patch; x-unix-mode=0644Download+20-21
On Wed, 6 Feb 2019 at 21:39, Michael Paquier <michael@paquier.xyz> wrote:
And done after checking the whole set.
Thanks for pushing.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Feb 06, 2019 at 10:50:27AM +0100, Daniel Gustafsson wrote:
I still think we should enforce one-or-more matching on the OWNER part as well,
since matching zero would be a syntax error. There are more .* matches but
I’ve only touched the ones which match SQL, since there is a defined grammar to
rely on there. The attached patch does that on top of your commit.
- regexp => qr/^COMMENT ON DATABASE postgres IS .*;/m,
+ regexp => qr/^COMMENT ON DATABASE postgres IS .+;/m,
So... With what's on HEAD the current regex means that all these are
correct commands:
COMMENT ON DATABASE postgres is ;
COMMENT ON DATABASE postgres is foo;
COMMENT ON DATABASE postgres is foo;
And for the first one that's obviously wrong.
So what you are suggesting is to actually make the first pattern
something to complain about, right? And ".+" this makes sure that at
least one character is present, while for ".*" it is fine to have zero
characters. What you are suggesting looks right if I understood that
right.
--
Michael
On 6 Feb 2019, at 12:37, Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Feb 06, 2019 at 10:50:27AM +0100, Daniel Gustafsson wrote:
I still think we should enforce one-or-more matching on the OWNER part as well,
since matching zero would be a syntax error. There are more .* matches but
I’ve only touched the ones which match SQL, since there is a defined grammar to
rely on there. The attached patch does that on top of your commit.- regexp => qr/^COMMENT ON DATABASE postgres IS .*;/m, + regexp => qr/^COMMENT ON DATABASE postgres IS .+;/m, So... With what's on HEAD the current regex means that all these are correct commands: COMMENT ON DATABASE postgres is ; COMMENT ON DATABASE postgres is foo; COMMENT ON DATABASE postgres is foo; And for the first one that's obviously wrong.So what you are suggesting is to actually make the first pattern
something to complain about, right? And ".+" this makes sure that at
least one character is present, while for ".*" it is fine to have zero
characters. What you are suggesting looks right if I understood that
right.
Correct. One could argue that the regex is still suboptimal since “COMMENT ON
DATABASE postgres IS ;” will be matched as well, but there I think the tradeoff
for readability wins.
cheers ./daniel
On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
Correct. One could argue that the regex is still suboptimal since “COMMENT ON
DATABASE postgres IS ;” will be matched as well, but there I think the tradeoff
for readability wins.
Okay, that looks like an improvement anyway, so committed after going
over the tests for similar problems, and there was one for CREATE
DATABASE and DROP ROLE. It is possible to have a regex which tells as
at least one non-whitespace character, but from what I can see these
don't really improve the readability.
--
Michael
Moin,
On Wed, February 6, 2019 8:10 pm, Michael Paquier wrote:
On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
Correct. One could argue that the regex is still suboptimal since
“COMMENT ON
DATABASE postgres IS ;” will be matched as well, but there I think the
tradeoff
for readability wins.Okay, that looks like an improvement anyway, so committed after going
over the tests for similar problems, and there was one for CREATE
DATABASE and DROP ROLE. It is possible to have a regex which tells as
at least one non-whitespace character, but from what I can see these
don't really improve the readability.
Sorry for being late to the party, but it just occured to me that while
".+" is definitely an improvement over ".*", it isn't foolproof either,
as it also matches ";".
Thus:
regexp => qr/^COMMENT ON DATABASE postgres IS .+;/m,
matches things like:
'COMMENT ON DATABASE postgres IS ;;'
'COMMENT ON DATABASE postgres IS ;'
'COMMENT ON DATABASE postgres IS --;'
etc.
I'm not sure it is really necessary to deal with these cases, but one
possibility would be to pre-compute regexps:
$QR_COMMENT = qr/[^ ;]+/;
$QR_IDENTIFIER = qr/[^ ;]+/; # etc
or whataver is the thing that should actually be matched here and use them
like so:
regexp => qr/^COMMENT ON DATABASE postgres IS $QR_COMMENT;/m,
That way it is easily changable and quite readable.
Oh, one more question. Shouldn't these regexps that start with "^" also
end with "$"? Or can there be output like:
'COMMENT ON DATABASE postgres IS $QR_IDENTIFIER; SELECT 1;'
?
Best regards,
Tels
On 7 Feb 2019, at 13:55, Tels <nospam-pg-abuse@bloodgate.com> wrote:
On Wed, February 6, 2019 8:10 pm, Michael Paquier wrote:On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
Correct. One could argue that the regex is still suboptimal since
“COMMENT ON
DATABASE postgres IS ;” will be matched as well, but there I think the
tradeoff
for readability wins.Okay, that looks like an improvement anyway, so committed after going
over the tests for similar problems, and there was one for CREATE
DATABASE and DROP ROLE. It is possible to have a regex which tells as
at least one non-whitespace character, but from what I can see these
don't really improve the readability.Sorry for being late to the party, but it just occured to me that while
".+" is definitely an improvement over ".*", it isn't foolproof either,
as it also matches ";”.
Correct. The idea was to maintain readability while making the regex a bit
better, without any claims to make it perfect. One can argue whether or not
that’s enough, but IMO keeping the tests as readable as possible is preferrable
when it comes to the tests in question, as they aren’t matching against user
supplied SQL but machine generated.
cheers ./daniel
On Thu, Feb 07, 2019 at 10:10:32AM +0900, Michael Paquier wrote:
On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
Correct. One could argue that the regex is still suboptimal since “COMMENT ON
DATABASE postgres IS ;” will be matched as well, but there I think the tradeoff
for readability wins.Okay, that looks like an improvement anyway, so committed after going
over the tests for similar problems, and there was one for CREATE
DATABASE and DROP ROLE. It is possible to have a regex which tells as
at least one non-whitespace character, but from what I can see these
don't really improve the readability.
Are you talking about \w+, or [^[:space:]]+, [^[:blank:]]+, or...?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On 7 Feb 2019, at 18:20, David Fetter <david@fetter.org> wrote:
On Thu, Feb 07, 2019 at 10:10:32AM +0900, Michael Paquier wrote:
On Wed, Feb 06, 2019 at 03:41:20PM +0100, Daniel Gustafsson wrote:
Correct. One could argue that the regex is still suboptimal since “COMMENT ON
DATABASE postgres IS ;” will be matched as well, but there I think the tradeoff
for readability wins.Okay, that looks like an improvement anyway, so committed after going
over the tests for similar problems, and there was one for CREATE
DATABASE and DROP ROLE. It is possible to have a regex which tells as
at least one non-whitespace character, but from what I can see these
don't really improve the readability.Are you talking about \w+, or [^[:space:]]+, [^[:blank:]]+, or…?
Personally I feel that expanding these test regexes to catch more low-risk
bugs, at the cost of readability, is a slippery slope towards implementing a
SQL parser in regexes. That was my $0.02 for not attempting to make these
bulletproof.
cheers ./daniel