pg_restore (libpq? parser?) bug in 8

Started by Philip Warnerover 21 years ago15 messages
#1Philip Warner
pjw@rhyme.com.au

CREATE FUNCTION xxx() RETURNS integer
AS $$ begin return 1;
2004-08-12 01:38:48 EST<zzz,birds>: ERROR: unterminated dollar-quoted
string at or near "$$ begin return 1;" at character 115

Just realized the problem; pg_restore uses a trivial parser to work out
when statements start/end. It knows about quotes but not about dollar-quotes.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#1)
Re: pg_restore (libpq? parser?) bug in 8

Philip Warner <pjw@rhyme.com.au> writes:

Just realized the problem; pg_restore uses a trivial parser to work out
when statements start/end. It knows about quotes but not about dollar-quotes.

Blech. Do you have time to fix it?

regards, tom lane

#3Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#2)
Re: pg_restore (libpq? parser?) bug in 8

At 03:33 AM 12/08/2004, Tom Lane wrote:

Do you have time to fix it?

Should do; I'll add the die-on-error option as well.

Con someone confirm how dollar quoting works:

'$[tag]$'

where tag is alpha chars? any chars? \n? \r?

and closing tag must match.

All dollar-quotes inside any kind of quotes can be ignored.

Is there any circumstance where an unquoted '$' is valid?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#3)
Re: pg_restore (libpq? parser?) bug in 8

Philip Warner <pjw@rhyme.com.au> writes:

Con someone confirm how dollar quoting works:
'$[tag]$'
where tag is alpha chars? any chars? \n? \r?

IIRC the tag is either empty or anything that looks like a
(dollar-sign-less) identifier. But check the rules in scan.l to be sure.

Is there any circumstance where an unquoted '$' is valid?

$n parameter identifiers (which is why the tag cannot start with
a digit). Also, I believe $ can be embedded in identifiers (ie,
it can be a non-first character). So a dollar quote can't be
adjacent to a preceding identifier.

IIRC we tried to do ad-hoc code for dollar quoting in psql, and gave it
up as a bad job. You might need to bite the bullet and implement a flex
lexer.

Why exactly does pg_restore need to parse the SQL anyway?

regards, tom lane

#5Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#4)
Re: pg_restore (libpq? parser?) bug in 8

At 12:15 PM 12/08/2004, Tom Lane wrote:

Why exactly does pg_restore need to parse the SQL anyway?

It just looks for complete statements. From memory it relates to the
possibility that TOC entries can have more than one statement, or it may
relate to handling COPY statements. I think it has to look for
PQresultStatus(...) == PGRES_COPY_IN for each statement it executes, so it
needs to pass statements one at a time.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/

#6Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#4)
Re: pg_restore (libpq? parser?) bug in 8

At 12:15 PM 12/08/2004, Tom Lane wrote:

IIRC we tried to do ad-hoc code for dollar quoting in psql, and gave it
up as a bad job. You might need to bite the bullet and implement a flex
lexer.

Looks like the psql side of things is not ideal (see other post).

Any idea how backslashes should handled in and out of the tag?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#5)
Re: pg_restore (libpq? parser?) bug in 8

Philip Warner <pjw@rhyme.com.au> writes:

At 12:15 PM 12/08/2004, Tom Lane wrote:

Why exactly does pg_restore need to parse the SQL anyway?

It just looks for complete statements. From memory it relates to the
possibility that TOC entries can have more than one statement, or it may
relate to handling COPY statements. I think it has to look for
PQresultStatus(...) == PGRES_COPY_IN for each statement it executes, so it
needs to pass statements one at a time.

Hm. But we could assume that a COPY will be all by itself in a TOC
entry, couldn't we?

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#6)
Re: pg_restore (libpq? parser?) bug in 8

Philip Warner <pjw@rhyme.com.au> writes:

Any idea how backslashes should handled in and out of the tag?

Backslashes aren't legal in the tag, I'm quite sure. In the body of the
quoted string they are not special in any way.

regards, tom lane

#9Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#7)
Re: pg_restore (libpq? parser?) bug in 8

At 12:42 PM 12/08/2004, Tom Lane wrote:

Hm. But we could assume that a COPY will be all by itself in a TOC
entry, couldn't we?

Maybe. I know I hit a couple of nasty examples in the original code. Isn't
the COPY combined with the data? If so, we still have to scan for it's end.
The existing scanner is pretty trivial. The dollar-quoting will not make it
much worse (I hope). But I'll hold off on the changes if you want to
experiment -- I used to use my own DBs + the regression DB for testing.

Another possible issue - if I pass two statements in one string to libpq,
separated by semicolons, will it cope? If so, has that been true since 7.0?
If the answers are ('no',_), or ('yes', 'no') then that explains why
pg_restore has to parse statements - the TOC entry can have more than one
statement.

Sorry to be vague, it's a long time since I wrote the code.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#9)
Re: pg_restore (libpq? parser?) bug in 8

Philip Warner <pjw@rhyme.com.au> writes:

At 12:42 PM 12/08/2004, Tom Lane wrote:

Hm. But we could assume that a COPY will be all by itself in a TOC
entry, couldn't we?

Maybe. I know I hit a couple of nasty examples in the original code. Isn't
the COPY combined with the data? If so, we still have to scan for it's end.
The existing scanner is pretty trivial.

Agreed. But we only emit dollar quoting in CREATE FUNCTION entries, and
I don't really see why you need to parse those with any accuracy. I
think we could do something here with making assumptions based on the
known TOC entry type about what might be in it.

Another possible issue - if I pass two statements in one string to libpq,
separated by semicolons, will it cope? If so, has that been true since 7.0?

Yes, and yes, except that if the first one gets an error the second will
not be executed. Per the other thread, that's probably a behavior change
we don't want.

regards, tom lane

#11Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#4)
Re: pg_restore (libpq? parser?) bug in 8

At 12:15 PM 12/08/2004, Tom Lane wrote:

You might need to bite the bullet and implement a flex
lexer.

I'd like to avoid this if I can; AFAICT, for statement detection on
pg_restore, I can require white space before the $tag. Since I also skip
over all quoted text, the bodies of functions are ignored. The only issues
will be attribute names with ' $' in them, but they will be quoted as well
(so ignored).

So to recognize a tag, I look for a '$' after white space, and assume it's
a tag start. If I subsequently read an invalid tag char, I just go back
into scan mode on that character and assume the '$...' was some other valid
sql element.

From other threads, it sounds like removing the statement detection code
entirely is not an option.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Philip Warner (#11)
Re: pg_restore (libpq? parser?) bug in 8

Philip Warner said:

At 12:15 PM 12/08/2004, Tom Lane wrote:

You might need to bite the bullet and implement a flex
lexer.

I'd like to avoid this if I can; AFAICT, for statement detection on
pg_restore, I can require white space before the $tag. Since I also
skip over all quoted text, the bodies of functions are ignored. The
only issues will be attribute names with ' $' in them, but they will
be quoted as well (so ignored).

So to recognize a tag, I look for a '$' after white space, and assume
it's a tag start. If I subsequently read an invalid tag char, I just
go back into scan mode on that character and assume the '$...' was
some other valid sql element.

From other threads, it sounds like removing the statement detection
code
entirely is not an option.

See the discussions that culminated here before Tom bit the bullet and
implemented a flex scanner for psql:

http://archives.postgresql.org/pgsql-patches/2004-02/msg00182.php

It's not as easy as you might think.

I had a thought that might short-circuit this - do we even need pg_dump to
use dollar quoting for non-text output, such as is required by pg_restore?
IIRC, the idea was to have text dumps that didn't undo what the user had
done in using dollar quoting, but I am not sure that consideration applies
to non-text output. Turning it off should be very easy - we already have the
switch.

Thoughts?

cheers

andrew

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#12)
Re: pg_restore (libpq? parser?) bug in 8

"Andrew Dunstan" <andrew@dunslane.net> writes:

I had a thought that might short-circuit this - do we even need pg_dump to
use dollar quoting for non-text output, such as is required by pg_restore?

That's a possibility, but I'd rather work around it by finding a way for
pg_restore not to need to parse the dollar-quoted literal in the first
place. I haven't heard a reason why it needs to parse the contents of a
CREATE FUNCTION entry. If we guarantee not to put dollar-quoted
literals into the TOC types it *does* need to parse, ISTM we're good.

regards, tom lane

#14Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#13)
Re: pg_restore (libpq? parser?) bug in 8

At 11:42 PM 12/08/2004, Tom Lane wrote:

That's a possibility, but I'd rather work around it by finding a way for
pg_restore not to need to parse the dollar-quoted literal in the first
place.

Two possibilities:

1) Parse the tags (I have the code working): it's not that hard, the only
trick bit being recognizing the tags in the first place. I have assumed
that any bare unquoted string that is not preceded by valid identifier name
chars, and which starts with a '$' may be a dollar quote. This seems valid
to me.

2) We could avoid special coding for TOC entry types (eg. pg_restore
knowing 'FUNCTION' TOC entries should not be parsed), by changing the TOC
data to include a flag/counter (set by pg_dump) indicating that the entry
contains > 1 statements. Then we don't hard code knowledge of TOC entry
types, and function definitions will not be parsed. Old dump files would be
treated as multi-statement, and still be parsed.

If my assumption in (1) is valid, then I have a very mild preference for
it, but am happy with either.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/

#15Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Philip Warner (#1)
Re: pg_restore (libpq? parser?) bug in 8

Added to open items list:

* fix dollar quoting problems in pg_restore

---------------------------------------------------------------------------

Philip Warner wrote:

CREATE FUNCTION xxx() RETURNS integer
AS $$ begin return 1;
2004-08-12 01:38:48 EST<zzz,birds>: ERROR: unterminated dollar-quoted
string at or near "$$ begin return 1;" at character 115

Just realized the problem; pg_restore uses a trivial parser to work out
when statements start/end. It knows about quotes but not about dollar-quotes.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073