Nasty tsvector can make dumps unrestorable

Started by Stuart Bishopover 18 years ago6 messages
#1Stuart Bishop
stuart@stuartbishop.net
1 attachment(s)

To continue our streak of bad luck, here is the second tsearch2 bug we found
this week.

The attached script creates a tsvector with a value that can be dumped using
pg_dump, but not loaded again using pg_restore. This causes restores of a
dump containing this value to fail.

This script only tested with PG 8.2.5 under Ubuntu Feisty so far, although
we found the original problem under 8.2.4 on Ubuntu Dapper.

Also reported in the Ubuntu bug tracker at:

https://bugs.launchpad.net/ubuntu/+source/postgresql-8.2/+bug/146382

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

Attachments:

unrestorabledb.sqltext/x-sql; name=unrestorabledb.sqlDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stuart Bishop (#1)
Re: Nasty tsvector can make dumps unrestorable

Stuart Bishop <stuart@stuartbishop.net> writes:

The attached script creates a tsvector with a value that can be dumped using
pg_dump, but not loaded again using pg_restore. This causes restores of a
dump containing this value to fail.

Hmm, sorta looks like tsvectorout should be doubling backslashes?

regards, tom lane

#3Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
1 attachment(s)
Re: [BUGS] Nasty tsvector can make dumps unrestorable

Tom Lane wrote:

Stuart Bishop <stuart@stuartbishop.net> writes:

The attached script creates a tsvector with a value that can be dumped using
pg_dump, but not loaded again using pg_restore. This causes restores of a
dump containing this value to fail.

Hmm, sorta looks like tsvectorout should be doubling backslashes?

I think the larger question is why tsvectorin() requires
double-backslashes? It seems it is for marking of single-quotes in
phrases, from what I can tell from the code and regression test usage:

SELECT E'''1 \\''2'' 3'::tsvector;
tsvector
-------------
'3' '1 ''2'
(1 row)

My guess is that the '' is used to start/stop phrases, and \\'' puts a
literal '' in the phrase.

I have developed the attached patch which doubles backslashes on output:

test=> INSERT INTO Foo(bar) VALUES (E'\\\\x');
INSERT 0 1
test=> select * from foo;
bar
-------
'\\x'
(1 row)

However, I am still unclear if the dump code is correct because I don't
see the backslash preserved in \\'' cases, just \\\\ cases:

test=> CREATE TABLE Foo(bar tsvector);
CREATE
test=> INSERT INTO Foo(bar) VALUES (E'\\''x');
INSERT 0 1
test=> select * from foo;
bar
-------
'''x'
(1 row)

and pg_dump outputs:

COPY foo (bar) FROM stdin;
'''x'
\.

While the COPY will load into the table, this doesn't:

test=> INSERT INTO Foo(bar) VALUES (E'''''x');
ERROR: syntax error in tsvector: "''x"

I am confused.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/pgpatches/ts_backslashtext/x-diffDownload
Index: src/backend/utils/adt/tsvector.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/tsvector.c,v
retrieving revision 1.6
diff -c -c -r1.6 tsvector.c
*** src/backend/utils/adt/tsvector.c	23 Oct 2007 00:51:23 -0000	1.6
--- src/backend/utils/adt/tsvector.c	9 Nov 2007 23:59:06 -0000
***************
*** 345,350 ****
--- 345,352 ----
  
  			if (t_iseq(curin, '\''))
  				*curout++ = '\'';
+ 			else if (t_iseq(curin, '\\'))
+ 				*curout++ = '\\';
  
  			while (len--)
  				*curout++ = *curin++;
#4Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#3)
Re: [BUGS] Nasty tsvector can make dumps unrestorable

Bruce Momjian wrote:

However, I am still unclear if the dump code is correct because I don't
see the backslash preserved in \\'' cases, just \\\\ cases:

test=> CREATE TABLE Foo(bar tsvector);
CREATE
test=> INSERT INTO Foo(bar) VALUES (E'\\''x');
INSERT 0 1
test=> select * from foo;
bar
-------
'''x'
(1 row)

and pg_dump outputs:

COPY foo (bar) FROM stdin;
'''x'
\.

While the COPY will load into the table, this doesn't:

test=> INSERT INTO Foo(bar) VALUES (E'''''x');
ERROR: syntax error in tsvector: "''x"

I am confused.

These two are not equivalent. What happens if you try this?

INSERT INTO Foo(bar) VALUES (E'''''''x''');

cheers

andrew

#5Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#4)
Re: [BUGS] Nasty tsvector can make dumps unrestorable

Andrew Dunstan wrote:

While the COPY will load into the table, this doesn't:

test=> INSERT INTO Foo(bar) VALUES (E'''''x');
ERROR: syntax error in tsvector: "''x"

I am confused.

These two are not equivalent. What happens if you try this?

INSERT INTO Foo(bar) VALUES (E'''''''x''');

test=> INSERT INTO Foo(bar) VALUES (E'''''''x''');
INSERT 0 1
test=> select * from foo;
bar
-------
'''x'
(1 row)

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#3)
Re: [BUGS] Nasty tsvector can make dumps unrestorable

Bruce Momjian <bruce@momjian.us> writes:

However, I am still unclear if the dump code is correct because I don't
see the backslash preserved in \\'' cases, just \\\\ cases:

test=> INSERT INTO Foo(bar) VALUES (E'\\''x');

You're just confused. That produces a word whose contents are the
two characters 'x, so either '\'x' or '''x' would be legitimate
output.

However, I'd prefer to see Teodor fix this, because it needs to be
back-patched too, and I'm not entirely sure if there are other
consequences.

regards, tom lane