Nasty tsvector can make dumps unrestorable
To continue our streak of bad luck, here is the second tsearch2 bug we found
this week.
The attached script creates a tsvector with a value that can be dumped using
pg_dump, but not loaded again using pg_restore. This causes restores of a
dump containing this value to fail.
This script only tested with PG 8.2.5 under Ubuntu Feisty so far, although
we found the original problem under 8.2.4 on Ubuntu Dapper.
Also reported in the Ubuntu bug tracker at:
https://bugs.launchpad.net/ubuntu/+source/postgresql-8.2/+bug/146382
--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/
Attachments:
Stuart Bishop <stuart@stuartbishop.net> writes:
The attached script creates a tsvector with a value that can be dumped using
pg_dump, but not loaded again using pg_restore. This causes restores of a
dump containing this value to fail.
Hmm, sorta looks like tsvectorout should be doubling backslashes?
regards, tom lane
Tom Lane wrote:
Stuart Bishop <stuart@stuartbishop.net> writes:
The attached script creates a tsvector with a value that can be dumped using
pg_dump, but not loaded again using pg_restore. This causes restores of a
dump containing this value to fail.Hmm, sorta looks like tsvectorout should be doubling backslashes?
I think the larger question is why tsvectorin() requires
double-backslashes? It seems it is for marking of single-quotes in
phrases, from what I can tell from the code and regression test usage:
SELECT E'''1 \\''2'' 3'::tsvector;
tsvector
-------------
'3' '1 ''2'
(1 row)
My guess is that the '' is used to start/stop phrases, and \\'' puts a
literal '' in the phrase.
I have developed the attached patch which doubles backslashes on output:
test=> INSERT INTO Foo(bar) VALUES (E'\\\\x');
INSERT 0 1
test=> select * from foo;
bar
-------
'\\x'
(1 row)
However, I am still unclear if the dump code is correct because I don't
see the backslash preserved in \\'' cases, just \\\\ cases:
test=> CREATE TABLE Foo(bar tsvector);
CREATE
test=> INSERT INTO Foo(bar) VALUES (E'\\''x');
INSERT 0 1
test=> select * from foo;
bar
-------
'''x'
(1 row)
and pg_dump outputs:
COPY foo (bar) FROM stdin;
'''x'
\.
While the COPY will load into the table, this doesn't:
test=> INSERT INTO Foo(bar) VALUES (E'''''x');
ERROR: syntax error in tsvector: "''x"
I am confused.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Attachments:
/pgpatches/ts_backslashtext/x-diffDownload
Index: src/backend/utils/adt/tsvector.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/tsvector.c,v
retrieving revision 1.6
diff -c -c -r1.6 tsvector.c
*** src/backend/utils/adt/tsvector.c 23 Oct 2007 00:51:23 -0000 1.6
--- src/backend/utils/adt/tsvector.c 9 Nov 2007 23:59:06 -0000
***************
*** 345,350 ****
--- 345,352 ----
if (t_iseq(curin, '\''))
*curout++ = '\'';
+ else if (t_iseq(curin, '\\'))
+ *curout++ = '\\';
while (len--)
*curout++ = *curin++;
Bruce Momjian wrote:
However, I am still unclear if the dump code is correct because I don't
see the backslash preserved in \\'' cases, just \\\\ cases:test=> CREATE TABLE Foo(bar tsvector);
CREATE
test=> INSERT INTO Foo(bar) VALUES (E'\\''x');
INSERT 0 1
test=> select * from foo;
bar
-------
'''x'
(1 row)and pg_dump outputs:
COPY foo (bar) FROM stdin;
'''x'
\.While the COPY will load into the table, this doesn't:
test=> INSERT INTO Foo(bar) VALUES (E'''''x');
ERROR: syntax error in tsvector: "''x"I am confused.
These two are not equivalent. What happens if you try this?
INSERT INTO Foo(bar) VALUES (E'''''''x''');
cheers
andrew
Andrew Dunstan wrote:
While the COPY will load into the table, this doesn't:
test=> INSERT INTO Foo(bar) VALUES (E'''''x');
ERROR: syntax error in tsvector: "''x"I am confused.
These two are not equivalent. What happens if you try this?
INSERT INTO Foo(bar) VALUES (E'''''''x''');
test=> INSERT INTO Foo(bar) VALUES (E'''''''x''');
INSERT 0 1
test=> select * from foo;
bar
-------
'''x'
(1 row)
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
However, I am still unclear if the dump code is correct because I don't
see the backslash preserved in \\'' cases, just \\\\ cases:
test=> INSERT INTO Foo(bar) VALUES (E'\\''x');
You're just confused. That produces a word whose contents are the
two characters 'x, so either '\'x' or '''x' would be legitimate
output.
However, I'd prefer to see Teodor fix this, because it needs to be
back-patched too, and I'm not entirely sure if there are other
consequences.
regards, tom lane