Use of backslash in tsearch2
Bruce Momjian wrote:
I backed out the patch, attached, and it has fixed the regression
problem. What has me confused is that is looks like it is checking for
', then putting \, which doesn't make a lot of sense, but the regression
output is corrected, so I just don't get it. Here is an example:test=> SELECT E'''1 \\''2''';
?column?
----------
'1 \'2'My only guess is that the output is somehow a single-quoted string
itself, and in fact \' should become ''. Is that right? Basically they
are doing \' in their output, and it should be doing '', but then the
query above would be wrong and shouldn't be using \'.
As part of the move to support standard-conforming strings and treat
backslash literally, I reviewed the tsearch2 code and found two place
that seemed to use \' rather than '', and generated the attached patch.
('' is standards conforming.) However, when I fixed the code, the
regression tests failed.
Teodor, are the new attached regression results correct? If so, I will
apply the patch and update the expected file.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Attachments:
/bjm/difftext/x-diffDownload
Index: contrib/tsearch2/query.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/tsearch2/query.c,v
retrieving revision 1.25
diff -c -c -r1.25 query.c
*** contrib/tsearch2/query.c 19 May 2006 02:38:47 -0000 1.25
--- contrib/tsearch2/query.c 19 May 2006 04:37:35 -0000
***************
*** 748,754 ****
{
if ( t_iseq(op, '\'') )
{
! *(in->cur) = '\'';
in->cur++;
}
COPYCHAR(in->cur,op);
--- 748,754 ----
{
if ( t_iseq(op, '\'') )
{
! *(in->cur) = '\\';
in->cur++;
}
COPYCHAR(in->cur,op);
Index: contrib/tsearch2/tsvector.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/tsearch2/tsvector.c,v
retrieving revision 1.18
diff -c -c -r1.18 tsvector.c
*** contrib/tsearch2/tsvector.c 19 May 2006 02:38:47 -0000 1.18
--- contrib/tsearch2/tsvector.c 19 May 2006 04:37:39 -0000
***************
*** 529,535 ****
outbuf = (char *) repalloc((void *) outbuf, ++lenbuf);
curout = outbuf + pos;
! *curout++ = '\'';
}
while(len--)
*curout++ = *curin++;
--- 529,535 ----
outbuf = (char *) repalloc((void *) outbuf, ++lenbuf);
curout = outbuf + pos;
! *curout++ = '\\';
}
while(len--)
*curout++ = *curin++;
/rtmp/regression.diffstext/x-diffDownload
*** ./expected/tsearch2.out Wed May 31 10:05:31 2006
--- ./results/tsearch2.out Mon Aug 21 20:01:12 2006
***************
*** 59,83 ****
SELECT E'''1 \\''2'''::tsvector;
tsvector
----------
! '1 \'2'
(1 row)
SELECT E'''1 \\''2''3'::tsvector;
tsvector
-------------
! '3' '1 \'2'
(1 row)
SELECT E'''1 \\''2'' 3'::tsvector;
tsvector
-------------
! '3' '1 \'2'
(1 row)
SELECT E'''1 \\''2'' '' 3'' 4 '::tsvector;
tsvector
------------------
! '4' ' 3' '1 \'2'
(1 row)
select '''w'':4A,3B,2C,1D,5 a:8';
--- 59,83 ----
SELECT E'''1 \\''2'''::tsvector;
tsvector
----------
! '1 ''2'
(1 row)
SELECT E'''1 \\''2''3'::tsvector;
tsvector
-------------
! '3' '1 ''2'
(1 row)
SELECT E'''1 \\''2'' 3'::tsvector;
tsvector
-------------
! '3' '1 ''2'
(1 row)
SELECT E'''1 \\''2'' '' 3'' 4 '::tsvector;
tsvector
------------------
! '4' ' 3' '1 ''2'
(1 row)
select '''w'':4A,3B,2C,1D,5 a:8';
***************
*** 138,144 ****
SELECT E'''1 \\''2'''::tsquery;
tsquery
---------
! '1 \'2'
(1 row)
SELECT '!1'::tsquery;
--- 138,144 ----
SELECT E'''1 \\''2'''::tsquery;
tsquery
---------
! '1 ''2'
(1 row)
SELECT '!1'::tsquery;
***************
*** 336,342 ****
SELECT E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))'::tsquery;
tsquery
------------------------------------------
! '1' & '2' & ' 4' & ( '|5' | '6 \' !|&' )
(1 row)
SELECT '''the wether'':dc & '' sKies '':BC & a:d b:a';
--- 336,342 ----
SELECT E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))'::tsquery;
tsquery
------------------------------------------
! '1' & '2' & ' 4' & ( '|5' | '6 '' !|&' )
(1 row)
SELECT '''the wether'':dc & '' sKies '':BC & a:d b:a';
======================================================================
Import Notes
Reply to msg id not found: 200605190450.k4J4oHQ16195@candle.pha.pa.us
Bruce Momjian <bruce@momjian.us> writes:
As part of the move to support standard-conforming strings and treat
backslash literally, I reviewed the tsearch2 code and found two place
that seemed to use \' rather than '', and generated the attached patch.
I thought we had decided that that code should not be changed. It has
nothing to do with SQL literals, and changing it will create unnecessary
backwards-compatibility problems.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
As part of the move to support standard-conforming strings and treat
backslash literally, I reviewed the tsearch2 code and found two place
that seemed to use \' rather than '', and generated the attached patch.I thought we had decided that that code should not be changed. It has
nothing to do with SQL literals, and changing it will create unnecessary
backwards-compatibility problems.
I don't remember any comment regarding that. I think it does relate to
SQL literals because it is creating a literal inside a literal. Also, at
the time this was a core-only discussion and I am hoping from a comment
from the tsearch2 folks.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote:
Bruce Momjian wrote:
I backed out the patch, attached, and it has fixed the regression
problem. What has me confused is that is looks like it is checking for
', then putting \, which doesn't make a lot of sense, but the regression
output is corrected, so I just don't get it. Here is an example:test=> SELECT E'''1 \\''2''';
?column?
----------
'1 \'2'My only guess is that the output is somehow a single-quoted string
itself, and in fact \' should become ''. Is that right? Basically they
are doing \' in their output, and it should be doing '', but then the
query above would be wrong and shouldn't be using \'.As part of the move to support standard-conforming strings and treat
backslash literally, I reviewed the tsearch2 code and found two place
that seemed to use \' rather than '', and generated the attached patch.
('' is standards conforming.) However, when I fixed the code, the
regression tests failed.Teodor, are the new attached regression results correct? If so, I will
apply the patch and update the expected file.
Updated patch attached. The previous one was reversed.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Attachments:
/pgpatches/tsearch2text/x-diffDownload
Index: contrib/tsearch2/query.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/tsearch2/query.c,v
retrieving revision 1.26
diff -c -c -r1.26 query.c
*** contrib/tsearch2/query.c 19 May 2006 04:39:47 -0000 1.26
--- contrib/tsearch2/query.c 21 Aug 2006 23:58:20 -0000
***************
*** 748,754 ****
{
if ( t_iseq(op, '\'') )
{
! *(in->cur) = '\\';
in->cur++;
}
COPYCHAR(in->cur,op);
--- 748,754 ----
{
if ( t_iseq(op, '\'') )
{
! *(in->cur) = '\'';
in->cur++;
}
COPYCHAR(in->cur,op);
Index: contrib/tsearch2/tsvector.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/tsearch2/tsvector.c,v
retrieving revision 1.20
diff -c -c -r1.20 tsvector.c
*** contrib/tsearch2/tsvector.c 11 Jul 2006 18:26:10 -0000 1.20
--- contrib/tsearch2/tsvector.c 21 Aug 2006 23:58:20 -0000
***************
*** 529,535 ****
outbuf = (char *) repalloc((void *) outbuf, ++lenbuf);
curout = outbuf + pos;
! *curout++ = '\\';
}
while(len--)
*curout++ = *curin++;
--- 529,535 ----
outbuf = (char *) repalloc((void *) outbuf, ++lenbuf);
curout = outbuf + pos;
! *curout++ = '\'';
}
while(len--)
*curout++ = *curin++;
Teodor, are the new attached regression results correct? If so, I will
apply the patch and update the expected file.
Patch isn't full, simple test (values are took from regression.diffs):
# create table tt (tv tsvector, tq tsquery);
# insert into tt values (E'''1 \\''2''', NULL);
# insert into tt values (E'''1 \\''2''3', NULL);
# insert into tt values ( E'''1 \\''2'' 3', NULL);
# insert into tt values ( E'''1 \\''2'' '' 3'' 4 ', NULL);
# insert into tt values ( NULL, E'''1 \\''2''');
# insert into tt values ( NULL, E'''1 \\''2''');
# insert into tt values ( NULL, E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))');
# insert into tt values ( NULL, E'1&(''2''&('' 4''&(\\|5 | ''6 \\'' !|&'')))');
and try dump table and restore:
ERROR: syntax error
CONTEXT: COPY tt, line 5, column tq: "'1 ''2'"
PS I'm not subscribed to -patches, so I post to -hackers
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Patch isn't full, simple test (values are took from regression.diffs):
and try dump table and restore:
ERROR: syntax error
CONTEXT: COPY tt, line 5, column tq: "'1 ''2'"
Attached cumulative patch fixes problem, but I have some doubts, is it really
needed?
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Attachments:
Thanks. Yes, it is need for two reasons. In 8.2 you can set
standard_conforming_strings to on, meaning \' is really treated as \ and
', and because some encodings now can't support \' for security reasons,
though I don't think tsearch2 supports those multibyte encodings.
Anyway, applied to 8.2 only, not backpatched. Thanks.
---------------------------------------------------------------------------
Teodor Sigaev wrote:
Patch isn't full, simple test (values are took from regression.diffs):
and try dump table and restore:
ERROR: syntax error
CONTEXT: COPY tt, line 5, column tq: "'1 ''2'"Attached cumulative patch fixes problem, but I have some doubts, is it really
needed?--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
[ application/x-tar is not supported, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +