Text search segmentation fault
While trying to create a new dictionary for use with PostgreSQL text
search, I get a segfault. My Postgres version is 8.3.5
The dictionary I use is the "Norwegian Bokm�l & Nynorsk (Norway) pack
for OOo 2.x" downloaded from
http://wiki.services.openoffice.org/wiki/Dictionaries#Norwegian_.28Norway.29
I've also uploaded the dictionary and affix-file used to this location:
http://folk.uio.no/tommygi/postgres/nb_no.dict
http://folk.uio.no/tommygi/postgres/nb_no.affix
These are unpacked from the zip-file I got from the above location, and
converted to UTF-8 with the following commands:
- iconv -f latin1 -t utf-8 nb_NO.dic > nb_no.dict
- iconv -f latin1 -t utf-8 nb_NO.aff > nb_no.affix
The command I use is this:
CREATE TEXT SEARCH DICTIONARY no_ispell (
TEMPLATE = ispell,
DictFile = nb_no,
AffFile = nb_no,
StopWords = norwegian
);
I've uploaded a strace of the process at
http://folk.uio.no/tommygi/postgres/strace.txt
This is captured with strace -p<pid>. Since this is about the extent of
my knowledge of strace, please let me know if there's any other options
I should specify.
Relevant parts of the postgres log file is included below.
[2009-01-29 13:55:11.721 CET] [pgtest02] [:] [] [26466] [0] LOG: server
process (PID 12002) was terminated by signal 11: Segmentation fault
[2009-01-29 13:55:11.721 CET] [pgtest02] [:] [] [26466] [0] LOG:
terminating any other active server processes
[2009-01-29 13:55:11.725 CET] [pgtest02] [:] [] [26466] [0] LOG: all
server processes terminated; reinitializing
[2009-01-29 13:55:11.931 CET] [pgtest02] [:] [] [12018] [0] LOG:
database system was interrupted; last known up at 2009-01-29 13:42:05 CET
[2009-01-29 13:55:11.931 CET] [pgtest02] [:] [] [12018] [0] LOG:
database system was not properly shut down; automatic recovery in progress
[2009-01-29 13:55:11.934 CET] [pgtest02] [:] [] [12018] [0] LOG: record
with zero length at 0/4A2F9A0
[2009-01-29 13:55:11.934 CET] [pgtest02] [:] [] [12018] [0] LOG: redo
is not required
[2009-01-29 13:55:11.986 CET] [pgtest02] [fotoportal:postgres]
[192.168.6.137(49650)] [12019] [0] FATAL: the database system is in
recovery mode
[2009-01-29 13:55:11.988 CET] [pgtest02] [fotoportal:postgres]
[192.168.6.137(49655)] [12020] [0] FATAL: the database system is in
recovery mode
[2009-01-29 13:55:12.010 CET] [pgtest02] [:] [] [12023] [0] LOG:
autovacuum launcher started
[2009-01-29 13:55:12.011 CET] [pgtest02] [:] [] [26466] [0] LOG:
database system is ready to accept connections
--
Tommy Gildseth
DBA, Gruppe for databasedrift
Universitetet i Oslo, USIT
m: +47 45 86 38 50
t: +47 22 85 29 39
Tommy,
I tried your example and didn't find any problem.
My postgresql version is 8.3.3 and I didn't use stopwords, since I don't
have them.
arxiv=# select version();
version
------------------------------------------------------------------------
PostgreSQL 8.3.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.2.3
(1 row)
arxiv=# select ts_lexize('no_ispell', 'overbuljongterningpakkmesterassistent');
ts_lexize
----------------------------------------------
{over,buljong,terning,pakk,mester,assistent}
(1 row)
Time: 922.364 ms
Oleg
On Thu, 29 Jan 2009, Tommy Gildseth wrote:
While trying to create a new dictionary for use with PostgreSQL text search,
I get a segfault. My Postgres version is 8.3.5The dictionary I use is the "Norwegian Bokm?l & Nynorsk (Norway) pack for OOo
2.x" downloaded from
http://wiki.services.openoffice.org/wiki/Dictionaries#Norwegian_.28Norway.29I've also uploaded the dictionary and affix-file used to this location:
http://folk.uio.no/tommygi/postgres/nb_no.dict
http://folk.uio.no/tommygi/postgres/nb_no.affixThese are unpacked from the zip-file I got from the above location, and
converted to UTF-8 with the following commands:
- iconv -f latin1 -t utf-8 nb_NO.dic > nb_no.dict
- iconv -f latin1 -t utf-8 nb_NO.aff > nb_no.affixThe command I use is this:
CREATE TEXT SEARCH DICTIONARY no_ispell (
TEMPLATE = ispell,
DictFile = nb_no,
AffFile = nb_no,
StopWords = norwegian
);I've uploaded a strace of the process at
http://folk.uio.no/tommygi/postgres/strace.txt
This is captured with strace -p<pid>. Since this is about the extent of my
knowledge of strace, please let me know if there's any other options I should
specify.Relevant parts of the postgres log file is included below.
[2009-01-29 13:55:11.721 CET] [pgtest02] [:] [] [26466] [0] LOG: server
process (PID 12002) was terminated by signal 11: Segmentation fault
[2009-01-29 13:55:11.721 CET] [pgtest02] [:] [] [26466] [0] LOG: terminating
any other active server processes
[2009-01-29 13:55:11.725 CET] [pgtest02] [:] [] [26466] [0] LOG: all server
processes terminated; reinitializing
[2009-01-29 13:55:11.931 CET] [pgtest02] [:] [] [12018] [0] LOG: database
system was interrupted; last known up at 2009-01-29 13:42:05 CET
[2009-01-29 13:55:11.931 CET] [pgtest02] [:] [] [12018] [0] LOG: database
system was not properly shut down; automatic recovery in progress
[2009-01-29 13:55:11.934 CET] [pgtest02] [:] [] [12018] [0] LOG: record with
zero length at 0/4A2F9A0
[2009-01-29 13:55:11.934 CET] [pgtest02] [:] [] [12018] [0] LOG: redo is not
required
[2009-01-29 13:55:11.986 CET] [pgtest02] [fotoportal:postgres]
[192.168.6.137(49650)] [12019] [0] FATAL: the database system is in recovery
mode
[2009-01-29 13:55:11.988 CET] [pgtest02] [fotoportal:postgres]
[192.168.6.137(49655)] [12020] [0] FATAL: the database system is in recovery
mode
[2009-01-29 13:55:12.010 CET] [pgtest02] [:] [] [12023] [0] LOG: autovacuum
launcher started
[2009-01-29 13:55:12.011 CET] [pgtest02] [:] [] [26466] [0] LOG: database
system is ready to accept connections
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Could you provide a backtrace? Do you use unchanged norwegian.stop file?
I'm not able to reproduce the bug - postgres just works.
Tommy Gildseth wrote:
While trying to create a new dictionary for use with PostgreSQL text
search, I get a segfault. My Postgres version is 8.3.5
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
It works on one of my servers:
SELECT version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 8.3.4 on i686-pc-linux-gnu, compiled by GCC cc (GCC) 3.2.3
20030502 (Red Hat Linux 3.2.3-59)
The one it fails on is running:
SELECT version();
version
----------------------------------------------------------------------------------------------------------------
PostgreSQL 8.3.5 on x86_64-unknown-linux-gnu, compiled by GCC cc (GCC)
3.2.3 20030502 (Red Hat Linux 3.2.3-59)
Oleg Bartunov wrote:
Tommy,
I tried your example and didn't find any problem.
My postgresql version is 8.3.3 and I didn't use stopwords, since I don't
have them.arxiv=# select version();
version
------------------------------------------------------------------------
PostgreSQL 8.3.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.2.3
(1 row)arxiv=# select ts_lexize('no_ispell',
'overbuljongterningpakkmesterassistent');
ts_lexize
----------------------------------------------
{over,buljong,terning,pakk,mester,assistent}
(1 row)Time: 922.364 ms
Oleg
On Thu, 29 Jan 2009, Tommy Gildseth wrote:
While trying to create a new dictionary for use with PostgreSQL text
search, I get a segfault. My Postgres version is 8.3.5The dictionary I use is the "Norwegian Bokm?l & Nynorsk (Norway) pack
for OOo 2.x" downloaded from
http://wiki.services.openoffice.org/wiki/Dictionaries#Norwegian_.28Norway.29I've also uploaded the dictionary and affix-file used to this location:
http://folk.uio.no/tommygi/postgres/nb_no.dict
http://folk.uio.no/tommygi/postgres/nb_no.affixThese are unpacked from the zip-file I got from the above location,
and converted to UTF-8 with the following commands:
- iconv -f latin1 -t utf-8 nb_NO.dic > nb_no.dict
- iconv -f latin1 -t utf-8 nb_NO.aff > nb_no.affixThe command I use is this:
CREATE TEXT SEARCH DICTIONARY no_ispell (
TEMPLATE = ispell,
DictFile = nb_no,
AffFile = nb_no,
StopWords = norwegian
);I've uploaded a strace of the process at
http://folk.uio.no/tommygi/postgres/strace.txt
This is captured with strace -p<pid>. Since this is about the extent
of my knowledge of strace, please let me know if there's any other
options I should specify.Relevant parts of the postgres log file is included below.
[2009-01-29 13:55:11.721 CET] [pgtest02] [:] [] [26466] [0] LOG:
server process (PID 12002) was terminated by signal 11: Segmentation
fault
[2009-01-29 13:55:11.721 CET] [pgtest02] [:] [] [26466] [0] LOG:
terminating any other active server processes
[2009-01-29 13:55:11.725 CET] [pgtest02] [:] [] [26466] [0] LOG: all
server processes terminated; reinitializing
[2009-01-29 13:55:11.931 CET] [pgtest02] [:] [] [12018] [0] LOG:
database system was interrupted; last known up at 2009-01-29 13:42:05 CET
[2009-01-29 13:55:11.931 CET] [pgtest02] [:] [] [12018] [0] LOG:
database system was not properly shut down; automatic recovery in
progress
[2009-01-29 13:55:11.934 CET] [pgtest02] [:] [] [12018] [0] LOG:
record with zero length at 0/4A2F9A0
[2009-01-29 13:55:11.934 CET] [pgtest02] [:] [] [12018] [0] LOG: redo
is not required
[2009-01-29 13:55:11.986 CET] [pgtest02] [fotoportal:postgres]
[192.168.6.137(49650)] [12019] [0] FATAL: the database system is in
recovery mode
[2009-01-29 13:55:11.988 CET] [pgtest02] [fotoportal:postgres]
[192.168.6.137(49655)] [12020] [0] FATAL: the database system is in
recovery mode
[2009-01-29 13:55:12.010 CET] [pgtest02] [:] [] [12023] [0] LOG:
autovacuum launcher started
[2009-01-29 13:55:12.011 CET] [pgtest02] [:] [] [26466] [0] LOG:
database system is ready to accept connectionsRegards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
--
Tommy Gildseth
DBA, Gruppe for databasedrift
Universitetet i Oslo, USIT
m: +47 45 86 38 50
t: +47 22 85 29 39
Yes, originaly I used a customized norwegian.stop-file, but I changed
that to the one that comes with PostgreSQL, and got the same error.
How do I make a backtrace?
Teodor Sigaev wrote:
Could you provide a backtrace? Do you use unchanged norwegian.stop file?
I'm not able to reproduce the bug - postgres just works.Tommy Gildseth wrote:
While trying to create a new dictionary for use with PostgreSQL text
search, I get a segfault. My Postgres version is 8.3.5
--
Tommy Gildseth
I tried without specifying a StopWords-list as well, but same thing happens.
Teodor Sigaev wrote:
Could you provide a backtrace? Do you use unchanged norwegian.stop file?
I'm not able to reproduce the bug - postgres just works.Tommy Gildseth wrote:
While trying to create a new dictionary for use with PostgreSQL text
search, I get a segfault. My Postgres version is 8.3.5
--
Tommy Gildseth
How do I make a backtrace?
- if you have coredump, just execute gdb /PATH1/postgres gdb /PATH2/core and
type bt. Linux doesn't make core by default, so you allow to do it by ulimit -c
unlimited for postgresql user
- connect to db, and attach gdb to backend process: gdb /PATH1/postgres
BACKEND_PID and type run in gdb, next, execute CREATE DICTIONARY and type bt in gdb
Teodor Sigaev wrote:
Could you provide a backtrace? Do you use unchanged norwegian.stop file?
I'm not able to reproduce the bug - postgres just works.Tommy Gildseth wrote:
While trying to create a new dictionary for use with PostgreSQL text
search, I get a segfault. My Postgres version is 8.3.5
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
to get coredump:
sudo su - postgres
ulimit -c unlimited
pg_ctl restart
Also, Teodor - mind the fact, that his machine is 64, and you've
tested it on 32bits.
Teodor Sigaev wrote:
How do I make a backtrace?
- if you have coredump, just execute gdb /PATH1/postgres gdb /PATH2/core
and type bt. Linux doesn't make core by default, so you allow to do it
by ulimit -c unlimited for postgresql user
- connect to db, and attach gdb to backend process: gdb /PATH1/postgres
BACKEND_PID and type run in gdb, next, execute CREATE DICTIONARY and
type bt in gdb
I am completely unfamiliar with gdb, but hopefully, this is what you are
after?
(gdb) continue
Continuing.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182898870848 (LWP 606)]
0x00000000005a6f65 in makeCompoundFlags ()
(gdb) bt
#0 0x00000000005a6f65 in makeCompoundFlags ()
#1 0x00000000005a712a in mkSPNode ()
#2 0x00000000005a709b in mkSPNode ()
#3 0x00000000005a709b in mkSPNode ()
#4 0x00000000005a709b in mkSPNode ()
#5 0x00000000005a71a2 in mkSPNode ()
#6 0x00000000005a734d in NISortDictionary ()
#7 0x00000000005a4dae in dispell_init ()
#8 0x0000000000638823 in OidFunctionCall1 ()
#9 0x00000000004f4800 in verify_dictoptions ()
#10 0x00000000004f48da in DefineTSDictionary ()
#11 0x000000000059c4fe in PortalRunUtility ()
#12 0x000000000059c5dc in PortalRunMulti ()
#13 0x000000000059bf28 in PortalRun ()
#14 0x00000000005977d2 in exec_simple_query ()
#15 0x000000000059a874 in PostgresMain ()
#16 0x0000000000572821 in BackendRun ()
#17 0x0000000000572112 in BackendStartup ()
#18 0x000000000057034c in ServerLoop ()
#19 0x000000000056f938 in PostmasterMain ()
#20 0x0000000000529f5b in main ()
--
Tommy Gildseth
if it's
static uint32
makeCompoundFlags(IspellDict *Conf, int affix)
{
uint32 flag = 0;
char *str = Conf->AffixData[affix];
while (str && *str)
{
flag |= Conf->flagval[(unsigned int) *str];
str++;
}
return (flag & FF_DICTFLAGMASK);
}
Than I have quite few notes about that function:
- affix is not checked on entry, and should be unsigned,
- for sake of safety uint32_t should be used instead of unsigned int,
in the cast
- there should be some safety limit for lenght of str,
I reproduced the bug with a help of Grzegorz's point for 64-bit box. So, patch
is attached and I'm going to commit it
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Attachments:
unc.patchtext/plain; name=unc.patchDownload+10-10
Than I have quite few notes about that function:
- affix is not checked on entry, and should be unsigned,
Could be Assert( affix>=0 && affix < Conf->nAffixData )
- for sake of safety uint32_t should be used instead of unsigned int,
in the cast
see patch
- there should be some safety limit for lenght of str,
It's a C-string
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
On Thu, Jan 29, 2009 at 3:26 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
I reproduced the bug with a help of Grzegorz's point for 64-bit box. So,
patch is attached and I'm going to commit it
:)
To be honest, looking through that file, I am quite worried about few
points. I don't know too much about insights of ispell, but I see few
suspicious things in mkSPNode too.
I generally don't want to get involve in reviewing code for stuff I
don't know, But if Teodor (and Oleg) don't mind, I can raise my
points, and see if anything useful comes out of it.
Also, about that patch - it doesn't seem to apply cleanly to 8.4,
perhaps that file has changed too much (I based my 'review' above on
8.4's code)
--
GJ
On Thu, Jan 29, 2009 at 3:32 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
Than I have quite few notes about that function:
- affix is not checked on entry, and should be unsigned,Could be Assert( affix>=0 && affix < Conf->nAffixData )
wouldn't that crash pg backend too ?
The structure that this file parses, does it come straight from ispell
file, or is it being already parsed (and checked for errors) ?
--
GJ
Teodor Sigaev <teodor@sigaev.ru> writes:
I reproduced the bug with a help of Grzegorz's point for 64-bit box.
Hmm, seems it's not so much a "64 bit" error as a "signed vs unsigned
char" issue? Does this affect the old contrib/tsearch2 code?
Please try to make the commits in the next eight hours, as we have
release wraps scheduled for tonight.
regards, tom lane
Teodor Sigaev <teodor@sigaev.ru> writes:
I reproduced the bug with a help of Grzegorz's point for 64-bit box. So, patch
is attached and I'm going to commit it
...
! Conf->flagval[(unsigned int) *s] = (unsigned char) val;
...
! Conf->flagval[*(unsigned char*) s] = (unsigned char) val;
Maybe I'm missing something but I don't understand how this fixes the problem.
s is a "char*" so type punning it to an unsigned char * before dereferencing
it is really the same as casting it to unsigned char directly and casting it
to unsigned int really ought to have done the same thing anyways.
All of the changes are of this type so I can't see how your patch could have
fixed the problem.
And in general casting the pointer before dereferencing it is a whole lot
scarier code which should raise eyebrows a lot faster than just a simple cast
to unsigned char like you had it originally.
What really boggles me is why you don't just use unsigned chars everywhere and
remove all of these casts. or would that just move the casts to strcmp and
company?
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!
Tom Lane wrote:
Teodor Sigaev <teodor@sigaev.ru> writes:
I reproduced the bug with a help of Grzegorz's point for 64-bit box.
Hmm, seems it's not so much a "64 bit" error as a "signed vs unsigned
char" issue?
Yes, but I don't understand why it worked in 32-bit box.
Does this affect the old contrib/tsearch2 code?
Will check.
Please try to make the commits in the next eight hours, as we have
release wraps scheduled for tonight.
Minor versions or beta of 8.4? if last, I'd like to commit btree_gin and
fast_update_gin. For both patches all pointed issues was resolved and Jeff,
seems, haven't objections.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Gregory Stark <stark@enterprisedb.com> writes:
Maybe I'm missing something but I don't understand how this fixes the problem.
s is a "char*" so type punning it to an unsigned char * before dereferencing
it is really the same as casting it to unsigned char directly
No, it isn't. If char is signed then you'll get quite different results
from a high-bit-set byte value, because sign extension will happen
before the value is reinterpreted as unsigned.
regards, tom lane
On Thu, 2009-01-29 at 19:00 +0300, Teodor Sigaev wrote:
Please try to make the commits in the next eight hours, as we have
release wraps scheduled for tonight.Minor versions or beta of 8.4?
Minor versions.
--
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org
To be honest, looking through that file, I am quite worried about few
points. I don't know too much about insights of ispell, but I see few
suspicious things in mkSPNode too.
I generally don't want to get involve in reviewing code for stuff I
don't know, But if Teodor (and Oleg) don't mind, I can raise my
points, and see if anything useful comes out of it.
If you see bug/mistake/suspicious point, please, don't be quiet
Also, about that patch - it doesn't seem to apply cleanly to 8.4,
perhaps that file has changed too much (I based my 'review' above on
8.4's code)
will tweak
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/