BUG #1473: Backend bus error, possibly due to ANALYZE

Started by Brian B.about 21 years ago23 messagesbugs
Jump to latest
#1Brian B.
brian-pgsql@bbdab.org

The following bug has been logged online:

Bug reference: 1473
Logged by: Brian B.
Email address: brian-pgsql@bbdab.org
PostgreSQL version: 8.0.1
Operating system: FreeBSD 4.11-STABLE
Description: Backend bus error, possibly due to ANALYZE
Details:

Hello,

I am loading some spam/ham data/tokens, to be used for the dspam anti-spam
software, into PostgreSQL. After a few hours of inserting and updating the
existing data, the backend crashes with a signal 10 (bus error). I am also
running an ANALYZE command to help keep the indexes up to date every 10
minutes. I have rebuilt PostgreSQL with debugging enabled and have a
resulting core dump. I am not sure who to send this to, however. The
backtrace using the core file is as follows:

#0 0x80c4b52 in transformExpr (pstate=0x86fa630, expr=0x85ec148)
at parse_expr.c:81
#1 0x80c5126 in transformExpr (pstate=0x86fa630, expr=0x85ec1d0)
at parse_expr.c:270
#2 0x80c5126 in transformExpr (pstate=0x86fa630, expr=0x85ec258)
at parse_expr.c:270
#3 0x80c5126 in transformExpr (pstate=0x86fa630, expr=0x85ec2e0)
at parse_expr.c:270

[... repeat above several thousand times ...]

#8139 0x80c5126 in transformExpr (pstate=0x86fa630, expr=0x86fa530)
at parse_expr.c:270
#8140 0x80c50c2 in transformExpr (pstate=0x86fa630, expr=0x86fa558)
at parse_expr.c:257
#8141 0x80c40e3 in transformWhereClause (pstate=0x86fa630, clause=0x86fa558,

constructName=0x81f8e75 "WHERE") at parse_clause.c:1013
#8142 0x80b178d in transformSelectStmt (pstate=0x86fa630, stmt=0x86fa580)
at analyze.c:1826
#8143 0x80af817 in transformStmt (pstate=0x86fa630, parseTree=0x86fa580,
extras_before=0xbfbfe6f0, extras_after=0xbfbfe6f4) at analyze.c:371
#8144 0x80af5ba in do_parse_analyze (parseTree=0x86fa580, pstate=0x86fa630)
at analyze.c:245
#8145 0x80af4ba in parse_analyze (parseTree=0x86fa580, paramTypes=0x0,
numParams=0) at analyze.c:169
#8146 0x816fc9e in pg_analyze_and_rewrite (parsetree=0x86fa580,
paramTypes=0x0, numParams=0) at postgres.c:555
#8147 0x8170061 in exec_simple_query (
query_string=0x83cd018 "SELECT uid, token, spam_hits, innocent_hits FROM
dspam_token_data WHERE uid = '1000' AND token IN
('913609592713134710','6655100721407275609','-7433241509246025544','61795992
61633421689','588386968494"...)
at postgres.c:872
#8148 0x8172713 in PostgresMain (argc=4, argv=0x82e4258,
username=0x82e4238 "dspam") at postgres.c:3007
#8149 0x814dc49 in BackendRun (port=0x82e2000) at postmaster.c:2816
#8150 0x814d47e in BackendStartup (port=0x82e2000) at postmaster.c:2452
#8151 0x814b9ba in ServerLoop () at postmaster.c:1199
#8152 0x814b31a in PostmasterMain (argc=3, argv=0xbfbff3d0) at
postmaster.c:918
#8153 0x811f2b9 in main (argc=3, argv=0xbfbff3d0) at main.c:268
(gdb)

I have compiled PostgreSQL with pthread for pl/python capability.

Thanks,

Brian

#2Neil Conway
neilc@samurai.com
In reply to: Brian B. (#1)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

On Thu, 2005-02-10 at 02:37 +0000, Brian B. wrote:

I am loading some spam/ham data/tokens, to be used for the dspam anti-spam
software, into PostgreSQL. After a few hours of inserting and updating the
existing data, the backend crashes with a signal 10 (bus error). I am also
running an ANALYZE command to help keep the indexes up to date every 10
minutes.

As far as I can tell, ANALYZE is not at issue here.

I have rebuilt PostgreSQL with debugging enabled and have a
resulting core dump. I am not sure who to send this to, however. The
backtrace using the core file is as follows:

#0 0x80c4b52 in transformExpr (pstate=0x86fa630, expr=0x85ec148)
at parse_expr.c:81
#1 0x80c5126 in transformExpr (pstate=0x86fa630, expr=0x85ec1d0)
at parse_expr.c:270
#2 0x80c5126 in transformExpr (pstate=0x86fa630, expr=0x85ec258)
at parse_expr.c:270
#3 0x80c5126 in transformExpr (pstate=0x86fa630, expr=0x85ec2e0)
at parse_expr.c:270

[... repeat above several thousand times ...]

It seems what's happening here is that dspam is submitting a query with
many thousands of elements in the IN clause. In the parser, we transform
"foo IN (a, b, c)" into "foo = a OR foo = b OR foo = c", and then
recurse for each element of the OR expression and eventually run out of
stack space. (Note that this will actually be worse in HEAD, since a
refactoring I applied will mean we consume two stack frames for each
expression.)

A workaround would be to increase PostgreSQL's stack size.

Perhaps it would be worth considering representing IN lists as a
distinct expression type, at least in the parser. Then the
transformExpr() code would look like:

foreach (element of IN list)
transformExpr(element);
... do whatever else ...

so we wouldn't need to recurse. We could then transform the new
expression type into a list of OR clauses at this point.

-Neil

#3John Hansen
john@geeknet.com.au
In reply to: Neil Conway (#2)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

It seems what's happening here is that dspam is submitting a
query with many thousands of elements in the IN clause. In
the parser, we transform "foo IN (a, b, c)" into "foo = a OR
foo = b OR foo = c", and then recurse for each element of the
OR expression and eventually run out of stack space. (Note
that this will actually be worse in HEAD, since a refactoring
I applied will mean we consume two stack frames for each
expression.)

A workaround would be to increase PostgreSQL's stack size.

Perhaps it would be worth considering representing IN lists
as a distinct expression type, at least in the parser. Then the
transformExpr() code would look like:

Just like I showed earlier on large IN () lists are useless....
Instead I use the UNNEST function I posted earlier (see http://archives.postgresql.org/pgsql-hackers/2004-11/msg00327.php) like so:

Select id from table inner join unnest(array[1,2,3,4,...]) as d(id) using(id);

Not only does it not crash the backend,. But it also proved to be faster, tho admittedly not much.

... John

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brian B. (#1)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

"Brian B." <brian-pgsql@bbdab.org> writes:

I am loading some spam/ham data/tokens, to be used for the dspam anti-spam
software, into PostgreSQL. After a few hours of inserting and updating the
existing data, the backend crashes with a signal 10 (bus error).

Judging by the trace, this doesn't depend on your data at all, just on
the specific query and the declarations of the tables it references.
Please see if you can extract a self-contained test case that way.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#2)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

Neil Conway <neilc@samurai.com> writes:

It seems what's happening here is that dspam is submitting a query with
many thousands of elements in the IN clause. In the parser, we transform
"foo IN (a, b, c)" into "foo = a OR foo = b OR foo = c", and then
recurse for each element of the OR expression and eventually run out of
stack space.

There is a check_stack_depth call in there, so this could only be the
explanation if max_stack_depth is set too high for the actual
stack depth limit. What's the platform, and what ulimit values is the
postmaster started under?

Perhaps it would be worth considering representing IN lists as a
distinct expression type, at least in the parser. Then the
transformExpr() code would look like:

foreach (element of IN list)
transformExpr(element);
... do whatever else ...

so we wouldn't need to recurse. We could then transform the new
expression type into a list of OR clauses at this point.

Waste of time unless you were to propagate this representation all the
way through; as described above you'd merely be postponing the stack
depth problem to a later phase.

regards, tom lane

#6Brian B.
brian-pgsql@bbdab.org
In reply to: Tom Lane (#5)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

Hello,

On Thu, Feb 10, 2005 at 10:08:14AM -0500, Tom Lane wrote:

Neil Conway <neilc@samurai.com> writes:

It seems what's happening here is that dspam is submitting a query with
many thousands of elements in the IN clause. In the parser, we transform
"foo IN (a, b, c)" into "foo = a OR foo = b OR foo = c", and then
recurse for each element of the OR expression and eventually run out of
stack space.

There is a check_stack_depth call in there, so this could only be the
explanation if max_stack_depth is set too high for the actual
stack depth limit. What's the platform, and what ulimit values is the
postmaster started under?

FreeBSD 4.11 on x86 using PostgreSQL 8.0.1

% limits
Resource limits (current):
cputime infinity secs
filesize 512000 kb
datasize 524288 kb
stacksize 65536 kb
coredumpsize 307200 kb
memoryuse-cur 458752 kb
memorylocked-cur 458752 kb
maxprocesses 512
openfiles 7351
sbsize infinity bytes
vmemoryuse infinity kb

max_stack_depth is at the default commented value of 2048

I can bump this value up to test it if desired, it just takes
a while to get to the point of the backend crash scenario.

I feed about 2300 good pieces of email using the dspam corpus
feed program. Around 200 email messages left to feed,
the backend will want to crash around that point. I will try
to narrow down what query is being ran at that point. Perhaps
I can extract it from the core dump.

Thanks,

Brian B.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brian B. (#6)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

"Brian B." <brian-pgsql@bbdab.org> writes:

On Thu, Feb 10, 2005 at 10:08:14AM -0500, Tom Lane wrote:

There is a check_stack_depth call in there, so this could only be the
explanation if max_stack_depth is set too high for the actual
stack depth limit. What's the platform, and what ulimit values is the
postmaster started under?

FreeBSD 4.11 on x86 using PostgreSQL 8.0.1

% limits
Resource limits (current):
stacksize 65536 kb

Hmm, 64meg should certainly be far past where check_stack_depth will
start to complain, so there's something else going on here. (BUT:
the ulimits you see interactively aren't necessarily what the postmaster
was started under. You might want to confirm that the results are the
same if you put the limits call into the boot script that launches
the postmaster...)

the backend will want to crash around that point. I will try
to narrow down what query is being ran at that point. Perhaps
I can extract it from the core dump.

See debug_query_string.

regards, tom lane

#8Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#7)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

Tom Lane wrote:

Hmm, 64meg should certainly be far past where check_stack_depth will
start to complain, so there's something else going on here.

Right; if those really are the stack size limits for the crashing
backend, I guess my initial analysis must have been mistaken. But I'm
mystified as to what else the problem could possibly be...

-Neil

#9Brian B.
brian-pgsql@bbdab.org
In reply to: Tom Lane (#7)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

On Thu, Feb 10, 2005 at 05:24:12PM -0500, Tom Lane wrote:

Hmm, 64meg should certainly be far past where check_stack_depth will
start to complain, so there's something else going on here. (BUT:
the ulimits you see interactively aren't necessarily what the postmaster
was started under. You might want to confirm that the results are the
same if you put the limits call into the boot script that launches
the postmaster...)

I put limits in the startup script for PostgreSQL and came up with the same
64MB stacksize value.

See debug_query_string.

I've got the gdb output of printing debug_query_string and the text saved
from script(1) is about 267KB in size. Should I send this to you off-list?

Thanks,

Brian B.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brian B. (#9)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

"Brian B." <brian-pgsql@bbdab.org> writes:

See debug_query_string.

I've got the gdb output of printing debug_query_string and the text saved
from script(1) is about 267KB in size. Should I send this to you off-list?

Please. Also I'll need the schemas for the tables the query uses.

regards, tom lane

#11Brian B.
brian-pgsql@bbdab.org
In reply to: Tom Lane (#10)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

On Fri, Feb 11, 2005 at 04:56:16PM -0500, Tom Lane wrote:

regression=# create function infinite_recurse() returns int as '
regression'# select infinite_recurse()' language sql;
regression=# \set VERBOSITY terse
regression=# select infinite_recurse();
ERROR: stack depth limit exceeded

and see if you get the proper error or a core dump?

This makes the backend core dump in about 5-10 seconds. It looks like your
analysis is correct. Excellent work. I am not sure if it's my particular
FreeBSD installation that is screwing up or if it's FreeBSD in general.
Did you happen to test with FreeBSD on one of your test machines?

If it dumps core, then the thing to look at is check_stack_depth() in
src/backend/tcop/postgres.c. Maybe your compiler is bogusly optimizing
the address arithmetic there?

It is possible. Using gcc 2.95.4. I will eliminate all the optimization
options when I recompile PostgreSQL and see what happens.

Thanks,

Brian B.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brian B. (#11)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

"Brian B." <brian-pgsql@bbdab.org> writes:

Did you happen to test with FreeBSD on one of your test machines?

No, I don't have any BSD machines here. However, I've added this test
case to the regression tests, so in a few hours we'll have a spectrum
of results from the PG build farm.

regards, tom lane

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#12)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

Did you happen to test with FreeBSD on one of your test machines?

No, I don't have any BSD machines here. However, I've added this test
case to the regression tests, so in a few hours we'll have a spectrum
of results from the PG build farm.

FWIW, I see a "pass" from buildfarm member cockatoo, which claims to be
FreeBSD 4.10-STABLE gcc 2.95.4 x86 ... that's at least pretty close to
your setup, no?

http://www.pgbuildfarm.org/cgi-bin/show_status.pl

regards, tom lane

#14Brian B.
brian-pgsql@bbdab.org
In reply to: Tom Lane (#13)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

On Fri, Feb 11, 2005 at 06:46:55PM -0500, Tom Lane wrote:

Did you happen to test with FreeBSD on one of your test machines?

No, I don't have any BSD machines here. However, I've added this test
case to the regression tests, so in a few hours we'll have a spectrum
of results from the PG build farm.

FWIW, I see a "pass" from buildfarm member cockatoo, which claims to be
FreeBSD 4.10-STABLE gcc 2.95.4 x86 ... that's at least pretty close to
your setup, no?

Yup ... that's really close. I'm going to do a few tests to try to figure
out what's going on with my environment and let you know if I come up with
anything odd.

Thanks for your time!

Brian B.

#15Brian B.
brian-pgsql@bbdab.org
In reply to: Brian B. (#14)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

On Fri, Feb 11, 2005 at 06:54:31PM -0500, I wrote:

Yup ... that's really close. I'm going to do a few tests to try to figure
out what's going on with my environment and let you know if I come up with
anything odd.

I think I have figured out the culprit.

I use the FreeBSD PostgreSQL port and I set the option to use threads so
that pl/python will work with PostgreSQL. If I unset this option and rebuild
the port, I will receive the normal error message but no crash. Just to
make sure it was not the port's fault, I build the source by hand without
any port-specific patches. The same behavior occurs.

So I am not sure if, again, it's my libc_r or something gets messed up in
the PostgreSQL code dealing with the stack/recursion, when I have pthread
libs linked in. I am guessing it is just my particular environment, though.

Thanks,

Brian B.

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brian B. (#15)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

"Brian B." <brian-pgsql@bbdab.org> writes:

I use the FreeBSD PostgreSQL port and I set the option to use threads so
that pl/python will work with PostgreSQL.

What option is that, exactly?

It's entirely possible that something has decided that the backend is
going to be multithreaded and is only giving the "main" thread a
1MB-or-so stack. If so, I would regard this as a build error. We do
not want threading libraries linked into the backend.

regards, tom lane

#17Brian B.
brian-pgsql@bbdab.org
In reply to: Tom Lane (#16)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

On Fri, Feb 11, 2005 at 09:46:50PM -0500, Tom Lane wrote:

"Brian B." <brian-pgsql@bbdab.org> writes:

I use the FreeBSD PostgreSQL port and I set the option to use threads so
that pl/python will work with PostgreSQL.

What option is that, exactly?

It's entirely possible that something has decided that the backend is
going to be multithreaded and is only giving the "main" thread a
1MB-or-so stack. If so, I would regard this as a build error. We do
not want threading libraries linked into the backend.

Apologies, due to message revising, I forgot to include the explanation of what
this port setting entails.

When setting the "LIBC_R" option, the FreeBSD port essentially sets CFLAGS to
-D_THREAD_SAFE and LDFLAGS to -pthread. This is probably due to the Python
procedural handler not being able to link with PostgreSQL until PostgreSQL
is built pthread-aware. I could be wrong on all of this, but it seems to
work as such. Unfortunately, this has some unforeseen broken behavior for
PostgreSQL that was sorta hard to debug. :)

Perhaps there is another way around it. I think the situation now is that I
should converse with the FreeBSD port developer rather than using up your time.

Thank you,

Brian B.

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brian B. (#17)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

"Brian B." <brian-pgsql@bbdab.org> writes:

When setting the "LIBC_R" option, the FreeBSD port essentially sets CFLAGS to
-D_THREAD_SAFE and LDFLAGS to -pthread. This is probably due to the Python
procedural handler not being able to link with PostgreSQL until PostgreSQL
is built pthread-aware.

Hmm, is that a FreeBSD-specific restriction? I've not had any such
trouble on Linux or Mac OS X.

regards, tom lane

#19Brian B.
brian-pgsql@bbdab.org
In reply to: Tom Lane (#18)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

On Fri, Feb 11, 2005 at 10:05:36PM -0500, Tom Lane wrote:

"Brian B." <brian-pgsql@bbdab.org> writes:

When setting the "LIBC_R" option, the FreeBSD port essentially sets CFLAGS to
-D_THREAD_SAFE and LDFLAGS to -pthread. This is probably due to the Python
procedural handler not being able to link with PostgreSQL until PostgreSQL
is built pthread-aware.

Hmm, is that a FreeBSD-specific restriction? I've not had any such
trouble on Linux or Mac OS X.

I was citing that behavior from memory. It actually builds/installs OK until
one decides to add the handler to a database:

$ createlang plpythonu
createlang: language installation failed: ERROR: could not load library
"/usr/local/lib/postgresql/plpython.so": dlopen
'/usr/local/lib/postgresql/plpython.so' failed.
(/usr/local/lib/python2.4/config/libpython2.4.so: Undefined symbol
"pthread_attr_destroy")

If I build the backend with the pthread stuff, this step succeeds.

Thanks,

Brian B.

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brian B. (#19)
Re: BUG #1473: Backend bus error, possibly due to ANALYZE

BTW, something that would be interesting is to figure out what the
thread stack size actually is (I assume this is available in the FreeBSD
docs) and experiment to find what is the maximum value max_stack_depth
can be set to without letting infinite_recurse() dump core.

regards, tom lane

#21Brian B.
brian-pgsql@bbdab.org
In reply to: Tom Lane (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brian B. (#21)
#23Michael Fuhr
mike@fuhr.org
In reply to: Tom Lane (#22)