Re: PG crash on simple query, story continues

Started by Maksim Likharevover 22 years ago9 messages
#1Maksim Likharev
mlikharev@aurigin.com

After upgrade on 7.3.3 we have following:

signal 11
#0 0x254f38 in pfree ()
#1 0x1fde44 in convert_to_scalar ()
#2 0x1faafc in scalarineqsel ()
#3 0x1fd574 in mergejoinscansel ()
#4 0x14fec8 in cost_mergejoin ()
#5 0x16b820 in create_mergejoin_path ()
#6 0x155048 in sort_inner_and_outer ()
#7 0x154dd0 in add_paths_to_joinrel ()
#8 0x1567cc in make_join_rel ()
#9 0x15669c in make_jointree_rel ()
#10 0x14dd28 in make_fromexpr_rel ()
#11 0x14d6d0 in make_one_rel ()
#12 0x15d328 in subplanner ()
#13 0x15d218 in query_planner ()
#14 0x15f29c in grouping_planner ()
#15 0x15d93c in subquery_planner ()
#16 0x15d5e4 in planner ()
#17 0x1a6a94 in pg_plan_query ()
#18 0x1a712c in pg_exec_query_string ()
#19 0x1a8fd8 in PostgresMain ()
#20 0x172698 in DoBackend ()
#21 0x171ac4 in BackendStartup ()
#22 0x16ff14 in ServerLoop ()
#23 0x16f780 in PostmasterMain ()
#24 0x128e60 in main ()

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, July 07, 2003 10:14 PM
To: Maksim Likharev
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] PG crash on simple query, story continues

"Maksim Likharev" <mlikharev@aurigin.com> writes:

SELECT p.docid FROM prod.t_documents AS p
INNER JOIN t_tempdocs AS t
ON p.docid = t.docid
LEFT OUTER JOIN prod.t_refs AS ct
ON ct.docid = p.docid;

here is a stack trace:
00252174 AllocSetAlloc (3813b0, 15, 251fe0, 20, 0, ffbee2f8) + 194
002532e4 MemoryContextAlloc (3813b0, 15, 11, 7efefeff, 81010100,

ff00)

+ 68
0020dc0c varcharin (ffbee378, ffbee378, 20dae4, 0, 0, ffbee3f0) + 128
00243570 FunctionCall3 (ffbee4a8, 3c1ce8, 0, 324, 0, ffbee5c4) + 11c
0023e6c4 get_attstatsslot (3d6410, 413, 324, 2, 0, ffbee5c4) + 2b0
001f8cb4 scalarineqsel (3bb978, 42a, 0, 3bffa8, 40f0e8, 413) + 288
001fb824 mergejoinscansel (3bb978, 3c0080, 3c0968, 3c0970, 0, 1) +

23c

Hmm, it would seem there's something flaky about your pg_statistic
entries. Could we see the pg_stats rows for the columns mentioned
in this query?

regards, tom lane

#2Maksim Likharev
mlikharev@aurigin.com
In reply to: Maksim Likharev (#1)

Hi, I have very interesting suspicion:
See my comments !

convert_string_datum

...
!this is my case
if (!lc_collate_is_c())
{
/* Guess that transformed string is not much bigger than
original */
xfrmsize = strlen(val) + 32; /* arbitrary pad value
here... */

! I would say very interesting aproach,
! why not just
xfrmsize = strxfrm(xfrmstr, NULL, 0);

! fine
xfrmstr = (char *) palloc(xfrmsize);
!fine
xfrmlen = strxfrm(xfrmstr, val, xfrmsize);

!if error happend, xfrmlen will be (size_t)-1
if (xfrmlen >= xfrmsize) {
!yep did not make it
/* Oops, didn't make it */
pfree(xfrmstr);

!what do we allocating here? 0 byte
xfrmstr = (char *) palloc(xfrmlen + 1);

!BOOM
xfrmlen = strxfrm(xfrmstr, val, xfrmlen + 1);
}
pfree(val);
val = xfrmstr;
}

-----Original Message-----
From: Maksim Likharev
Sent: Tuesday, July 08, 2003 9:35 AM
To: 'Tom Lane'
Cc: pgsql-general@postgresql.org; 'pgsql-hackers@postgresql.org'
Subject: RE: [GENERAL] PG crash on simple query, story continues

After upgrade on 7.3.3 we have following:

signal 11
#0 0x254f38 in pfree ()
#1 0x1fde44 in convert_to_scalar ()
#2 0x1faafc in scalarineqsel ()
#3 0x1fd574 in mergejoinscansel ()
#4 0x14fec8 in cost_mergejoin ()
#5 0x16b820 in create_mergejoin_path ()
#6 0x155048 in sort_inner_and_outer ()
#7 0x154dd0 in add_paths_to_joinrel ()
#8 0x1567cc in make_join_rel ()
#9 0x15669c in make_jointree_rel ()
#10 0x14dd28 in make_fromexpr_rel ()
#11 0x14d6d0 in make_one_rel ()
#12 0x15d328 in subplanner ()
#13 0x15d218 in query_planner ()
#14 0x15f29c in grouping_planner ()
#15 0x15d93c in subquery_planner ()
#16 0x15d5e4 in planner ()
#17 0x1a6a94 in pg_plan_query ()
#18 0x1a712c in pg_exec_query_string ()
#19 0x1a8fd8 in PostgresMain ()
#20 0x172698 in DoBackend ()
#21 0x171ac4 in BackendStartup ()
#22 0x16ff14 in ServerLoop ()
#23 0x16f780 in PostmasterMain ()
#24 0x128e60 in main ()

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, July 07, 2003 10:14 PM
To: Maksim Likharev
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] PG crash on simple query, story continues

"Maksim Likharev" <mlikharev@aurigin.com> writes:

SELECT p.docid FROM prod.t_documents AS p
INNER JOIN t_tempdocs AS t
ON p.docid = t.docid
LEFT OUTER JOIN prod.t_refs AS ct
ON ct.docid = p.docid;

here is a stack trace:
00252174 AllocSetAlloc (3813b0, 15, 251fe0, 20, 0, ffbee2f8) + 194
002532e4 MemoryContextAlloc (3813b0, 15, 11, 7efefeff, 81010100,

ff00)

+ 68
0020dc0c varcharin (ffbee378, ffbee378, 20dae4, 0, 0, ffbee3f0) + 128
00243570 FunctionCall3 (ffbee4a8, 3c1ce8, 0, 324, 0, ffbee5c4) + 11c
0023e6c4 get_attstatsslot (3d6410, 413, 324, 2, 0, ffbee5c4) + 2b0
001f8cb4 scalarineqsel (3bb978, 42a, 0, 3bffa8, 40f0e8, 413) + 288
001fb824 mergejoinscansel (3bb978, 3c0080, 3c0968, 3c0970, 0, 1) +

23c

Hmm, it would seem there's something flaky about your pg_statistic
entries. Could we see the pg_stats rows for the columns mentioned
in this query?

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Maksim Likharev (#2)

"Maksim Likharev" <mlikharev@aurigin.com> writes:

! I would say very interesting aproach,
! why not just
xfrmsize = strxfrm(xfrmstr, NULL, 0);

strxfrm doesn't work that way (and if it did, it would give back a
malloc'd not a palloc'd string).

!if error happend, xfrmlen will be (size_t)-1

No it won't; see the man page for strxfrm.

This does raise an interesting thought though: what platform are you on?
It seems to me that we've heard of buggy versions of strxfrm that write
more bytes than they're allowed to, thereby clobbering palloc's data
structures.

regards, tom lane

#4Maksim Likharev
mlikharev@aurigin.com
In reply to: Tom Lane (#3)

!if error happend, xfrmlen will be (size_t)-1

No it won't; see the man page for strxfrm.

RETURN VALUES
Upon successful completion, strxfrm() returns the length of
the transformed string (not including the terminating null
byte). If the value returned is n or more, the contents of
the array pointed to by s1 are indeterminate.

On failure, strxfrm() returns (size_t)-1.

but you a right it is strxfrm() that returns more than allowed,
most likely in following condition:
strxfrm(xfrmstr, val, 0)

a null terminator extra.

I am on SunOS 5.8,
BTW on Linux it works....

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, July 08, 2003 11:45 AM
To: Maksim Likharev
Cc: pgsql-general@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [GENERAL] PG crash on simple query, story continues

"Maksim Likharev" <mlikharev@aurigin.com> writes:

! I would say very interesting aproach,
! why not just
xfrmsize = strxfrm(xfrmstr, NULL, 0);

strxfrm doesn't work that way (and if it did, it would give back a
malloc'd not a palloc'd string).

!if error happend, xfrmlen will be (size_t)-1

No it won't; see the man page for strxfrm.

This does raise an interesting thought though: what platform are you on?
It seems to me that we've heard of buggy versions of strxfrm that write
more bytes than they're allowed to, thereby clobbering palloc's data
structures.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Maksim Likharev (#4)

"Maksim Likharev" <mlikharev@aurigin.com> writes:

On failure, strxfrm() returns (size_t)-1.

Not according to the Single Unix Specification, Linux, or HP-UX;
I don't have any others to check. But anyway, that is not causing
your problem, since palloc(0) would complain not dump core.

I am on SunOS 5.8,

Solaris, eh? IIRC, it was Solaris that we last heard about broken
strxfrm on. Better check to see if Sun has a fix for this.

regards, tom lane

#6Maksim Likharev
mlikharev@aurigin.com
In reply to: Tom Lane (#5)

I would referrer dump that gar.xxg, and put PG on Linux,
but this is not up to me.
Thanks for the help.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, July 08, 2003 3:58 PM
To: Maksim Likharev
Cc: pgsql-general@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [GENERAL] PG crash on simple query, story continues

"Maksim Likharev" <mlikharev@aurigin.com> writes:

On failure, strxfrm() returns (size_t)-1.

Not according to the Single Unix Specification, Linux, or HP-UX;
I don't have any others to check. But anyway, that is not causing
your problem, since palloc(0) would complain not dump core.

I am on SunOS 5.8,

Solaris, eh? IIRC, it was Solaris that we last heard about broken
strxfrm on. Better check to see if Sun has a fix for this.

regards, tom lane

#7Maksim Likharev
mlikharev@aurigin.com
In reply to: Maksim Likharev (#6)

So following modification seems to fixed all PG (7.3/7.3.3)crashes on
Solaris ( NON C LOCALE )

selfuncs.c line 2356:

I changed:
xfrmsize = strlen(val) + 32; /*arbitrary pad value here...*/
to
xfrmsize = strxfrm(NULL, val, 0) + 32;

so basically instead of wild guess of transformed string size I asking
"strxfrm" for that.

+32 out my desperation, strxfrm(NULL, val, 0) + 1 should be fine ( have
not tested that )...

Out of curiosity:
Really interesting, following condition seems to be impossible anymore,
of cause if something went terribly wrong,

die here, return original string, return empty string?

if (xfrmlen >= xfrmsize) {
pfree(xfrmstr);
xfrmstr = (char *) palloc(xfrmlen + 1);
xfrmlen = strxfrm(xfrmstr, val, xfrmlen + 1);
}

Again fixed all crashes on Sun 5.8 ( PG 7.3.3, en_US locale, LATIN1
encoding ) Generic Patch...

P.S
NO SUPPORT, NO WARRANTY, NO NOTHING, just for you information.

Regards.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, July 08, 2003 3:58 PM
To: Maksim Likharev
Cc: pgsql-general@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [GENERAL] PG crash on simple query, story continues

"Maksim Likharev" <mlikharev@aurigin.com> writes:

On failure, strxfrm() returns (size_t)-1.

Not according to the Single Unix Specification, Linux, or HP-UX;
I don't have any others to check. But anyway, that is not causing
your problem, since palloc(0) would complain not dump core.

I am on SunOS 5.8,

Solaris, eh? IIRC, it was Solaris that we last heard about broken
strxfrm on. Better check to see if Sun has a fix for this.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Maksim Likharev (#7)
Re: [HACKERS] PG crash on simple query, story continues

"Maksim Likharev" <mlikharev@aurigin.com> writes:

So following modification seems to fixed all PG (7.3/7.3.3)crashes on
Solaris ( NON C LOCALE )

Given that the problem is Solaris' tendency to write more data than
the specified output buffer length allows, I'd think this is still
risking a core dump (due to null pointer dereference).

regards, tom lane

#9Maksim Likharev
mlikharev@aurigin.com
In reply to: Tom Lane (#8)
Re: [HACKERS] PG crash on simple query, story continues

Possible, but if before almost every tenth query crash the server
now it stays, that's only I care about.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Saturday, July 12, 2003 2:05 PM
To: Maksim Likharev
Cc: pgsql-general@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] [GENERAL] PG crash on simple query, story
continues

"Maksim Likharev" <mlikharev@aurigin.com> writes:

So following modification seems to fixed all PG (7.3/7.3.3)crashes on
Solaris ( NON C LOCALE )

Given that the problem is Solaris' tendency to write more data than
the specified output buffer length allows, I'd think this is still
risking a core dump (due to null pointer dereference).

regards, tom lane