7.2.1 backend crash (convert_string_datum, locale)

Started by Mats Lofkvistalmost 24 years ago6 messagesbugs
Jump to latest
#1Mats Lofkvist
mal@algonet.se

Hi,

When testing postgres 7.2.1 on a sparc/solaris8 box with
--enable-locale --enable-multibyte I get a crash in
convert_string_datum.

The backend just dies when doing an select. With casserts
and debug configured in I got the following in the log:

NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7818

Gdb on the crashing backend says:

Program received signal SIGSEGV, Segmentation fault.
0x269bd0 in pfree (pointer=0x4b7878) at mcxt.c:446
446 AssertArg(MemoryContextIsValid(header->context));
(gdb) where
#0 0x269bd0 in pfree (pointer=0x4b7878) at mcxt.c:446
#1 0x21844c in convert_string_datum (value=5251848, typid=1043)
at selfuncs.c:2059
#2 0x217978 in convert_to_scalar (value=4947304, valuetypid=1043,
scaledvalue=0xffbee0b8, lobound=5251848, hibound=4946632,
boundstypid=1043, scaledlobound=0xffbee0a8, scaledhibound=0xffbee0b0)
at selfuncs.c:1763
#3 0x214f8c in scalarineqsel (root=0x4aebe8, operator=1066, isgt=0 '\000',
var=0x4b6218, other=0x4b76d8) at selfuncs.c:584
#4 0x21541c in scalarltsel (fcinfo=0xffbee258) at selfuncs.c:733
#5 0x25aa90 in DirectFunctionCall4 (func=0x215304 <scalarltsel>,
arg1=4910056, arg2=1066, arg3=4947368, arg4=0) at fmgr.c:725
#6 0x2199f0 in prefix_selectivity (root=0x4aebe8, var=0x4b6218,
prefix=0x4b7ce8 "SY") at selfuncs.c:2667
#7 0x215854 in patternsel (fcinfo=0xffbee518, ptype=Pattern_Type_Like)
at selfuncs.c:872
#8 0x215a18 in likesel (fcinfo=0xffbee518) at selfuncs.c:913
#9 0x25c5e4 in OidFunctionCall4 (functionId=1819, arg1=4910056, arg2=1213,
arg3=4941064, arg4=1) at fmgr.c:1218
#10 0x185128 in restriction_selectivity (root=0x4aebe8, operator=1213,
args=0x4b6508, varRelid=1) at plancat.c:232
#11 0x167530 in clauselist_selectivity (root=0x4aebe8, clauses=0x4b7678,
varRelid=1) at clausesel.c:156
#12 0x167394 in restrictlist_selectivity (root=0x4aebe8,
restrictinfo_list=0x4b6958, varRelid=1) at clausesel.c:74
#13 0x16a044 in set_baserel_size_estimates (root=0x4aebe8, rel=0x4b6af8)
at costsize.c:1146
#14 0x166ae0 in set_plain_rel_pathlist (root=0x4aebe8, rel=0x4b6af8,
rte=0x4aec78) at allpaths.c:132
#15 0x166aa4 in set_base_rel_pathlists (root=0x4aebe8) at allpaths.c:115
#16 0x1667ec in make_one_rel (root=0x4aebe8) at allpaths.c:62
#17 0x177708 in subplanner (root=0x4aebe8, flat_tlist=0x4b6a18,
tuple_fraction=0) at planmain.c:238
#18 0x177544 in query_planner (root=0x4aebe8, tlist=0x4b5ed8, tuple_fraction=0)
at planmain.c:126
#19 0x17939c in grouping_planner (parse=0x4aebe8, tuple_fraction=0)
at planner.c:1094
#20 0x177d70 in subquery_planner (parse=0x4aebe8, tuple_fraction=-1)
at planner.c:228
#21 0x177a2c in planner (parse=0x4aebe8) at planner.c:94
#22 0x1c821c in pg_plan_query (querytree=0x4aebe8) at postgres.c:513
#23 0x1c871c in pg_exec_query_string (
query_string=0x4ae278 "SELECT find0.userId AS userId, find0.longValue AS findLongValue0 FROM userData find0 WHERE find0.groupName='user' AND find0.attributeName LIKE 'login%' AND find0.value LIKE 'SY%'", dest=Remote,
parse_context=0x464598) at postgres.c:784
#24 0x1ca63c in PostgresMain (argc=4, argv=0xffbef018,
username=0x4607e1 "mats") at postgres.c:1926
#25 0x18bab0 in DoBackend (port=0x4606b0) at postmaster.c:2243
#26 0x18af48 in BackendStartup (port=0x4606b0) at postmaster.c:1874
#27 0x189548 in ServerLoop () at postmaster.c:995
#28 0x188d18 in PostmasterMain (argc=1, argv=0x447db0) at postmaster.c:771
#29 0x143ebc in main (argc=1, argv=0xffbefacc) at main.c:206
(gdb) up
#1 0x21844c in convert_string_datum (value=5251848, typid=1043)
at selfuncs.c:2059
2059 pfree(val);
(gdb) print val
$1 = 0x4b7878 "D1BFD67F71192ECE"
(gdb) print xfrmstr
$2 = 0x4b78d8 "\001R\0014\001P\001T\001R\0019\001:\001T\001:\0014\0014\001<\0015\001S\001Q\001S\001\001\001S\001Q\001S\0015\001<\0014\0014\001:\001T\001:\0019\001R\001T\001P\0014\001R\001\001\001R\0014\001P\001T\001R\0019\001:\001T\001:\0014\0014\001<\0015\001S\001Q\001S\001\001"
(gdb) print xfrmsize
$3 = 48
(gdb) print xfrmlen
$4 = 102
(gdb) print *(varattrib *)(value)
$5 = {va_header = 20, va_content = {va_compressed = {va_rawsize = 1144078918,
va_data = "D"}, va_external = {va_rawsize = 1144078918,
va_extsize = 1144403782, va_valueid = 925970745,
va_toastrelid = 843400005}, va_data = "D"}}
(gdb) print (char *)((varattrib *)(value))->va_content.va_data
$6 = 0x50230c "D1BFD67F71192ECE~", '\177' <repeats 183 times>...
(gdb) list
2054 /* Oops, didn't make it */
2055 pfree(xfrmstr);
2056 xfrmstr = (char *) palloc(xfrmlen + 1);
2057 xfrmlen = strxfrm(xfrmstr, val, xfrmlen + 1);
2058 }
2059 pfree(val);
2060 val = xfrmstr;
2061 #endif
2062
2063 return (unsigned char *) val;
(gdb) down
#0 0x269bd0 in pfree (pointer=0x4b7878) at mcxt.c:446
446 AssertArg(MemoryContextIsValid(header->context));
(gdb) print header
$7 = (StandardChunkHeader *) 0x4b7868
(gdb) print *header
$8 = {context = 0x15246b8, size = 32, requested_size = 17}
(gdb)

Please let me know if there is more info I can get out of
gdb to track this down.

_
Mats Lofkvist
mal@algonet.se

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mats Lofkvist (#1)
Re: 7.2.1 backend crash (convert_string_datum, locale)

Mats Lofkvist <mal@algonet.se> writes:

When testing postgres 7.2.1 on a sparc/solaris8 box with
--enable-locale --enable-multibyte I get a crash in
convert_string_datum.

This smells like a problem that we chased down awhile back, that
snprintf on Solaris is broken (it will write past the end of the
specified buffer length, thus corrupting adjacent data).

Andrew, I think that was your test case we found it on. Do you
recall if a fix is available from Sun?

regards, tom lane

#3Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: 7.2.1 backend crash (convert_string_datum, locale)

Tom Lane wrote:

Mats Lofkvist <mal@algonet.se> writes:

When testing postgres 7.2.1 on a sparc/solaris8 box with
--enable-locale --enable-multibyte I get a crash in
convert_string_datum.

This smells like a problem that we chased down awhile back, that
snprintf on Solaris is broken (it will write past the end of the
specified buffer length, thus corrupting adjacent data).

Andrew, I think that was your test case we found it on. Do you
recall if a fix is available from Sun?

Yes, I remember this too. It was specifically multibyte-related.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#4Andrew Sullivan
andrew@libertyrms.info
In reply to: Tom Lane (#2)
Re: 7.2.1 backend crash (convert_string_datum, locale)

On Thu, Jul 11, 2002 at 11:15:42PM -0400, Tom Lane wrote:

Mats Lofkvist <mal@algonet.se> writes:

When testing postgres 7.2.1 on a sparc/solaris8 box with
--enable-locale --enable-multibyte I get a crash in
convert_string_datum.

This smells like a problem that we chased down awhile back, that
snprintf on Solaris is broken (it will write past the end of the
specified buffer length, thus corrupting adjacent data).

It does indeed. This was only the 64-bit library, though, or at
least as far as we were able to tell. And I wasn't able to turn up
any evidence that it happened on Solaris 8. But it might. We don't
use 8, at least not yet.

Andrew, I think that was your test case we found it on. Do you
recall if a fix is available from Sun?

Not as far as I know, at least for 7. Come to think of it, I now
_do_ recall seeing something in my various Google wanderings which
suggested that there is a fix in one of the patch packages for
Solaris 8 (which suggests the buggy library is in the basic Solaris 8
install). I dimly recall some mention of incompatibility between it
and some other patchlevel, as well, so it might require some digging.
(Given that it's really a bounds mistake in a system library, you'd
think that it'd be easier to find more information about it; I
actually learned almost everything I know about the problem from,
IIRC, the autoconf web pages, so I'd not expect a cursory search of
Sun's site to turn anything up.)

In the FAQ_Solaris, there is a suggestion to use the substitute
function included in the Postgres tree (which is what you suggested,
Tom, and what I did), as well as instructions on how to do it. It
definitely works for me on Solaris 7. Might be worth trying on 8 as
well. If so, the FAQ should be updated so as not to limit the
discussion to Solaris 7 and earlier.

Sorry I can't be more help than this.

A

-- 
----
Andrew Sullivan                               87 Mowat Avenue 
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110
#5Mats Lofkvist
mal@algonet.se
In reply to: Andrew Sullivan (#4)
Re: 7.2.1 backend crash (convert_string_datum, locale)

andrew@libertyrms.info (Andrew Sullivan) writes:

On Thu, Jul 11, 2002 at 11:15:42PM -0400, Tom Lane wrote:

Mats Lofkvist <mal@algonet.se> writes:

When testing postgres 7.2.1 on a sparc/solaris8 box with
--enable-locale --enable-multibyte I get a crash in
convert_string_datum.

This smells like a problem that we chased down awhile back, that
snprintf on Solaris is broken (it will write past the end of the
specified buffer length, thus corrupting adjacent data).

It does indeed. This was only the 64-bit library, though, or at
least as far as we were able to tell. And I wasn't able to turn up
any evidence that it happened on Solaris 8. But it might. We don't
use 8, at least not yet.

Andrew, I think that was your test case we found it on. Do you
recall if a fix is available from Sun?

Not as far as I know, at least for 7. Come to think of it, I now
_do_ recall seeing something in my various Google wanderings which
suggested that there is a fix in one of the patch packages for
Solaris 8 (which suggests the buggy library is in the basic Solaris 8
install). I dimly recall some mention of incompatibility between it
and some other patchlevel, as well, so it might require some digging.
(Given that it's really a bounds mistake in a system library, you'd
think that it'd be easier to find more information about it; I
actually learned almost everything I know about the problem from,
IIRC, the autoconf web pages, so I'd not expect a cursory search of
Sun's site to turn anything up.)

In the FAQ_Solaris, there is a suggestion to use the substitute
function included in the Postgres tree (which is what you suggested,
Tom, and what I did), as well as instructions on how to do it. It
definitely works for me on Solaris 7. Might be worth trying on 8 as
well. If so, the FAQ should be updated so as not to limit the
discussion to Solaris 7 and earlier.

I didn't get it to work with the stuff in FAQ_Solaris (can't
guarantee I really got snprintf substituted though, just
followed the instructions and recompiled).

Removing --enable-multibyte didn't help either.

Without neither --enable-locale or --enable-multibyte it
seems to work, but as I had to create a new database when
removing locale any problems local to the first database
are not seen anymore.

Is postgres 8-bit clean without locale support enabled?
(I don't care about sort orders and such, only need to
read/write 8-bit chars via jdbc).

_
Mats Lofkvist
mal@algonet.se

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mats Lofkvist (#5)
Re: 7.2.1 backend crash (convert_string_datum, locale)

Mats Lofkvist <mal@algonet.se> writes:

Without neither --enable-locale or --enable-multibyte it
seems to work, but as I had to create a new database when
removing locale any problems local to the first database
are not seen anymore.

Hm. If the database is already corrupt then simply recompiling
a corrected binary isn't going to magically make things perfect.
Maybe you should retry the snprintf patch and/or --enable-multibyte
using fresh databases.

Is postgres 8-bit clean without locale support enabled?
(I don't care about sort orders and such, only need to
read/write 8-bit chars via jdbc).

In that case you don't really need locale, no. Not sure about
whether you need multibyte; does JDBC expect Unicode support?

regards, tom lane