cvs head initdb hangs on unixware
Hi all,
cvs head configured without --enable-debug hang in initdb while making
check.
warthog doesn't exhibit it because it's configured with debug.
when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for
it while creating template db.
According to truss, the last usefull thing postmaster does is writing 8K
zeroes to disk.
If someone needs an access to a unixware machine, let me know.
regards,
--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)
Could you generate a core and send a stacktrace?
kill SIGABRT <pid> should do that.
Zdenek
ohp@pyrenet.fr napsal(a):
Show quoted text
Hi all,
cvs head configured without --enable-debug hang in initdb while making
check.warthog doesn't exhibit it because it's configured with debug.
when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for
it while creating template db.According to truss, the last usefull thing postmaster does is writing 8K
zeroes to disk.If someone needs an access to a unixware machine, let me know.
regards,
On Tue, 2 Dec 2008, Zdenek Kotala wrote:
Date: Tue, 02 Dec 2008 17:22:25 +0100
From: Zdenek Kotala <Zdenek.Kotala@Sun.COM>
To: ohp@pyrenet.fr
Cc: pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixwareCould you generate a core and send a stacktrace?
kill SIGABRT <pid> should do that.
Zdenek
Hmm. No point doing it, it's not debug enabled, I'm afraid stack trace
won't show us anything usefull.
ohp@pyrenet.fr napsal(a):
Hi all,
cvs head configured without --enable-debug hang in initdb while making
check.warthog doesn't exhibit it because it's configured with debug.
when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for it
while creating template db.According to truss, the last usefull thing postmaster does is writing 8K
zeroes to disk.If someone needs an access to a unixware machine, let me know.
regards,
--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)
On Tue, 2 Dec 2008, Zdenek Kotala wrote:
Date: Tue, 02 Dec 2008 17:22:25 +0100
From: Zdenek Kotala <Zdenek.Kotala@Sun.COM>
To: ohp@pyrenet.fr
Cc: pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixwareCould you generate a core and send a stacktrace?
kill SIGABRT <pid> should do that.
Zdenek
Zdenek,
On second thought, I tried and got that:
Suivi de pile correspondant � p1, Programme postmaster
*[0] fsm_rebuild_page( pr�sum�: 0xbd9731a0, 0, 0xbd9731a0)
[0x81e6a97]
[1]: fsm_search_avail( pr�sum�: 0x2, 0x6, 0x1) [0x81e68d9]
[2]: fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416, 0xb4) [0x81e6385]
0xb4) [0x81e6385]
[3]: RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00]
[0x81e5a00]
[4]: RelationGetBufferForTuple( pr�sum�: 0x84b2250, 0xb4, 0) [0x8099b59]
[0x8099b59]
[5]: heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
[6]: simple_heap_insert( pr�sum�: 0x84b2250, 0x853a338, 0x853a310) [0x8097297]
[0x8097297]
[7]: InsertOneTuple( pr�sum�: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
[0x80cb210]
[8]: boot_yyparse( pr�sum�: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
[9]: BootstrapModeMain( pr�sum�: 0x66, 0x8454600, 0x4) [0x80ca233]
[10]: AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b]
[11]: main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce]
[12]: _start() [0x807ff96]
seems interesting!
We've had problems already with unixware optimizer, hope this one is
fixable!
regards
ohp@pyrenet.fr napsal(a):
Hi all,
cvs head configured without --enable-debug hang in initdb while making
check.warthog doesn't exhibit it because it's configured with debug.
when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for it
while creating template db.According to truss, the last usefull thing postmaster does is writing 8K
zeroes to disk.If someone needs an access to a unixware machine, let me know.
regards,
--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)
From pgsql-hackers-owner@postgresql.org Tue Dec 2 13:46:51 2008
Received: from localhost (unknown [200.46.204.183])
by mail.postgresql.org (Postfix) with ESMTP id ED83C64FE0F
for <pgsql-hackers-postgresql.org@mail.postgresql.org>; Tue, 2 Dec 2008 13:46:50 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
with ESMTP id 83332-01
for <pgsql-hackers-postgresql.org@mail.postgresql.org>;
Tue, 2 Dec 2008 13:46:48 -0400 (AST)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.233])
by mail.postgresql.org (Postfix) with ESMTP id 0557464FD9F
for <pgsql-hackers@postgresql.org>; Tue, 2 Dec 2008 13:46:47 -0400 (AST)
Received: by rv-out-0506.google.com with SMTP id b25so2998730rvf.43
for <pgsql-hackers@postgresql.org>; Tue, 02 Dec 2008 09:46:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=domainkey-signature:received:received:message-id:date:from:to
:subject:cc:in-reply-to:mime-version:content-type
:content-transfer-encoding:content-disposition:references;
bh=aKIdYuz7B/SybfXN4yCNWHRV9RMbF3h1248u3XyI3cg=;
b=nzKv5HinM1zE5rJCm0fWGnb/OtP25JOLx7HcHoehFO5j5VNgyjuEXEcfwbQoQQNBBQ
fLZmY0jUzjAT+YH4C+j0nN23kbCsiEgLWFqu+LTnTUgSTfNQwdA4QjM5cvRwC/tQnWdG
VchslhVbBRHXzQ3uBB/qjDO3Vn3jGT9nD+muA=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=gamma;
h=message-id:date:from:to:subject:cc:in-reply-to:mime-version
:content-type:content-transfer-encoding:content-disposition
:references;
b=IxCKiF6Y4QgkUmSn1EAHTJibriYXjrGEpTFqWn8fWDgWVKMB8dazpIZYd5kH8/1BiF
c3+TGGrAHRTmzFow7DKTDxPMQDtVKbOkMOmnhWUO0rlq56a5rsWS03hqcbffz8OGdr7E
emB+yILNyH4LXHGseQUyW/IYSClgk+CE0jFHM=
Received: by 10.141.212.5 with SMTP id o5mr5852879rvq.247.1228240006866;
Tue, 02 Dec 2008 09:46:46 -0800 (PST)
Received: by 10.141.189.10 with HTTP; Tue, 2 Dec 2008 09:46:46 -0800 (PST)
Message-ID: <e08cc0400812020946i7c4c2afxf24a45e5a37c153@mail.gmail.com>
Date: Wed, 3 Dec 2008 02:46:46 +0900
From: "Hitoshi Harada" <umi.tanuki@gmail.com>
To: "Heikki Linnakangas" <heikki.linnakangas@enterprisedb.com>
Subject: Re: Windowing Function Patch Review -> Standard Conformance
Cc: "David Rowley" <dgrowley@gmail.com>, pgsql-hackers@postgresql.org
In-Reply-To: <492D3356.2070705@enterprisedb.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <9E276C7F44A4410D969D25BEDDC2E7FE@amd64>
<e08cc0400811232348v1ad4d192tf4c9967705bca5fe@mail.gmail.com>
<492A8E4B.4050409@enterprisedb.com>
<e08cc0400811240541p296f051v9f3298b821e23e0@mail.gmail.com>
<492AEBB8.8030609@enterprisedb.com>
<e08cc0400811242046v4b368eebx3a18995e92e3538@mail.gmail.com>
<e08cc0400811252203o46e2e859y29104c6732394395@mail.gmail.com>
<492D3356.2070705@enterprisedb.com>
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=0 tagged_above=0 required=5 tests=none
X-Spam-Level:
X-Archive-Number: 200812/85
X-Sequence-Number: 128714
2008/11/26 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
Hitoshi Harada wrote:
I read more, and your spooling approach seems flexible for both now
and the furture. Looking at only current release, the frame with ORDER
BY is done by detecting peers in WinFrameGetArg() and add row number
of peers to winobj->currentpos. Actually if we have capability to
spool all rows we need on demand, the frame would be only a boundary
problem.Yeah, we could do that. I'm afraid it would be pretty slow, though, if
there's a lot of peers. That could probably be alleviated with some sort of
caching, though.
I added code for this issue. See
http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=blobdiff;f=src/backend/executor/nodeWindow.c;h=f2144bf73a94829cd7a306c28064fa5454f8d369;hp=50a6d6ca4a26cd4854c445364395ed183b61f831;hb=895f1e615352dfc733643a701d1da3de7f91344b;hpb=843e34f341f0e824fd2cc0f909079ad943e3815b
This process is very similar to your aggregatedupto in window
aggregate, so they might be shared as general "the way to detect frame
boundary", aren't they?
I am randomly trying some issues instead of agg common code (which I
now doubt if it's worth sharing the code), so tell me if you're
restarting your hack again. I'll send the whole patch.
Regards,
--
Hitoshi Harada
ohp@pyrenet.fr wrote:
Suivi de pile correspondant � p1, Programme postmaster
*[0] fsm_rebuild_page( pr�sum�: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
[1] fsm_search_avail( pr�sum�: 0x2, 0x6, 0x1) [0x81e68d9]
[2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e,
0x8047416, 0xb4) [0x81e6385]
[3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00]
[4] RelationGetBufferForTuple( pr�sum�: 0x84b2250, 0xb4, 0) [0x8099b59]
[5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
[6] simple_heap_insert( pr�sum�: 0x84b2250, 0x853a338, 0x853a310)
[0x8097297]
[7] InsertOneTuple( pr�sum�: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
[8] boot_yyparse( pr�sum�: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
[9] BootstrapModeMain( pr�sum�: 0x66, 0x8454600, 0x4) [0x80ca233]
[10] AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b]
[11] main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce]
[12] _start() [0x807ff96]seems interesting!
We've had problems already with unixware optimizer, hope this one is
fixable!
Looking at fsm_rebuild_page, I wonder if the compiler is treating "int"
as an unsigned integer? That would cause an infinite loop.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Tue, 2 Dec 2008, Heikki Linnakangas wrote:
Date: Tue, 02 Dec 2008 20:47:19 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
To: ohp@pyrenet.fr
Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixwareohp@pyrenet.fr wrote:
Suivi de pile correspondant � p1, Programme postmaster
*[0] fsm_rebuild_page( pr�sum�: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
[1] fsm_search_avail( pr�sum�: 0x2, 0x6, 0x1) [0x81e68d9]
[2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416,
0xb4) [0x81e6385]
[3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00]
[4] RelationGetBufferForTuple( pr�sum�: 0x84b2250, 0xb4, 0) [0x8099b59]
[5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
[6] simple_heap_insert( pr�sum�: 0x84b2250, 0x853a338, 0x853a310)
[0x8097297]
[7] InsertOneTuple( pr�sum�: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
[8] boot_yyparse( pr�sum�: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
[9] BootstrapModeMain( pr�sum�: 0x66, 0x8454600, 0x4) [0x80ca233]
[10] AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b]
[11] main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce]
[12] _start() [0x807ff96]seems interesting!
We've had problems already with unixware optimizer, hope this one is
fixable!Looking at fsm_rebuild_page, I wonder if the compiler is treating "int" as an
unsigned integer? That would cause an infinite loop.
No, a simple printf of nodeno shows it starting at 4096 all the way down
to 0, starting back at 4096...
I wonder if leftchild/rightchild definitions has something to do with
it...
--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)
From pgsql-hackers-owner@postgresql.org Wed Dec 3 09:23:34 2008
Received: from localhost (unknown [200.46.204.183])
by mail.postgresql.org (Postfix) with ESMTP id A2EDE650014
for <pgsql-hackers-postgresql.org@mail.postgresql.org>; Wed, 3 Dec 2008 09:23:33 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
by localhost (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
with ESMTP id 87376-09
for <pgsql-hackers-postgresql.org@mail.postgresql.org>;
Wed, 3 Dec 2008 09:23:31 -0400 (AST)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.169])
by mail.postgresql.org (Postfix) with ESMTP id 5948264FEBD
for <pgsql-hackers@postgresql.org>; Wed, 3 Dec 2008 09:23:29 -0400 (AST)
Received: by ug-out-1314.google.com with SMTP id k40so3309484ugc.7
for <pgsql-hackers@postgresql.org>; Wed, 03 Dec 2008 05:23:28 -0800 (PST)
Received: by 10.210.52.15 with SMTP id z15mr15406978ebz.19.1228310607851;
Wed, 03 Dec 2008 05:23:27 -0800 (PST)
Received: from ?80.223.223.193? (dsl-hkibrasgw2-fedfdf00-193.dhcp.inet.fi [80.223.223.193])
by mx.google.com with ESMTPS id h6sm35338289nfh.21.2008.12.03.05.23.24
(version=TLSv1/SSLv3 cipher=RC4-MD5);
Wed, 03 Dec 2008 05:23:25 -0800 (PST)
Message-ID: <4936884B.6050205@enterprisedb.com>
Date: Wed, 03 Dec 2008 15:23:23 +0200
Organization: EnterpriseDB
User-Agent: Mozilla-Thunderbird 2.0.0.17 (X11/20081018)
MIME-Version: 1.0
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
CC: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: Visibility map, partial vacuums
References: <4905AE17.7090305@enterprisedb.com> <491D376B.9000608@enterprisedb.com> <491D7F52.6070908@enterprisedb.com> <4925664C.3090605@enterprisedb.com> <26361.1227467112@sss.pgh.pa.us> <492A6032.6080000@enterprisedb.com> <18086.1227537479@sss.pgh.pa.us> <492D4460.1000809@enterprisedb.com> <5856.1227705135@sss.pgh.pa.us> <492EF88F.9050709@enterprisedb.com>
In-Reply-To: <492EF88F.9050709@enterprisedb.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=0 tagged_above=0 required=5 tests=none
X-Spam-Level:
X-Archive-Number: 200812/147
X-Sequence-Number: 128776
Heikki Linnakangas wrote:
Here's an updated version, with a lot of smaller cleanups, and using
relcache invalidation to notify other backends when the visibility map
fork is extended. I already committed the change to FSM to do the same.
I'm feeling quite satisfied to commit this patch early next week.
Committed.
I haven't done any doc changes for this yet. I think a short section in
the "database internal storage" chapter is probably in order, and the
fact that plain VACUUM skips pages should be mentioned somewhere. I'll
skim through references to vacuum and see what needs to be changed.
Hmm. It just occurred to me that I think this circumvented the
anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid
anymore. We'll need to disable the skipping when autovacuum is triggered
to prevent wraparound. VACUUM FREEZE does that already, but it's
unnecessarily aggressive in freezing.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
ohp@pyrenet.fr wrote:
Looking at fsm_rebuild_page, I wonder if the compiler is treating
"int" as an unsigned integer? That would cause an infinite loop.No, a simple printf of nodeno shows it starting at 4096 all the way
down to 0, starting back at 4096...I wonder if leftchild/rightchild definitions has something to do with
it...
With probably no relevance at all, I notice that this routine is
declared extern, although it is only referenced in its own file
apparently. Don't we have a tool that checks that?
cheers
andrew
ohp@pyrenet.fr wrote:
On Tue, 2 Dec 2008, Heikki Linnakangas wrote:
Date: Tue, 02 Dec 2008 20:47:19 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
To: ohp@pyrenet.fr
Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixwareohp@pyrenet.fr wrote:
Suivi de pile correspondant � p1, Programme postmaster
*[0] fsm_rebuild_page( pr�sum�: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
[1] fsm_search_avail( pr�sum�: 0x2, 0x6, 0x1) [0x81e68d9]
[2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e,
0x8047416, 0xb4) [0x81e6385]
[3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4)
[0x81e5a00]
[4] RelationGetBufferForTuple( pr�sum�: 0x84b2250, 0xb4, 0) [0x8099b59]
[5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
[6] simple_heap_insert( pr�sum�: 0x84b2250, 0x853a338, 0x853a310)
[0x8097297]
[7] InsertOneTuple( pr�sum�: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
[8] boot_yyparse( pr�sum�: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
[9] BootstrapModeMain( pr�sum�: 0x66, 0x8454600, 0x4) [0x80ca233]
[10] AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b]
[11] main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce]
[12] _start() [0x807ff96]seems interesting!
We've had problems already with unixware optimizer, hope this one is
fixable!Looking at fsm_rebuild_page, I wonder if the compiler is treating
"int" as an unsigned integer? That would cause an infinite loop.No, a simple printf of nodeno shows it starting at 4096 all the way
down to 0, starting back at 4096...
Hmm, it's probably looping in fsm_search_avail then. In a fresh cluster,
there shouldn't be any broken FSM pages that need rebuilding.
I'd like to see what the FSM page in question looks like. Could you try
to run initdb with "-d -n" options? I bet you'll get an infinite number
of lines like:
DEBUG: fixing corrupt FSM block 1, relation 123/456/789
Could you zip up the FSM file of that relation (a file called e.g
"789_fsm"), and send it over? Or the whole data directory, it shouldn't
be that big.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Andrew Dunstan wrote:
ohp@pyrenet.fr wrote:
Looking at fsm_rebuild_page, I wonder if the compiler is treating
"int" as an unsigned integer? That would cause an infinite loop.No, a simple printf of nodeno shows it starting at 4096 all the way
down to 0, starting back at 4096...I wonder if leftchild/rightchild definitions has something to do with
it...With probably no relevance at all, I notice that this routine is
declared extern, although it is only referenced in its own file
apparently. Don't we have a tool that checks that?
Sure, src/tools/find_static.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Wed, 3 Dec 2008, Heikki Linnakangas wrote:
Date: Wed, 03 Dec 2008 20:29:01 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
To: ohp@pyrenet.fr
Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixwareohp@pyrenet.fr wrote:
On Tue, 2 Dec 2008, Heikki Linnakangas wrote:
Date: Tue, 02 Dec 2008 20:47:19 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
To: ohp@pyrenet.fr
Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixwareohp@pyrenet.fr wrote:
Suivi de pile correspondant � p1, Programme postmaster
*[0] fsm_rebuild_page( pr�sum�: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97]
[1] fsm_search_avail( pr�sum�: 0x2, 0x6, 0x1) [0x81e68d9]
[2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416,
0xb4) [0x81e6385]
[3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4)
[0x81e5a00]
[4] RelationGetBufferForTuple( pr�sum�: 0x84b2250, 0xb4, 0) [0x8099b59]
[5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042]
[6] simple_heap_insert( pr�sum�: 0x84b2250, 0x853a338, 0x853a310)
[0x8097297]
[7] InsertOneTuple( pr�sum�: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210]
[8] boot_yyparse( pr�sum�: 0xffffffff, 0x3, 0x8047ab8) [0x80c822b]
[9] BootstrapModeMain( pr�sum�: 0x66, 0x8454600, 0x4) [0x80ca233]
[10] AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b]
[11] main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce]
[12] _start() [0x807ff96]seems interesting!
We've had problems already with unixware optimizer, hope this one is
fixable!Looking at fsm_rebuild_page, I wonder if the compiler is treating "int" as
an unsigned integer? That would cause an infinite loop.No, a simple printf of nodeno shows it starting at 4096 all the way down
to 0, starting back at 4096...Hmm, it's probably looping in fsm_search_avail then. In a fresh cluster,
there shouldn't be any broken FSM pages that need rebuilding.
You're right!
I'd like to see what the FSM page in question looks like. Could you try to
run initdb with "-d -n" options? I bet you'll get an infinite number of lines
like:DEBUG: fixing corrupt FSM block 1, relation 123/456/789
right again!
DEBUG: fixing corrupt FSM block 2, relation 1663/1/1255
Could you zip up the FSM file of that relation (a file called e.g
"789_fsm"), and send it over? Or the whole data directory, it shouldn't be
that big.
you get both.
BTW, this is an optimizer problem, not anything wrong with the code, but
I'd hate to have a -g compiled postmaster in prod :)
best regards,
--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)
ohp@pyrenet.fr wrote:
On Wed, 3 Dec 2008, Heikki Linnakangas wrote:
Could you zip up the FSM file of that relation (a file called e.g
"789_fsm"), and send it over? Or the whole data directory, it
shouldn't be that big.you get both.
Thanks. Hmm, the FSM pages are full of zeros, as I would expect for a
just-created relation. fsm_search_avail should've returned quickly at
the top of the function in that case. Can you put a extra printf or
something at the top of the function, to print all the arguments? And
the value of fsmpage->fp_nodes[0].
BTW, this is an optimizer problem, not anything wrong with the code, but
I'd hate to have a -g compiled postmaster in prod :)
Yes, so it seems, although I wouldn't be surprised if it turns out to be
a bug in the new FSM code either..
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Thu, 4 Dec 2008, Heikki Linnakangas wrote:
Date: Thu, 04 Dec 2008 13:19:15 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
To: ohp@pyrenet.fr
Cc: Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixwareohp@pyrenet.fr wrote:
On Wed, 3 Dec 2008, Heikki Linnakangas wrote:
Could you zip up the FSM file of that relation (a file called e.g
"789_fsm"), and send it over? Or the whole data directory, it shouldn't be
that big.you get both.
Thanks. Hmm, the FSM pages are full of zeros, as I would expect for a
just-created relation. fsm_search_avail should've returned quickly at the top
of the function in that case. Can you put a extra printf or something at the
top of the function, to print all the arguments? And the value of
fsmpage->fp_nodes[0].BTW, this is an optimizer problem, not anything wrong with the code, but
I'd hate to have a -g compiled postmaster in prod :)Yes, so it seems, although I wouldn't be surprised if it turns out to be a
bug in the new FSM code either..
As you can see in attached initdb.log, it seems fsm_search_avail is called
repeatedly and args are sort of looping...
--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)
Attachments:
initdb.logtext/plain; charset=US-ASCII; name=initdb.logDownload
ohp@pyrenet.fr writes:
As you can see in attached initdb.log, it seems fsm_search_avail is called
repeatedly and args are sort of looping...
That's expected, since the system is inserting a lot of tuples
successively. What it looks like to me is that the failing call is the
first one where the initial test *doesn't* result in falling out
immediately. So the probability is that there's something wrong with
the code that descends the tree.
Note that the all-zeroes pages in your dump are uninformative because
none of the real FSM data has been written to disk yet. We can see
from this trace that the code is dealing with not-all-zero pages.
regards, tom lane
Tom Lane wrote:
ohp@pyrenet.fr writes:
As you can see in attached initdb.log, it seems fsm_search_avail is called
repeatedly and args are sort of looping...That's expected, since the system is inserting a lot of tuples
successively.
Right. I suspect it was in the infinite loop yet. Try to run it for
*much* longer (it'll probably take much longer than usual because it's
printing all the debug stuff), until it gets stuck looping over the same
pages in same relation.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Dear all,
On Mon, 8 Dec 2008, Heikki Linnakangas wrote:
Date: Mon, 08 Dec 2008 09:17:52 +0200
From: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: ohp@pyrenet.fr, Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixwareTom Lane wrote:
ohp@pyrenet.fr writes:
As you can see in attached initdb.log, it seems fsm_search_avail is called
repeatedly and args are sort of looping...That's expected, since the system is inserting a lot of tuples
successively.Right. I suspect it was in the infinite loop yet. Try to run it for *much*
longer (it'll probably take much longer than usual because it's printing all
the debug stuff), until it gets stuck looping over the same pages in same
relation.
the infinite loop occurs in fsm_search_avail when called for the 32nd
time.
It loops between restart: and goto restart
the long (95M) initdb.log can be found at
ftp://ftp.pyrenet.fr/private/initdb.log
regards,
--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)
ohp@pyrenet.fr writes:
the infinite loop occurs in fsm_search_avail when called for the 32nd
time.
... which is the first time that the initial test doesn't make it fall
out immediately.
Would you add a couple more printouts, along the line of
nodeno = target;
while (nodeno > 0)
{
+ fprintf(stderr, "ascend at node %d value %d\n",
+ nodeno, fsmpage->fp_nodes[nodeno]);
if (fsmpage->fp_nodes[nodeno] >= minvalue)
break;
/*
* Move to the right, wrapping around on same level if necessary,
* then climb up.
*/
nodeno = parentof(rightneighbor(nodeno));
}
/*
* We're now at a node with enough free space, somewhere in the middle of
* the tree. Descend to the bottom, following a path with enough free
* space, preferring to move left if there's a choice.
*/
while (nodeno < NonLeafNodesPerPage)
{
int leftnodeno = leftchild(nodeno);
int rightnodeno = leftnodeno + 1;
bool leftok = (leftnodeno < NodesPerPage) &&
(fsmpage->fp_nodes[leftnodeno] >= minvalue);
bool rightok = (rightnodeno < NodesPerPage) &&
(fsmpage->fp_nodes[rightnodeno] >= minvalue);
+ fprintf(stderr, "descend at node %d value %d, leftnode %d value %d, rightnode %d value %d\n",
+ nodeno, fsmpage->fp_nodes[nodeno],
+ leftnodeno, fsmpage->fp_nodes[leftnodeno],
+ rightnodeno, fsmpage->fp_nodes[rightnodeno]);
if (leftok)
nodeno = leftnodeno;
else if (rightok)
nodeno = rightnodeno;
else
(I'm assuming we can print possibly-off-the-end array elements without dumping
core; which is bogus in general but I expect we can get away with it
for this purpose.)
Also, we don't really need 94MB of log to convince us it's an
infinite loop ;-)
regards, tom lane
Hi Tom,
On Mon, 8 Dec 2008, Tom Lane wrote:
Date: Mon, 08 Dec 2008 13:15:28 -0500
From: Tom Lane <tgl@sss.pgh.pa.us>
To: ohp@pyrenet.fr
Cc: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>,
Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixwareohp@pyrenet.fr writes:
the infinite loop occurs in fsm_search_avail when called for the 32nd
time.... which is the first time that the initial test doesn't make it fall
out immediately.Would you add a couple more printouts, along the line of
nodeno = target; while (nodeno > 0) { + fprintf(stderr, "ascend at node %d value %d\n", + nodeno, fsmpage->fp_nodes[nodeno]);if (fsmpage->fp_nodes[nodeno] >= minvalue)
break;/*
* Move to the right, wrapping around on same level if necessary,
* then climb up.
*/
nodeno = parentof(rightneighbor(nodeno));
}/*
* We're now at a node with enough free space, somewhere in the middle of
* the tree. Descend to the bottom, following a path with enough free
* space, preferring to move left if there's a choice.
*/
while (nodeno < NonLeafNodesPerPage)
{
int leftnodeno = leftchild(nodeno);
int rightnodeno = leftnodeno + 1;
bool leftok = (leftnodeno < NodesPerPage) &&
(fsmpage->fp_nodes[leftnodeno] >= minvalue);
bool rightok = (rightnodeno < NodesPerPage) &&
(fsmpage->fp_nodes[rightnodeno] >= minvalue);+ fprintf(stderr, "descend at node %d value %d, leftnode %d value %d, rightnode %d value %d\n", + nodeno, fsmpage->fp_nodes[nodeno], + leftnodeno, fsmpage->fp_nodes[leftnodeno], + rightnodeno, fsmpage->fp_nodes[rightnodeno]);if (leftok)
nodeno = leftnodeno;
else if (rightok)
nodeno = rightnodeno;
else(I'm assuming we can print possibly-off-the-end array elements without dumping
core; which is bogus in general but I expect we can get away with it
for this purpose.)Also, we don't really need 94MB of log to convince us it's an
infinite loop ;-)
oops, sorry
regards, tom lane
I first misread your mail, and added only the first fprintf , while I was
uploading a 400M initdb.log, I went back to add the second one.
Guess what! with the fprintf .. descending node... in place, everything
goes well. The optimizer definitly does something weird along the
definition/assignement of leftok/rightok..
--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)
ohp@pyrenet.fr napsal(a):
I first misread your mail, and added only the first fprintf , while I
was uploading a 400M initdb.log, I went back to add the second one.Guess what! with the fprintf .. descending node... in place, everything
goes well. The optimizer definitly does something weird along the
definition/assignement of leftok/rightok..
Could you generate assembler code with and without optimization of fsmSearch
function? Of course without extra printf :-). It should show difference.
Zdenek
ohp@pyrenet.fr writes:
Guess what! with the fprintf .. descending node... in place, everything
goes well. The optimizer definitly does something weird along the
definition/assignement of leftok/rightok..
Hmm, so the problem is in that second loop. The trick is to pick some
reasonably non-ugly code change that makes the problem go away.
The first thing I'd try is to get rid of the overly cute optimization
int rightnodeno = leftnodeno + 1;
and make it just read
int rightnodeno = rightchild(nodeno);
If that doesn't work, we might try refactoring the code enough to get
rid of the goto, but that looks a little bit tedious.
regards, tom lane
On Tue, 9 Dec 2008, Tom Lane wrote:
Date: Tue, 09 Dec 2008 09:23:06 -0500
From: Tom Lane <tgl@sss.pgh.pa.us>
To: ohp@pyrenet.fr
Cc: Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>,
Zdenek Kotala <Zdenek.Kotala@Sun.COM>,
pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] cvs head initdb hangs on unixwareohp@pyrenet.fr writes:
Guess what! with the fprintf .. descending node... in place, everything
goes well. The optimizer definitly does something weird along the
definition/assignement of leftok/rightok..Hmm, so the problem is in that second loop. The trick is to pick some
reasonably non-ugly code change that makes the problem go away.The first thing I'd try is to get rid of the overly cute optimization
int rightnodeno = leftnodeno + 1;
and make it just read
int rightnodeno = rightchild(nodeno);
If that doesn't work, we might try refactoring the code enough to get
rid of the goto, but that looks a little bit tedious.regards, tom lane
I tried that and moving leftok,rightok declaration outside the loop, and
refactor the assignement code of leftok, rightok . nothing worked!
Regards,
--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)