GiST concurrency commited

Started by Teodor Sigaevover 20 years ago26 messages
#1Teodor Sigaev
teodor@sigaev.ru

Have we list named something like 'test focusing for 8.1'? If it exists then
GiST concurrency and recovery testing should be added to it. Especially,
recovery after crash. Of course, now Oleg and me going to begin a large test
program.

While I'm running test with concurrent select/insert/update/delete/vacuum/vacuum
full I found, that sometimes postgres crashes in index_beginscan_internal on
FunctionCall3, because structure 'procedure' becomes zeroed. As I understand,
LockRelation can invalidate part of Relation structure. So, I moved
GET_REL_PROCEDURE after LockRelation. It seems to me, this patch should be
backpatched or it's needed another fixing. This problem was 2-4 times per
million statements executing by 4 flows.

And there is one more problem: it caused approximatly one time per 2-4 million
statements, I got traps:
TRAP: FailedAssertion("!((*curpage)->offsets_used == num_tuples)", File:
"vacuum.c", Line: 2766)
LOG: server process (PID 15847) was terminated by signal 6
Sorry, but I couldn't debug this trap and my knowledge about this piece of code
is very limited. Postgres didn't create a core file. I don't believe this
problem is in touch with my GiST framework, becouse it is about heap pages. I
suspect trap occurs while concurrent vacuum, but I am not sure.

PS
My concurrency testing scripts:
http://www.sigaev.ru/gist/
concur.pl - generator of SQL statements
concur.sh - simple wrapper about concur.pl which reinit db, makes db and table.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#1)
Re: GiST concurrency commited

Teodor Sigaev <teodor@sigaev.ru> writes:

While I'm running test with concurrent
select/insert/update/delete/vacuum/vacuum full I found, that sometimes
postgres crashes in index_beginscan_internal on FunctionCall3, because
structure 'procedure' becomes zeroed. As I understand, LockRelation
can invalidate part of Relation structure. So, I moved
GET_REL_PROCEDURE after LockRelation.

Oooh, good catch.

It seems to me, this patch
should be backpatched or it's needed another fixing.

No, it's not an issue in the back branches, because until recently
GET_REL_PROCEDURE only fetched the function OID.

And there is one more problem: it caused approximatly one time per 2-4 million
statements, I got traps:
TRAP: FailedAssertion("!((*curpage)->offsets_used == num_tuples)", File:
"vacuum.c", Line: 2766)
LOG: server process (PID 15847) was terminated by signal 6

Odd. Will look at it later (after feature freeze), if you don't find
the cause beforehand.

regards, tom lane

#3Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#2)
Re: GiST concurrency commited

I think the whole GiST limitations page can be removed now...

http://developer.postgresql.org/docs/postgres/limitations.html

Chris

#4Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#2)
Re: GiST concurrency commited

And there is one more problem: it caused approximatly one time per 2-4 million
statements, I got traps:
TRAP: FailedAssertion("!((*curpage)->offsets_used == num_tuples)", File:
"vacuum.c", Line: 2766)
LOG: server process (PID 15847) was terminated by signal 6

Odd. Will look at it later (after feature freeze), if you don't find
the cause beforehand.

It's definitly bug in a vaccum code, I got the same trap without any GiST
indexes (to reproduce, just comment out 'create index' command in my script).

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#5Qingqing Zhou
zhouqq@cs.toronto.edu
In reply to: Teodor Sigaev (#1)
Re: GiST concurrency commited

"Teodor Sigaev" <teodor@sigaev.ru> writes

concur.pl - generator of SQL statements

retrieving it is forbidden ...

Regards,
Qingqing

#6Teodor Sigaev
teodor@sigaev.ru
In reply to: Qingqing Zhou (#5)
Re: GiST concurrency commited

Sorry, fixed.

Qingqing Zhou wrote:

"Teodor Sigaev" <teodor@sigaev.ru> writes

concur.pl - generator of SQL statements

retrieving it is forbidden ...

Regards,
Qingqing

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#7Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Christopher Kings-Lynne (#3)
Re: GiST concurrency commited

Christopher Kings-Lynne wrote:

I think the whole GiST limitations page can be removed now...

http://developer.postgresql.org/docs/postgres/limitations.html

Done.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#1)
VACUUM/t_ctid bug (was Re: GiST concurrency commited)

Awhile back, Teodor Sigaev <teodor@sigaev.ru> wrote:

And there is one more problem: it caused approximatly one time per 2-4 million
statements, I got traps:
TRAP: FailedAssertion("!((*curpage)->offsets_used == num_tuples)", File:
"vacuum.c", Line: 2766)
LOG: server process (PID 15847) was terminated by signal 6
Sorry, but I couldn't debug this trap and my knowledge about this piece of code
is very limited. Postgres didn't create a core file. I don't believe this
problem is in touch with my GiST framework, becouse it is about heap pages. I
suspect trap occurs while concurrent vacuum, but I am not sure.

PS
My concurrency testing scripts:
http://www.sigaev.ru/gist/
concur.pl - generator of SQL statements
concur.sh - simple wrapper about concur.pl which reinit db, makes db and table.

I have committed changes that I believe fix this problem:
http://archives.postgresql.org/pgsql-committers/2005-08/msg00213.php
But it needs more testing. Would you update to CVS tip and see if you
still see the failure?

Also, if anyone else has some vacuum + concurrent update test cases,
any testing you can do in CVS tip would be useful. This patch is big
and ugly enough that back-patching it into all the supported back
branches is a pretty scary prospect. I don't think we have a lot of
choice --- it is a data-loss risk --- but we need to beat the heck
out of the CVS-tip version before we start pushing it into the release
branches.

My current intention is to leave it just in CVS tip for the next few
days, and not to start developing back-branch versions until after
we've made the first 8.1 beta release. The back-ports are going to
be painful (the code involved has changed often enough that I fear
each branch will need a custom tailored patch) ... so I really don't
want to start without some confidence that the CVS-tip patch is right.

In other words ... if you can test this ... HELP!!!

regards, tom lane

#9Gavin Sherry
swm@linuxworld.com.au
In reply to: Tom Lane (#8)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

On Sat, 20 Aug 2005, Tom Lane wrote:

I have committed changes that I believe fix this problem:
http://archives.postgresql.org/pgsql-committers/2005-08/msg00213.php
But it needs more testing. Would you update to CVS tip and see if you
still see the failure?

I've written some quick scripts. One just vacuums constantly (999 vacuums
to 1 vacuum full) while three other scripts three randomly insert
into, update and delete from 3 tables. There's a mix of small and large
transactions. The tables have a single int column. It is set up to run 3
million transactions across the 3 scripts.

I will try and jump onto one of the larger OSDL machines to test as well.

Gavin

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gavin Sherry (#9)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

Gavin Sherry <swm@linuxworld.com.au> writes:

I've written some quick scripts. One just vacuums constantly (999 vacuums
to 1 vacuum full) while three other scripts three randomly insert
into, update and delete from 3 tables. There's a mix of small and large
transactions. The tables have a single int column. It is set up to run 3
million transactions across the 3 scripts.

Note that since the issues have mainly to do with update chains, it'd be
good to stress cases where a row is updated multiple times before being
deleted. And use at least one long-running transaction, so that VACUUM
can't just throw away the update chain.

regards, tom lane

#11Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#8)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

In other words ... if you can test this ... HELP!!!

I'll run tests.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#12Gavin Sherry
swm@linuxworld.com.au
In reply to: Tom Lane (#10)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

On Sat, 20 Aug 2005, Tom Lane wrote:

Gavin Sherry <swm@linuxworld.com.au> writes:

I've written some quick scripts. One just vacuums constantly (999 vacuums
to 1 vacuum full) while three other scripts three randomly insert
into, update and delete from 3 tables. There's a mix of small and large
transactions. The tables have a single int column. It is set up to run 3
million transactions across the 3 scripts.

Note that since the issues have mainly to do with update chains, it'd be
good to stress cases where a row is updated multiple times before being
deleted. And use at least one long-running transaction, so that VACUUM
can't just throw away the update chain.

Right.

I modified the test so have multiple updates of a given row mixed with
concurrent long running read transactions. Vacuum was running repeatedly
in a concurrent session. I did not encounter any problems.

However, the results are inconclusive since I ran the same test against
HEAD from 10 days ago and didn't manage to trigger the problem Teodor's
script did. I'll take a better look tomorrow.

Gavin

#13Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#8)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

I have committed changes that I believe fix this problem:
http://archives.postgresql.org/pgsql-committers/2005-08/msg00213.php
But it needs more testing. Would you update to CVS tip and see if you
still see the failure?

It seems, patch works correctly. My tests is passed with approximatly 1e8 SQL
statements without any failure. Tests runs on two boxes: PIII/1133MHz adn Quad
Xeon/500MHz and works with four threads.

Further I'm going to increase concurrency up to 12 parallel threads.

PS GiST passed this tests too.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#14Teodor Sigaev
teodor@sigaev.ru
In reply to: Teodor Sigaev (#13)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

Further I'm going to increase concurrency up to 12 parallel threads.

All is ok, test is passed with approximatly 40 millions statements

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#15Hannu Krosing
hannu@skype.net
In reply to: Teodor Sigaev (#14)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

On K, 2005-08-24 at 15:52 +0400, Teodor Sigaev wrote:

Further I'm going to increase concurrency up to 12 parallel threads.

All is ok, test is passed with approximatly 40 millions statements

I have sent a patch to patches list enabling concurrent vacuums to
actually reclaim space while another long vacuum is running, i.e.
vacuums won't no longer block each other from removing deleted tuples.

see:

http://archives.postgresql.org/pgsql-patches/2005-08/msg00304.php

Could you perhaps test this patch as well, while you already have a
setup for testing parallel vacuums under big loads ?

Or perhaps you can share the setup/scripts/data so that I could run your
test myself as well ?

--
Hannu Krosing <hannu@skype.net>

#16Teodor Sigaev
teodor@sigaev.ru
In reply to: Hannu Krosing (#15)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

http://archives.postgresql.org/pgsql-patches/2005-08/msg00304.php

Could you perhaps test this patch as well, while you already have a
setup for testing parallel vacuums under big loads ?

Ok, I'll do it.

Or perhaps you can share the setup/scripts/data so that I could run your
test myself as well ?

Statements generator:
http://www.sigaev.ru/gist/concur.pl
Usage:
./concur.pl -d DATABASE [-n N] [-c N] [-l]
-d DATABASE - DATABASE name
-n N - number of rows
-c N - number of flow
-l - linear mode

Script guarantees the same result (ie the same data in table) with the same -n
and -c options, result doesn't depend on -l option. Also script guarantees
absence of dead locks.

% perl concur.pl -d qq -n 100000 -c 4
Start: parallel mode with 4 flows
3 flow finish. Stats: ni:25000 nu:555 nd:81 nv:4(nf:1) nt:232
0 flow finish. Stats: ni:25000 nu:554 nd:77 nv:2(nf:0) nt:247
1 flow finish. Stats: ni:25000 nu:548 nd:65 nv:4(nf:1) nt:249
2 flow finish. Stats: ni:25000 nu:552 nd:79 nv:7(nf:2) nt:263
All flow finish; status: 0; elapsed time: 159.25 sec

Script prints some statistics per flow:
ni - number of insert's statements (sum of this should be equal to -n N)
nu - -/- update's statements
nd - -/- delete's statements
nv - -/- vacuum (including full vacuum)
nf - number of full vacuum
nt - number of transactions

Simple wrapper for manipulating test table and loop testing (you should edit it
to right paths):
http://www.sigaev.ru/gist/concur.sh

Those scripts was wrote to test GiST concurrency.

I suspect it's needed to make some changes in generator to increase number of
updates and vacuums for your goal.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#16)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

Teodor Sigaev <teodor@sigaev.ru> writes:

http://www.sigaev.ru/gist/concur.pl
http://www.sigaev.ru/gist/concur.sh

BTW, these scripts seem to indicate that there's a GIST or
contrib/intarray problem in the 8.0 branch. I was trying to use 'em
to test REL8_0_STABLE branch tip to verify my t_ctid chain backpatch,
and I pretty consistently see "Problem with update":

Start: parallel mode with 4 flows
Problem with update {77,77}:0 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
Problem with update {43,24}:3 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
Problem with update {43,43}:2 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
1 flow finish. Stats: ni:75000 nu:1661 nd:216 nv:13(nf:3) nt:780
All flow finish; status: 255; elapsed time: 265.48 sec

Is this something that can be fixed for 8.0.4?

regards, tom lane

#18Teodor Sigaev
teodor@sigaev.ru
In reply to: Hannu Krosing (#15)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

http://archives.postgresql.org/pgsql-patches/2005-08/msg00304.php

Could you perhaps test this patch as well, while you already have a
setup for testing parallel vacuums under big loads ?

I didn't find any problem with your patch during testing with 1e8 statements...

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#19Hannu Krosing
hannu@skype.net
In reply to: Teodor Sigaev (#18)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

On R, 2005-08-26 at 16:47 +0400, Teodor Sigaev wrote:

http://archives.postgresql.org/pgsql-patches/2005-08/msg00304.php

Could you perhaps test this patch as well, while you already have a
setup for testing parallel vacuums under big loads ?

I didn't find any problem with your patch during testing with 1e8 statements...

Thank You Very Much !

--
Hannu Krosing <hannu@skype.net>

#20Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#17)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

Finded problem in GiST isn't too simple to resolve. I'm working on it. The
problem is about update query...

Tom Lane wrote:

Teodor Sigaev <teodor@sigaev.ru> writes:

http://www.sigaev.ru/gist/concur.pl
http://www.sigaev.ru/gist/concur.sh

BTW, these scripts seem to indicate that there's a GIST or
contrib/intarray problem in the 8.0 branch. I was trying to use 'em
to test REL8_0_STABLE branch tip to verify my t_ctid chain backpatch,
and I pretty consistently see "Problem with update":

Start: parallel mode with 4 flows
Problem with update {77,77}:0 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
Problem with update {43,24}:3 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
Problem with update {43,43}:2 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
1 flow finish. Stats: ni:75000 nu:1661 nd:216 nv:13(nf:3) nt:780
All flow finish; status: 255; elapsed time: 265.48 sec

Is this something that can be fixed for 8.0.4?

regards, tom lane

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#21Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#17)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

Fixed in 8.0, 7.4 and 7.3 branches.

Tom Lane wrote:

Teodor Sigaev <teodor@sigaev.ru> writes:

http://www.sigaev.ru/gist/concur.pl
http://www.sigaev.ru/gist/concur.sh

BTW, these scripts seem to indicate that there's a GIST or
contrib/intarray problem in the 8.0 branch. I was trying to use 'em
to test REL8_0_STABLE branch tip to verify my t_ctid chain backpatch,
and I pretty consistently see "Problem with update":

Start: parallel mode with 4 flows
Problem with update {77,77}:0 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
Problem with update {43,24}:3 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
Problem with update {43,43}:2 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
1 flow finish. Stats: ni:75000 nu:1661 nd:216 nv:13(nf:3) nt:780
All flow finish; status: 255; elapsed time: 265.48 sec

Is this something that can be fixed for 8.0.4?

regards, tom lane

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#22Mario Weilguni
mweilguni@sime.com
In reply to: Teodor Sigaev (#21)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

Am Dienstag, 30. August 2005 11:25 schrieb Teodor Sigaev:

Fixed in 8.0, 7.4 and 7.3 branches.

Tom Lane wrote:

Teodor Sigaev <teodor@sigaev.ru> writes:

http://www.sigaev.ru/gist/concur.pl
http://www.sigaev.ru/gist/concur.sh

BTW, these scripts seem to indicate that there's a GIST or
contrib/intarray problem in the 8.0 branch. I was trying to use 'em
to test REL8_0_STABLE branch tip to verify my t_ctid chain backpatch,
and I pretty consistently see "Problem with update":

Start: parallel mode with 4 flows
Problem with update {77,77}:0 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit
disconnect(). Problem with update {43,24}:3 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit
disconnect(). Problem with update {43,43}:2 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit
disconnect(). 1 flow finish. Stats: ni:75000 nu:1661 nd:216 nv:13(nf:3)
nt:780 All flow finish; status: 255; elapsed time: 265.48 sec

Is this something that can be fixed for 8.0.4?

regards, tom lane

Since 7.4 we have troubles with ltree (seldom corruption of buffer cache, not
on-disk), might this bug be somehow related to the ltree problem?
7.2 was rock-stable with ltree.

Best regards,
Mario Weilguni

#23Teodor Sigaev
teodor@sigaev.ru
In reply to: Mario Weilguni (#22)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

Since 7.4 we have troubles with ltree (seldom corruption of buffer cache, not
on-disk), might this bug be somehow related to the ltree problem?
7.2 was rock-stable with ltree.

Not sure. Fixed bug was (@ - contains operation):

update wow set a = a || '{101}'::int[] where a @ '{1,2,3}';
select a from wow where a @ '{1,2,3}' and not a @ '{101}';

After update query select must not find any rows, but it did. The problem was in
GiST code and so any GiST idexes was affected.

Can you say more about your trouble?

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#24Mario Weilguni
mweilguni@sime.com
In reply to: Teodor Sigaev (#23)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

Am Dienstag, 30. August 2005 12:19 schrieb Teodor Sigaev:

Since 7.4 we have troubles with ltree (seldom corruption of buffer cache,
not on-disk), might this bug be somehow related to the ltree problem? 7.2
was rock-stable with ltree.

Not sure. Fixed bug was (@ - contains operation):

update wow set a = a || '{101}'::int[] where a @ '{1,2,3}';
select a from wow where a @ '{1,2,3}' and not a @ '{101}';

After update query select must not find any rows, but it did. The problem
was in GiST code and so any GiST idexes was affected.

Can you say more about your trouble?

We have queries that use ltree for sorting too, the sort looks like this:
order by subpath(ltreefield, 0, nlevel(ltreefield) - 1)

But concurrency leads to a bug, that results in an sql-error:
ERROR: invalid positions

Now we use locking to prevent concurrenct access on the most-used concurrent
part of the program, and the problem is extremly rare now, I had only 4
occurences in one year, but still happens (there are other access paths that
do not use locking, but they are rareley accessed).

It seems the ltree length parameter is set to 0 in the tuples, the content
itself is still there: Example:
Say the tuple was before treefield='1.2.3.4.5'
After the occurence of the error, I get: treefield='' (empty, but not null)

Using a tool Tom Lane told me to use, I checked it, and on-disk I had still
"1.2.3.4.5", but the length parameter of the ltree column was 0 (sorry, I was
wrong in my first mail, on-disk was broken too.)

Might this be somehow related to the intarray bugs?

Best regards,
Mario Weilguni

p.s.: I tried hard to create a self-contained test for tracking this down, but
failed.

#25Teodor Sigaev
teodor@sigaev.ru
In reply to: Mario Weilguni (#24)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

We have queries that use ltree for sorting too, the sort looks like this:
order by subpath(ltreefield, 0, nlevel(ltreefield) - 1)

But concurrency leads to a bug, that results in an sql-error:
ERROR: invalid positions

contrib_regression=# select nlevel(t), subpath(t, 0, nlevel(t)-1) from
text2ltree('1.2.3.4.5') as t;
nlevel | subpath
--------+---------
5 | 1.2.3.4
(1 row)

contrib_regression=# select nlevel(t), subpath(t, 0, nlevel(t)-1) from
text2ltree('1') as t;
nlevel | subpath
--------+---------
1 |
(1 row)

contrib_regression=# select nlevel(t), subpath(t, 0, nlevel(t)-1) from
text2ltree('') as t;
ERROR: invalid positions
contrib_regression=#

It's incorrect arguments for subpath().

It seems the ltree length parameter is set to 0 in the tuples, the content
itself is still there: Example:
Say the tuple was before treefield='1.2.3.4.5'
After the occurence of the error, I get: treefield='' (empty, but not null)

Using a tool Tom Lane told me to use, I checked it, and on-disk I had still
"1.2.3.4.5", but the length parameter of the ltree column was 0 (sorry, I was
wrong in my first mail, on-disk was broken too.)

Interesting... But with some test suite or more information I'm helpless.
How often do updates/inserts of table and/or ltree column occurs? Vacuum?

Might this be somehow related to the intarray bugs?

No, except case when you update your table something like to:
update tbl set ltreefield=... where ltreefield ...;

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#21)
Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)

Teodor Sigaev <teodor@sigaev.ru> writes:

Is this something that can be fixed for 8.0.4?

Fixed in 8.0, 7.4 and 7.3 branches.

Excellent news. Thanks.

regards, tom lane