pg_autovacuum w/ dbt2

Started by Mark Wongabout 21 years ago22 messages
#1Mark Wong
markw@osdl.org

After all this time I finally got around to vacuuming the database
with dbt2 with pg_autovacuum. :)
http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/

Doesn't look so good though, probably because I'm not using optimal
settings with pg_autovacuum. So far I have only tried the default
settings (running without any arguments, except -D).

The only thing that's peculiar is a number of unexpected rollbacks
across all of the transactions. I suspect it was something to do with
these messages coming from pg_autovacuum:

[2004-12-20 15:48:18 PST] ERROR: Can not refresh statistics information from the database dbt2.
[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district"
]

This is with 8.0rc1. I can get rc2 installed since it just came out.
So let me know what I can try and what not.

Mark

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Wong (#1)
Re: pg_autovacuum w/ dbt2

Mark Wong <markw@osdl.org> writes:

[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district"
]

Yikes. Is this reproducible?

regards, tom lane

#3Matthew T. O'Connor
matthew@zeut.net
In reply to: Mark Wong (#1)
Re: pg_autovacuum w/ dbt2

Mark Wong wrote:

After all this time I finally got around to vacuuming the database
with dbt2 with pg_autovacuum. :)
http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/

Thanks!

Doesn't look so good though, probably because I'm not using optimal
settings with pg_autovacuum. So far I have only tried the default
settings (running without any arguments, except -D).

I don't know what you mean by "Not Good" since I don't have graphs from
a similar test without pg_autovacuum handy. Do you have a link to such
a test?

As for better pg_autovacuum settings, It appears that the little
performance dips are happening about once every 5 minutes, which if I
remember correctly is the default sleep time. You might try playing
with the lazy vacuum settings to see if that smooths out the curve.
Beyond that all you can do is play with the thresholds to see if there
is a better sweet spot than the defaults (which by the way I have no
confidence in, they were just conservative guesses)

The only thing that's peculiar is a number of unexpected rollbacks
across all of the transactions. I suspect it was something to do with
these messages coming from pg_autovacuum:

[2004-12-20 15:48:18 PST] ERROR: Can not refresh statistics information from the database dbt2.
[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district"
]

Not sure what this is all about, but if you turn up the debug level to 4
or greater (pg_autovacuum -d4), pg_autovacuum will log the query that is
causing the problems, that would be helpful output to have.

This is with 8.0rc1. I can get rc2 installed since it just came out.
So let me know what I can try and what not.

I don't think anything has changed for pg_autovacuum between rc1 and rc2.

thanks again for the good work!!!

#4Mark Wong
markw@osdl.org
In reply to: Tom Lane (#2)
Re: pg_autovacuum w/ dbt2

On Tue, Dec 21, 2004 at 02:23:41PM -0500, Tom Lane wrote:

Mark Wong <markw@osdl.org> writes:

[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district"
]

Yikes. Is this reproducible?

regards, tom lane

Yes, and I think there is one for each of the rollbacks that are
occuring in the workload. Except for the 1% that's supposed to happen
for the new-order transaction.

Mark

#5Mark Wong
markw@osdl.org
In reply to: Matthew T. O'Connor (#3)
Re: pg_autovacuum w/ dbt2

The overall throughput is better for a run like this:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/207/

A drop from 3865 to 2679 (31%) by just adding pg_autovacuum. That's
what I meant by "not good". :)

I'll start with the additional debug messages, with 8.0rc2, before
I start changing the other settings, if that sounds good.

Mark

Show quoted text

On Tue, Dec 21, 2004 at 02:33:57PM -0500, Matthew T. O'Connor wrote:

Mark Wong wrote:

After all this time I finally got around to vacuuming the database
with dbt2 with pg_autovacuum. :)
http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/

Thanks!

Doesn't look so good though, probably because I'm not using optimal
settings with pg_autovacuum. So far I have only tried the default
settings (running without any arguments, except -D).

I don't know what you mean by "Not Good" since I don't have graphs from
a similar test without pg_autovacuum handy. Do you have a link to such
a test?

As for better pg_autovacuum settings, It appears that the little
performance dips are happening about once every 5 minutes, which if I
remember correctly is the default sleep time. You might try playing
with the lazy vacuum settings to see if that smooths out the curve.
Beyond that all you can do is play with the thresholds to see if there
is a better sweet spot than the defaults (which by the way I have no
confidence in, they were just conservative guesses)

The only thing that's peculiar is a number of unexpected rollbacks
across all of the transactions. I suspect it was something to do with
these messages coming from pg_autovacuum:

[2004-12-20 15:48:18 PST] ERROR: Can not refresh statistics information from the database dbt2.
[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district"
]

Not sure what this is all about, but if you turn up the debug level to 4
or greater (pg_autovacuum -d4), pg_autovacuum will log the query that is
causing the problems, that would be helpful output to have.

This is with 8.0rc1. I can get rc2 installed since it just came out.
So let me know what I can try and what not.

I don't think anything has changed for pg_autovacuum between rc1 and rc2.

thanks again for the good work!!!

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Wong (#4)
Re: pg_autovacuum w/ dbt2

Mark Wong <markw@osdl.org> writes:

On Tue, Dec 21, 2004 at 02:23:41PM -0500, Tom Lane wrote:

Mark Wong <markw@osdl.org> writes:

[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district"

Yikes. Is this reproducible?

Yes, and I think there is one for each of the rollbacks that are
occuring in the workload. Except for the 1% that's supposed to happen
for the new-order transaction.

Well, we need to find out what's causing that. There are two possible
sources of that error (one elog in src/backend/access/nbtree/nbtinsert.c,
and one in src/backend/access/nbtree/nbtpage.c) and neither of them
should ever fire.

If you want to track it yourself, please change those elog(ERROR)s to
elog(PANIC) so that they'll generate core dumps, then build with
--enable-debug if you didn't already (--enable-cassert would be good too)
and get a debugger stack trace from the core dump.

Otherwise, can you extract a test case that causes this without needing
vast resources to run?

regards, tom lane

#7Mark Wong
markw@osdl.org
In reply to: Tom Lane (#6)
Re: pg_autovacuum w/ dbt2

On Tue, Dec 21, 2004 at 05:56:47PM -0500, Tom Lane wrote:

Mark Wong <markw@osdl.org> writes:

On Tue, Dec 21, 2004 at 02:23:41PM -0500, Tom Lane wrote:

Mark Wong <markw@osdl.org> writes:

[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district"

Yikes. Is this reproducible?

Yes, and I think there is one for each of the rollbacks that are
occuring in the workload. Except for the 1% that's supposed to happen
for the new-order transaction.

Well, we need to find out what's causing that. There are two possible
sources of that error (one elog in src/backend/access/nbtree/nbtinsert.c,
and one in src/backend/access/nbtree/nbtpage.c) and neither of them
should ever fire.

If you want to track it yourself, please change those elog(ERROR)s to
elog(PANIC) so that they'll generate core dumps, then build with
--enable-debug if you didn't already (--enable-cassert would be good too)
and get a debugger stack trace from the core dump.

Otherwise, can you extract a test case that causes this without needing
vast resources to run?

regards, tom lane

I was going to try Matthew's suggestion of turning up the debug on
pg_autovacuum, unless you don't that'll help find the cause. I'm not
sure if I can more easily reproduce the problem but i can try.

I'll go ahead and make the elog() changes you recommended and do a run
overnight either way.

Mark

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Wong (#7)
Re: pg_autovacuum w/ dbt2

Mark Wong <markw@osdl.org> writes:

I was going to try Matthew's suggestion of turning up the debug on
pg_autovacuum, unless you don't that'll help find the cause.

It won't help --- this is a backend-internal bug of some kind.

regards, tom lane

#9Matthew T. O'Connor
matthew@zeut.net
In reply to: Mark Wong (#5)
Re: pg_autovacuum w/ dbt2

Mark Wong wrote:

The overall throughput is better for a run like this:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/207/

A drop from 3865 to 2679 (31%) by just adding pg_autovacuum. That's
what I meant by "not good". :)

I would agree that is "not good" :-) It sounds like pg_autovacuum is
being to aggressive for this type of load, that is vacuuming more often
than needed, however the lazy vacuum options were added so as to reduce
the performance impact of running a vacuum while doing other things, so,
I would recommend both higher autovacuum thresholds and trying out some
of the lazy vacuum settings.

I'll start with the additional debug messages, with 8.0rc2, before
I start changing the other settings, if that sounds good.

Sounds fine. From Tom Lane's response, we have a backend bug that needs
to be resolved and I think that is the priority.

#10Mark Wong
markw@osdl.org
In reply to: Mark Wong (#1)
Re: [Testperf-general] pg_autovacuum w/ dbt2

I apologize for the significant delay, here's a link to results to a
test with 8.0rc3:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/

These are the same parameters with as run 215, listed below with the
but with --enable-debug --enable-cassert. I also ran pg_autovacuum
with -d4, where the output can be seen here:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/dbt2.out

I, uh, wasn't able to reproduce the previous errors after repairing my
filesystems after a power outage. So I figure that might be good news.
The performance is up from run 215 with the errors, so I'll continue
with trying to tune some of the pg_autovacuum values.

Mark

Show quoted text

On Tue, Dec 21, 2004 at 09:41:31AM -0800, Mark Wong wrote:

After all this time I finally got around to vacuuming the database
with dbt2 with pg_autovacuum. :)
http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/

Doesn't look so good though, probably because I'm not using optimal
settings with pg_autovacuum. So far I have only tried the default
settings (running without any arguments, except -D).

The only thing that's peculiar is a number of unexpected rollbacks
across all of the transactions. I suspect it was something to do with
these messages coming from pg_autovacuum:

[2004-12-20 15:48:18 PST] ERROR: Can not refresh statistics information from the database dbt2.
[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district"
]

This is with 8.0rc1. I can get rc2 installed since it just came out.
So let me know what I can try and what not.

Mark

#11Matthew T. O'Connor
matthew@zeut.net
In reply to: Mark Wong (#10)
Re: [Testperf-general] pg_autovacuum w/ dbt2

I'm curious, the original run you posted with 3825 NOTPM is still 17%
faster than the latest pg_autovacuum run which shows 3280 NOTPM. Is
this on the same hardware? Also, did the original non-pg_autovacuum run
any manual vacuum commands? Also, does the non-pg_autovacuum run start
slowing down after a while? The graphs look like there is a slight
decline in performance as time goes on, what happens if you double the
length of the test?

Thanks for doing the testing!

Matthew

Mark Wong wrote:

Show quoted text

I apologize for the significant delay, here's a link to results to a
test with 8.0rc3:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/

These are the same parameters with as run 215, listed below with the
but with --enable-debug --enable-cassert. I also ran pg_autovacuum
with -d4, where the output can be seen here:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/dbt2.out

I, uh, wasn't able to reproduce the previous errors after repairing my
filesystems after a power outage. So I figure that might be good news.
The performance is up from run 215 with the errors, so I'll continue
with trying to tune some of the pg_autovacuum values.

Mark

On Tue, Dec 21, 2004 at 09:41:31AM -0800, Mark Wong wrote:

After all this time I finally got around to vacuuming the database
with dbt2 with pg_autovacuum. :)
http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/

Doesn't look so good though, probably because I'm not using optimal
settings with pg_autovacuum. So far I have only tried the default
settings (running without any arguments, except -D).

The only thing that's peculiar is a number of unexpected rollbacks
across all of the transactions. I suspect it was something to do with
these messages coming from pg_autovacuum:

[2004-12-20 15:48:18 PST] ERROR: Can not refresh statistics information from the database dbt2.
[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district"
]

This is with 8.0rc1. I can get rc2 installed since it just came out.
So let me know what I can try and what not.

Mark

#12Mark Wong
markw@osdl.org
In reply to: Matthew T. O'Connor (#11)
Re: [Testperf-general] pg_autovacuum w/ dbt2

Yeah, same hardware and database configuration.

No manual vacuum commands before. The decline in performance has been
pretty consistent in all my previous tests and people have told me on
many occasions that the decline in performance was probably because I
was never using vacuum...

Mark

On Fri, Jan 07, 2005 at 08:48:52AM -0500, Matthew T. O'Connor wrote:

I'm curious, the original run you posted with 3825 NOTPM is still 17%
faster than the latest pg_autovacuum run which shows 3280 NOTPM. Is
this on the same hardware? Also, did the original non-pg_autovacuum run
any manual vacuum commands? Also, does the non-pg_autovacuum run start
slowing down after a while? The graphs look like there is a slight
decline in performance as time goes on, what happens if you double the
length of the test?

Thanks for doing the testing!

Matthew

Mark Wong wrote:

I apologize for the significant delay, here's a link to results to a
test with 8.0rc3:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/

These are the same parameters with as run 215, listed below with the
but with --enable-debug --enable-cassert. I also ran pg_autovacuum
with -d4, where the output can be seen here:
http://www.osdl.org/projects/dbt2dev/results/dev4-010/236/dbt2.out

I, uh, wasn't able to reproduce the previous errors after repairing my
filesystems after a power outage. So I figure that might be good news.
The performance is up from run 215 with the errors, so I'll continue
with trying to tune some of the pg_autovacuum values.

Mark

On Tue, Dec 21, 2004 at 09:41:31AM -0800, Mark Wong wrote:

After all this time I finally got around to vacuuming the database
with dbt2 with pg_autovacuum. :)
http://www.osdl.org/projects/dbt2dev/results/dev4-010/215/

Doesn't look so good though, probably because I'm not using optimal
settings with pg_autovacuum. So far I have only tried the default
settings (running without any arguments, except -D).

The only thing that's peculiar is a number of unexpected rollbacks
across all of the transactions. I suspect it was something to do with
these messages coming from pg_autovacuum:

[2004-12-20 15:48:18 PST] ERROR: Can not refresh statistics information from the database dbt2.
[2004-12-20 15:48:18 PST] The error is [ERROR: failed to re-find parent key in "pk_district"
]

This is with 8.0rc1. I can get rc2 installed since it just came out.
So let me know what I can try and what not.

Mark

--
Mark Wong - - markw@osdl.org
Open Source Development Lab Inc - A non-profit corporation
12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005
(503) 906-1932 (office)
(503) 626-2436 (fax)
http://developer.osdl.org/markw/

#13Matthew T. O'Connor
matthew@zeut.net
In reply to: Mark Wong (#12)
Re: [Testperf-general] pg_autovacuum w/ dbt2

Ok, but what I'm curious to do is see if you run the non-pg_autovacuum
test for a "long time" (4 hours? more?) when does it get slower that
running with pg_autovacuum. And, can you demonstrate that running the
tests with pg_autovacuum for a long time (say 4 hours) that the
performance stays steady.

Also, I would very much like to see this test run with pg_autovacuum and
it's vacuum delay settings enabled.

Matthew

ps: I know time is limited and these tests take a lot of time to run, so
please take my requests with a grain of salt, all I'm saying is that I
think these would be interesting results to see.

Mark Wong wrote:

Show quoted text

Yeah, same hardware and database configuration.

No manual vacuum commands before. The decline in performance has been
pretty consistent in all my previous tests and people have told me on
many occasions that the decline in performance was probably because I
was never using vacuum...

Mark

On Fri, Jan 07, 2005 at 08:48:52AM -0500, Matthew T. O'Connor wrote:

I'm curious, the original run you posted with 3825 NOTPM is still 17%
faster than the latest pg_autovacuum run which shows 3280 NOTPM. Is
this on the same hardware? Also, did the original non-pg_autovacuum run
any manual vacuum commands? Also, does the non-pg_autovacuum run start
slowing down after a while? The graphs look like there is a slight
decline in performance as time goes on, what happens if you double the
length of the test?

Thanks for doing the testing!

Matthew

#14Josh Berkus
josh@agliodbs.com
In reply to: Mark Wong (#12)
Re: [Testperf-general] pg_autovacuum w/ dbt2

Mark,

No manual vacuum commands before. The decline in performance has been
pretty consistent in all my previous tests and people have told me on
many occasions that the decline in performance was probably because I
was never using vacuum...

Hmmm ... what autovacuum params are you using? From the look of the graphs,
VACUUM is running every 6 minutes or so, and having a huge impact on
performance when it does run. Give me a bit of time to crunch the database
log, and we'll see which table vacuums were taking the most time.

Can you re-run this test with a fairly generous vacuum_delay setting? Given
that the performance drops seem to correspond to wait times, I think this
could mean a big difference.

--
Josh Berkus
Aglio Database Solutions
San Francisco

#15Mark Wong
markw@osdl.org
In reply to: Josh Berkus (#14)
Re: [Testperf-general] pg_autovacuum w/ dbt2

On Fri, Jan 07, 2005 at 09:58:47AM -0800, Josh Berkus wrote:

Mark,

No manual vacuum commands before. The decline in performance has been
pretty consistent in all my previous tests and people have told me on
many occasions that the decline in performance was probably because I
was never using vacuum...

Hmmm ... what autovacuum params are you using? From the look of the graphs,
VACUUM is running every 6 minutes or so, and having a huge impact on
performance when it does run. Give me a bit of time to crunch the database
log, and we'll see which table vacuums were taking the most time.

Can you re-run this test with a fairly generous vacuum_delay setting? Given
that the performance drops seem to correspond to wait times, I think this
could mean a big difference.

All default parameters. Matthew also recommended using the
vacuum_delay setting so I was about to try that.

Mark

#16Josh Berkus
josh@agliodbs.com
In reply to: Mark Wong (#15)
Re: [Testperf-general] pg_autovacuum w/ dbt2

Mark,

All default parameters. Matthew also recommended using the
vacuum_delay setting so I was about to try that.

Interesting ... the default parameters are quite conservative, running only
when the table has doubled in new rows. So if those spikes are vacuums,
then the DBT2 test is updating or adding 100% of the rows in the orders table
every 6-8 minutes? Does that seem right?

--
Josh Berkus
Aglio Database Solutions
San Francisco

#17Mark Wong
markw@osdl.org
In reply to: Josh Berkus (#16)
Re: [Testperf-general] pg_autovacuum w/ dbt2

On Fri, Jan 07, 2005 at 10:13:52AM -0800, Josh Berkus wrote:

Mark,

All default parameters. Matthew also recommended using the
vacuum_delay setting so I was about to try that.

Interesting ... the default parameters are quite conservative, running only
when the table has doubled in new rows. So if those spikes are vacuums,
then the DBT2 test is updating or adding 100% of the rows in the orders table
every 6-8 minutes? Does that seem right?

Some hand-wavey math here:

We start with 18,000,000 rows in the orders table.

Based one 3281 transactions per minute for just the new-order transaction:
288 transactions per minute for delivery

New-order inserts 3281 rows per minute (not counting the 1% of rollbacks.)
Delivery updates 288 rows per minute in orders.

So no, we're not change that many rows per minute...

Mark

#18Matthew T. O'Connor
matthew@zeut.net
In reply to: Mark Wong (#17)
Re: [Testperf-general] pg_autovacuum w/ dbt2

Mark Wong wrote:

On Fri, Jan 07, 2005 at 10:13:52AM -0800, Josh Berkus wrote:

Mark,

All default parameters. Matthew also recommended using the
vacuum_delay setting so I was about to try that.

Interesting ... the default parameters are quite conservative, running only
when the table has doubled in new rows. So if those spikes are vacuums,
then the DBT2 test is updating or adding 100% of the rows in the orders table
every 6-8 minutes? Does that seem right?

Some hand-wavey math here:

We start with 18,000,000 rows in the orders table.

Based one 3281 transactions per minute for just the new-order transaction:
288 transactions per minute for delivery

New-order inserts 3281 rows per minute (not counting the 1% of rollbacks.)
Delivery updates 288 rows per minute in orders.

So no, we're not change that many rows per minute...

Except the thing pg_autovacuum is actually doing is Analyzing the
district table (4800 rows, takes ~15 sec) and vacuuming the warehouse
table (66,000, takes ~50sec) which takes about 50 seconds. Since the
default vacuum sleep time is 5 minutes, that would explain the dip every
6 minutes.

Matthew

#19Josh Berkus
josh@agliodbs.com
In reply to: Mark Wong (#1)
Re: [Testperf-general] pg_autovacuum w/ dbt2

Guys,

So, here's my suggested test series for autovacuum. After we have these
results, we can introduce vacuum delay to see if it has an effect:

1) Control Test. No autovacuum.
shared_buffers = 60000
maintainence_work_mem = 65536
default_statistics_target = 100
effective_cache_size = 100000
all stats_* on
rest of params as run 207

On second thought, this isn't going to do much good outside of the control
test. Other than Warehouse and District, even on the most aggresive
settings the other tables won't get vacuumed inside of even 2 hours. For
example, these very busy tables have the following vacuum skew at the end of
the run:

[2005-01-06 15:36:59 PST] INFO: table name: dbt2."public"."orders"
[2005-01-06 15:36:59 PST] INFO: relid: 17263; relisshared: 0
[2005-01-06 15:36:59 PST] INFO: reltuples: 18011200.000000; relpages:
162775
[2005-01-06 15:36:59 PST] INFO: curr_analyze_count: 1553269;
curr_vacuum_count: 730780
[2005-01-06 15:36:59 PST] INFO: last_analyze_count: 0;
last_vacuum_count: 0
[2005-01-06 15:36:59 PST] INFO: analyze_threshold: 18011700;
vacuum_threshold: 36023400

[2005-01-06 15:36:59 PST] INFO: table name: dbt2."public"."stock"
[2005-01-06 15:36:59 PST] INFO: relid: 17275; relisshared: 0
[2005-01-06 15:36:59 PST] INFO: reltuples: 59989800.000000; relpages:
2936743
[2005-01-06 15:36:59 PST] INFO: curr_analyze_count: 8629017;
curr_vacuum_count: 8629017
[2005-01-06 15:36:59 PST] INFO: last_analyze_count: 0;
last_vacuum_count: 0
[2005-01-06 15:36:59 PST] INFO: analyze_threshold: 59990300;
vacuum_threshold: 119980600

As you can see, they don't really even *need* to be vacuumed. Analyzed,
maybe. So I'm really wondering if we'd get any results worth testing in
less than a 6-hour run.

Ideas, anyone?

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

#20Mark Wong
markw@osdl.org
In reply to: Tom Lane (#6)
Re: pg_autovacuum w/ dbt2

On Tue, Dec 21, 2004 at 05:56:47PM -0500, Tom Lane wrote:

If you want to track it yourself, please change those elog(ERROR)s to
elog(PANIC) so that they'll generate core dumps, then build with
--enable-debug if you didn't already (--enable-cassert would be good too)
and get a debugger stack trace from the core dump.

Ok, well I got a core dump with 8.0rc4, but I'm not sure if it's
exactly the same problem. I have the postgres binary and the core
here:
http://developer.osdl.org/markw/pgsql/core/2files.tar.bz2

But it's for ia64, if you got one. Otherwise, this is what gdb is
telling me with a bt:

(gdb) bt
#0 FunctionCall2 (flinfo=0x6000000000187850, arg1=16, arg2=1043)
at fmgr.c:1141
#1 0x400000000007a320 in _bt_checkkeys (scan=0x60000000000c2d80,
tuple=0x200000000101f660, dir=ForwardScanDirection,
continuescan=0x60000fffffff8ae0 "\001") at nbtutils.c:542
#2 0x4000000000078eb0 in _bt_endpoint (scan=0x6000000000187690,
dir=ForwardScanDirection) at nbtsearch.c:1309
#3 0x40000000000771e0 in _bt_first (scan=0x6000000000187690,
dir=ForwardScanDirection) at nbtsearch.c:482
#4 0x4000000000074350 in btgettuple (fcinfo=0x1) at nbtree.c:265
#5 0x40000000003bd430 in FunctionCall2 (flinfo=0x6000000000187700,
arg1=6917529027642685072, arg2=1) at fmgr.c:1141
#6 0x400000000006b3a0 in index_getnext (scan=0x6000000000187690,
direction=ForwardScanDirection) at indexam.c:429
#7 0x400000000006a1e0 in systable_getnext (sysscan=0x6000000000187668)
at genam.c:253
#8 0x400000000039c970 in SearchCatCache (cache=0x200000001f1e0140, v1=0,
v2=6917529027641871376, v3=4294966252, v4=6917546619827097184)
at catcache.c:1217
#9 0x40000000003a9ee0 in SearchSysCache (cacheId=33, key1=1043, key2=0,
key3=0, key4=0) at syscache.c:524
#10 0x4000000000049110 in TupleDescInitEntry (desc=0x60000000001872c8,
attributeNumber=4, attributeName=0x6000000000187614 "\023\004",
oidtypeid=1043, typmod=28, attdim=0) at tupdesc.c:444
#11 0x40000000001b5fc0 in ExecTypeFromTLInternal (
targetList=0x6000000000135d40, hasoid=-64 '�', skipjunk=1 '\001')
at execTuples.c:570
#12 0x40000000001a4a20 in ExecInitJunkFilter (targetList=0x6000000000135b38,
hasoid=-64 '�', slot=0x60000000001258a0) at execJunk.c:76
#13 0x40000000001a6890 in InitPlan (queryDesc=0x6000000000177ed0,
explainOnly=0 '\0') at execMain.c:456
#14 0x40000000001a5800 in ExecutorStart (queryDesc=0x6000000000177ed0,
explainOnly=0 '\0') at execMain.c:160
#15 0x40000000001d6ab0 in _SPI_pquery (queryDesc=0x6000000000177ed0, tcount=0)
at spi.c:1521
#16 0x40000000001d6390 in _SPI_execute_plan (plan=0x60000fffffff9380,
Values=0x0, Nulls=0x0, snapshot=0x0, crosscheck_snapshot=0x0,
read_only=0 '\0', tcount=0) at spi.c:1452

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Wong (#20)
Re: pg_autovacuum w/ dbt2

Mark Wong <markw@osdl.org> writes:

Ok, well I got a core dump with 8.0rc4, but I'm not sure if it's
exactly the same problem. I have the postgres binary and the core
here:
http://developer.osdl.org/markw/pgsql/core/2files.tar.bz2
But it's for ia64, if you got one.

Poking around with gdb, it seems that the scankey structure being used
by SearchCatCache got clobbered; which is a bit surprising because
that's just a local variable in that function, and hence isn't really
very exposed. The contents of cache->cc_skey are okay, but cur_skey[0]
and cur_skey[1] don't match, which implies the clobber happened
somewhere between lines 1110 and 1217 of catcache.c.

(gdb) f 8
#8 0x400000000039c970 in SearchCatCache (cache=0x200000001f1e0140, v1=0,
v2=6917529027641871376, v3=4294966252, v4=6917546619827097184)
at catcache.c:1217
1217 in catcache.c
(gdb) p cache->cc_skey
$7 = {{sk_flags = 0, sk_attno = -2, sk_strategy = 3, sk_subtype = 0,
sk_func = {fn_addr = 0x200000000003a9c8, fn_oid = 184, fn_nargs = 2,
fn_strict = 1 '\001', fn_retset = 0 '\0', fn_extra = 0x0,
fn_mcxt = 0x600000000009e550, fn_expr = 0x0}, sk_argument = 0}, {
sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = {
fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0',
fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},
sk_argument = 0}, {sk_flags = 0, sk_attno = 0, sk_strategy = 0,
sk_subtype = 0, sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs = 0,
fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0,
fn_expr = 0x0}, sk_argument = 0}, {sk_flags = 0, sk_attno = 0,
sk_strategy = 0, sk_subtype = 0, sk_func = {fn_addr = 0, fn_oid = 0,
fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0,
fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}}
(gdb) p cur_skey
$8 = {{sk_flags = 0, sk_attno = 1, sk_strategy = 24932, sk_subtype = 24948,
sk_func = {fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0',
fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},
sk_argument = 1043}, {sk_flags = 0, sk_attno = 1043, sk_strategy = 0,
sk_subtype = 4294967295, sk_func = {fn_addr = 0, fn_oid = 0,
fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0', fn_extra = 0x0,
fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}, {sk_flags = 0,
sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = {fn_addr = 0,
fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0', fn_retset = 0 '\0',
fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0}, sk_argument = 0}, {
sk_flags = 0, sk_attno = 0, sk_strategy = 0, sk_subtype = 0, sk_func = {
fn_addr = 0, fn_oid = 0, fn_nargs = 0, fn_strict = 0 '\0',
fn_retset = 0 '\0', fn_extra = 0x0, fn_mcxt = 0x0, fn_expr = 0x0},
sk_argument = 0}}

The core dump happens because we eventually try to jump through the
zeroed-out fn_addr function pointer.

Not sure what to make of this. That's extremely heavily used,
well-debugged code; it's hard to believe that there are any intermittent
bugs in it.

I notice that the backend seems to have been using some nonstandard C
code:

Error while reading shared library symbols:
/home/markw/dbt2/storedproc/pgsql/c/../../../storedproc/pgsql/c/funcs.so: No such file or directory.

What is that, and how much confidence have you got in it?

regards, tom lane

#22Mark Wong
markw@osdl.org
In reply to: Tom Lane (#21)
1 attachment(s)
Re: pg_autovacuum w/ dbt2

On Wed, Jan 12, 2005 at 09:17:33PM -0500, Tom Lane wrote:

I notice that the backend seems to have been using some nonstandard C
code:

Error while reading shared library symbols:
/home/markw/dbt2/storedproc/pgsql/c/../../../storedproc/pgsql/c/funcs.so: No such file or directory.

What is that, and how much confidence have you got in it?

That's my C stored function library. I'll attached it if anyone wants
to take a persusal. Well, it was my first attempt with C stored
functions and SPI calls, so it wouldn't surprise me if it was flawed.
Would supplying the .so help the debugging?

Mark

Attachments:

funcs.ctext/plain; charset=us-asciiDownload