pgsql: Fix TransactionIdIsCurrentTransactionId() to use binary search
Log Message:
-----------
Fix TransactionIdIsCurrentTransactionId() to use binary search instead of
linear search when checking child-transaction XIDs. This makes for an
important speedup in transactions that have large numbers of children,
as in a recent example from Craig Ringer. We can also get rid of an
ugly kluge that represented lists of TransactionIds as lists of OIDs.
Heikki Linnakangas
Modified Files:
--------------
pgsql/src/backend/access/transam:
twophase.c (r1.39 -> r1.40)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/twophase.c?r1=1.39&r2=1.40)
xact.c (r1.258 -> r1.259)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xact.c?r1=1.258&r2=1.259)
pgsql/src/include/nodes:
pg_list.h (r1.57 -> r1.58)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/include/nodes/pg_list.h?r1=1.57&r2=1.58)
On Sunday 16 March 2008 22:18, Tom Lane wrote:
Log Message:
-----------
Fix TransactionIdIsCurrentTransactionId() to use binary search instead of
linear search when checking child-transaction XIDs. This makes for an
important speedup in transactions that have large numbers of children,
as in a recent example from Craig Ringer. We can also get rid of an
ugly kluge that represented lists of TransactionIds as lists of OIDs.
Are there any plans to backpatch this into REL8_3_STABLE? It looks like I am
hitting a pretty serious performance regression on 8.3 with a stored
procedure that grabs a pretty big recordset, and loops through doing
insert....update on unique failures. The procedure get progressivly slower
the more records involved... and dbx shows me stuck in
TransactionIdIsCurrentTransactionId(). I can provide provide more details if
needed (lmk what your looking for) but it certainly looks like the issue
discussed here:
http://archives.postgresql.org/pgsql-performance/2008-03/msg00191.php
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes:
On Sunday 16 March 2008 22:18, Tom Lane wrote:
Fix TransactionIdIsCurrentTransactionId() to use binary search instead of
linear search when checking child-transaction XIDs.
Are there any plans to backpatch this into REL8_3_STABLE?
No.
It looks like I am
hitting a pretty serious performance regression on 8.3 with a stored
procedure that grabs a pretty big recordset, and loops through doing
insert....update on unique failures. The procedure get progressivly slower
the more records involved... and dbx shows me stuck in
TransactionIdIsCurrentTransactionId().
If you can convince me it's a regression I might reconsider, but I
rather doubt that 8.2 was better,
regards, tom lane
On Thursday 27 March 2008 17:11, Tom Lane wrote:
Robert Treat <xzilla@users.sourceforge.net> writes:
On Sunday 16 March 2008 22:18, Tom Lane wrote:
Fix TransactionIdIsCurrentTransactionId() to use binary search instead
of linear search when checking child-transaction XIDs.Are there any plans to backpatch this into REL8_3_STABLE?
No.
It looks like I am
hitting a pretty serious performance regression on 8.3 with a stored
procedure that grabs a pretty big recordset, and loops through doing
insert....update on unique failures. The procedure get progressivly
slower the more records involved... and dbx shows me stuck in
TransactionIdIsCurrentTransactionId().If you can convince me it's a regression I might reconsider, but I
rather doubt that 8.2 was better,
Well, I can't speak for 8.2, but I have a second system crunching the same
data using the same function on 8.1 (on lesser hardware in fact), and it
doesn't have these type of issues. Looking over the past week, the 8.3 box
averages about 19 minutes to complete each run, and the 8.1 box averages 15
minutes (sample size is over 100 iterations of both). Of course this is when
it completes, the 8.3 box often does not complete, as it falls farther behind
during the day and eventually cannot finish (it does periodic intra-day
summing, so there's a limited time frame it has to run, and it's jobs end up
taking hours to complete).
I am open to the idea that there is some other issue going on here, but
whenever I look at it, it seems stuck in
TransactionIdIsCurrentTransactionId(), progress for the function does get
increasingly slower as it progresses, and I can watch the process consuming
more and more memory as it goes on... I can probably get some dtrace output
tommorrow if you want.
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes:
If you can convince me it's a regression I might reconsider, but I
rather doubt that 8.2 was better,
Well, I can't speak for 8.2, but I have a second system crunching the same
data using the same function on 8.1 (on lesser hardware in fact), and it
doesn't have these type of issues.
If you can condense it to a test case that is worse on 8.3 than 8.1,
I'm willing to listen...
regards, tom lane
On Thursday 27 March 2008 17:11, Tom Lane wrote:
Robert Treat <xzilla@users.sourceforge.net> writes:
On Sunday 16 March 2008 22:18, Tom Lane wrote:
Fix TransactionIdIsCurrentTransactionId() to use binary
search instead
of linear search when checking child-transaction XIDs.Are there any plans to backpatch this into REL8_3_STABLE?
No.
It looks like I am
hitting a pretty serious performance regression on 8.3 with a stored
procedure that grabs a pretty big recordset, and loops through doing
insert....update on unique failures. The procedure get progressivly
slower the more records involved... and dbx shows me stuck in
TransactionIdIsCurrentTransactionId().If you can convince me it's a regression I might reconsider, but I
rather doubt that 8.2 was better,
Well, I can't speak for 8.2, but I have a second system crunching the
same data using the same function on 8.1 (on lesser hardware in fact),
and it doesn't have these type of issues.If you can condense it to a test case that is worse on 8.3 than 8.1,
I'm willing to listen...
I spent some time trying to come up with a test case, but had no luck. Dtrace
showed that the running process was calling this function rather excessively;
sample profiling for 30 seconds would look like this:
FUNCTION COUNT PCNT
<snip>
postgres`LockBuffer 10 0.0%
postgres`slot_deform_tuple 11 0.0%
postgres`ExecEvalScalarVar 11 0.0%
postgres`ExecMakeFunctionResultNoSets 13 0.0%
postgres`IndexNext 14 0.0%
postgres`slot_getattr 15 0.0%
postgres`LWLockRelease 20 0.0%
postgres`index_getnext 55 0.1%
postgres`TransactionIdIsCurrentTransactionId 40074 99.4%
But I saw similar percentages on the 8.1 machine, so I am not convinced this
is where the problem is. Unfortunatly (in some respects) the problem went
away up untill this morning, so I haven't been looking at it since the above
exchange. I'm still open to the idea that something inside
TransactionIdIsCurrentTransactionId could have changed to make things worse
(in addition to cpu, the process does consume a significant amount of
memory... prstat shows:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
3844 postgres 1118M 1094M cpu3 50 0 6:25:48 12% postgres/1
I do wonder if the number of rows being worked on is significant in some
way... by looking in the job log for the running procedure (we use
autonoumous logging in this function), I can see that it has a much larger
number of rows to be processed, so perhaps there is simply a tipping point
that is reached which causes it to stop performing... still it would be
curious that I never saw this behavior on 8.1
= current job
elapsed | status
-----------------+--------------------------------------------------------
00:00:00.042895 | OK/starting with 2008-04-21 03:20:03
00:00:00.892663 | OK/processing 487291 hits up until 2008-04-21 05:20:03
05:19:26.595508 | ??/Processed 70000 aggregated rows so far
(3 rows)
= yesterdays run
| elapsed | status
+-----------------+--------------------------------------------------------
| 00:00:00.680222 | OK/starting with 2008-04-20 04:20:02
| 00:00:00.409331 | OK/processing 242142 hits up until 2008-04-20 05:20:04
| 00:25:02.306736 | OK/Processed 35936 aggregated rows
| 00:00:00.141179 | OK/
(4 rows)
Unfortunatly I don't have the 8.1 system to bang on anymore for this, (though
anecdotaly speaking, I never saw this behavior in 8.1) however I do now have
a parallel 8.3 system crunching the data, and it is showing the same symptom
(yes, 2 8.3 servers, crunching the same data, both bogged down now), so I do
feel this is something specific to 8.3.
I am mostly wondering if anyone else has encountered behavior like this on 8.3
(large sets of insert....update exception block in plpgsql bogging down), or
if anyone has any thoughts on which direction I should poke at it from here.
TIA.
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat wrote:
Unfortunatly I don't have the 8.1 system to bang on anymore for this, (though
anecdotaly speaking, I never saw this behavior in 8.1) however I do now have
a parallel 8.3 system crunching the data, and it is showing the same symptom
(yes, 2 8.3 servers, crunching the same data, both bogged down now), so I do
feel this is something specific to 8.3.I am mostly wondering if anyone else has encountered behavior like this on 8.3
(large sets of insert....update exception block in plpgsql bogging down), or
if anyone has any thoughts on which direction I should poke at it from here.
TIA.
Perhaps what you could do is backpatch the change and see if the problem
goes away.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Monday 21 April 2008 12:54, Alvaro Herrera wrote:
Robert Treat wrote:
Unfortunatly I don't have the 8.1 system to bang on anymore for this,
(though anecdotaly speaking, I never saw this behavior in 8.1) however I
do now have a parallel 8.3 system crunching the data, and it is showing
the same symptom (yes, 2 8.3 servers, crunching the same data, both
bogged down now), so I do feel this is something specific to 8.3.I am mostly wondering if anyone else has encountered behavior like this
on 8.3 (large sets of insert....update exception block in plpgsql bogging
down), or if anyone has any thoughts on which direction I should poke at
it from here. TIA.Perhaps what you could do is backpatch the change and see if the problem
goes away.
So, after some more digging, we ended up backpatching the change. Results as
follows:
= hanging job before patch
elapsed | status
-----------------+--------------------------------------------------------
00:00:00.024075 | OK/starting with 2008-04-25 08:20:02
00:00:00.611411 | OK/processing 624529 hits up until 2008-04-25 10:20:02
03:48:02.748319 | ??/Processed 65000 aggregated rows so far
(3 rows)
= successful job after patch
elapsed | status
-----------------+---------------------------------------------------------
00:00:00.026809 | OK/starting with 2008-04-25 08:20:02
00:00:03.921532 | OK/processing 2150115 hits up until 2008-04-25 15:00:02
00:24:45.439081 | OK/Processed 334139 aggregated rows
00:00:00.019433 | OK/
(4 rows)
Note the second run had to do all the rows from the first run, plus additional
rows that accumulated while the first job was running.
Oddly some dtrace profiling gave me this, which is pretty different, but
certainly doesn't have concerns about TransactionIdIsCurrentTransactionId
<snip>
postgres`hash_search_with_hash_value 536 2.3%
postgres`SearchCatCache 538 2.3%
postgres`hash_seq_search 577 2.4%
postgres`MemoryContextAllocZeroAligned 610 2.6%
postgres`_bt_compare 671 2.8%
libc.so.1`memcpy 671 2.8%
postgres`XLogInsert 755 3.2%
postgres`LockReassignCurrentOwner 757 3.2%
postgres`base_yyparse 1174 5.0%
postgres`AllocSetAlloc 1244 5.3%
We still have one of our 8.3 servers running stock 8.3.1, so we'll see how
long before this bites us again. Would certainly be nice to get this fixed
in the mainline code.
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes:
Oddly some dtrace profiling gave me this, which is pretty different, but
certainly doesn't have concerns about TransactionIdIsCurrentTransactionId
... which seems to pretty much destroy your thesis, no?
regards, tom lane
On Friday 25 April 2008 17:32, Tom Lane wrote:
Robert Treat <xzilla@users.sourceforge.net> writes:
Oddly some dtrace profiling gave me this, which is pretty different, but
certainly doesn't have concerns about TransactionIdIsCurrentTransactionId.... which seems to pretty much destroy your thesis, no?
How so? Before the patch we bog down for hours, spending 99% of our time in
TransactionIdIsCurrentTransactionId, after the patch everything performs well
(really better than before) and we spend so little time in
TransactionIdIsCurrentTransactionId it barely shows up on the radar.
Note I'm open to the idea that TransactionIdIsCurrentTransactionId itself is
not the problem, but that something else changed between 8.1 and 8.3 that
exposes TransactionIdIsCurrentTransactionId as a problem. Changing to a
binary search for TransactionIdIsCurrentTransactionId makes that a non-issue
though.
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes:
On Friday 25 April 2008 17:32, Tom Lane wrote:
Robert Treat <xzilla@users.sourceforge.net> writes:
Oddly some dtrace profiling gave me this, which is pretty different, but
certainly doesn't have concerns about TransactionIdIsCurrentTransactionId.... which seems to pretty much destroy your thesis, no?
How so? Before the patch we bog down for hours, spending 99% of our time in
TransactionIdIsCurrentTransactionId, after the patch everything performs well
(really better than before) and we spend so little time in
TransactionIdIsCurrentTransactionId it barely shows up on the radar.
Oh, you failed to state that the dtrace output was post-patch. You need
to show *pre* patch dtrace output if you want us to think it relevant.
regards, tom lane
On Saturday 26 April 2008 13:26, Tom Lane wrote:
Robert Treat <xzilla@users.sourceforge.net> writes:
On Friday 25 April 2008 17:32, Tom Lane wrote:
Robert Treat <xzilla@users.sourceforge.net> writes:
Oddly some dtrace profiling gave me this, which is pretty different,
but certainly doesn't have concerns about
TransactionIdIsCurrentTransactionId.... which seems to pretty much destroy your thesis, no?
How so? Before the patch we bog down for hours, spending 99% of our time
in TransactionIdIsCurrentTransactionId, after the patch everything
performs well (really better than before) and we spend so little time in
TransactionIdIsCurrentTransactionId it barely shows up on the radar.Oh, you failed to state that the dtrace output was post-patch. You need
to show *pre* patch dtrace output if you want us to think it relevant.
Please read up-thread.
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes:
On Saturday 26 April 2008 13:26, Tom Lane wrote:
Oh, you failed to state that the dtrace output was post-patch. You need
to show *pre* patch dtrace output if you want us to think it relevant.
Please read up-thread.
Sorry, I'd forgotten your previous post.
I poked around for calls to TransactionIdIsCurrentTransactionId that
are in current code and weren't in 8.1. I found these:
src/backend/commands/analyze.c: 965: if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(targtuple.t_data)))
src/backend/commands/analyze.c: 984: if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(targtuple.t_data)))
src/backend/commands/cluster.c: 803: if (!TransactionIdIsCurrentTransactionId(
src/backend/commands/cluster.c: 816: if (!TransactionIdIsCurrentTransactionId(
src/backend/storage/ipc/procarray.c: 374: if (TransactionIdIsCurrentTransactionId(xid))
src/backend/utils/time/combocid.c: 108: Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tup)));
src/backend/utils/time/combocid.c: 123: Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tup)));
src/backend/utils/time/combocid.c: 156: TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tup)))
The ANALYZE and CLUSTER calls are not likely to be your issue, but the
one in HeapTupleHeaderAdjustCmax could get called a lot, and the one
in TransactionIdIsInProgress definitely will get called a lot.
Neither of those calls existed in 8.2.
So I think that explains why TransactionIdIsCurrentTransactionId has
become more performance-critical in 8.3 than it was before. Will
apply the back-patch.
regards, tom lane