HOT is applied

heikki.linnakangas@enterprisedb.com

over 18 years ago

In reply to: Tom Lane (#1)

Re: HOT is applied

Tom Lane wrote:

I've committed the HOT patch.

Thanks, much easier to work with it now that it's in.

I'd still like to think about whether we
can be smarter about when to invoke pruning, but that's a small enough
issue that the patch can go in without it.

Yeah. I'm doing some micro-benchmarking, and the attached test case is
much slower with HOT. It's spending a lot of time trying to prune, only
to find out that it can't.

Instead of/in addition to avoiding pruning when it doesn't help, maybe
we could make HeapTupleSatisfiesVacuum cheaper.

I'm going to continue testing, this is just a heads-up that HOT as
committed seriously hurts performance in some cases. (though one can
argue that this test case isn't a very realistic one.)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Merlin Moncure

mmoncure@gmail.com

over 18 years ago

In reply to: Heikki Linnakangas (#2)

Re: HOT is applied

On 9/20/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:

Tom Lane wrote:

I've committed the HOT patch.

Thanks, much easier to work with it now that it's in.

I'd still like to think about whether we
can be smarter about when to invoke pruning, but that's a small enough
issue that the patch can go in without it.

Yeah. I'm doing some micro-benchmarking, and the attached test case is
much slower with HOT. It's spending a lot of time trying to prune, only
to find out that it can't.

Instead of/in addition to avoiding pruning when it doesn't help, maybe
we could make HeapTupleSatisfiesVacuum cheaper.

I'm going to continue testing, this is just a heads-up that HOT as
committed seriously hurts performance in some cases. (though one can
argue that this test case isn't a very realistic one.)

well, I ran your test on my box and here are the results:
pre hot:
run 1: 3617.641 ms
run 2: 5195.215 ms
run 3: 6760.449 ms
after vacuum:
run 1: 4171.362 ms
run 2: 5513.317 ms
run 3: 6884.125 ms
post hot:
run 1: Time: 7286.292 ms
run 2: Time: 7477.089 ms
run 3: Time: 7701.229 ms

those results aren't exactly terrible, and this case is highly artificial.

merlin

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Heikki Linnakangas (#2)

Re: HOT is applied

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

Tom Lane wrote:

I'd still like to think about whether we
can be smarter about when to invoke pruning, but that's a small enough
issue that the patch can go in without it.

Yeah. I'm doing some micro-benchmarking, and the attached test case is
much slower with HOT. It's spending a lot of time trying to prune, only
to find out that it can't.

Not sure if that's an appropriate description or not. oprofile
(on a dual Xeon running Fedora 6) shows me this:

CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 100000
samples % symbol name
1070003 29.8708 LWLockAcquire
1015097 28.3380 LWLockRelease
283514 7.9147 heap_page_prune
252270 7.0425 AllocSetAlloc
148981 4.1590 HeapTupleSatisfiesMVCC
146442 4.0882 TransactionIdIsInProgress
92673 2.5871 AllocSetFree
84966 2.3720 HeapTupleSatisfiesVacuum
83097 2.3198 TransactionIdGetStatus
80737 2.2539 SimpleLruReadPage_ReadOnly
52803 1.4741 TransactionLogFetch
43406 1.2117 heapgetpage
42536 1.1875 HeapTupleHeaderGetCmin
34842 0.9727 TransactionIdIsCurrentTransactionId
27761 0.7750 TransactionIdDidAbort
24833 0.6933 MemoryContextAlloc
16446 0.4591 TransactionIdPrecedes
16089 0.4491 HeapTupleHeaderGetCmax
12919 0.3607 hash_search_with_hash_value
11857 0.3310 pfree
4964 0.1386 heap_page_prune_opt
3683 0.1028 hash_any
3215 0.0898 LWLockConditionalAcquire
3086 0.0862 PinBuffer
2573 0.0718 UnpinBuffer
2009 0.0561 ConditionalLockBufferForCleanup
1854 0.0518 ReadBuffer_common
934 0.0261 XLogInsert
784 0.0219 heapgettup_pagemode

so basically it's all about the locking. Maybe the problem is that with
HOT we lock the buffer too often? heap_page_prune_opt is designed to
not take the buffer lock unless there's a good probability of needing
to prune, but maybe that's not working as intended.

[ pokes at it a bit more... ] Actually the disturbing part is that
pruning doesn't seem to be working at all: after the test finishes,
I see

regression=# vacuum verbose testtable;
INFO: vacuuming "public.testtable"
INFO: "testtable": removed 10000 row versions in 44 pages
INFO: "testtable": found 10000 removable, 1 nonremovable row versions in 45 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 9955 unused item pointers.
45 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

Shouldn't we be able to prune rows that have been inserted and deleted
by the same transaction? I'd have hoped to see this example use only
one heap page ...

regards, tom lane

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Tom Lane (#4)

Re: HOT is applied

I wrote:

... so basically it's all about the locking. Maybe the problem is that with
HOT we lock the buffer too often? heap_page_prune_opt is designed to
not take the buffer lock unless there's a good probability of needing
to prune, but maybe that's not working as intended.

Indeed it seems it's not; gprof shows

0.00 0.00 111/1389276 index_getnext <cycle 7> [89]
0.05 49.52 1389165/1389276 heapgetpage [6]
[8]: 50.9 0.05 49.52 1389276 heap_page_prune_opt [8] 7.17 42.31 1366696/1366696 heap_page_prune [9] 0.01 0.03 1366696/1366696 ConditionalLockBufferForCleanup [106] 0.01 0.00 2755558/2780795 PageGetHeapFreeSpace [177]
7.17 42.31 1366696/1366696 heap_page_prune [9]
0.01 0.03 1366696/1366696 ConditionalLockBufferForCleanup [106]
0.01 0.00 2755558/2780795 PageGetHeapFreeSpace [177]

so this example is getting past the heuristic tests in
heap_page_prune_opt almost every time. Why is that? Too tired to poke
at it more tonight.

regards, tom lane

pavan.deolasee@gmail.com

over 18 years ago

In reply to: Tom Lane (#4)

Re: HOT is applied

On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Shouldn't we be able to prune rows that have been inserted and deleted
by the same transaction? I'd have hoped to see this example use only
one heap page ...

Not before the transaction commits ? In the test, we update a single tuple
10000 times in the same transaction. So there is no opportunity to prune.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Pavan Deolasee (#6)

Re: HOT is applied

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Shouldn't we be able to prune rows that have been inserted and deleted
by the same transaction? I'd have hoped to see this example use only
one heap page ...

Not before the transaction commits ? In the test, we update a single tuple
10000 times in the same transaction. So there is no opportunity to prune.

[ looks a bit more ... ] Hm, the test I was thinking of was this one
at the end of HeapTupleSatisfiesVacuum:

if (TransactionIdEquals(HeapTupleHeaderGetXmin(tuple),
HeapTupleHeaderGetXmax(tuple)))
{
/*
* Inserter also deleted it, so it was never visible to anyone else.
* However, we can only remove it early if it's not an updated tuple;
* else its parent tuple is linking to it via t_ctid, and this tuple
* mustn't go away before the parent does.
*/
if (!(tuple->t_infomask & HEAP_UPDATED))
return HEAPTUPLE_DEAD;
}

but control never gets that far because neither xmin nor xmax is
committed yet. We could fix that, probably, by considering the
xmin=xmax case in the xmin-in-progress case further up; but the
HEAP_UPDATED exclusion is still a problem. Still, it seems like this
is leaving some money on the table when you think about pruning a HOT
chain. Can we improve on it easily?

regards, tom lane

pavan.deolasee@gmail.com

over 18 years ago

In reply to: Tom Lane (#7)

Re: HOT is applied

On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

but control never gets that far because neither xmin nor xmax is
committed yet. We could fix that, probably, by considering the
xmin=xmax case in the xmin-in-progress case further up; but the
HEAP_UPDATED exclusion is still a problem. Still, it seems like this
is leaving some money on the table when you think about pruning a HOT
chain. Can we improve on it easily?

May be we can, but it would get a bit tricky. There might be a transaction
looking at the first tuple in the chain and waiting for this
(inserting-deleting)
transaction to finish. If the waiting transaction is running in READ
COMMITTED
mode, it would then follow the update chain. Removing any intermediate
tuples without fixing the previous tuple's xmax/ctid (or redirected line
pointer)
would be tricky.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Bruce Momjian

bruce@momjian.us

over 18 years ago

In reply to: Heikki Linnakangas (#2)

Re: HOT is applied

Heikki Linnakangas wrote:

Tom Lane wrote:

I've committed the HOT patch.

Thanks, much easier to work with it now that it's in.

I'd still like to think about whether we
can be smarter about when to invoke pruning, but that's a small enough
issue that the patch can go in without it.

Yeah. I'm doing some micro-benchmarking, and the attached test case is
much slower with HOT. It's spending a lot of time trying to prune, only
to find out that it can't.

Instead of/in addition to avoiding pruning when it doesn't help, maybe
we could make HeapTupleSatisfiesVacuum cheaper.

I'm going to continue testing, this is just a heads-up that HOT as
committed seriously hurts performance in some cases. (though one can
argue that this test case isn't a very realistic one.)

This might be a simplistic question but if the page is +90% full and
there is a long-lived transaction, isn't Postgres going to try pruning
on each page read access?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#10

pavan.deolasee@gmail.com

over 18 years ago

In reply to: Bruce Momjian (#9)

Re: HOT is applied

On 9/21/07, Bruce Momjian <bruce@momjian.us> wrote:

This might be a simplistic question but if the page is +90% full and
there is a long-lived transaction, isn't Postgres going to try pruning
on each page read access?

The way it stands today, yes. Thats one reason why we are seeing
the performance drop in this particular case.

I liked the earlier proposed idea of storing the xid in the page header
to quickly tell us whether its worth pruning the page.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#11

heikki.linnakangas@enterprisedb.com

over 18 years ago

In reply to: Bruce Momjian (#9)

Re: HOT is applied

Bruce Momjian wrote:

This might be a simplistic question but if the page is +90% full and
there is a long-lived transaction, isn't Postgres going to try pruning
on each page read access?

Yes :(. That's why we earlier talked about stored the xid of the oldest
deleted tuple on the page in the page header. That way we could skip the
fruitless pruning attempts until that xid < OldestXmin.

Another approach is to try to make HeapTupleSatisfiesVacuum cheaper, so
that the fruitless pruning attempts wouldn't hurt that much.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#12

heikki.linnakangas@enterprisedb.com

over 18 years ago

In reply to: Tom Lane (#4)

Re: HOT is applied

Tom Lane wrote:

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

Tom Lane wrote:

I'd still like to think about whether we
can be smarter about when to invoke pruning, but that's a small enough
issue that the patch can go in without it.

Yeah. I'm doing some micro-benchmarking, and the attached test case is
much slower with HOT. It's spending a lot of time trying to prune, only
to find out that it can't.

Not sure if that's an appropriate description or not. oprofile
(on a dual Xeon running Fedora 6) shows me this:

...
samples % symbol name
1070003 29.8708 LWLockAcquire
1015097 28.3380 LWLockRelease
283514 7.9147 heap_page_prune
...
so basically it's all about the locking. Maybe the problem is that with
HOT we lock the buffer too often? heap_page_prune_opt is designed to
not take the buffer lock unless there's a good probability of needing
to prune, but maybe that's not working as intended.

If you look at the callgraph, you'll see that those
LWLockAcquire/Release calls are coming from HeapTupleSatisfiesVacuum ->
TransactionIdIsInProgress, which keeps trashing the ProcArrayLock. A
"if(TransactionIdIsCurrentTransactionId(xid)) return true;" check in
TransactionIdIsInProgress would speed that up, but I wonder if there's a
more general solution to make HeapTupleSatisfiesVacuum cheaper. For
example, we could cache the in-progress status of tuples.

Shouldn't we be able to prune rows that have been inserted and deleted
by the same transaction? I'd have hoped to see this example use only
one heap page ...

Well maybe, but that's a separate issue. Wouldn't we need the "snapshot
bookkeeping" we've discussed in the past, to notice that there's no
snapshot in our own backend that needs to see the tuples?

Nevertheless, the fruitless pruning attempts would still hurt us in
slightly more complex scenarios, even if we fixed the above.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#13

heikki.linnakangas@enterprisedb.com

over 18 years ago

In reply to: Merlin Moncure (#3)

Re: HOT is applied

Merlin Moncure wrote:

On 9/20/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:

Yeah. I'm doing some micro-benchmarking, and the attached test case is
much slower with HOT. It's spending a lot of time trying to prune, only
to find out that it can't.

Instead of/in addition to avoiding pruning when it doesn't help, maybe
we could make HeapTupleSatisfiesVacuum cheaper.

I'm going to continue testing, this is just a heads-up that HOT as
committed seriously hurts performance in some cases. (though one can
argue that this test case isn't a very realistic one.)

well, I ran your test on my box and here are the results:
pre hot:
run 1: 3617.641 ms
run 2: 5195.215 ms
run 3: 6760.449 ms
after vacuum:
run 1: 4171.362 ms
run 2: 5513.317 ms
run 3: 6884.125 ms
post hot:
run 1: Time: 7286.292 ms
run 2: Time: 7477.089 ms
run 3: Time: 7701.229 ms

those results aren't exactly terrible, and this case is highly artificial.

Your runtimes seem to be increasing as you repeat the test. Did you
remove the "DROP TABLE" from the beginning? On my laptop, post hot takes
~2x as long as pre hot, even when repeated, which matches the results of
your first runs.

Yeah, it's highly artificial.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#14

heikki.linnakangas@enterprisedb.com

over 18 years ago

In reply to: Merlin Moncure (#3)

Re: HOT is applied

Merlin Moncure wrote:

On 9/20/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:

Yeah. I'm doing some micro-benchmarking, and the attached test case is
much slower with HOT. It's spending a lot of time trying to prune, only
to find out that it can't.

Instead of/in addition to avoiding pruning when it doesn't help, maybe
we could make HeapTupleSatisfiesVacuum cheaper.

I'm going to continue testing, this is just a heads-up that HOT as
committed seriously hurts performance in some cases. (though one can
argue that this test case isn't a very realistic one.)

well, I ran your test on my box and here are the results:
pre hot:
run 1: 3617.641 ms
run 2: 5195.215 ms
run 3: 6760.449 ms
after vacuum:
run 1: 4171.362 ms
run 2: 5513.317 ms
run 3: 6884.125 ms
post hot:
run 1: Time: 7286.292 ms
run 2: Time: 7477.089 ms
run 3: Time: 7701.229 ms

those results aren't exactly terrible,

and this case is highly artificial.

Yeah.

I'm going through a bunch of CPU test cases a colleague of mine wrote.
Some of the test cases shows much worse performance with HOT.
Unfortunately they're quite complex and overlapping, so instead of
posting them as is, I'm going through the ones that HOT performs badly
at, and try to reduce them to simpler test cases like the above.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#15

pavan.deolasee@gmail.com

over 18 years ago

In reply to: Tom Lane (#5)

Re: HOT is applied

On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

so this example is getting past the heuristic tests in
heap_page_prune_opt almost every time. Why is that? Too tired to poke
at it more tonight.

I guess you already know the answer now, but anyways: Since we are
updating a single tuple in a single transaction, each update is preceded
by a sequential scan. All but last pages are completely full and marked
prunable, so HOT would try to (unsuccessfully) prune every (except may
be last) page.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#16

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Heikki Linnakangas (#11)

Re: HOT is applied

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

Bruce Momjian wrote:

This might be a simplistic question but if the page is +90% full and
there is a long-lived transaction, isn't Postgres going to try pruning
on each page read access?

Yes :(

It shouldn't, though --- the hint bit should get cleared on the first
try. I think I probably broke something in the last round of revisions
to heap_page_prune_opt, but haven't looked yet ...

regards, tom lane

#17

pavan.deolasee@gmail.com

over 18 years ago

In reply to: Tom Lane (#16)

Re: HOT is applied

On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It shouldn't, though --- the hint bit should get cleared on the first
try. I think I probably broke something in the last round of revisions
to heap_page_prune_opt, but haven't looked yet ...

We set the hint bit (prunable) again when we see a RECENTLY_DEAD
or DELETE_IN_PROGRESS tuple. This is correct in normal circumstances
because otherwise we would never be able to prune the page.

On a second thought, may be we can choose not to set the bit again. In
the worst case, the next update on the page would be a COLD update.
That would set the bit again and we shall retry prune in the next visit to
the page.

This scheme would work as long as we don't put in mechanism to
update FSM after pruning. But if we choose to update FSM (I am
inclined to do this to avoid repeated relation extension because
of COLD updates), we would need to improve this. Otherwise a
page full of only DEAD tuples may never be pruned and vacuum
would be required to reclaim that space.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

#18

heikki.linnakangas@enterprisedb.com

over 18 years ago

In reply to: Tom Lane (#16)

Re: HOT is applied

Tom Lane wrote:

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:

Bruce Momjian wrote:

This might be a simplistic question but if the page is +90% full and
there is a long-lived transaction, isn't Postgres going to try pruning
on each page read access?

Yes :(

It shouldn't, though --- the hint bit should get cleared on the first
try.

Think so? That would mean that you only get a single chance to prune
after an update. Not sure how big a problem that is, but I do feel that
would make HOT a lot less effective in some usage patterns.

The idea of having a "prunable xmin" in the page header sounds more and
more attractive to me...

I think I probably broke something in the last round of revisions
to heap_page_prune_opt, but haven't looked yet ...

Pavan's patch was like that as well; you don't clear the flag (or
rather, you set it again while pruning) until there's nothing left to
prune on the page.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#19

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Pavan Deolasee (#15)

Re: HOT is applied

"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

On 9/21/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

so this example is getting past the heuristic tests in
heap_page_prune_opt almost every time. Why is that? Too tired to poke
at it more tonight.

I guess you already know the answer now, but anyways: Since we are
updating a single tuple in a single transaction, each update is preceded
by a sequential scan. All but last pages are completely full and marked
prunable, so HOT would try to (unsuccessfully) prune every (except may
be last) page.

Hmm ... the problem really is that heap_page_prune turns the hint back
on when it sees a DELETE_IN_PROGRESS tuple. Maybe that's a bad idea.

I don't much like the idea of adding an xid to the page header --- for
one thing, *which* xid would you put there, and what would you test it
against?

regards, tom lane

#20

heikki.linnakangas@enterprisedb.com

over 18 years ago

In reply to: Tom Lane (#19)

Re: HOT is applied

Tom Lane wrote:

I don't much like the idea of adding an xid to the page header --- for
one thing, *which* xid would you put there, and what would you test it
against?

I was thinking that you would put the smallest in-progress xmax on the
page there, and you would test it against OldestXmin. If all
transactions commit, there surely isn't anything to prune until that xid
falls beyond the OldestXmin horizon. If an inserting transaction aborts,
we could prune the aborted tuple earlier, but optimizing for aborts
doesn't seem that important.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#21

tgl@sss.pgh.pa.us

over 18 years ago

In reply to: Heikki Linnakangas (#12)

#22