Do we need so many hint bits?
Related to discussion here:
http://archives.postgresql.org/message-id/CAHyXU0zn5emePLedoZUGrAQiF92F-YjvFr-P5vUh6n0WpKZ6PQ@mail.gmail.com
It occurred to me recently that many of the hint bits aren't terribly
important (at least it's not obvious to me). HEAP_XMIN_COMMITTED clearly
has a purpose, and we'd expect it to be used many times following the
initial CLOG lookup.
But the other tuple hint bits seem to be there just for symmetry,
because they shouldn't last long. If HEAP_XMIN_INVALID or
HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed
soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it
should just be changed to InvalidTransactionId.
Removing those 3 hints would give us 3 more flag bits (eventually, after
we are sure they aren't just leftover), and it would also reduce the
chance that a page is dirtied for no other reason than to set them. It
might even take a few cycles out of the tqual.c routines, or at least
reduce the code size. Not a huge win, but I don't see much downside
either.
Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced
in the visibility map patch, apparently as a way to know when to clear
the VM bit when doing an update. It was then also used for scans, which
showed a significant speedup. But I wonder: why not just use the
visibilitymap directly from those places? It can be used for the scan
because it is crash safe now (not possible before). And since it's only
one lookup per scanned page, then I don't think it would be a measurable
performance loss there. Inserts/updates/deletes also do a significant
amount of work, so again, I doubt it's a big drop in performance there
-- maybe under a lot of concurrency or something.
The benefit of removing PD_ALL_VISIBLE would be significantly higher.
It's quite common to load a lot of data, and then do some reads for a
while (setting hint bits and flushing them to disk), and then do a
VACUUM a while later, setting PD_ALL_VISIBLE and writing all of the
pages again. Also, if I remember correctly, Robert went to significant
effort when making the VM crash-safe to keep the PD_ALL_VISIBLE and VM
bits consistent. Maybe this was all discussed before?
All of these hint bits will have a bit more of a performance impact
after checksums are introduced (for those that use them in conjunction
with large data loads), so I'm looking for some simple ways to mitigate
those effects. What kind of worst-case tests could I construct to see if
there are worrying performance effects to removing these hint bits?
Regards,
Jeff Davis
On 15 November 2012 19:42, Jeff Davis <pgsql@j-davis.com> wrote:
many of the hint bits aren't terribly important
The truth is that nobody knows because there is no way of knowing.
A good way forwards would be to collect detailed stats by table. We
can then run various loads and see if any of this is worthwhile.
I'm sure we'd learn enough to understand how to optimise things.
track_hints?
track_buffers?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes:
On 15 November 2012 19:42, Jeff Davis <pgsql@j-davis.com> wrote:
many of the hint bits aren't terribly important
The truth is that nobody knows because there is no way of knowing.
We had a discussion awhile back in which the idea of *no* hint bits
was advocated, and someone (I think Robert) did some preliminary
tests that pretty much shot it down. However, I don't recall
anyone suggesting before that the four existing bits might not all
be equally worthwhile. It's worth looking into. The hard part is
probably agreeing on the test case or cases to measure behavior for.
regards, tom lane
Jeff Davis <pgsql@j-davis.com> writes:
It occurred to me recently that many of the hint bits aren't terribly
important (at least it's not obvious to me). HEAP_XMIN_COMMITTED clearly
has a purpose, and we'd expect it to be used many times following the
initial CLOG lookup.
Right.
But the other tuple hint bits seem to be there just for symmetry,
because they shouldn't last long. If HEAP_XMIN_INVALID or
HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed
soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it
should just be changed to InvalidTransactionId.
Hm. It is not cheaper to change xmax to 0 than it is to set the hint
bit --- you still need a write, and there are also added locking and
atomicity worries --- so I'm not convinced by your argument there.
But you might be right that the expected number of wins from the other
two bits is a lot less.
Removing those 3 hints would give us 3 more flag bits (eventually, after
we are sure they aren't just leftover), and it would also reduce the
chance that a page is dirtied for no other reason than to set them.
We aren't pressed for flag bits particularly. I think the main
attraction of this idea is precisely to reduce unnecessary page dirties,
and so that leads me to suggest a variant: keep the four bits defined as
now, but do not attempt to set XMIN_INVALID or XMAX_COMMITTED unless the
page is already dirty. This would make it a straight-up trade of more
clog consultation for fewer page dirties.
Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced
in the visibility map patch, apparently as a way to know when to clear
the VM bit when doing an update. It was then also used for scans, which
showed a significant speedup. But I wonder: why not just use the
visibilitymap directly from those places?
Added contention for access to the visibility map pages would be a
problem.
regards, tom lane
On 15 November 2012 22:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On 15 November 2012 19:42, Jeff Davis <pgsql@j-davis.com> wrote:
many of the hint bits aren't terribly important
The truth is that nobody knows because there is no way of knowing.
We had a discussion awhile back in which the idea of *no* hint bits
was advocated, and someone (I think Robert) did some preliminary
tests that pretty much shot it down. However, I don't recall
anyone suggesting before that the four existing bits might not all
be equally worthwhile. It's worth looking into.
Itsn't that what I said? In case of doubt, Yes, its a great idea and
worth looking into.
The question is *how* we look into it.
The hard part is
probably agreeing on the test case or cases to measure behavior for.
I think thats impossible. There are just too many possible cases.
Measuring top-level performance without measuring low level stats just
means we can't tell the difference between a test that didn't exercise
the code and a test where there was no difference.
We need detailed stats that allow many people to make their own tests
and to report on what they find.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Nov 16, 2012 at 8:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Davis <pgsql@j-davis.com> writes:
Removing those 3 hints would give us 3 more flag bits (eventually, after
we are sure they aren't just leftover), and it would also reduce the
chance that a page is dirtied for no other reason than to set them.We aren't pressed for flag bits particularly. I think the main
attraction of this idea is precisely to reduce unnecessary page dirties,
and so that leads me to suggest a variant: keep the four bits defined as
now, but do not attempt to set XMIN_INVALID or XMAX_COMMITTED unless the
page is already dirty.
Another approach could be to set those additional bits, but don't dirty the
page. So if the page is already dirty or gets dirty later on before its
eviction, those hint bits will get recorded. We can also try some other
variants like: set the bits and dirty the page if the XID is beyond the
horizon that CLOG buffers can track. A very old XID will most likely cause
a CLOG page read later on.
I forgot if we have a way to differentiate between critical and
non-critical dirtiness of a page. If we don't, that might be another
worthwhile thing to try and test. Setting hint bits is a non-critical
operation. So such a dirty page may be discarded if the system is under
pressure.
Well, all of the above may have been discussed and rejected in the past.
What we need is a thorough benchmarking to know what is better in which
situation.
Thanks,
Pavan
On 15 November 2012 22:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Removing those 3 hints would give us 3 more flag bits (eventually, after
we are sure they aren't just leftover), and it would also reduce the
chance that a page is dirtied for no other reason than to set them.We aren't pressed for flag bits particularly. I think the main
attraction of this idea is precisely to reduce unnecessary page dirties,
and so that leads me to suggest a variant: keep the four bits defined as
now, but do not attempt to set XMIN_INVALID or XMAX_COMMITTED unless the
page is already dirty. This would make it a straight-up trade of more
clog consultation for fewer page dirties.
Hmm, I thought Alvaro wanted an extra flag bit for foreign key locks
but couldn't find it.
Come to think of it, why do we have XMIN_INVALID and XMAX_INVALID? We
never need both at the same time, so we can just use one...
XMIN_INVALID -> XACT_INVALID
XMAX_INVALID == XMIN_COMMITTED | XACT_INVALID
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, 2012-11-15 at 22:21 -0500, Tom Lane wrote:
We aren't pressed for flag bits particularly. I think the main
attraction of this idea is precisely to reduce unnecessary page dirties,
and so that leads me to suggest a variant: keep the four bits defined as
now, but do not attempt to set XMIN_INVALID or XMAX_COMMITTED unless the
page is already dirty. This would make it a straight-up trade of more
clog consultation for fewer page dirties.
Sounds reasonable.
Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced
in the visibility map patch, apparently as a way to know when to clear
the VM bit when doing an update. It was then also used for scans, which
showed a significant speedup. But I wonder: why not just use the
visibilitymap directly from those places?Added contention for access to the visibility map pages would be a
problem.
Can you think of a reasonable (or worst-case) test scenario that would
illustrate this problem? Would I need a many-core box to actually test
this?
Also, what locks do you expect to be contended? The buffer header lock
while pinning the VM page?
Regards,
Jeff Davis
On 15 November 2012 23:12, Simon Riggs <simon@2ndquadrant.com> wrote:
We need detailed stats that allow many people to make their own tests
and to report on what they find.
To work out how to proceed we need
* wait event times on clog accesses at table-level
* numbers of blocks dirtied by hint bit settings at table-level
That way we can see which tables need to have hint bits set and which don't.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 15, 2012 at 9:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Davis <pgsql@j-davis.com> writes:
It occurred to me recently that many of the hint bits aren't terribly
important (at least it's not obvious to me). HEAP_XMIN_COMMITTED clearly
has a purpose, and we'd expect it to be used many times following the
initial CLOG lookup.Right.
But the other tuple hint bits seem to be there just for symmetry,
because they shouldn't last long. If HEAP_XMIN_INVALID or
HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed
soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it
should just be changed to InvalidTransactionId.Hm. It is not cheaper to change xmax to 0 than it is to set the hint
bit --- you still need a write, and there are also added locking and
atomicity worries --- so I'm not convinced by your argument there.
But you might be right that the expected number of wins from the other
two bits is a lot less.
Is that true in a post checksum world though? Given that we are
logging changes can we relax atomicity expectations? IIRC xmin/xmax
are aligned, how come you can't just set InvalidTransactionId for
INVALID and 'FrozenTransactionId' for COMMITTED? Why can't you do
this now?
merlin
On 2012-11-16 08:43:12 -0600, Merlin Moncure wrote:
On Thu, Nov 15, 2012 at 9:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Davis <pgsql@j-davis.com> writes:
It occurred to me recently that many of the hint bits aren't terribly
important (at least it's not obvious to me). HEAP_XMIN_COMMITTED clearly
has a purpose, and we'd expect it to be used many times following the
initial CLOG lookup.Right.
But the other tuple hint bits seem to be there just for symmetry,
because they shouldn't last long. If HEAP_XMIN_INVALID or
HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed
soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it
should just be changed to InvalidTransactionId.Hm. It is not cheaper to change xmax to 0 than it is to set the hint
bit --- you still need a write, and there are also added locking and
atomicity worries --- so I'm not convinced by your argument there.
But you might be right that the expected number of wins from the other
two bits is a lot less.Is that true in a post checksum world though? Given that we are
logging changes can we relax atomicity expectations? IIRC xmin/xmax
are aligned, how come you can't just set InvalidTransactionId for
INVALID and 'FrozenTransactionId' for COMMITTED? Why can't you do
this now?
Uhm. The latter doesn't really work if you have any transactions that
might not see that row or am I missing something?
Greetings,
Andres Freund
On Fri, Nov 16, 2012 at 8:58 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2012-11-16 08:43:12 -0600, Merlin Moncure wrote:
On Thu, Nov 15, 2012 at 9:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Davis <pgsql@j-davis.com> writes:
It occurred to me recently that many of the hint bits aren't terribly
important (at least it's not obvious to me). HEAP_XMIN_COMMITTED clearly
has a purpose, and we'd expect it to be used many times following the
initial CLOG lookup.Right.
But the other tuple hint bits seem to be there just for symmetry,
because they shouldn't last long. If HEAP_XMIN_INVALID or
HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed
soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it
should just be changed to InvalidTransactionId.Hm. It is not cheaper to change xmax to 0 than it is to set the hint
bit --- you still need a write, and there are also added locking and
atomicity worries --- so I'm not convinced by your argument there.
But you might be right that the expected number of wins from the other
two bits is a lot less.Is that true in a post checksum world though? Given that we are
logging changes can we relax atomicity expectations? IIRC xmin/xmax
are aligned, how come you can't just set InvalidTransactionId for
INVALID and 'FrozenTransactionId' for COMMITTED? Why can't you do
this now?Uhm. The latter doesn't really work if you have any transactions that
might not see that row or am I missing something?
nope. you're right.
merlin
On Thu, Nov 15, 2012 at 10:14 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
Another approach could be to set those additional bits, but don't dirty the
page. So if the page is already dirty or gets dirty later on before its
eviction, those hint bits will get recorded. We can also try some other
variants like: set the bits and dirty the page if the XID is beyond the
horizon that CLOG buffers can track. A very old XID will most likely cause a
CLOG page read later on.I forgot if we have a way to differentiate between critical and non-critical
dirtiness of a page. If we don't, that might be another worthwhile thing to
try and test. Setting hint bits is a non-critical operation. So such a dirty
page may be discarded if the system is under pressure.Well, all of the above may have been discussed and rejected in the past.
What we need is a thorough benchmarking to know what is better in which
situation.
Yeah: all previous ideas attempts manage hint bits through trade-offs
have failed because of general nervousness about who loses. The
unlucky guy who has high transaction rates that suddenly has a big
increase in clog activity is in for a world of hurt.
Also, AIUI 'set but don't dirty' strategies are less effective if/when
checksums are implemented as currently proposed. Previously hint bit
mitigation was all about saving some i/o. Now, everything is getting
rammed through the WAL lock.
merlin
On 2012-11-15 16:42:57 -0800, Jeff Davis wrote:
Related to discussion here:
http://archives.postgresql.org/message-id/CAHyXU0zn5emePLedoZUGrAQiF92F-YjvFr-P5vUh6n0WpKZ6PQ@mail.gmail.comIt occurred to me recently that many of the hint bits aren't terribly
important (at least it's not obvious to me). HEAP_XMIN_COMMITTED clearly
has a purpose, and we'd expect it to be used many times following the
initial CLOG lookup.
But the other tuple hint bits seem to be there just for symmetry,
because they shouldn't last long. If HEAP_XMIN_INVALID or
HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed
soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it
should just be changed to InvalidTransactionId.
Wrt HEAP_XMAX_COMMITTED:
It can take an *awfully* long time till autovacuum crosses the
thresholds the next time for a big table.
I also think we cannot dismiss the case of longrunning transactions
because vacuum won't be able to cleanup those rows in that case.
Wrt HEAP_(XMIN|XMAX)_INVALID: yes, if we are in need of new flag bits
those sound like a good target to me.
Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced
in the visibility map patch, apparently as a way to know when to clear
the VM bit when doing an update. It was then also used for scans, which
showed a significant speedup. But I wonder: why not just use the
visibilitymap directly from those places? It can be used for the scan
because it is crash safe now (not possible before). And since it's only
one lookup per scanned page, then I don't think it would be a measurable
performance loss there. Inserts/updates/deletes also do a significant
amount of work, so again, I doubt it's a big drop in performance there
-- maybe under a lot of concurrency or something.The benefit of removing PD_ALL_VISIBLE would be significantly higher.
It's quite common to load a lot of data, and then do some reads for a
while (setting hint bits and flushing them to disk), and then do a
VACUUM a while later, setting PD_ALL_VISIBLE and writing all of the
pages again. Also, if I remember correctly, Robert went to significant
effort when making the VM crash-safe to keep the PD_ALL_VISIBLE and VM
bits consistent. Maybe this was all discussed before?
As far as I understand the code the crash-safety aspects of the
visibilitymap currently rely on on having the knowledge that ALL_VISIBLE
has been cleared during a heap_(insert|update|delete). That allows
management of the visibilitymap without it being xlogged itself which
seems pretty important to me.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 15, 2012 at 7:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:
But the other tuple hint bits seem to be there just for symmetry,
because they shouldn't last long. If HEAP_XMIN_INVALID or
HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed
soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it
should just be changed to InvalidTransactionId.
"Soon" is a relative term. I doubt that we can really rely on vacuum
to be timely enough to avoid pain here - you can easily have tens of
thousands of hits on the same tuple before vacuum gets around to
dealing with it. Now, we might be able to rejigger things to avoid
that. For example, maybe it'd be possible to arrange things so that
when we see an invalid xmin, we set the flag that triggers a HOT prune
instead of setting the hint bit. That would probably be good enough
to dispense with the hint bit, and maybe better altogether better than
the current system, because now the next time someone (including us)
locks the buffer we'll nuke the entire tuple, which would not only
make it cheaper to scan but also frees up space in the buffer sooner.
However, that solution only works for invalid-xmin. For
committed-xmax, there could actually be quite a long time before the
page can be pruned, because there can be some other backend holding an
old snapshot open. A one-minute reporting query in another database,
which is hardly an unreasonable scenario, could result in many, many
additional CLOG lookups, which are already a major contention point at
high concurrencies. I think that bit is probably pretty important,
and I don't see a viable way to get rid of it, though maybe someone
can think of one. For invalid-xmax, I agree that we could probably
just change xmax to InvalidTransactionId, if we need to save
bit-space. In the past Tom and I think also Alvaro have been
skeptical about anything that would overwrite xmin/xmax values too
quickly for forensic reasons, but maybe it's worth considering.
Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced
in the visibility map patch, apparently as a way to know when to clear
the VM bit when doing an update. It was then also used for scans, which
showed a significant speedup. But I wonder: why not just use the
visibilitymap directly from those places?
Well, you'd have to look up, lock and pin the page to do that. I
suspect that overhead is pretty significant. The benefit of noticing
that the flag is set is that you need not call HeapTupleSatisfiesMVCC
for each tuple on the page: checking one bit in the page header is a
lot cheaper than calling that function for every tuple. However, if
you had to lock and pin a second page in order to check whether the
page is all-visible, I suspect it wouldn't be a win; you'd probably be
better off just doing the HeapTupleSatisfiesMVCC checks for each
tuple.
One of the main advantages of PD_ALL_VISIBLE is that if you do an
insert, update, or delete on a page where that bit isn't set, you need
not lock and pin the visibility map page, because you already know
that the bit will be clear in the visibility map. If the data is
being rapidly modified, you'll get the benefit of this optimization
most of the time, only losing it when vacuum has visited recently. I
hope that's not premature optimization because I sure sweat a lot of
blood last release cycle to keep it working like that. I had a few
doubts at the time about how much we were winning there, but I don't
actually have any hard data either way, so I would be reluctant to
assume it doesn't matter.
Even if it doesn't, the sequential-scan optimization definitely
matters a LOT, as you can easily verify.
One approach that I've been hoping to pursue is to find a way to make
CLOG lookups cheaper and more concurrent. I started to work on some
concurrent hash table code, which you can find here:
http://git.postgresql.org/gitweb/?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/chash
The concurrency properties of this code are vastly better than what we
have now, but there are cases where it loses vs. dynahash when there's
no concurrency. That might be fixable or just not a big deal, though.
A bigger problem is that I got sucked off into other things before I
was able to get as far with it as I wanted to; in particular, I have
only unit test results for it, and haven't tried to integrate it into
the SLRU code yet.
But I'm not sure any of this is going to fundamentally chip away at
the need for hint bits all that much. Making CLOG lookups cheaper or
less frequent is all to the good, but the prognosis for improving
things enough that we can dump some or all of the hint bits completely
seems uncertain at best. Even if we COULD dump everything but
heap-xmin-committed, how much would that really help with the
disk-write problem? I bet heap-xmin-committed gets set far more
often than the other three put together.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, 2012-11-15 at 22:21 -0500, Tom Lane wrote:
We aren't pressed for flag bits particularly. I think the main
attraction of this idea is precisely to reduce unnecessary page dirties,
and so that leads me to suggest a variant: keep the four bits defined as
now, but do not attempt to set XMIN_INVALID or XMAX_COMMITTED unless the
page is already dirty. This would make it a straight-up trade of more
clog consultation for fewer page dirties.
When I tried a simple test on my laptop, where I removed the
XMIN_INVALID and XMAX_COMMITTED bits on a simple SELECT test (no aborts,
no deletes, fits in shared_buffers) I saw a pretty consistent 2%
speedup. Not earth-shattering, but it would be nice if we could figure
out a way to do that.
Not sure where that speedup came from, to be honest, it doesn't look
like much difference in the work being done in the visibility check.
Regards,
Jeff Davis
On Fri, 2012-11-16 at 11:58 -0500, Robert Haas wrote:
Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced
in the visibility map patch, apparently as a way to know when to clear
the VM bit when doing an update. It was then also used for scans, which
showed a significant speedup. But I wonder: why not just use the
visibilitymap directly from those places?Well, you'd have to look up, lock and pin the page to do that. I
suspect that overhead is pretty significant. The benefit of noticing
that the flag is set is that you need not call HeapTupleSatisfiesMVCC
for each tuple on the page: checking one bit in the page header is a
lot cheaper than calling that function for every tuple. However, if
you had to lock and pin a second page in order to check whether the
page is all-visible, I suspect it wouldn't be a win; you'd probably be
better off just doing the HeapTupleSatisfiesMVCC checks for each
tuple.
That's pretty easy to test. Here's what I got on a 10M record table
(Some runs got some strangely high numbers around 1700ms, which I assume
is because it's difficult to keep the data in shared buffers, so I took
the lower numbers.):
PD_ALL_VISIBLE: 661ms
VM Lookup: 667ms
Neither: 740ms
Even if pinning the vm buffer were slow, we could keep the pin longer
during a scan (it seems like the VM API is designed for that kind of a
use case), so I don't think scans are a problem at all, even if there is
a lot of concurrency.
One of the main advantages of PD_ALL_VISIBLE is that if you do an
insert, update, or delete on a page where that bit isn't set, you need
not lock and pin the visibility map page, because you already know
that the bit will be clear in the visibility map. If the data is
being rapidly modified, you'll get the benefit of this optimization
most of the time, only losing it when vacuum has visited recently. I
hope that's not premature optimization because I sure sweat a lot of
blood last release cycle to keep it working like that. I had a few
doubts at the time about how much we were winning there, but I don't
actually have any hard data either way, so I would be reluctant to
assume it doesn't matter.
This is a more plausible concern, because it's per-tuple rather than
per-page.
However, I think we might be able to save the current VM buffer across
multiple calls of heap_insert|update|delete in nodeModifyTable. At first
glance, it doesn't look very difficult.
It's unfortunate that I didn't think much about this while you were
working on it before; but difficult-to-write code is also difficult to
maintain, so simplifying it would be nice.
Even if it doesn't, the sequential-scan optimization definitely
matters a LOT, as you can easily verify.
I am in no way suggesting that we remove that optimization; I am
suggesting that we can get the exact same benefit by looking at the VM
directly.
One approach that I've been hoping to pursue is to find a way to make
CLOG lookups cheaper and more concurrent. I started to work on some
concurrent hash table code, which you can find here:http://git.postgresql.org/gitweb/?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/chash
The concurrency properties of this code are vastly better than what we
have now, but there are cases where it loses vs. dynahash when there's
no concurrency. That might be fixable or just not a big deal, though.
A bigger problem is that I got sucked off into other things before I
was able to get as far with it as I wanted to; in particular, I have
only unit test results for it, and haven't tried to integrate it into
the SLRU code yet.
Great work!
But I'm not sure any of this is going to fundamentally chip away at
the need for hint bits all that much. Making CLOG lookups cheaper or
less frequent is all to the good, but the prognosis for improving
things enough that we can dump some or all of the hint bits completely
seems uncertain at best. Even if we COULD dump everything but
heap-xmin-committed, how much would that really help with the
disk-write problem? I bet heap-xmin-committed gets set far more
often than the other three put together.
Right now I'm setting my sights on PD_ALL_VISIBLE. I know that causes a
lot of extra page dirties, and it seems like it doesn't serve all that
much purpose after the VM became crash-safe. And it would simplify the
code.
If we can't get rid of the other hint bits, then so be it.
But the other tuple hint bits seem to be there just for symmetry,
because they shouldn't last long. If HEAP_XMIN_INVALID or
HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed
soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it
should just be changed to InvalidTransactionId.However, that solution only works for invalid-xmin. For
committed-xmax, there could actually be quite a long time before the
page can be pruned, because there can be some other backend holding an
old snapshot open.
I think I'll set aside this part of the idea for now, because it doesn't
seem like a huge win after I thought about it, anyway. If
HEAP_XMIN_INVALID or HEAP_XMAX_COMMITTED are set, it's likely to be
dirty anyway, unless it's a bulk delete or something.
Regards,
Jeff Davis
On Thu, Nov 15, 2012 at 4:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:
Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced
in the visibility map patch, apparently as a way to know when to clear
the VM bit when doing an update. It was then also used for scans, which
showed a significant speedup. But I wonder: why not just use the
visibilitymap directly from those places? It can be used for the scan
because it is crash safe now (not possible before). And since it's only
one lookup per scanned page,
Wouldn't you need to lock two pages simultaneously, for each table
page, in order to ensure that there are no races?
For a full table scan, that might not be so good.
then I don't think it would be a measurable
performance loss there. Inserts/updates/deletes also do a significant
amount of work, so again, I doubt it's a big drop in performance there
-- maybe under a lot of concurrency or something.
Your question prompts me to post something I had been wondering.
Might it be worthwhile to break the PD_ALL_VISIBLE / vm equivalence?
Should the vm bit get cleared by a HOT update?
Currently the vm is used only for non-freezing vacuuming and
index-only scans, as far as I know.
For the IOS, it doesn't care which tuple in the chain is visible, as
long as exactly one of them is. So everything it cares about is still
visible.
And anyone can vacuum a block that has only had HOT updates, you don't
need to be dedicated vacuum worker to do that.
I guess one limitation of this as that if the block was populated by
large tuples which then systematically got made smaller by HOT
updates, the free space map would never get updated until the next
vacuum freeze kicked in.
And obviously this would be incompatible with removing the
PD_ALL_VISIBLE, unless we also wanted to eliminate the ability to
short-cut hint bit checks.
Cheers,
Jeff
On Fri, 2012-11-16 at 16:09 +0100, Andres Freund wrote:
As far as I understand the code the crash-safety aspects of the
visibilitymap currently rely on on having the knowledge that ALL_VISIBLE
has been cleared during a heap_(insert|update|delete). That allows
management of the visibilitymap without it being xlogged itself which
seems pretty important to me.
It looks like the xlrec does contain a "cleared all visible" flag in it,
and it uses that to clear the VM as well as PD_ALL_VISIBLE.
So, it still seems achievable, and maybe not even a large patch, to do
away with PD_ALL_VISIBLE without a lot of performance risk.
Regards,
Jeff Davis
On Fri, 2012-11-16 at 17:04 -0800, Jeff Janes wrote:
On Thu, Nov 15, 2012 at 4:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:
Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced
in the visibility map patch, apparently as a way to know when to clear
the VM bit when doing an update. It was then also used for scans, which
showed a significant speedup. But I wonder: why not just use the
visibilitymap directly from those places? It can be used for the scan
because it is crash safe now (not possible before). And since it's only
one lookup per scanned page,Wouldn't you need to lock two pages simultaneously, for each table
page, in order to ensure that there are no races?For a full table scan, that might not be so good.
Can you explain? The table scan share locks the heap page during the
visibility check (which should suffice as a memory barrier) and then
could test the VM bit with only a pin on the VM page.
Your question prompts me to post something I had been wondering.
Might it be worthwhile to break the PD_ALL_VISIBLE / vm equivalence?
Should the vm bit get cleared by a HOT update?
To clarify: are you saying that a hot update should clear the
PD_ALL_VISIBLE bit, but not the VM bit?
And anyone can vacuum a block that has only had HOT updates, you don't
need to be dedicated vacuum worker to do that.
And obviously this would be incompatible with removing the
PD_ALL_VISIBLE, unless we also wanted to eliminate the ability to
short-cut hint bit checks.
I'm still a little unclear on what the benefit is.
It sounds like a slightly different kind of hint, so maybe we should
just treat it as a completely different thing after removing
PD_ALL_VISIBLE. If it's related to HOT updates, then the page will
probably be dirty anyway, so that removes my primary complaint about
PD_ALL_VISIBLE.
Regards,
Jeff Davis