limiting hint bit I/O
I whipped up the attached patch tonight. It's pretty quick and dirty,
so it's possible I've missed something, but the intent is to suppress
writing of hint bits by buffers allocating backends, and by
checkpoints, and write them only from the background writer cleaning
scan. It therefore should (and does) avoid the problem that the first
scan of a relation after a bulk load is much slower than subsequent
scans. I used this test case:
create table s as select g,
random()::text||random()::text||random()::text||random()::text from
generate_series(1,1000000) g;
I didn't do any special configuration, so this was large enough to not
fit in shared_buffers, but small enough to fit in the OS cache. Then
I did this repeatedly:
select sum(1) from s;
Without the patch, the first run took 1602 ms, and subsequent runs
took 207-216 ms.
With the patch, the first run took 270 ms, and subsequent runs
declined very, very slowly. I got bored after getting down into the
240 ms range and ran VACUUM FREEZE, after which times dropped to about
197 ms. (This also happens without the patch - VACUUM FREEZE seems to
speed things up a bit more than just setting all the hint bits.)
I find these results pretty depressing. Obviously, the ~6x speedup on
the first run is great, but even after many runs subsequent runs it
was still 10-15% slower. Certainly, for some people this patch might
be an improvement, but on the whole I can't see applying it, unless
someone can spot something I've done wrong that casts a different
light on the situation. I am a little bit at a loss to explain how
I'm getting these results when others posted results that appeared to
show hint bits making very little difference.
Any insights appreciated.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
bm-hint-bits.patchtext/x-diff; charset=US-ASCII; name=bm-hint-bits.patchDownload+78-55
Robert Haas <robertmhaas@gmail.com> writes:
I whipped up the attached patch tonight.
This appears to remove the BM_JUST_DIRTIED logic. Please explain why
that's not completely broken. Even if it isn't completely broken,
it would seem better to do something like that as a separate patch.
regards, tom lane
On Thu, Jan 13, 2011 at 10:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I whipped up the attached patch tonight.
This appears to remove the BM_JUST_DIRTIED logic. Please explain why
that's not completely broken. Even if it isn't completely broken,
it would seem better to do something like that as a separate patch.
Well, the only point of BM_JUST_DIRTIED is to detect whether BM_DIRTY
has been set while a buffer write is in progress. With this patch,
only BM_HINT_BITS can be set while the buffer write is in progress;
BM_DIRTY cannot. Perhaps one could make the argument that this would
be a good cleanup anyway: in the unpatched code, BM_DIRTY can only be
set while a buffer I/O is in progress if it is set due to a hint-bit
update, and then we don't really care if the update gets lost.
Although that seems a bit confusing...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Jan 13, 2011 at 10:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This appears to remove the BM_JUST_DIRTIED logic. �Please explain why
that's not completely broken. �Even if it isn't completely broken,
it would seem better to do something like that as a separate patch.
Well, the only point of BM_JUST_DIRTIED is to detect whether BM_DIRTY
has been set while a buffer write is in progress. With this patch,
only BM_HINT_BITS can be set while the buffer write is in progress;
BM_DIRTY cannot. Perhaps one could make the argument that this would
be a good cleanup anyway: in the unpatched code, BM_DIRTY can only be
set while a buffer I/O is in progress if it is set due to a hint-bit
update, and then we don't really care if the update gets lost.
Although that seems a bit confusing...
[ thinks some more... ] If memory serves, the BM_JUST_DIRTIED mechanism
dates from a time when checkpoints would write dirty buffers without
taking any lock on them; if somebody changed the page meanwhile, the
buffer was just considered to remain dirty. We later decided that was
a bad idea and set up the current arrangement whereby only hint-bit
changes are allowed while a write is in progress. So you're right that
it would be dead code if we don't consider that a hint-bit change is
really dirtying the page. I'm not for removing it altogether though,
because it seems like something we could possibly want again in the
future (for instance, we might decide to go back to write-without-lock
to reduce lock contention). It's not like we are short of buffer flag
bits. Moreover this whole business of not treating hint-bit setting as
a page-dirtying operation is completely experimental/unproven IMO, so it
would be better to keep the patch footprint as small as possible. I'd
suggest leaving BM_JUST_DIRTIED as-is and just adding BM_HINT_BITS_DIRTY
as a new flag.
regards, tom lane
On Fri, Jan 14, 2011 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Jan 13, 2011 at 10:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
This appears to remove the BM_JUST_DIRTIED logic. Please explain why
that's not completely broken. Even if it isn't completely broken,
it would seem better to do something like that as a separate patch.Well, the only point of BM_JUST_DIRTIED is to detect whether BM_DIRTY
has been set while a buffer write is in progress. With this patch,
only BM_HINT_BITS can be set while the buffer write is in progress;
BM_DIRTY cannot. Perhaps one could make the argument that this would
be a good cleanup anyway: in the unpatched code, BM_DIRTY can only be
set while a buffer I/O is in progress if it is set due to a hint-bit
update, and then we don't really care if the update gets lost.
Although that seems a bit confusing...[ thinks some more... ] If memory serves, the BM_JUST_DIRTIED mechanism
dates from a time when checkpoints would write dirty buffers without
taking any lock on them; if somebody changed the page meanwhile, the
buffer was just considered to remain dirty. We later decided that was
a bad idea and set up the current arrangement whereby only hint-bit
changes are allowed while a write is in progress. So you're right that
it would be dead code if we don't consider that a hint-bit change is
really dirtying the page. I'm not for removing it altogether though,
because it seems like something we could possibly want again in the
future (for instance, we might decide to go back to write-without-lock
to reduce lock contention). It's not like we are short of buffer flag
bits. Moreover this whole business of not treating hint-bit setting as
a page-dirtying operation is completely experimental/unproven IMO, so it
would be better to keep the patch footprint as small as possible. I'd
suggest leaving BM_JUST_DIRTIED as-is and just adding BM_HINT_BITS_DIRTY
as a new flag.
I have some concerns about that proposal, but it might be the right
way to go. Before we get too far off into the weeds, though, let's
back up and talk about something more fundamental: this seems to be
speeding up the first run by 6x at the expense of slowing down many
subsequent runs by 10-15%. Does that make this whole idea dead on
arrival?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote:
this seems to be speeding up the first run by 6x at the expense of
slowing down many subsequent runs by 10-15%.
If the overall throughput when measured far enough out to have hit a
steady state again is anywhere in the neighborhood of the unpatched
throughput, the leveling of the response times has enough value to
merit the change. At least in my world.
-Kevin
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Jan 14, 2011 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Moreover this whole business of not treating hint-bit setting as
a page-dirtying operation is completely experimental/unproven IMO, so it
would be better to keep the patch footprint as small as possible.
I have some concerns about that proposal, but it might be the right
way to go. Before we get too far off into the weeds, though, let's
back up and talk about something more fundamental: this seems to be
speeding up the first run by 6x at the expense of slowing down many
subsequent runs by 10-15%. Does that make this whole idea dead on
arrival?
Well, it reinforces my opinion that it's experimental ;-). But "first
run" of what, exactly? And are you sure you're taking a wholistic view
of the costs/benefits?
regards, tom lane
On Fri, Jan 14, 2011 at 1:02 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
this seems to be speeding up the first run by 6x at the expense of
slowing down many subsequent runs by 10-15%.If the overall throughput when measured far enough out to have hit a
steady state again is anywhere in the neighborhood of the unpatched
throughput, the leveling of the response times has enough value to
merit the change. At least in my world.
I think it would eventually settle down to the same speed, but it
might take a really long time. I got impatient before I got that far.
I'm hoping some will pick it up and play with it some more (hint,
hint).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Jan 14, 2011 at 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Jan 14, 2011 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Moreover this whole business of not treating hint-bit setting as
a page-dirtying operation is completely experimental/unproven IMO, so it
would be better to keep the patch footprint as small as possible.I have some concerns about that proposal, but it might be the right
way to go. Before we get too far off into the weeds, though, let's
back up and talk about something more fundamental: this seems to be
speeding up the first run by 6x at the expense of slowing down many
subsequent runs by 10-15%. Does that make this whole idea dead on
arrival?Well, it reinforces my opinion that it's experimental ;-). But "first
run" of what, exactly?
See the test case in my OP. The "runs" in question are "select sum(1) from s".
And are you sure you're taking a wholistic view
of the costs/benefits?
No.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Jan 14, 2011 at 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Well, it reinforces my opinion that it's experimental ;-). �But "first
run" of what, exactly?
See the test case in my OP. The "runs" in question are "select sum(1) from s".
And are you sure you're taking a wholistic view
of the costs/benefits?
No.
Well, IMO it would be a catastrophic mistake to evaluate a patch like
this on the basis of any single test case, let alone one as simplistic
as that. I would observe in particular that your test case creates a
table containing only one distinct value of xmin, which means that the
single-transaction cache in transam.c is 100% effective, which doesn't
seem to me to be a very realistic test condition. I think this is
vastly understating the cost of missing hint bits.
So what it needs now is a lot more testing. pg_bench might be worth
trying if you want something with minimal development effort, though
I'm not sure if its clog access pattern is particularly realistic
either.
regards, tom lane
Robert Haas <robertmhaas@gmail.com> wrote:
I'm hoping some will pick it up and play with it some more (hint,
hint).
That was a bit of a pun, eh?
Anyway, there are so many ideas in this area, it's hard to keep them
all straight. Personally, if I was going to start with something,
it would probably be to better establish what the impact is on
various workloads of *eliminating* hint bits. If the impact is
negative to a significant degree, my next step might be to try
background *freezing* of tuples (in a manner somewhat similar to
what you've done in this test) with the hint bits gone.
I know some people find them useful for forensics to a degree that
they would prefer not to see this, but I think it makes sense to
establish what cost people are paying every day to maintain forensic
information in this format. In previous discussions there has been
some talk about being able to get better forensics from WAL files if
certain barriers could be overcome -- having hard numbers on the
performance benefits which might also accrue might put that work in
a different perspective.
-Kevin
On Fri, Jan 14, 2011 at 1:34 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
I'm hoping some will pick it up and play with it some more (hint,
hint).That was a bit of a pun, eh?
Unintentional...
Anyway, there are so many ideas in this area, it's hard to keep them
all straight. Personally, if I was going to start with something,
it would probably be to better establish what the impact is on
various workloads of *eliminating* hint bits. If the impact is
negative to a significant degree, my next step might be to try
background *freezing* of tuples (in a manner somewhat similar to
what you've done in this test) with the hint bits gone.
Background freezing plays havoc with Hot Standby, and this test is
sufficient to show that eliminating hint bits altogether would a
significant regression on some workloads. I don't think either of
those ideas can get off the ground.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Anyway, there are so many ideas in this area, it's hard to keep them
all straight. Personally, if I was going to start with something,
it would probably be to better establish what the impact is on
various workloads of *eliminating* hint bits.
I know some people find them useful for forensics to a degree that
they would prefer not to see this,
Um, yeah, I think you're having a problem keeping all the ideas straight
;-). The argument about forensics has to do with how soon we're willing
to freeze tuples, ie replace the XID with a constant. Not about hint
bits.
regards, tom lane
On Fri, Jan 14, 2011 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Anyway, there are so many ideas in this area, it's hard to keep them
all straight. Personally, if I was going to start with something,
it would probably be to better establish what the impact is on
various workloads of *eliminating* hint bits.I know some people find them useful for forensics to a degree that
they would prefer not to see this,Um, yeah, I think you're having a problem keeping all the ideas straight
;-). The argument about forensics has to do with how soon we're willing
to freeze tuples, ie replace the XID with a constant. Not about hint
bits.
Those things are related, though. Freezing sooner could be viewed as
an alternative to hint bits. Trouble is, it breaks Hot Standby,
badly.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote:
Background freezing plays havoc with Hot Standby
I must have missed or forgotten the issue of background vacuums
and hot standby. Can you summarize why that's worse than hitting
thresholds where autovacuum is freezing things?
this test is sufficient to show that eliminating hint bits
altogether would a significant regression on some workloads.
That wasn't clear to me from what you posted -- I thought that the
reduced performance might be partly (largely? mostly?) due to
competition with the background writer's work pushing the hinted
pages out. Maybe I'm missing something or you didn't post
everything you observed in this regard....
-Kevin
Robert Haas <robertmhaas@gmail.com> wrote:
Freezing sooner could be viewed as an alternative to hint bits.
Exactly. And as your test showed, things run faster frozen than
unfrozen with hint bits set.
Trouble is, it breaks Hot Standby, badly.
You're really starting to worry me here. Both for performance and
to reduce the WAN bandwidth demands of our backup strategy we are
very aggressive with our freezing. Do off-hours VACUUM (FREEZE)
runs break hot standby? Autovacuum freezing? What are the
symptoms?
-Kevin
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Jan 14, 2011 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Um, yeah, I think you're having a problem keeping all the ideas straight
;-). �The argument about forensics has to do with how soon we're willing
to freeze tuples, ie replace the XID with a constant. �Not about hint
bits.
Those things are related, though. Freezing sooner could be viewed as
an alternative to hint bits.
Freezing sooner isn't likely to reduce I/O compared to hint bits. What
that does is create I/O that you *have* to execute ... both in the pages
themselves, and in WAL.
regards, tom lane
On Fri, Jan 14, 2011 at 2:01 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
Trouble is, it breaks Hot Standby, badly.
You're really starting to worry me here. Both for performance and
to reduce the WAN bandwidth demands of our backup strategy we are
very aggressive with our freezing. Do off-hours VACUUM (FREEZE)
runs break hot standby? Autovacuum freezing? What are the
symptoms?
Freezing removes XIDs, so latestRemovedXid advances. VACUUM (FREEZE)
is fine if you do it when there are no queries running on your Hot
Standby server, but if there ARE queries running on the Hot Standby
server, they'll be cancelled once max_standby_delay expires.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Jan 14, 2011 at 1:52 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
Background freezing plays havoc with Hot Standby
I must have missed or forgotten the issue of background vacuums
and hot standby. Can you summarize why that's worse than hitting
thresholds where autovacuum is freezing things?
The critical issue is whether the tuples get frozen while they're
still invisible to some transactions on the standby server. That's
when you get query cancellations.
this test is sufficient to show that eliminating hint bits
altogether would a significant regression on some workloads.That wasn't clear to me from what you posted -- I thought that the
reduced performance might be partly (largely? mostly?) due to
competition with the background writer's work pushing the hinted
pages out. Maybe I'm missing something or you didn't post
everything you observed in this regard....
Well, let me put together a quick patch that obliterates hint bits
entirely, and we can measure that. The background writer has always
pushed out hint bit pages; I think the reduced performance was
probably due to needing to reset hint bits on pages that we threw away
without pushing them out.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Jan 14, 2011 at 2:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Jan 14, 2011 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Um, yeah, I think you're having a problem keeping all the ideas straight
;-). The argument about forensics has to do with how soon we're willing
to freeze tuples, ie replace the XID with a constant. Not about hint
bits.Those things are related, though. Freezing sooner could be viewed as
an alternative to hint bits.Freezing sooner isn't likely to reduce I/O compared to hint bits. What
that does is create I/O that you *have* to execute ... both in the pages
themselves, and in WAL.
It depends on which way you tilt your head - right now, we rewrite
each table 3x - once to populate, once to hint, and once to freeze.
If the table is doomed to survive long enough to go through all three
of those, then freezing is better than hinting. Of course, that's not
always the case, but people keep complaining about the way this shakes
out.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company