How much do the hint bits help?

Started by Merlin Moncureover 15 years ago44 messageshackers
Jump to latest
#1Merlin Moncure
mmoncure@gmail.com

I've been playing around with postgresql hint bits in order to teach
myself more about the internals of the MVCC system.  I noticed that
the hint bit system has been around forever (Vadim era) and predates
several backend improvements that might affect their usefulness.  So I
started playing around, trying to quantify the benefit they provide
with an eye of optimizing clog lookups if it turned out to be
necessary say by mmap-ing a big transaction status file just to see if
that helped.

Attached is an incomplete patch disabling hint bits based on compile
switch.  It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines.  However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around.  Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.

So far, at least doing pgbench runs and another test designed to
exercise clog lookups, the performance loss of always doing full
lookup hasn't materialized.  Note that in these cases the clog lru
cache is pretty effective, and it's pretty likely I may have blown it
in some other way, so take the results for a grain of salt.   But,
here are the following questions/points:

*) relative to when the hint bits where implemented, the amount of
transactions to map has shrunk, while hardware has improved by a
couple of orders of magnitude.  Also the postgres architecture has
changed considerably.  Are they still necessary?

*) what's a good way to stress the clog severely? I'd like to pick a
degenerate case to get a better idea of the way things stand without
them.

*) is there community interest in a full patch that fills in the
missing details not implemented here?

merlin

Attachments:

disble_hints.difftext/x-patch; charset=US-ASCII; name=disble_hints.diffDownload+214-75
clog_stress.sqltext/x-sql; charset=US-ASCII; name=clog_stress.sqlDownload
#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Merlin Moncure (#1)
Re: How much do the hint bits help?

Merlin Moncure <mmoncure@gmail.com> wrote:

*) what's a good way to stress the clog severely? I'd like to pick
a degenerate case to get a better idea of the way things stand
without them.

The worst I can think of is a large database with a 90/10 mix of
reads to writes -- all short transactions. Maybe someone else can
do better. In particular, I'm not sure how savepoints might play
into a degenerate case.

Since we're always talking about how to do better with hint bits
during an unlogged bulk load, it would be interesting to benchmark
one of those followed by a `select count(*) from newtable;` with and
without the patch, on a data set too big to fit in RAM.

*) is there community interest in a full patch that fills in the
missing details not implemented here?

I'm certainly curious to see real numbers.

-Kevin

#3Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Merlin Moncure (#1)
Re: How much do the hint bits help?

On 22/12/10 11:42, Merlin Moncure wrote:

Attached is an incomplete patch disabling hint bits based on compile
switch. It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines. However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around. Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.

Looks like a great idea to test, however I don't seem to be able to
compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of
src/include/pg_config_manual.h)

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -I../../../../src/include -D_GNU_SOURCE -c -o heapam.o heapam.c
heapam.c: In function �HeapTupleHeaderAdvanceLatestRemovedXid�:
heapam.c:3867: error: �HEAP_XMIN_COMMITTED� undeclared (first use in
this function)
heapam.c:3867: error: (Each undeclared identifier is reported only once
heapam.c:3867: error: for each function it appears in.)
heapam.c:3869: error: �HEAP_XMIN_INVALID� undeclared (first use in this
function)
make[4]: *** [heapam.o] Error 1

#4Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Mark Kirkwood (#3)
Re: How much do the hint bits help?

On 22/12/10 13:05, Mark Kirkwood wrote:

On 22/12/10 11:42, Merlin Moncure wrote:

Attached is an incomplete patch disabling hint bits based on compile
switch. It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines. However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around. Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.

Looks like a great idea to test, however I don't seem to be able to
compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of
src/include/pg_config_manual.h)

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -I../../../../src/include -D_GNU_SOURCE -c -o heapam.o
heapam.c
heapam.c: In function �HeapTupleHeaderAdvanceLatestRemovedXid�:
heapam.c:3867: error: �HEAP_XMIN_COMMITTED� undeclared (first use in
this function)
heapam.c:3867: error: (Each undeclared identifier is reported only once
heapam.c:3867: error: for each function it appears in.)
heapam.c:3869: error: �HEAP_XMIN_INVALID� undeclared (first use in
this function)
make[4]: *** [heapam.o] Error 1

Arrg, sorry - against git head on Ubuntu 10.03 (gcc 4.4.3)

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Mark Kirkwood (#4)
Re: How much do the hint bits help?

On Tue, Dec 21, 2010 at 7:06 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:

On 22/12/10 13:05, Mark Kirkwood wrote:

On 22/12/10 11:42, Merlin Moncure wrote:

Attached is an incomplete patch disabling hint bits based on compile
switch.  It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines.  However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around.  Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.

Looks like a great idea to test, however I don't seem to be able to
compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of
src/include/pg_config_manual.h)

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
-I../../../../src/include -D_GNU_SOURCE -c -o heapam.o heapam.c
heapam.c: In function ‘HeapTupleHeaderAdvanceLatestRemovedXid’:
heapam.c:3867: error: ‘HEAP_XMIN_COMMITTED’ undeclared (first use in this
function)
heapam.c:3867: error: (Each undeclared identifier is reported only once
heapam.c:3867: error: for each function it appears in.)
heapam.c:3869: error: ‘HEAP_XMIN_INVALID’ undeclared (first use in this
function)
make[4]: *** [heapam.o] Error 1

Arrg, sorry - against git head on Ubuntu 10.03 (gcc 4.4.3)

did you check to see if the patch applied clean? btw I was working
against postgresql-9.0.1...

it looks like you are missing at least some of the changes to htup.h:

../postgresql-9.0.1_hb2/src/include/access/htup.h

#ifndef DISABLE_HINT_BITS
#define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
#define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
#define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */
#define HEAP_XMAX_INVALID 0x0800 /* t_xmax invalid/aborted */
#endif

merlin

#6Merlin Moncure
mmoncure@gmail.com
In reply to: Merlin Moncure (#5)
Re: How much do the hint bits help?

On Tue, Dec 21, 2010 at 7:20 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Tue, Dec 21, 2010 at 7:06 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:

On 22/12/10 13:05, Mark Kirkwood wrote:

On 22/12/10 11:42, Merlin Moncure wrote:

Attached is an incomplete patch disabling hint bits based on compile
switch.  It's not complete, for example it's not reconciling some
assumptions in heapam.c that hint bits have been set in various
routines.  However, it mostly passes regression and I deemed it good
enough to run some preliminary benchmarks and fool around.  Obviously,
hint bits are an annoying impediment to a couple of other cool pending
features, and it certainly would be nice to operate without them.
Also, for particular workloads, the extra i/o hint bits can cause a
fair amount of pain.

Looks like a great idea to test, however I don't seem to be able to
compile with it applied: (set#define DISABLE_HINT_BITS 1 at the end of
src/include/pg_config_manual.h)

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
-I../../../../src/include -D_GNU_SOURCE -c -o heapam.o heapam.c
heapam.c: In function ‘HeapTupleHeaderAdvanceLatestRemovedXid’:
heapam.c:3867: error: ‘HEAP_XMIN_COMMITTED’ undeclared (first use in this
function)
heapam.c:3867: error: (Each undeclared identifier is reported only once
heapam.c:3867: error: for each function it appears in.)
heapam.c:3869: error: ‘HEAP_XMIN_INVALID’ undeclared (first use in this
function)
make[4]: *** [heapam.o] Error 1

Arrg, sorry - against git head on Ubuntu 10.03 (gcc 4.4.3)

did you check to see if the patch applied clean? btw I was working
against postgresql-9.0.1...

ah, this is the problem (9.0.1 vs head). to work vs head it prob
needs a few more tweaks. you can also try removing it yourself --
most of the changes follow a similar pattern.

merlin

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#1)
Re: How much do the hint bits help?

Merlin Moncure <mmoncure@gmail.com> writes:

Attached is an incomplete patch disabling hint bits based on compile
switch. ...
So far, at least doing pgbench runs and another test designed to
exercise clog lookups, the performance loss of always doing full
lookup hasn't materialized.

The standard pgbench test would be just about 100% useless for stressing
this, because its net database activity is only about one row
touched/updated per query. You need a test case that hits lots of rows
per query, else you're just measuring parse+plan+network overhead.

regards, tom lane

#8Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#7)
Re: How much do the hint bits help?

On Tue, Dec 21, 2010 at 7:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

Attached is an incomplete patch disabling hint bits based on compile
switch. ...
So far, at least doing pgbench runs and another test designed to
exercise clog lookups, the performance loss of always doing full
lookup hasn't materialized.

The standard pgbench test would be just about 100% useless for stressing
this, because its net database activity is only about one row
touched/updated per query.  You need a test case that hits lots of rows
per query, else you're just measuring parse+plan+network overhead.

right -- see the attached clog_stress.sql above. It creates a script
that inserts records in blocks of 10000, deletes half of them, and
vacuums. Neither the execution of the script nor a seq scan following
its execution showed an interesting performance difference (which I am
arbitrarily calling 5% in either direction). Like I said though, I
don't trust the patch or the results yet.

@Mark: apparently the cvs server is behind git and there are some
recent changes to heapam.c that need more attention. I need to get
git going on my box, but try changing this:

if ((tuple->t_infomask & HEAP_XMIN_COMMITTED) ||
(!(tuple->t_infomask & HEAP_XMIN_COMMITTED) &&
!(tuple->t_infomask & HEAP_XMIN_INVALID) &&
TransactionIdDidCommit(xmin)))

to this:

if (TransactionIdDidCommit(xmin))

also, isn't the extra check vs HEAP_XMIN_COMMITTED redundant, and if
you do have to look up clog, why not set the hint bit?

merlin

#9Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Merlin Moncure (#8)
Re: How much do the hint bits help?

On 22/12/10 13:56, Merlin Moncure wrote:

On Tue, Dec 21, 2010 at 7:45 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

@Mark: apparently the cvs server is behind git and there are some
recent changes to heapam.c that need more attention. I need to get
git going on my box, but try changing this:

if ((tuple->t_infomask& HEAP_XMIN_COMMITTED) ||
(!(tuple->t_infomask& HEAP_XMIN_COMMITTED)&&
!(tuple->t_infomask& HEAP_XMIN_INVALID)&&
TransactionIdDidCommit(xmin)))

to this:

if (TransactionIdDidCommit(xmin))

also, isn't the extra check vs HEAP_XMIN_COMMITTED redundant, and if
you do have to look up clog, why not set the hint bit?

That gets it compiling.

#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Merlin Moncure (#8)
Re: How much do the hint bits help?

On 22.12.2010 02:56, Merlin Moncure wrote:

On Tue, Dec 21, 2010 at 7:45 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Merlin Moncure<mmoncure@gmail.com> writes:

Attached is an incomplete patch disabling hint bits based on compile
switch. ...
So far, at least doing pgbench runs and another test designed to
exercise clog lookups, the performance loss of always doing full
lookup hasn't materialized.

The standard pgbench test would be just about 100% useless for stressing
this, because its net database activity is only about one row
touched/updated per query. You need a test case that hits lots of rows
per query, else you're just measuring parse+plan+network overhead.

right -- see the attached clog_stress.sql above. It creates a script
that inserts records in blocks of 10000, deletes half of them, and
vacuums. Neither the execution of the script nor a seq scan following
its execution showed an interesting performance difference (which I am
arbitrarily calling 5% in either direction). Like I said though, I
don't trust the patch or the results yet.

Make sure you have a good mix of different xids in the table,
TransactionLogFetch has a one-item cache so repeatedly checking the same
xid is much faster than the general case.

Perhaps run pgbench for a while, and then do "SELECT COUNT(*)" on the
resulting tables.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#11Simon Riggs
simon@2ndQuadrant.com
In reply to: Merlin Moncure (#1)
Re: How much do the hint bits help?

On Tue, 2010-12-21 at 17:42 -0500, Merlin Moncure wrote:

*) is there community interest in a full patch that fills in the
missing details not implemented here?

You're thinking seems sound to me. We now have all-visible flags, fewer
xids, much better clog concurrency. Avoiding hint bits would also
noticeably reduce number of dirty writes, especially at checkpoint.

Hot Standby already ignores hint bits and I've not heard a single
complaint, so we are already doing this in the code.

I don't see any reason to believe that there is not an equally effective
optimisation that we can apply to bring performance back up, if it is
shown to drop in particular use cases.

I would vote to put this into 9.1 as a non-default option at restart,
opening the door to other features which hint bits are frustrating.
People can then choose between those features and the "power of hint
bits". I think many people would choose db block checksums.

If you need support, or direct help with the code, just ask. Am happy to
be your committer also.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#12Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#11)
Re: How much do the hint bits help?

On 22.12.2010 15:21, Simon Riggs wrote:

On Tue, 2010-12-21 at 17:42 -0500, Merlin Moncure wrote:

*) is there community interest in a full patch that fills in the
missing details not implemented here?

You're thinking seems sound to me. We now have all-visible flags, fewer
xids, much better clog concurrency. Avoiding hint bits would also
noticeably reduce number of dirty writes, especially at checkpoint.

Yep.

Hot Standby already ignores hint bits and I've not heard a single
complaint, so we are already doing this in the code.

No, the XMIN/XMAX committed/invalid hint bits on each heap tuple are
used during hot sandby just like during normal operation. We ignore the
index tuples marked as dead during hot standby, but that's a different
issue.

I would vote to put this into 9.1 as a non-default option at restart,
opening the door to other features which hint bits are frustrating.
People can then choose between those features and the "power of hint
bits". I think many people would choose db block checksums.

Making it optional would add some ifs in the critical paths, possibly
making it slower.

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#12)
Re: How much do the hint bits help?

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

I would vote to put this into 9.1 as a non-default option at restart,
opening the door to other features which hint bits are frustrating.
People can then choose between those features and the "power of hint
bits". I think many people would choose db block checksums.

Making it optional would add some ifs in the critical paths, possibly
making it slower.

Hardly. A server-start parameter is going to be constant during
execution and branch prediction will just snuff that away to nothing.

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.

That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness. This isn't a discussion about hint
bits, its a discussion about opening the way for other features.

ISTM there are other ways of optimising any clog issues that may remain,
so clutching to this ancient optimisation has no further benefit for me.

Merlin's idea seems to me to be original, useful *and* reasonable.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#14Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#13)
Re: How much do the hint bits help?

On 22.12.2010 15:59, Simon Riggs wrote:

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.

That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness.

It does? The problem with block checksums is that if you modify a page
and don't have a corresponding WAL record for it, like a hint bit
update, you can have a torn page so that the checksum doesn't match.
Refraining from dirtying the page when a hint bit is updated avoids the
problem. With that change, we only ever write pages to disk that have a
WAL record associated with it, with full-page images as necessary to
avoid torn pages.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#15Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#14)
Re: How much do the hint bits help?

On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:

On 22.12.2010 15:59, Simon Riggs wrote:

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.

That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness.

It does? The problem with block checksums is that if you modify a page
and don't have a corresponding WAL record for it, like a hint bit
update, you can have a torn page so that the checksum doesn't match.
Refraining from dirtying the page when a hint bit is updated avoids the
problem. With that change, we only ever write pages to disk that have a
WAL record associated with it, with full-page images as necessary to
avoid torn pages.

Which then leads to a block CRC not matching the block in memory. Sure,
we can avoid CRC checking the hint bits, but that requires a much more
expensive and complex CRC check.

So what you suggest works only if we restrict CRC checking to blocks
incoming to the buffer cache, but leaves us unable to do CRC checks on
blocks once in the buffer cache. Since many blocks stay in cache almost
constantly, we're left with the situation that the most heavily used
parts of the database seldom get CRC checked.

Postgres needs CRC checking more than it needs hint bits.

I think we should allow this as an option, and if it proves to be an
issue during beta then we can remove it before we go live, assuming we
cannot get a reasonable alternate optimisation.

I think its important for Postgres to implement this in the same release
as sync rep. They complement each other: confirmed robustness. Exactly
the features we need to prove to the rest of the world to trust us with
their data.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#16Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#15)
Re: How much do the hint bits help?

On Wed, Dec 22, 2010 at 9:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I think its important for Postgres to implement this in the same release
as sync rep.

i.e. never, at the rate sync rep has been progressing for the last few months?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#15)
Re: How much do the hint bits help?

On 22.12.2010 16:52, Simon Riggs wrote:

On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:

On 22.12.2010 15:59, Simon Riggs wrote:

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.

That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness.

It does? The problem with block checksums is that if you modify a page
and don't have a corresponding WAL record for it, like a hint bit
update, you can have a torn page so that the checksum doesn't match.
Refraining from dirtying the page when a hint bit is updated avoids the
problem. With that change, we only ever write pages to disk that have a
WAL record associated with it, with full-page images as necessary to
avoid torn pages.

Which then leads to a block CRC not matching the block in memory.

What do you mean?

Do you envision that the CRC is calculated at every update, or only when
a page is written out from the buffer cache? If the former, you could
recalculate the CRC at a hint bit update too. If the latter, the hint
bits are included in the page image that you checksum just like any
other data.

So what you suggest works only if we restrict CRC checking to blocks
incoming to the buffer cache, but leaves us unable to do CRC checks on
blocks once in the buffer cache. Since many blocks stay in cache almost
constantly, we're left with the situation that the most heavily used
parts of the database seldom get CRC checked.

There's plenty of stuff in memory that's not covered by an
application-level CRC. That's what ECC RAM is for. Updating the CRC at
every update to a page seems really expensive, but it's an orthogonal
issue to hint bits.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#18Aidan Van Dyk
aidan@highrise.ca
In reply to: Simon Riggs (#15)
Re: How much do the hint bits help?

On Wed, Dec 22, 2010 at 9:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

So what you suggest works only if we restrict CRC checking to blocks
incoming to the buffer cache, but leaves us unable to do CRC checks on
blocks once in the buffer cache. Since many blocks stay in cache almost
constantly, we're left with the situation that the most heavily used
parts of the database seldom get CRC checked.

With this statement, you just moved the goal posts on the checksumming
ideas. In fact, you didn't just move the goal posts, you picked the
ball up and teleported it to another stadium.

I believe that most of the people talking about and wanting checksums
so far have been wanting them to verify I/O, not to verify that PG has
no bugs, that RAM is staying charged correctly, and that no stray bits
have been flipped, and that nobody else happens to be scribbling over
our shared buffers.

Being able to arbitrary (i.e at any point in time) prove that the
shared buffers contents are exactly what they should be may be a
worthy goal, but that's many orders of magnitude more difficult than
verifying that the bytes we read from disk are the ones we wrote to
disk.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#17)
Re: How much do the hint bits help?

On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:

On 22.12.2010 16:52, Simon Riggs wrote:

On Wed, 2010-12-22 at 16:22 +0200, Heikki Linnakangas wrote:

On 22.12.2010 15:59, Simon Riggs wrote:

On Wed, 2010-12-22 at 15:30 +0200, Heikki Linnakangas wrote:

My gut feeling is that a reasonable compromise is to set hint bits like
we do today, but don't mark the page as dirty when only hint bits are
set. That way you get the benefit of hint bits for tuples that are
frequently accessed and stay in buffer cache. But you don't spend any
extra I/O to set them. I'd really like to see a worst-case scenario
benchmark of a patch that does that.

That sounds great, but still prevents block checksums and that is a very
valuable feature for robustness.

It does? The problem with block checksums is that if you modify a page
and don't have a corresponding WAL record for it, like a hint bit
update, you can have a torn page so that the checksum doesn't match.
Refraining from dirtying the page when a hint bit is updated avoids the
problem. With that change, we only ever write pages to disk that have a
WAL record associated with it, with full-page images as necessary to
avoid torn pages.

Which then leads to a block CRC not matching the block in memory.

Do you envision that the CRC is calculated at every update, or only when
a page is written out from the buffer cache?

At every update, so there is a clear assertion that the CRC matches the
block.

If the former, you could
recalculate the CRC at a hint bit update too. If the latter, the hint
bits are included in the page image that you checksum just like any
other data.

If we didn't have hint bits, we wouldn't need to recalculate the CRC
each time one was updated...

So what you suggest works only if we restrict CRC checking to blocks
incoming to the buffer cache, but leaves us unable to do CRC checks on
blocks once in the buffer cache. Since many blocks stay in cache almost
constantly, we're left with the situation that the most heavily used
parts of the database seldom get CRC checked.

There's plenty of stuff in memory that's not covered by an
application-level CRC. That's what ECC RAM is for.

http://www.google.com/research/pubs/archive/35162.pdf

Google research shows that each DIMM has an 8% chance per annum of
uncorrectable memory errors, even on ECC.

If you have large RAM, like everybody now does, your incidence of this
type of error will be much higher than it was in previous years, so our
perception of what is necessary now to protect databases is out of date.

We have data under our care, and will be much more likely to receive
this kind of error because of the amount of RAM we use.

Updating the CRC at
every update to a page seems really expensive, but it's an orthogonal
issue to hint bits.

Clearly, the frequency with which we set hint bits affects the frequency
we can sensibly update CRCs. It shouldn't be up to us to decide how much
protection a user wants to give their data.

There might be two or three settings that make sense, but clearly we
need to be able to limit hint-bit setting to allow us to have a usable
CRC check. So there is a very string connection between turning this
optimisation off and gaining CRC checking as a feature.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#20Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#19)
Re: How much do the hint bits help?

On 22.12.2010 17:31, Simon Riggs wrote:

On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:

Do you envision that the CRC is calculated at every update, or only when
a page is written out from the buffer cache?

At every update, so there is a clear assertion that the CRC matches the
block.

Umm, when do you check the CRC? Every time the page is locked? Every
time it's updated? If don't verify the CRC, what is it good for?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#21Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#19)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Aidan Van Dyk (#18)
#23Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#22)
#24Aidan Van Dyk
aidan@highrise.ca
In reply to: Simon Riggs (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#12)
#26Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#21)
#27Merlin Moncure
mmoncure@gmail.com
In reply to: Aidan Van Dyk (#24)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#27)
#29Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#25)
#30Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#28)
#31Merlin Moncure
mmoncure@gmail.com
In reply to: Merlin Moncure (#30)
#32Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#25)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#29)
#34David Fetter
david@fetter.org
In reply to: Simon Riggs (#26)
#35Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Merlin Moncure (#30)
#36Josh Berkus
josh@agliodbs.com
In reply to: Merlin Moncure (#8)
#37Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Merlin Moncure (#29)
#38Josh Berkus
josh@agliodbs.com
In reply to: Aidan Van Dyk (#18)
#39Josh Berkus
josh@agliodbs.com
In reply to: Mark Kirkwood (#37)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#39)
#41Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Josh Berkus (#36)
#42Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Tom Lane (#40)
#43Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#35)
#44Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#40)