Substituting Checksum Algorithm (was: Enabling Checksums)

Started by Jeff Davisover 12 years ago29 messages
#1Jeff Davis
pgsql@j-davis.com
1 attachment(s)

Starting a new thread to more narrowly address the topic.

Attached is my reorganization of Ants's patch here:

/messages/by-id/CA
+CSw_vinyf-w45i=M1m__MpJZY=e8S4Nt_KNnpEbtWjTOaSUA@mail.gmail.com

My changes:

* wrest the core FNV algorithm from the specific concerns of a data page
- PageCalcChecksum16 mixes the block number, reduces to 16 bits,
and avoids the pd_checksum field
- the checksum algorithm is just a pure block checksum with a 32-bit
result
* moved the FNV checksum into a separate file, checksum.c
* added Ants's suggested compilation flags for better optimization
* slight update to the paragraph in the README that discusses concerns
specific to a data page

I do have a couple questions/concerns about Ants's patch though:

* The README mentions a slight bias; does that come from the mod
(2^16-1)? That's how I updated the README, so I wanted to make sure.
* How was the FNV_PRIME chosen?
* I can't match the individual algorithm step as described in the README
to the actual code. And the comments in the README don't make it clear
enough which one is right (or maybe they both are, and I'm just tired).

The README says:

hash = (hash ^ value) * ((hash ^ value) >> 3)

(which is obviously missing the FNV_PRIME factor) and the code says:

   +#define CHECKSUM_COMP(checksum, value) do {\
   +       uint32 __tmp = (checksum) ^ (value);\
   +       (checksum) = __tmp * FNV_PRIME ^ (__tmp >> 3);\
   +} while (0)

I'm somewhat on the fence about the "shift right". It was discussed in
this part of the thread:

/messages/by-id/99343716-5F5A-45C8-B2F6-74B9BA357396@phlo.org

I think we should be able to state with a little more clarity in the
README why there is a problem with plain FNV-1a, and why this
modification is both effective and safe.

I'd lean toward simplicity and closer adherence to the published version
of the algorithm rather than detecting a few more obscure error
patterns. It looks like the modification slows down the algorithm, too.

Regards,
Jeff Davis

Attachments:

fnv-jeff-20130422.patchtext/x-patch; charset=UTF-8; name=fnv-jeff-20130422.patchDownload
*** a/src/backend/storage/page/Makefile
--- b/src/backend/storage/page/Makefile
***************
*** 12,17 **** subdir = src/backend/storage/page
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS =  bufpage.o itemptr.o
  
  include $(top_srcdir)/src/backend/common.mk
--- 12,22 ----
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS =  bufpage.o checksum.o itemptr.o
  
  include $(top_srcdir)/src/backend/common.mk
+ 
+ # important optimization flags for checksum.c
+ ifeq ($(GCC),yes)
+ checksum.o: CFLAGS += -msse4.1 -funroll-loops -ftree-vectorize
+ endif
*** a/src/backend/storage/page/README
--- b/src/backend/storage/page/README
***************
*** 61,63 **** checksums are enabled.  Systems in Hot-Standby mode may benefit from hint bits
--- 61,109 ----
  being set, but with checksums enabled, a page cannot be dirtied after setting a
  hint bit (due to the torn page risk). So, it must wait for full-page images
  containing the hint bit updates to arrive from the master.
+ 
+ Checksum algorithm
+ ------------------
+ 
+ The algorithm used to checksum pages is chosen for very fast calculation.
+ Workloads where the database working set fits into OS file cache but not into
+ shared buffers can read in pages at a very fast pace and the checksum
+ algorithm itself can become the largest bottleneck.
+ 
+ The checksum algorithm itself is based on the FNV-1a hash (FNV is shorthand for
+ Fowler/Noll/Vo) The primitive of a plain FNV-1a hash folds in data 4 bytes at
+ a time according to the formula:
+ 
+     hash = (hash ^ value) * FNV_PRIME(16777619)
+ 
+ PostgreSQL doesn't use FNV-1a hash directly because it has bad mixing of high
+ bits - high order bits in input data only affect high order bits in output
+ data. To resolve this we xor in the value prior to multiplication shifted
+ right by 3 bits. The number 3 was chosen as it is a small odd, prime, and
+ experimentally provides enough mixing for the high order bits to avalanche
+ into lower positions. The actual hash formula used as the basis is:
+ 
+     hash = (hash ^ value) * ((hash ^ value) >> 3)
+ 
+ The main bottleneck in this calculation is the multiplication latency. To hide
+ the latency and to make use of SIMD parallelism multiple hash values are
+ calculated in parallel. Each hash function uses a different initial value
+ (offset basis in FNV terminology). The initial values actually used were
+ chosen randomly, as the values themselves don't matter as much as that they
+ are different and don't match anything in real data. The page is then treated
+ as 32 wide array of 32bit values and each column is aggregated according to
+ the above formula. Finally one more iteration of the formula is performed with
+ value 0 to mix the bits of the last value added.
+ 
+ The partial checksums are then aggregated together using xor to form a
+ 32-bit checksum. The caller can safely reduce the value to 16 bits
+ using modulo 2^16-1. That will cause a very slight bias towards lower
+ values but this is not significant for the performance of the
+ checksum.
+ 
+ Vectorization of the algorithm requires 32bit x 32bit -> 32bit integer
+ multiplication instruction. As of 2013 the corresponding instruction is
+ available on x86 SSE4.1 extensions (pmulld) and ARM NEON (vmul.i32).
+ Vectorization requires a compiler to do the vectorization for us. For recent
+ GCC versions the flags -msse4.1 -funroll-loops -ftree-vectorize are enough
+ to achieve vectorization.
*** a/src/backend/storage/page/bufpage.c
--- b/src/backend/storage/page/bufpage.c
***************
*** 16,21 ****
--- 16,22 ----
  
  #include "access/htup_details.h"
  #include "access/xlog.h"
+ #include "storage/checksum.h"
  
  bool ignore_checksum_failure = false;
  
***************
*** 948,980 **** PageSetChecksumInplace(Page page, BlockNumber blkno)
  static uint16
  PageCalcChecksum16(Page page, BlockNumber blkno)
  {
! 	pg_crc32    		crc;
! 	PageHeader	p = (PageHeader) page;
  
  	/* only calculate the checksum for properly-initialized pages */
  	Assert(!PageIsNew(page));
  
- 	INIT_CRC32(crc);
- 
  	/*
! 	 * Initialize the checksum calculation with the block number. This helps
! 	 * catch corruption from whole blocks being transposed with other whole
! 	 * blocks.
  	 */
! 	COMP_CRC32(crc, &blkno, sizeof(blkno));
  
! 	/*
! 	 * Now add in the LSN, which is always the first field on the page.
! 	 */
! 	COMP_CRC32(crc, page, sizeof(p->pd_lsn));
  
  	/*
! 	 * Now add the rest of the page, skipping the pd_checksum field.
  	 */
! 	COMP_CRC32(crc, page + sizeof(p->pd_lsn) + sizeof(p->pd_checksum),
! 				  BLCKSZ - sizeof(p->pd_lsn) - sizeof(p->pd_checksum));
! 
! 	FIN_CRC32(crc);
! 
! 	return (uint16) crc;
  }
--- 949,976 ----
  static uint16
  PageCalcChecksum16(Page page, BlockNumber blkno)
  {
! 	PageHeader	phdr   = (PageHeader) page;
! 	uint16		save_checksum;
! 	uint32		fnv_checksum;
  
  	/* only calculate the checksum for properly-initialized pages */
  	Assert(!PageIsNew(page));
  
  	/*
! 	 * Save pd_checksum and set it to zero, so that the checksum calculation
! 	 * isn't affected by the checksum stored on the page.
  	 */
! 	save_checksum = phdr->pd_checksum;
! 	phdr->pd_checksum = 0;
! 	fnv_checksum = checksum_fnv(page, BLCKSZ);
! 	phdr->pd_checksum = save_checksum;
  
! 	/* mix in the block number to detect transposed pages */
! 	fnv_checksum ^= blkno;
  
  	/*
! 	 * Reduce to a uint16 (to fit in the pd_checksum field) with an offset of
! 	 * one. That avoids checksums of zero, which seems like a good idea.
  	 */
! 	return (fnv_checksum % 65535) + 1;
  }
*** /dev/null
--- b/src/backend/storage/page/checksum.c
***************
*** 0 ****
--- 1,80 ----
+ /*-------------------------------------------------------------------------
+  *
+  * checksum.c
+  *	  Checksum implementation for data pages.
+  *
+  * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  *
+  * IDENTIFICATION
+  *	  src/backend/storage/page/checksum.c
+  *
+  *-------------------------------------------------------------------------
+  */
+ #include "postgres.h"
+ 
+ #include "storage/checksum.h"
+ 
+ /*
+  * See src/backend/storage/page/README for specification of the
+  * checksum algorithm used here.
+  */
+ 
+ /* number of checksums to calculate in parallel */
+ #define N_SUMS 32
+ /* prime multiplier of FNV-1a hash */
+ #define FNV_PRIME 16777619
+ 
+ /*
+  * Base offsets to initialize each of the parallel FNV hashes into a
+  * different initial state.
+  */
+ static const uint32 checksumBaseOffsets[N_SUMS] = {
+ 	0x5B1F36E9, 0xB8525960, 0x02AB50AA, 0x1DE66D2A,
+ 	0x79FF467A, 0x9BB9F8A3, 0x217E7CD2, 0x83E13D2C,
+ 	0xF8D4474F, 0xE39EB970, 0x42C6AE16, 0x993216FA,
+ 	0x7B093B5D, 0x98DAFF3C, 0xF718902A, 0x0B1C9CDB,
+ 	0xE58F764B, 0x187636BC, 0x5D7B3BB1, 0xE73DE7DE,
+ 	0x92BEC979, 0xCCA6C0B2, 0x304A0979, 0x85AA43D4,
+ 	0x783125BB, 0x6CA8EAA2, 0xE407EAC6, 0x4B5CFC3E,
+ 	0x9FBF8C76, 0x15CA20BE, 0xF2CA9FD3, 0x959BD756
+ };
+ 
+ /*
+  * Calculate one round of the checksum.
+  */
+ #define CHECKSUM_COMP(checksum, value) do {\
+ 	uint32 __tmp = (checksum) ^ (value);\
+ 	(checksum) = __tmp * FNV_PRIME ^ (__tmp >> 3);\
+ } while (0)
+ 
+ uint32
+ checksum_fnv(char *data, uint32 size)
+ {
+ 	uint32 sums[N_SUMS];
+ 	uint32 (*dataArr)[N_SUMS] = (uint32 (*)[N_SUMS]) data;
+ 	uint32 result = 0;
+ 	int i, j;
+ 
+ 	/* ensure that the size is compatible with the algorithm */
+ 	Assert((size % (sizeof(uint32)*N_SUMS)) == 0);
+ 
+ 	/* initialize partial checksums to their corresponding offsets */
+ 	memcpy(sums, checksumBaseOffsets, sizeof(checksumBaseOffsets));
+ 
+ 	/* main checksum calculation */
+ 	for (i = 0; i < size/sizeof(uint32)/N_SUMS; i++)
+ 		for (j = 0; j < N_SUMS; j++)
+ 			CHECKSUM_COMP(sums[j], dataArr[i][j]);
+ 
+ 	/* finally add in one round of zeroes for one more layer of mixing */
+ 	for (j = 0; j < N_SUMS; j++)
+ 		CHECKSUM_COMP(sums[j], 0);
+ 
+ 	/* xor fold partial checksums together */
+ 	for (i = 0; i < N_SUMS; i++)
+ 		result ^= sums[i];
+ 
+ 	return result;
+ }
*** /dev/null
--- b/src/include/storage/checksum.h
***************
*** 0 ****
--- 1,20 ----
+ /*-------------------------------------------------------------------------
+  *
+  * checksum.h
+  *	  Checksum implementation for data pages.
+  *
+  *
+  * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * src/include/storage/checksum.h
+  *
+  *-------------------------------------------------------------------------
+  */
+ #ifndef CHECKSUM_H
+ #define CHECKSUM_H
+ 
+ /* Fowler-Noll-Vo 1a block checksum algorithm */
+ extern uint32 checksum_fnv(char *data, uint32 size);
+ 
+ #endif   /* CHECKSUM_H */
#2Florian Pflug
fgp@phlo.org
In reply to: Jeff Davis (#1)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On Apr23, 2013, at 09:17 , Jeff Davis <pgsql@j-davis.com> wrote:

I'd lean toward simplicity and closer adherence to the published version
of the algorithm rather than detecting a few more obscure error
patterns. It looks like the modification slows down the algorithm, too.

The pattern that plain FNV1 misses are not that obscure, unfortunately.
With plain FNV1A, the n-th bit of an input word (i.e. 32-bit block) only
affects bits n through 31 of the checksum. In particular, the most
significant bit of every 32-bit block only affects the MSB of the checksum,
making the algorithm miss any even number of flipped MSBs. More generally,
any form of data corruption that affects only the top N bits are missed
at least once out of 2^N times, since changing only those bits cannot
yield more than 2^N different checksum values.

Such corruption pattern may not be especially likely, given that we're
mainly protecting against disk corruption, not memory corruption. But
quantifying how unlikely exactly seems hard, thus providing at least some
form of protection against such errors seems prudent.

In addition, even with the shift-induced slowdown, FNV1+SHIFT still
performs similarly to hardware-accelerated CRC, reaching about 6bytes/cycle
on modern Intel CPUS. This really is plenty fast - if I'm not mistaken, it
translates to well over 10 GB/s.

So overall -1 for removing the shift.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Ants Aasma
ants.aasma@eesti.ee
In reply to: Jeff Davis (#1)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On Apr 23, 2013 10:17 AM, "Jeff Davis" <pgsql@j-davis.com> wrote:

Attached is my reorganization of Ants's patch here:

/messages/by-id/CA
+CSw_vinyf-w45i=M1m__MpJZY=e8S4Nt_KNnpEbtWjTOaSUA@mail.gmail.com

Thanks for your help. Some notes below.

My changes:

* wrest the core FNV algorithm from the specific concerns of a data page
- PageCalcChecksum16 mixes the block number, reduces to 16 bits,
and avoids the pd_checksum field
- the checksum algorithm is just a pure block checksum with a 32-bit
result
* moved the FNV checksum into a separate file, checksum.c

I think the function should not be called checksum_fnv as it implies that
we use the well known straightforward implementation. Maybe checksum_block
or some other generic name.

* added Ants's suggested compilation flags for better optimization

-msse4.1 is not safe to use in default builds. On the other hand it doesn't
hurt to just specify it in CFLAGS for the whole compile (possibly as
-march=native). We should just omit it and mention somewhere that SSE4.1
enabled builds will have better checksum performance.

* slight update to the paragraph in the README that discusses concerns
specific to a data page

I do have a couple questions/concerns about Ants's patch though:

* The README mentions a slight bias; does that come from the mod
(2^16-1)? That's how I updated the README, so I wanted to make sure.

Yes.

* How was the FNV_PRIME chosen?

I still haven't found the actual source for this value. It's specified as
the value to use for 32bit FNV-1a.

* I can't match the individual algorithm step as described in the README
to the actual code. And the comments in the README don't make it clear
enough which one is right (or maybe they both are, and I'm just tired).

I will try to reword.

The README says:

hash = (hash ^ value) * ((hash ^ value) >> 3)

(which is obviously missing the FNV_PRIME factor) and the code says:

+#define CHECKSUM_COMP(checksum, value) do {\
+       uint32 __tmp = (checksum) ^ (value);\
+       (checksum) = __tmp * FNV_PRIME ^ (__tmp >> 3);\
+} while (0)

I'm somewhat on the fence about the "shift right". It was discussed in
this part of the thread:

/messages/by-id/99343716-5F5A-45C8-B2F6-74B9BA357396@phlo.org

I think we should be able to state with a little more clarity in the
README why there is a problem with plain FNV-1a, and why this
modification is both effective and safe.

Florian already mentioned why it's effective. I have an intuition why it's
safe, will try to come up with a well reasoned argument.

Regards,
Antd Aasma

#4Andres Freund
andres@2ndquadrant.com
In reply to: Jeff Davis (#1)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On 2013-04-23 00:17:28 -0700, Jeff Davis wrote:

+ # important optimization flags for checksum.c
+ ifeq ($(GCC),yes)
+ checksum.o: CFLAGS += -msse4.1 -funroll-loops -ftree-vectorize
+ endif

I am pretty sure we can't do those unconditionally:
- -funroll-loops and -ftree-vectorize weren't always part of gcc afair,
so we would need a configure check for those
- SSE4.1 looks like a total no-go, its not available everywhere. We
*can* add runtime detection of that with gcc fairly easily and
one-time if we wan't to go there (later?) using 'ifunc's, but that
needs a fair amount of infrastructure work.
- We can rely on SSE1/2 on amd64, but I think thats automatically
enabled there.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Ants Aasma
ants@cybertec.at
In reply to: Andres Freund (#4)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On Tue, Apr 23, 2013 at 11:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2013-04-23 00:17:28 -0700, Jeff Davis wrote:

+ # important optimization flags for checksum.c
+ ifeq ($(GCC),yes)
+ checksum.o: CFLAGS += -msse4.1 -funroll-loops -ftree-vectorize
+ endif

I am pretty sure we can't do those unconditionally:
- -funroll-loops and -ftree-vectorize weren't always part of gcc afair,
so we would need a configure check for those

-funroll-loops is available from at least GCC 2.95. -ftree-vectorize
is GCC 4.0+. From what I read from the documentation on ICC -axSSE4.1
should generate a plain and accelerated version and do a runtime
check., I don't know if ICC vectorizes the specific loop in the patch,
but I would expect it to given that Intels vectorization has generally
been better than GCCs and the loop is about as simple as it gets. I
don't know the relevant options for other compilers.

- SSE4.1 looks like a total no-go, its not available everywhere. We
*can* add runtime detection of that with gcc fairly easily and
one-time if we wan't to go there (later?) using 'ifunc's, but that
needs a fair amount of infrastructure work.
- We can rely on SSE1/2 on amd64, but I think thats automatically
enabled there.

This is why I initially went for the lower strength 16bit checksum
calculation - requiring only SSE2 would have made supporting the
vectorized version on amd64 trivial. By now my feeling is that it's
not prudent to compromise in quality to save some infrastructure
complexity. If we set a hypothetical VECTORIZATION_FLAGS variable at
configure time, the performance is still there for those who need it
and can afford CPU specific builds.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Jeff Davis
pgsql@j-davis.com
In reply to: Ants Aasma (#3)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On Tue, 2013-04-23 at 11:44 +0300, Ants Aasma wrote:

I will try to reword.

Did you have a chance to clarify this, as well as some of the other
documentation issues Simon mentioned here?

/messages/by-id/CA+U5nMKVEu8UDXQe
+Nk=d7Nqm4ypFSzaEf0esaK4j31LyQCOBQ@mail.gmail.com

I'm not sure if others are waiting on me for a new patch or not. I can
give the documentation issues a try, but I was hesitant to do so because
you've done the research.

The problems that I can correct are fairly trivial.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Ants Aasma
ants.aasma@eesti.ee
In reply to: Jeff Davis (#6)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On Apr 25, 2013 10:38 PM, "Jeff Davis" <pgsql@j-davis.com> wrote:

On Tue, 2013-04-23 at 11:44 +0300, Ants Aasma wrote:

I will try to reword.

Did you have a chance to clarify this, as well as some of the other
documentation issues Simon mentioned here?

/messages/by-id/CA+U5nMKVEu8UDXQe
+Nk=d7Nqm4ypFSzaEf0esaK4j31LyQCOBQ@mail.gmail.com

Not yet. I am busy at the moment due to events in private life. I might be
able to find some time over the weekend to go over the documentation but no
guarantees.

I'm not sure if others are waiting on me for a new patch or not. I can
give the documentation issues a try, but I was hesitant to do so because
you've done the research.

The problems that I can correct are fairly trivial.

The unresolved code issue that I know of is moving the compiler flags
behind a configure check. I would greatly appreciate it if you could take a
look at that. My config-fu is weak and it would take me some time to figure
out how to do that.

Regards,
Ants Aasma

#8Florian Pflug
fgp@phlo.org
In reply to: Ants Aasma (#7)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On Apr26, 2013, at 10:28 , Ants Aasma <ants.aasma@eesti.ee> wrote:

On Apr 25, 2013 10:38 PM, "Jeff Davis" <pgsql@j-davis.com> wrote:

On Tue, 2013-04-23 at 11:44 +0300, Ants Aasma wrote:

I will try to reword.

Did you have a chance to clarify this, as well as some of the other
documentation issues Simon mentioned here?

/messages/by-id/CA+U5nMKVEu8UDXQe
+Nk=d7Nqm4ypFSzaEf0esaK4j31LyQCOBQ@mail.gmail.com

Not yet. I am busy at the moment due to events in private life. I might be able to find some time over the weekend to go over the documentation but no guarantees.

I can try to write up the reasoning behind the choice of FNV1+SHIFT3 as a checksum function, but I'm quite busy too so I'm not 100% certain I'll get to it. If that's OK with you Ants, that is.

I'm not sure if others are waiting on me for a new patch or not. I can
give the documentation issues a try, but I was hesitant to do so because
you've done the research.

The problems that I can correct are fairly trivial.

The unresolved code issue that I know of is moving the compiler flags behind a configure check. I would greatly appreciate it if you could take a look at that. My config-fu is weak and it would take me some time to figure out how to do that.

Do we necessarily have to do that before beta? If not, let's concentrate on getting the basic path in, and let's add the gcc-specific compiler options later.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@2ndquadrant.com
In reply to: Florian Pflug (#8)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On 2013-04-26 13:11:00 +0200, Florian Pflug wrote:

The unresolved code issue that I know of is moving the compiler flags behind a configure check. I would greatly appreciate it if you could take a look at that. My config-fu is weak and it would take me some time to figure out how to do that.

Do we necessarily have to do that before beta? If not, let's concentrate on getting the basic path in, and let's add the gcc-specific compiler options later.

If we want them we should do it before beta, otherwise we won't notice
problems that the flags cause (misoptimizations, problems on compiler
versions, ...). So either now or in 9.4.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#9)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-04-26 13:11:00 +0200, Florian Pflug wrote:

The unresolved code issue that I know of is moving the compiler flags behind a configure check. I would greatly appreciate it if you could take a look at that. My config-fu is weak and it would take me some time to figure out how to do that.

Do we necessarily have to do that before beta? If not, let's concentrate on getting the basic path in, and let's add the gcc-specific compiler options later.

If we want them we should do it before beta, otherwise we won't notice
problems that the flags cause (misoptimizations, problems on compiler
versions, ...). So either now or in 9.4.

Yeah, beta1 is the point in the cycle where we have the most leverage
for discovering portability problems. We should not be leaving anything
involving configure tests as "to fix later".

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#10)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On 26 April 2013 14:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-04-26 13:11:00 +0200, Florian Pflug wrote:

The unresolved code issue that I know of is moving the compiler flags behind a configure check. I would greatly appreciate it if you could take a look at that. My config-fu is weak and it would take me some time to figure out how to do that.

Do we necessarily have to do that before beta? If not, let's concentrate on getting the basic path in, and let's add the gcc-specific compiler options later.

If we want them we should do it before beta, otherwise we won't notice
problems that the flags cause (misoptimizations, problems on compiler
versions, ...). So either now or in 9.4.

Yeah, beta1 is the point in the cycle where we have the most leverage
for discovering portability problems. We should not be leaving anything
involving configure tests as "to fix later".

I'm expecting to spend some time on this over the weekend, once I've
re-read the thread and patches to see if there is something to commit.

That's my last time window, so this looks like the last chance to make
changes before beta.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jeff Davis
pgsql@j-davis.com
In reply to: Simon Riggs (#11)
2 attachment(s)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On Fri, Apr 26, 2013 at 7:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I'm expecting to spend some time on this over the weekend, once I've
re-read the thread and patches to see if there is something to commit.

That's my last time window, so this looks like the last chance to make
changes before beta.

I updated the patch and split it into two parts (attached).

The first patch is the checksum algorithm itself. I have done
some documentation updates and moved it into the C file (rather
than the README), but further explanation of the "shift right 3"
modification will need to be by Ants or Florian.

The second patch adds the configure-time check for the right
compilation flags, and uses them when compiling checksum.c. I
called the new variable CFLAGS_EXTRA, for lack of a better idea,
so feel free to come up with a new name. It doesn't check for, or
use, -msse4.1, but that can be specified by the user by
configuring with CFLAGS_EXTRA="-msse4.1".

I don't know of any more required changes, aside from
documentation improvements.

Regards,
Jeff Davis

Attachments:

fnv-jeff-20130426.patchapplication/octet-stream; name=fnv-jeff-20130426.patchDownload
*** a/src/backend/storage/page/Makefile
--- b/src/backend/storage/page/Makefile
***************
*** 12,17 **** subdir = src/backend/storage/page
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS =  bufpage.o itemptr.o
  
  include $(top_srcdir)/src/backend/common.mk
--- 12,17 ----
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS =  bufpage.o checksum.o itemptr.o
  
  include $(top_srcdir)/src/backend/common.mk
*** a/src/backend/storage/page/bufpage.c
--- b/src/backend/storage/page/bufpage.c
***************
*** 16,21 ****
--- 16,22 ----
  
  #include "access/htup_details.h"
  #include "access/xlog.h"
+ #include "storage/checksum.h"
  
  bool ignore_checksum_failure = false;
  
***************
*** 948,980 **** PageSetChecksumInplace(Page page, BlockNumber blkno)
  static uint16
  PageCalcChecksum16(Page page, BlockNumber blkno)
  {
! 	pg_crc32    		crc;
! 	PageHeader	p = (PageHeader) page;
  
  	/* only calculate the checksum for properly-initialized pages */
  	Assert(!PageIsNew(page));
  
- 	INIT_CRC32(crc);
- 
  	/*
! 	 * Initialize the checksum calculation with the block number. This helps
! 	 * catch corruption from whole blocks being transposed with other whole
! 	 * blocks.
  	 */
! 	COMP_CRC32(crc, &blkno, sizeof(blkno));
  
! 	/*
! 	 * Now add in the LSN, which is always the first field on the page.
! 	 */
! 	COMP_CRC32(crc, page, sizeof(p->pd_lsn));
  
  	/*
! 	 * Now add the rest of the page, skipping the pd_checksum field.
  	 */
! 	COMP_CRC32(crc, page + sizeof(p->pd_lsn) + sizeof(p->pd_checksum),
! 				  BLCKSZ - sizeof(p->pd_lsn) - sizeof(p->pd_checksum));
! 
! 	FIN_CRC32(crc);
! 
! 	return (uint16) crc;
  }
--- 949,976 ----
  static uint16
  PageCalcChecksum16(Page page, BlockNumber blkno)
  {
! 	PageHeader	phdr   = (PageHeader) page;
! 	uint16		save_checksum;
! 	uint32		checksum;
  
  	/* only calculate the checksum for properly-initialized pages */
  	Assert(!PageIsNew(page));
  
  	/*
! 	 * Save pd_checksum and set it to zero, so that the checksum calculation
! 	 * isn't affected by the checksum stored on the page.
  	 */
! 	save_checksum = phdr->pd_checksum;
! 	phdr->pd_checksum = 0;
! 	checksum = checksum_block(page, BLCKSZ);
! 	phdr->pd_checksum = save_checksum;
  
! 	/* mix in the block number to detect transposed pages */
! 	checksum ^= blkno;
  
  	/*
! 	 * Reduce to a uint16 (to fit in the pd_checksum field) with an offset of
! 	 * one. That avoids checksums of zero, which seems like a good idea.
  	 */
! 	return (checksum % 65535) + 1;
  }
*** /dev/null
--- b/src/backend/storage/page/checksum.c
***************
*** 0 ****
--- 1,121 ----
+ /*-------------------------------------------------------------------------
+  *
+  * checksum.c
+  *	  Checksum implementation for data pages.
+  *
+  * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  *
+  * IDENTIFICATION
+  *	  src/backend/storage/page/checksum.c
+  *
+  *-------------------------------------------------------------------------
+  *
+  * Checksum algorithm
+  *
+  * The algorithm used to checksum pages is chosen for very fast calculation.
+  * Workloads where the database working set fits into OS file cache but not
+  * into shared buffers can read in pages at a very fast pace and the checksum
+  * algorithm itself can become the largest bottleneck.
+  *
+  * The checksum algorithm itself is based on the FNV-1a hash (FNV is shorthand
+  * for Fowler/Noll/Vo) The primitive of a plain FNV-1a hash folds in data 4
+  * bytes at a time according to the formula:
+  *
+  *     hash = (hash ^ value) * FNV_PRIME
+  *
+  * PostgreSQL doesn't use FNV-1a hash directly because it has bad mixing of
+  * high bits - high order bits in input data only affect high order bits in
+  * output data. To resolve this we xor in the value prior to multiplication
+  * shifted right by 3 bits. The number 3 was chosen as it is a small odd,
+  * prime, and experimentally provides enough mixing for the high order bits to
+  * avalanche into lower positions. The actual hash formula used as the basis
+  * is:
+  *
+  *     hash = (hash ^ value) * FNV_PRIME ^ ((hash ^ value) >> 3)
+  *
+  * The main bottleneck in this calculation is the multiplication latency. To
+  * hide the latency and to make use of SIMD parallelism multiple hash values
+  * are calculated in parallel. Each hash function uses a different initial
+  * value (offset basis in FNV terminology). The initial values actually used
+  * were chosen randomly, as the values themselves don't matter as much as that
+  * they are different and don't match anything in real data. The page is then
+  * treated as 32 wide array of 32bit values and each column is aggregated
+  * according to the above formula. Finally one more iteration of the formula is
+  * performed with value 0 to mix the bits of the last value added.
+  *
+  * The partial checksums are then aggregated together using xor to form a
+  * 32-bit checksum. The caller can safely reduce the value to 16 bits
+  * using modulo 2^16-1. That will cause a very slight bias towards lower
+  * values but this is not significant for the performance of the
+  * checksum.
+  *
+  * Vectorization of the algorithm requires 32bit x 32bit -> 32bit integer
+  * multiplication instruction. As of 2013 the corresponding instruction is
+  * available on x86 SSE4.1 extensions (pmulld) and ARM NEON (vmul.i32).
+  * Vectorization requires a compiler to do the vectorization for us. For recent
+  * GCC versions the flags -msse4.1 -funroll-loops -ftree-vectorize are enough
+  * to achieve vectorization.
+  */
+ #include "postgres.h"
+ 
+ #include "storage/checksum.h"
+ 
+ /* number of checksums to calculate in parallel */
+ #define N_SUMS 32
+ /* prime multiplier of FNV-1a hash */
+ #define FNV_PRIME 16777619
+ 
+ /*
+  * Base offsets to initialize each of the parallel FNV hashes into a
+  * different initial state.
+  */
+ static const uint32 checksumBaseOffsets[N_SUMS] = {
+ 	0x5B1F36E9, 0xB8525960, 0x02AB50AA, 0x1DE66D2A,
+ 	0x79FF467A, 0x9BB9F8A3, 0x217E7CD2, 0x83E13D2C,
+ 	0xF8D4474F, 0xE39EB970, 0x42C6AE16, 0x993216FA,
+ 	0x7B093B5D, 0x98DAFF3C, 0xF718902A, 0x0B1C9CDB,
+ 	0xE58F764B, 0x187636BC, 0x5D7B3BB1, 0xE73DE7DE,
+ 	0x92BEC979, 0xCCA6C0B2, 0x304A0979, 0x85AA43D4,
+ 	0x783125BB, 0x6CA8EAA2, 0xE407EAC6, 0x4B5CFC3E,
+ 	0x9FBF8C76, 0x15CA20BE, 0xF2CA9FD3, 0x959BD756
+ };
+ 
+ /*
+  * Calculate one round of the checksum.
+  */
+ #define CHECKSUM_COMP(checksum, value) do {\
+ 	uint32 __tmp = (checksum) ^ (value);\
+ 	(checksum) = __tmp * FNV_PRIME ^ (__tmp >> 3);\
+ } while (0)
+ 
+ uint32
+ checksum_block(char *data, uint32 size)
+ {
+ 	uint32 sums[N_SUMS];
+ 	uint32 (*dataArr)[N_SUMS] = (uint32 (*)[N_SUMS]) data;
+ 	uint32 result = 0;
+ 	int i, j;
+ 
+ 	/* ensure that the size is compatible with the algorithm */
+ 	Assert((size % (sizeof(uint32)*N_SUMS)) == 0);
+ 
+ 	/* initialize partial checksums to their corresponding offsets */
+ 	memcpy(sums, checksumBaseOffsets, sizeof(checksumBaseOffsets));
+ 
+ 	/* main checksum calculation */
+ 	for (i = 0; i < size/sizeof(uint32)/N_SUMS; i++)
+ 		for (j = 0; j < N_SUMS; j++)
+ 			CHECKSUM_COMP(sums[j], dataArr[i][j]);
+ 
+ 	/* finally add in one round of zeroes for one more layer of mixing */
+ 	for (j = 0; j < N_SUMS; j++)
+ 		CHECKSUM_COMP(sums[j], 0);
+ 
+ 	/* xor fold partial checksums together */
+ 	for (i = 0; i < N_SUMS; i++)
+ 		result ^= sums[i];
+ 
+ 	return result;
+ }
*** /dev/null
--- b/src/include/storage/checksum.h
***************
*** 0 ****
--- 1,23 ----
+ /*-------------------------------------------------------------------------
+  *
+  * checksum.h
+  *	  Checksum implementation for data pages.
+  *
+  *
+  * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  * src/include/storage/checksum.h
+  *
+  *-------------------------------------------------------------------------
+  */
+ #ifndef CHECKSUM_H
+ #define CHECKSUM_H
+ 
+ /*
+  * Fowler-Noll-Vo 1a block checksum algorithm. The data argument should be
+  * aligned on a 4-byte boundary.
+  */
+ extern uint32 checksum_block(char *data, uint32 size);
+ 
+ #endif   /* CHECKSUM_H */
fnv-jeff-20130426-cflags-extra.patchapplication/octet-stream; name=fnv-jeff-20130426-cflags-extra.patchDownload
*** a/config/c-compiler.m4
--- b/config/c-compiler.m4
***************
*** 242,247 **** undefine([Ac_cachevar])dnl
--- 242,271 ----
  
  
  
+ # PGAC_PROG_CC_CFLAGS_EXTRA_OPT
+ # -----------------------
+ # Given a string, check if the compiler supports the string as a
+ # command-line option. If it does, add the string to CFLAGS_EXTRA.
+ AC_DEFUN([PGAC_PROG_CC_CFLAGS_EXTRA_OPT],
+ [define([Ac_cachevar], [AS_TR_SH([pgac_cv_prog_cc_cflags_extra_$1])])dnl
+ AC_CACHE_CHECK([whether $CC supports $1], [Ac_cachevar],
+ [pgac_save_CFLAGS_EXTRA=$CFLAGS_EXTRA
+ CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA $1"
+ ac_save_c_werror_flag=$ac_c_werror_flag
+ ac_c_werror_flag=yes
+ _AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],
+                    [Ac_cachevar=yes],
+                    [Ac_cachevar=no])
+ ac_c_werror_flag=$ac_save_c_werror_flag
+ CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA"])
+ if test x"$Ac_cachevar" = x"yes"; then
+   CFLAGS_EXTRA="$CFLAGS_EXTRA $1"
+ fi
+ undefine([Ac_cachevar])dnl
+ ])# PGAC_PROG_CC_CFLAGS_EXTRA_OPT
+ 
+ 
+ 
  # PGAC_PROG_CC_LDFLAGS_OPT
  # ------------------------
  # Given a string, check if the compiler supports the string as a
*** a/configure
--- b/configure
***************
*** 731,736 **** autodepend
--- 731,737 ----
  TAS
  GCC
  CPP
+ CFLAGS_EXTRA
  SUN_STUDIO_CC
  OBJEXT
  EXEEXT
***************
*** 3944,3949 **** else
--- 3945,3955 ----
    fi
  fi
  
+ # set CFLAGS_EXTRA from the environment, if available
+ if test "$ac_env_CFLAGS_EXTRA_set" = set; then
+   CFLAGS_EXTRA=$ac_env_CFLAGS_EXTRA_value
+ fi
+ 
  # Some versions of GCC support some additional useful warning flags.
  # Check whether they are supported, and add them to CFLAGS if so.
  # ICC pretends to be GCC but it's lying; it doesn't support these flags,
***************
*** 4376,4381 **** if test x"$pgac_cv_prog_cc_cflags__fexcess_precision_standard" = x"yes"; then
--- 4382,4508 ----
    CFLAGS="$CFLAGS -fexcess-precision=standard"
  fi
  
+   # Optimization flags in $CFLAGS_EXTRA should only be applied to specific files
+   { $as_echo "$as_me:$LINENO: checking whether $CC supports -funroll-loops" >&5
+ $as_echo_n "checking whether $CC supports -funroll-loops... " >&6; }
+ if test "${pgac_cv_prog_cc_cflags_extra__funroll_loops+set}" = set; then
+   $as_echo_n "(cached) " >&6
+ else
+   pgac_save_CFLAGS_EXTRA=$CFLAGS_EXTRA
+ CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA -funroll-loops"
+ ac_save_c_werror_flag=$ac_c_werror_flag
+ ac_c_werror_flag=yes
+ cat >conftest.$ac_ext <<_ACEOF
+ /* confdefs.h.  */
+ _ACEOF
+ cat confdefs.h >>conftest.$ac_ext
+ cat >>conftest.$ac_ext <<_ACEOF
+ /* end confdefs.h.  */
+ 
+ int
+ main ()
+ {
+ 
+   ;
+   return 0;
+ }
+ _ACEOF
+ rm -f conftest.$ac_objext
+ if { (ac_try="$ac_compile"
+ case "(($ac_try" in
+   *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+   *) ac_try_echo=$ac_try;;
+ esac
+ eval ac_try_echo="\"\$as_me:$LINENO: $ac_try_echo\""
+ $as_echo "$ac_try_echo") >&5
+   (eval "$ac_compile") 2>conftest.er1
+   ac_status=$?
+   grep -v '^ *+' conftest.er1 >conftest.err
+   rm -f conftest.er1
+   cat conftest.err >&5
+   $as_echo "$as_me:$LINENO: \$? = $ac_status" >&5
+   (exit $ac_status); } && {
+ 	 test -z "$ac_c_werror_flag" ||
+ 	 test ! -s conftest.err
+        } && test -s conftest.$ac_objext; then
+   pgac_cv_prog_cc_cflags_extra__funroll_loops=yes
+ else
+   $as_echo "$as_me: failed program was:" >&5
+ sed 's/^/| /' conftest.$ac_ext >&5
+ 
+ 	pgac_cv_prog_cc_cflags_extra__funroll_loops=no
+ fi
+ 
+ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ ac_c_werror_flag=$ac_save_c_werror_flag
+ CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA"
+ fi
+ { $as_echo "$as_me:$LINENO: result: $pgac_cv_prog_cc_cflags_extra__funroll_loops" >&5
+ $as_echo "$pgac_cv_prog_cc_cflags_extra__funroll_loops" >&6; }
+ if test x"$pgac_cv_prog_cc_cflags_extra__funroll_loops" = x"yes"; then
+   CFLAGS_EXTRA="$CFLAGS_EXTRA -funroll-loops"
+ fi
+ 
+   { $as_echo "$as_me:$LINENO: checking whether $CC supports -ftree-vectorize" >&5
+ $as_echo_n "checking whether $CC supports -ftree-vectorize... " >&6; }
+ if test "${pgac_cv_prog_cc_cflags_extra__ftree_vectorize+set}" = set; then
+   $as_echo_n "(cached) " >&6
+ else
+   pgac_save_CFLAGS_EXTRA=$CFLAGS_EXTRA
+ CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA -ftree-vectorize"
+ ac_save_c_werror_flag=$ac_c_werror_flag
+ ac_c_werror_flag=yes
+ cat >conftest.$ac_ext <<_ACEOF
+ /* confdefs.h.  */
+ _ACEOF
+ cat confdefs.h >>conftest.$ac_ext
+ cat >>conftest.$ac_ext <<_ACEOF
+ /* end confdefs.h.  */
+ 
+ int
+ main ()
+ {
+ 
+   ;
+   return 0;
+ }
+ _ACEOF
+ rm -f conftest.$ac_objext
+ if { (ac_try="$ac_compile"
+ case "(($ac_try" in
+   *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+   *) ac_try_echo=$ac_try;;
+ esac
+ eval ac_try_echo="\"\$as_me:$LINENO: $ac_try_echo\""
+ $as_echo "$ac_try_echo") >&5
+   (eval "$ac_compile") 2>conftest.er1
+   ac_status=$?
+   grep -v '^ *+' conftest.er1 >conftest.err
+   rm -f conftest.er1
+   cat conftest.err >&5
+   $as_echo "$as_me:$LINENO: \$? = $ac_status" >&5
+   (exit $ac_status); } && {
+ 	 test -z "$ac_c_werror_flag" ||
+ 	 test ! -s conftest.err
+        } && test -s conftest.$ac_objext; then
+   pgac_cv_prog_cc_cflags_extra__ftree_vectorize=yes
+ else
+   $as_echo "$as_me: failed program was:" >&5
+ sed 's/^/| /' conftest.$ac_ext >&5
+ 
+ 	pgac_cv_prog_cc_cflags_extra__ftree_vectorize=no
+ fi
+ 
+ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ ac_c_werror_flag=$ac_save_c_werror_flag
+ CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA"
+ fi
+ { $as_echo "$as_me:$LINENO: result: $pgac_cv_prog_cc_cflags_extra__ftree_vectorize" >&5
+ $as_echo "$pgac_cv_prog_cc_cflags_extra__ftree_vectorize" >&6; }
+ if test x"$pgac_cv_prog_cc_cflags_extra__ftree_vectorize" = x"yes"; then
+   CFLAGS_EXTRA="$CFLAGS_EXTRA -ftree-vectorize"
+ fi
+ 
  elif test "$ICC" = yes; then
    # Intel's compiler has a bug/misoptimization in checking for
    # division by NAN (NaN == 0), -mp1 fixes it, so add it to the CFLAGS.
***************
*** 4627,4632 **** fi
--- 4754,4762 ----
  
  fi
  
+ CFLAGS_EXTRA=$CFLAGS_EXTRA
+ 
+ 
  # supply -g if --enable-debug
  if test "$enable_debug" = yes && test "$ac_cv_prog_cc_g" = yes; then
    CFLAGS="$CFLAGS -g"
*** a/configure.in
--- b/configure.in
***************
*** 400,405 **** else
--- 400,410 ----
    fi
  fi
  
+ # set CFLAGS_EXTRA from the environment, if available
+ if test "$ac_env_CFLAGS_EXTRA_set" = set; then
+   CFLAGS_EXTRA=$ac_env_CFLAGS_EXTRA_value
+ fi
+ 
  # Some versions of GCC support some additional useful warning flags.
  # Check whether they are supported, and add them to CFLAGS if so.
  # ICC pretends to be GCC but it's lying; it doesn't support these flags,
***************
*** 419,424 **** if test "$GCC" = yes -a "$ICC" = no; then
--- 424,432 ----
    PGAC_PROG_CC_CFLAGS_OPT([-fwrapv])
    # Disable FP optimizations that cause various errors on gcc 4.5+ or maybe 4.6+
    PGAC_PROG_CC_CFLAGS_OPT([-fexcess-precision=standard])
+   # Optimization flags in $CFLAGS_EXTRA should only be applied to specific files
+   PGAC_PROG_CC_CFLAGS_EXTRA_OPT([-funroll-loops])
+   PGAC_PROG_CC_CFLAGS_EXTRA_OPT([-ftree-vectorize])
  elif test "$ICC" = yes; then
    # Intel's compiler has a bug/misoptimization in checking for
    # division by NAN (NaN == 0), -mp1 fixes it, so add it to the CFLAGS.
***************
*** 434,439 **** elif test "$PORTNAME" = "hpux"; then
--- 442,449 ----
    PGAC_PROG_CC_CFLAGS_OPT([+Olibmerrno])
  fi
  
+ AC_SUBST(CFLAGS_EXTRA, $CFLAGS_EXTRA)
+ 
  # supply -g if --enable-debug
  if test "$enable_debug" = yes && test "$ac_cv_prog_cc_g" = yes; then
    CFLAGS="$CFLAGS -g"
*** a/src/Makefile.global.in
--- b/src/Makefile.global.in
***************
*** 219,224 **** CC = @CC@
--- 219,225 ----
  GCC = @GCC@
  SUN_STUDIO_CC = @SUN_STUDIO_CC@
  CFLAGS = @CFLAGS@
+ CFLAGS_EXTRA = @CFLAGS_EXTRA@
  
  # Kind-of compilers
  
*** a/src/backend/storage/page/Makefile
--- b/src/backend/storage/page/Makefile
***************
*** 15,17 **** include $(top_builddir)/src/Makefile.global
--- 15,20 ----
  OBJS =  bufpage.o checksum.o itemptr.o
  
  include $(top_srcdir)/src/backend/common.mk
+ 
+ # important optimizations flags for checksum.c
+ checksum.o: CFLAGS += ${CFLAGS_EXTRA}
#13Greg Smith
greg@2ndQuadrant.com
In reply to: Jeff Davis (#12)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On 4/26/13 3:57 PM, Jeff Davis wrote:

The second patch adds the configure-time check for the right
compilation flags, and uses them when compiling checksum.c. I
called the new variable CFLAGS_EXTRA, for lack of a better idea,
so feel free to come up with a new name. It doesn't check for, or
use, -msse4.1, but that can be specified by the user by
configuring with CFLAGS_EXTRA="-msse4.1".

Thank you, that is the last piece I was looking at but couldn't nail
down on my own. With that I should be able to duplicate both the
slicing by 8 CRC speedup Ants sent over (which also expected some
optimization changes) and trying something FNV based this weekend.

I think I need to do two baselines: master without checksums, and
master with extra optimizations but still without checksums. It may be
the case that using better compile time optimizations gives a general
speedup that's worth considering regardless. The optimizations seem to
have a very significant impact on the checksum feature, but I'd like to
quantify how they change the code a little bit before even getting into
that.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Jeff Davis
pgsql@j-davis.com
In reply to: Greg Smith (#13)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On Fri, 2013-04-26 at 16:40 -0400, Greg Smith wrote:

I think I need to do two baselines: master without checksums, and
master with extra optimizations but still without checksums. It may be
the case that using better compile time optimizations gives a general
speedup that's worth considering regardless. The optimizations seem to
have a very significant impact on the checksum feature, but I'd like to
quantify how they change the code a little bit before even getting into
that.

The patch only affects optimization flags used when compiling
checksum.c, so it should have no effect on other areas of the code.

If you want to compile the whole source with those flags, then just do:

CFLAGS="-msse4.1 -funroll-loops -ftree-vectorize" ./configure

Changing the optimization flags for existing code will have a larger
impact and should be considered separately from checksums.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andres Freund
andres@2ndquadrant.com
In reply to: Jeff Davis (#12)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On 2013-04-26 12:57:09 -0700, Jeff Davis wrote:

I updated the patch and split it into two parts (attached).

The second patch adds the configure-time check for the right
compilation flags, and uses them when compiling checksum.c. I
called the new variable CFLAGS_EXTRA, for lack of a better idea,
so feel free to come up with a new name. It doesn't check for, or
use, -msse4.1, but that can be specified by the user by
configuring with CFLAGS_EXTRA="-msse4.1".

CFLAGS_VECTORIZATION? EXTRA sounds to generic to me.

--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -242,6 +242,30 @@ undefine([Ac_cachevar])dnl
+# PGAC_PROG_CC_CFLAGS_EXTRA_OPT
+# -----------------------
+# Given a string, check if the compiler supports the string as a
+# command-line option. If it does, add the string to CFLAGS_EXTRA.
+AC_DEFUN([PGAC_PROG_CC_CFLAGS_EXTRA_OPT],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_prog_cc_cflags_extra_$1])])dnl
+AC_CACHE_CHECK([whether $CC supports $1], [Ac_cachevar],
+[pgac_save_CFLAGS_EXTRA=$CFLAGS_EXTRA
+CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA $1"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+_AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],
+                   [Ac_cachevar=yes],
+                   [Ac_cachevar=no])
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA"])
+if test x"$Ac_cachevar" = x"yes"; then
+  CFLAGS_EXTRA="$CFLAGS_EXTRA $1"
+fi
+undefine([Ac_cachevar])dnl
+])# PGAC_PROG_CC_CFLAGS_EXTRA_OPT

I think it would be better to have a PGAC_PROG_CC_VAR_OPT or so which
assigns the flag to some passed variable name. Then we can reuse it for
different vars and I have the feeling those will come. And having a
CFLAGS_VECTOR_OPT would just be stupid ;)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Jeff Davis
pgsql@j-davis.com
In reply to: Andres Freund (#15)
1 attachment(s)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On Sat, 2013-04-27 at 00:20 +0200, Andres Freund wrote:

CFLAGS_VECTORIZATION? EXTRA sounds to generic to me.

I went with CFLAGS_VECTOR to be a little shorter while still keeping
some meaning.

I think it would be better to have a PGAC_PROG_CC_VAR_OPT or so which
assigns the flag to some passed variable name. Then we can reuse it for
different vars and I have the feeling those will come. And having a
CFLAGS_VECTOR_OPT would just be stupid ;)

Good suggestion; done.

Thank you for the review. New renamed patch attached for the build
options only (the other patch for the FNV checksum algorithm is
unchanged).

Regards,
Jeff Davis

Attachments:

fnv-jeff-20130426-cflags-vector.patchtext/x-patch; charset=ISO-8859-1; name=fnv-jeff-20130426-cflags-vector.patchDownload
*** a/config/c-compiler.m4
--- b/config/c-compiler.m4
***************
*** 242,247 **** undefine([Ac_cachevar])dnl
--- 242,272 ----
  
  
  
+ # PGAC_PROG_CC_VAR_OPT
+ # -----------------------
+ # Given a variable name and a string, check if the compiler supports
+ # the string as a command-line option. If it does, add the string to
+ # the given variable.
+ AC_DEFUN([PGAC_PROG_CC_VAR_OPT],
+ [define([Ac_cachevar], [AS_TR_SH([pgac_cv_prog_cc_cflags_$2])])dnl
+ AC_CACHE_CHECK([whether $CC supports $2], [Ac_cachevar],
+ [pgac_save_CFLAGS=$CFLAGS
+ CFLAGS="$pgac_save_CFLAGS $2"
+ ac_save_c_werror_flag=$ac_c_werror_flag
+ ac_c_werror_flag=yes
+ _AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],
+                    [Ac_cachevar=yes],
+                    [Ac_cachevar=no])
+ ac_c_werror_flag=$ac_save_c_werror_flag
+ CFLAGS="$pgac_save_CFLAGS"])
+ if test x"$Ac_cachevar" = x"yes"; then
+   $1="${$1} $2"
+ fi
+ undefine([Ac_cachevar])dnl
+ ])# PGAC_PROG_CC_CFLAGS_OPT
+ 
+ 
+ 
  # PGAC_PROG_CC_LDFLAGS_OPT
  # ------------------------
  # Given a string, check if the compiler supports the string as a
*** a/configure
--- b/configure
***************
*** 731,736 **** autodepend
--- 731,737 ----
  TAS
  GCC
  CPP
+ CFLAGS_VECTOR
  SUN_STUDIO_CC
  OBJEXT
  EXEEXT
***************
*** 3944,3949 **** else
--- 3945,3955 ----
    fi
  fi
  
+ # set CFLAGS_VECTOR from the environment, if available
+ if test "$ac_env_CFLAGS_VECTOR_set" = set; then
+   CFLAGS_VECTOR=$ac_env_CFLAGS_VECTOR_value
+ fi
+ 
  # Some versions of GCC support some additional useful warning flags.
  # Check whether they are supported, and add them to CFLAGS if so.
  # ICC pretends to be GCC but it's lying; it doesn't support these flags,
***************
*** 4376,4381 **** if test x"$pgac_cv_prog_cc_cflags__fexcess_precision_standard" = x"yes"; then
--- 4382,4508 ----
    CFLAGS="$CFLAGS -fexcess-precision=standard"
  fi
  
+   # Optimization flags for specific files that benefit from vectorization
+   { $as_echo "$as_me:$LINENO: checking whether $CC supports -funroll-loops" >&5
+ $as_echo_n "checking whether $CC supports -funroll-loops... " >&6; }
+ if test "${pgac_cv_prog_cc_cflags__funroll_loops+set}" = set; then
+   $as_echo_n "(cached) " >&6
+ else
+   pgac_save_CFLAGS=$CFLAGS
+ CFLAGS="$pgac_save_CFLAGS -funroll-loops"
+ ac_save_c_werror_flag=$ac_c_werror_flag
+ ac_c_werror_flag=yes
+ cat >conftest.$ac_ext <<_ACEOF
+ /* confdefs.h.  */
+ _ACEOF
+ cat confdefs.h >>conftest.$ac_ext
+ cat >>conftest.$ac_ext <<_ACEOF
+ /* end confdefs.h.  */
+ 
+ int
+ main ()
+ {
+ 
+   ;
+   return 0;
+ }
+ _ACEOF
+ rm -f conftest.$ac_objext
+ if { (ac_try="$ac_compile"
+ case "(($ac_try" in
+   *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+   *) ac_try_echo=$ac_try;;
+ esac
+ eval ac_try_echo="\"\$as_me:$LINENO: $ac_try_echo\""
+ $as_echo "$ac_try_echo") >&5
+   (eval "$ac_compile") 2>conftest.er1
+   ac_status=$?
+   grep -v '^ *+' conftest.er1 >conftest.err
+   rm -f conftest.er1
+   cat conftest.err >&5
+   $as_echo "$as_me:$LINENO: \$? = $ac_status" >&5
+   (exit $ac_status); } && {
+ 	 test -z "$ac_c_werror_flag" ||
+ 	 test ! -s conftest.err
+        } && test -s conftest.$ac_objext; then
+   pgac_cv_prog_cc_cflags__funroll_loops=yes
+ else
+   $as_echo "$as_me: failed program was:" >&5
+ sed 's/^/| /' conftest.$ac_ext >&5
+ 
+ 	pgac_cv_prog_cc_cflags__funroll_loops=no
+ fi
+ 
+ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ ac_c_werror_flag=$ac_save_c_werror_flag
+ CFLAGS="$pgac_save_CFLAGS"
+ fi
+ { $as_echo "$as_me:$LINENO: result: $pgac_cv_prog_cc_cflags__funroll_loops" >&5
+ $as_echo "$pgac_cv_prog_cc_cflags__funroll_loops" >&6; }
+ if test x"$pgac_cv_prog_cc_cflags__funroll_loops" = x"yes"; then
+   CFLAGS_VECTOR="${CFLAGS_VECTOR} -funroll-loops"
+ fi
+ 
+   { $as_echo "$as_me:$LINENO: checking whether $CC supports -ftree-vectorize" >&5
+ $as_echo_n "checking whether $CC supports -ftree-vectorize... " >&6; }
+ if test "${pgac_cv_prog_cc_cflags__ftree_vectorize+set}" = set; then
+   $as_echo_n "(cached) " >&6
+ else
+   pgac_save_CFLAGS=$CFLAGS
+ CFLAGS="$pgac_save_CFLAGS -ftree-vectorize"
+ ac_save_c_werror_flag=$ac_c_werror_flag
+ ac_c_werror_flag=yes
+ cat >conftest.$ac_ext <<_ACEOF
+ /* confdefs.h.  */
+ _ACEOF
+ cat confdefs.h >>conftest.$ac_ext
+ cat >>conftest.$ac_ext <<_ACEOF
+ /* end confdefs.h.  */
+ 
+ int
+ main ()
+ {
+ 
+   ;
+   return 0;
+ }
+ _ACEOF
+ rm -f conftest.$ac_objext
+ if { (ac_try="$ac_compile"
+ case "(($ac_try" in
+   *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;;
+   *) ac_try_echo=$ac_try;;
+ esac
+ eval ac_try_echo="\"\$as_me:$LINENO: $ac_try_echo\""
+ $as_echo "$ac_try_echo") >&5
+   (eval "$ac_compile") 2>conftest.er1
+   ac_status=$?
+   grep -v '^ *+' conftest.er1 >conftest.err
+   rm -f conftest.er1
+   cat conftest.err >&5
+   $as_echo "$as_me:$LINENO: \$? = $ac_status" >&5
+   (exit $ac_status); } && {
+ 	 test -z "$ac_c_werror_flag" ||
+ 	 test ! -s conftest.err
+        } && test -s conftest.$ac_objext; then
+   pgac_cv_prog_cc_cflags__ftree_vectorize=yes
+ else
+   $as_echo "$as_me: failed program was:" >&5
+ sed 's/^/| /' conftest.$ac_ext >&5
+ 
+ 	pgac_cv_prog_cc_cflags__ftree_vectorize=no
+ fi
+ 
+ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ ac_c_werror_flag=$ac_save_c_werror_flag
+ CFLAGS="$pgac_save_CFLAGS"
+ fi
+ { $as_echo "$as_me:$LINENO: result: $pgac_cv_prog_cc_cflags__ftree_vectorize" >&5
+ $as_echo "$pgac_cv_prog_cc_cflags__ftree_vectorize" >&6; }
+ if test x"$pgac_cv_prog_cc_cflags__ftree_vectorize" = x"yes"; then
+   CFLAGS_VECTOR="${CFLAGS_VECTOR} -ftree-vectorize"
+ fi
+ 
  elif test "$ICC" = yes; then
    # Intel's compiler has a bug/misoptimization in checking for
    # division by NAN (NaN == 0), -mp1 fixes it, so add it to the CFLAGS.
***************
*** 4627,4632 **** fi
--- 4754,4762 ----
  
  fi
  
+ CFLAGS_VECTOR=$CFLAGS_VECTOR
+ 
+ 
  # supply -g if --enable-debug
  if test "$enable_debug" = yes && test "$ac_cv_prog_cc_g" = yes; then
    CFLAGS="$CFLAGS -g"
*** a/configure.in
--- b/configure.in
***************
*** 400,405 **** else
--- 400,410 ----
    fi
  fi
  
+ # set CFLAGS_VECTOR from the environment, if available
+ if test "$ac_env_CFLAGS_VECTOR_set" = set; then
+   CFLAGS_VECTOR=$ac_env_CFLAGS_VECTOR_value
+ fi
+ 
  # Some versions of GCC support some additional useful warning flags.
  # Check whether they are supported, and add them to CFLAGS if so.
  # ICC pretends to be GCC but it's lying; it doesn't support these flags,
***************
*** 419,424 **** if test "$GCC" = yes -a "$ICC" = no; then
--- 424,432 ----
    PGAC_PROG_CC_CFLAGS_OPT([-fwrapv])
    # Disable FP optimizations that cause various errors on gcc 4.5+ or maybe 4.6+
    PGAC_PROG_CC_CFLAGS_OPT([-fexcess-precision=standard])
+   # Optimization flags for specific files that benefit from vectorization
+   PGAC_PROG_CC_VAR_OPT(CFLAGS_VECTOR, [-funroll-loops])
+   PGAC_PROG_CC_VAR_OPT(CFLAGS_VECTOR, [-ftree-vectorize])
  elif test "$ICC" = yes; then
    # Intel's compiler has a bug/misoptimization in checking for
    # division by NAN (NaN == 0), -mp1 fixes it, so add it to the CFLAGS.
***************
*** 434,439 **** elif test "$PORTNAME" = "hpux"; then
--- 442,449 ----
    PGAC_PROG_CC_CFLAGS_OPT([+Olibmerrno])
  fi
  
+ AC_SUBST(CFLAGS_VECTOR, $CFLAGS_VECTOR)
+ 
  # supply -g if --enable-debug
  if test "$enable_debug" = yes && test "$ac_cv_prog_cc_g" = yes; then
    CFLAGS="$CFLAGS -g"
*** a/src/Makefile.global.in
--- b/src/Makefile.global.in
***************
*** 219,224 **** CC = @CC@
--- 219,225 ----
  GCC = @GCC@
  SUN_STUDIO_CC = @SUN_STUDIO_CC@
  CFLAGS = @CFLAGS@
+ CFLAGS_VECTOR = @CFLAGS_VECTOR@
  
  # Kind-of compilers
  
*** a/src/backend/storage/page/Makefile
--- b/src/backend/storage/page/Makefile
***************
*** 15,17 **** include $(top_builddir)/src/Makefile.global
--- 15,20 ----
  OBJS =  bufpage.o checksum.o itemptr.o
  
  include $(top_srcdir)/src/backend/common.mk
+ 
+ # important optimizations flags for checksum.c
+ checksum.o: CFLAGS += ${CFLAGS_VECTOR}
#17Ants Aasma
ants@cybertec.at
In reply to: Jeff Davis (#12)
3 attachment(s)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On Fri, Apr 26, 2013 at 10:57 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Fri, Apr 26, 2013 at 7:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I'm expecting to spend some time on this over the weekend, once I've
re-read the thread and patches to see if there is something to commit.

That's my last time window, so this looks like the last chance to make
changes before beta.

I updated the patch and split it into two parts (attached).

The first patch is the checksum algorithm itself. I have done
some documentation updates and moved it into the C file (rather
than the README), but further explanation of the "shift right 3"
modification will need to be by Ants or Florian.

The second patch adds the configure-time check for the right
compilation flags, and uses them when compiling checksum.c. I
called the new variable CFLAGS_EXTRA, for lack of a better idea,
so feel free to come up with a new name. It doesn't check for, or
use, -msse4.1, but that can be specified by the user by
configuring with CFLAGS_EXTRA="-msse4.1".

I don't know of any more required changes, aside from
documentation improvements.

I have updated the base patch. This is supposed to go under the
cflags-vector patch that Jeff posted yesterday.

I had the opportunity to run a few more tests of the hash. Based on
the tests I switched the shift-right operation from 3 to 17bits (the
original value was chosen by gut feel). Avalanche tests showed that
this value removed bias the quickest. You can see the difference in
the attached image, colors are still black 0% bias, blue 5%, green
33%, yellow 75%, red 100%. The final box in the diagram is covered by
the final mixing iteration. The take away from these diagrams is 17
mixes better than 3. 17 still has some residual bias for the final
iteration on the page. The effective information content values in
checksum for 16 high order bits on final 32 32bit words on the page
are: 16.0 15.1 14.1 13.3 12.6 12.1 11.8 11.7 11.5 11.5 11.1 11.0 10.9
10.6 10.6 10.4. Error detection capability for the highest bit is
therefore 1:1351. Based on this I also switched to using two
iterations of zeroes at the end, this way the lowest information
content is 15.976bits or 1:64473.

Documentation changes:
* reworded the algorithm description so the order of steps is more apparent.
* added a link to the FNV reference page.
* fixed note about FNV being 4 bytes at a time. Official variant is 1
byte at a time.
* added a segment of why the algorithm was chosen and its error
detection capabilities.
* added a segment about how the code affects vectorization.

Issue to decide before commiting:
* Whether to use 32 or 64 parallel checksums. The tradeoff for 64 is a
small performance hit (10-20%) on todays CPUs for a small performance
gain on Haswell processors coming to market this year and up to a
theoretical 2x performance gain on future CPUs. Changing this is just
a matter of changing N_SUMS and updating documentation to match.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Attachments:

avalanche-fnv-slr3.pngimage/png; name=avalanche-fnv-slr3.pngDownload
avalanche-fnv-slr17.pngimage/png; name=avalanche-fnv-slr17.pngDownload
fnv-ants-20130428.patchapplication/octet-stream; name=fnv-ants-20130428.patchDownload
*** a/src/backend/storage/page/Makefile
--- b/src/backend/storage/page/Makefile
***************
*** 12,17 **** subdir = src/backend/storage/page
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS =  bufpage.o itemptr.o
  
  include $(top_srcdir)/src/backend/common.mk
--- 12,17 ----
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS =  bufpage.o checksum.o itemptr.o
  
  include $(top_srcdir)/src/backend/common.mk
*** a/src/backend/storage/page/bufpage.c
--- b/src/backend/storage/page/bufpage.c
***************
*** 16,21 ****
--- 16,22 ----
  
  #include "access/htup_details.h"
  #include "access/xlog.h"
+ #include "storage/checksum.h"
  
  bool ignore_checksum_failure = false;
  
***************
*** 948,980 **** PageSetChecksumInplace(Page page, BlockNumber blkno)
  static uint16
  PageCalcChecksum16(Page page, BlockNumber blkno)
  {
! 	pg_crc32    		crc;
! 	PageHeader	p = (PageHeader) page;
  
  	/* only calculate the checksum for properly-initialized pages */
  	Assert(!PageIsNew(page));
  
- 	INIT_CRC32(crc);
- 
  	/*
! 	 * Initialize the checksum calculation with the block number. This helps
! 	 * catch corruption from whole blocks being transposed with other whole
! 	 * blocks.
  	 */
! 	COMP_CRC32(crc, &blkno, sizeof(blkno));
  
! 	/*
! 	 * Now add in the LSN, which is always the first field on the page.
! 	 */
! 	COMP_CRC32(crc, page, sizeof(p->pd_lsn));
  
  	/*
! 	 * Now add the rest of the page, skipping the pd_checksum field.
  	 */
! 	COMP_CRC32(crc, page + sizeof(p->pd_lsn) + sizeof(p->pd_checksum),
! 				  BLCKSZ - sizeof(p->pd_lsn) - sizeof(p->pd_checksum));
! 
! 	FIN_CRC32(crc);
! 
! 	return (uint16) crc;
  }
--- 949,976 ----
  static uint16
  PageCalcChecksum16(Page page, BlockNumber blkno)
  {
! 	PageHeader	phdr   = (PageHeader) page;
! 	uint16		save_checksum;
! 	uint32		checksum;
  
  	/* only calculate the checksum for properly-initialized pages */
  	Assert(!PageIsNew(page));
  
  	/*
! 	 * Save pd_checksum and set it to zero, so that the checksum calculation
! 	 * isn't affected by the checksum stored on the page.
  	 */
! 	save_checksum = phdr->pd_checksum;
! 	phdr->pd_checksum = 0;
! 	checksum = checksum_block(page, BLCKSZ);
! 	phdr->pd_checksum = save_checksum;
  
! 	/* mix in the block number to detect transposed pages */
! 	checksum ^= blkno;
  
  	/*
! 	 * Reduce to a uint16 (to fit in the pd_checksum field) with an offset of
! 	 * one. That avoids checksums of zero, which seems like a good idea.
  	 */
! 	return (checksum % 65535) + 1;
  }
*** /dev/null
--- b/src/backend/storage/page/checksum.c
***************
*** 0 ****
--- 1,160 ----
+ /*-------------------------------------------------------------------------
+  *
+  * checksum.c
+  *	  Checksum implementation for data pages.
+  *
+  * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+  * Portions Copyright (c) 1994, Regents of the University of California
+  *
+  *
+  * IDENTIFICATION
+  *	  src/backend/storage/page/checksum.c
+  *
+  *-------------------------------------------------------------------------
+  *
+  * Checksum algorithm
+  *
+  * The algorithm used to checksum pages is chosen for very fast calculation.
+  * Workloads where the database working set fits into OS file cache but not
+  * into shared buffers can read in pages at a very fast pace and the checksum
+  * algorithm itself can become the largest bottleneck.
+  *
+  * The checksum algorithm itself is based on the FNV-1a hash (FNV is shorthand
+  * for Fowler/Noll/Vo) The primitive of a plain FNV-1a hash folds in data 1
+  * byte at a time according to the formula:
+  *
+  *     hash = (hash ^ value) * FNV_PRIME
+  *
+  * FNV-1a algorithm is described at http://www.isthe.com/chongo/tech/comp/fnv/
+  *
+  * PostgreSQL doesn't use FNV-1a hash directly because it has bad mixing of
+  * high bits - high order bits in input data only affect high order bits in
+  * output data. To resolve this we xor in the value prior to multiplication
+  * shifted right by 17 bits. The number 17 was chosen because it doesn't
+  * have common denominator with set bit positions in FNV_PRIME and empirically
+  * provides the fastest mixing for high order bits of final iterations quickly
+  * avalanche into lower positions. For performance reasons we choose to combine
+  * 4 bytes at a time. The actual hash formula used as the basis is:
+  *
+  *     hash = (hash ^ value) * FNV_PRIME ^ ((hash ^ value) >> 17)
+  *
+  * The main bottleneck in this calculation is the multiplication latency. To
+  * hide the latency and to make use of SIMD parallelism multiple hash values
+  * are calculated in parallel. The page is treated as a 32 column two
+  * dimensional array of 32 bit values. Each column is aggregated separately
+  * into a partial checksum. Each partial checksum uses a different initial
+  * value (offset basis in FNV terminology). The initial values actually used
+  * were chosen randomly, as the values themselves don't matter as much as that
+  * they are different and don't match anything in real data. After initializing
+  * partial checksums each value in the column is aggregated according to the
+  * above formula. Finally two more iterations of the formula are performed with
+  * value 0 to mix the bits of the last value added.
+  *
+  * The partial checksums are then folded together using xor to form a single
+  * 32-bit checksum. The caller can safely reduce the value to 16 bits
+  * using modulo 2^16-1. That will cause a very slight bias towards lower
+  * values but this is not significant for the performance of the
+  * checksum.
+  *
+  * The algorithm choice was based on what instructions are available in SIMD
+  * instruction sets. This meant that a fast and good algorithm needed to use
+  * multiplication as the main mixing operator. The simplest multiplication
+  * based checksum primitive is the one used by FNV. The prime used is chosen
+  * for good dispersion of values. It has no known simple patterns that result
+  * in collisions. Test of 5-bit differentials of the primitive over 64bit keys
+  * reveals no differentials with 3 or more values out of 100000 random keys
+  * colliding. Avalanche test shows that only high order bits of the last word
+  * have a bias. Tests of 1-4 uncorrelated bit errors, stray 0 and 0xFF bytes,
+  * overwriting page from random position to end with 0 bytes, and overwriting
+  * random segments of page with 0x00, 0xFF and random data all show optimal
+  * 2e-16 false positive rate within margin of error.
+  *
+  * Vectorization of the algorithm requires 32bit x 32bit -> 32bit integer
+  * multiplication instruction. As of 2013 the corresponding instruction is
+  * available on x86 SSE4.1 extensions (pmulld) and ARM NEON (vmul.i32).
+  * Vectorization requires a compiler to do the vectorization for us. For recent
+  * GCC versions the flags -msse4.1 -funroll-loops -ftree-vectorize are enough
+  * to achieve vectorization.
+  *
+  * The optimal amount of parallelism to use depends on CPU specific instruction
+  * latency, SIMD instruction width, throughput and the amount of registers
+  * available to hold intermediate state. Generally, more parallelism is better
+  * up to the point that state doesn't fit in registers and extra load-store
+  * instructions are needed to swap values in/out. The number chosen is a fixed
+  * part of the algorithm because changing the parallelism changes the checksum
+  * result.
+  *
+  * The parallelism number 32 was chosen based on the fact that it is the
+  * largest state that fits into architecturally visible x86 SSE registers while
+  * leaving some free registers for intermediate values. For future processors
+  * with 256bit vector registers this will leave some performance on the table.
+  * When vectorization is not available it might be beneficial to restructure
+  * the computation to calculate a subset of the columns at a time and perform
+  * multiple passes to avoid register spilling. This optimization opportunity
+  * is not used. Current coding also assumes that the compiler has the ability
+  * to unroll the inner loop to avoid loop overhead and minimize register
+  * spilling. For less sophisticated compilers it might be beneficial to manually
+  * unroll the inner loop.
+  */
+ #include "postgres.h"
+ 
+ #include "storage/checksum.h"
+ 
+ /* number of checksums to calculate in parallel */
+ #define N_SUMS 32
+ /* prime multiplier of FNV-1a hash */
+ #define FNV_PRIME 16777619
+ 
+ /*
+  * Base offsets to initialize each of the parallel FNV hashes into a
+  * different initial state.
+  */
+ static const uint32 checksumBaseOffsets[N_SUMS] = {
+ 	0x5B1F36E9, 0xB8525960, 0x02AB50AA, 0x1DE66D2A,
+ 	0x79FF467A, 0x9BB9F8A3, 0x217E7CD2, 0x83E13D2C,
+ 	0xF8D4474F, 0xE39EB970, 0x42C6AE16, 0x993216FA,
+ 	0x7B093B5D, 0x98DAFF3C, 0xF718902A, 0x0B1C9CDB,
+ 	0xE58F764B, 0x187636BC, 0x5D7B3BB1, 0xE73DE7DE,
+ 	0x92BEC979, 0xCCA6C0B2, 0x304A0979, 0x85AA43D4,
+ 	0x783125BB, 0x6CA8EAA2, 0xE407EAC6, 0x4B5CFC3E,
+ 	0x9FBF8C76, 0x15CA20BE, 0xF2CA9FD3, 0x959BD756
+ };
+ 
+ /*
+  * Calculate one round of the checksum.
+  */
+ #define CHECKSUM_COMP(checksum, value) do {\
+ 	uint32 __tmp = (checksum) ^ (value);\
+ 	(checksum) = __tmp * FNV_PRIME ^ (__tmp >> 17);\
+ } while (0)
+ 
+ uint32
+ checksum_block(char *data, uint32 size)
+ {
+ 	uint32 sums[N_SUMS];
+ 	uint32 (*dataArr)[N_SUMS] = (uint32 (*)[N_SUMS]) data;
+ 	uint32 result = 0;
+ 	int i, j;
+ 
+ 	/* ensure that the size is compatible with the algorithm */
+ 	Assert((size % (sizeof(uint32)*N_SUMS)) == 0);
+ 
+ 	/* initialize partial checksums to their corresponding offsets */
+ 	memcpy(sums, checksumBaseOffsets, sizeof(checksumBaseOffsets));
+ 
+ 	/* main checksum calculation */
+ 	for (i = 0; i < size/sizeof(uint32)/N_SUMS; i++)
+ 		for (j = 0; j < N_SUMS; j++)
+ 			CHECKSUM_COMP(sums[j], dataArr[i][j]);
+ 
+ 	/* finally add in two rounds of zeroes for additional mixing */
+ 	for (i = 0; i < 2; i++)
+ 		for (j = 0; j < N_SUMS; j++)
+ 			CHECKSUM_COMP(sums[j], 0);
+ 
+ 	/* xor fold partial checksums together */
+ 	for (i = 0; i < N_SUMS; i++)
+ 		result ^= sums[i];
+ 
+ 	return result;
+ }
#18Simon Riggs
simon@2ndQuadrant.com
In reply to: Ants Aasma (#17)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On 28 April 2013 13:22, Ants Aasma <ants@cybertec.at> wrote:

I have updated the base patch. This is supposed to go under the
cflags-vector patch that Jeff posted yesterday.

I've committed your patch, with two changes
* one comment extended
* adding the .h file from Jeff's last main patch

Please can Ants and Jeff review the commit and confirm that is what we
wanted. Thanks.

I'm about to light up the build farm with a trial commit of the
compiler instructions stuff.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#18)
1 attachment(s)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On 30 April 2013 06:57, Simon Riggs <simon@2ndquadrant.com> wrote:

I'm about to light up the build farm with a trial commit of the
compiler instructions stuff.

Amazingly that seemed to work.

ISTM that we also need this patch to put memory barriers in place
otherwise the code might be rearranged.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

checksum_memory_barrier.v1.patchapplication/octet-stream; name=checksum_memory_barrier.v1.patchDownload
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index f0e3653..2430d6f 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -960,11 +960,14 @@ PageCalcChecksum16(Page page, BlockNumber blkno)
 	 * Save pd_checksum and set it to zero, so that the checksum calculation
 	 * isn't affected by the checksum stored on the page. We do this to
 	 * allow optimization of the checksum calculation on the whole block
-	 * in one go.
+	 * in one go. Memory barriers are required to avoid rearrangement here.
 	 */
 	save_checksum = phdr->pd_checksum;
+	pg_memory_barrier();
 	phdr->pd_checksum = 0;
+	pg_memory_barrier();
 	checksum = checksum_block(page, BLCKSZ);
+	pg_memory_barrier();
 	phdr->pd_checksum = save_checksum;
 
 	/* mix in the block number to detect transposed pages */
#20Andres Freund
andres@2ndquadrant.com
In reply to: Simon Riggs (#19)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On 2013-04-30 11:55:29 +0100, Simon Riggs wrote:

ISTM that we also need this patch to put memory barriers in place
otherwise the code might be rearranged.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -960,11 +960,14 @@ PageCalcChecksum16(Page page, BlockNumber blkno)
* Save pd_checksum and set it to zero, so that the checksum calculation
* isn't affected by the checksum stored on the page. We do this to
* allow optimization of the checksum calculation on the whole block
-	 * in one go.
+	 * in one go. Memory barriers are required to avoid rearrangement here.
*/
save_checksum = phdr->pd_checksum;
+	pg_memory_barrier();
phdr->pd_checksum = 0;
+	pg_memory_barrier();
checksum = checksum_block(page, BLCKSZ);
+	pg_memory_barrier();
phdr->pd_checksum = save_checksum;

/* mix in the block number to detect transposed pages */

Why? I am not sure which rearrangement you're fearing? In all cases
where there is danger of concurrent write access to the page we should
already be working on a copy?
Also, if we need a memory barrier I can only see a point in the 2nd
one. The first and third shouldn't ever be able to change anything since
we are only writing to local memory?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Ants Aasma
ants@cybertec.at
In reply to: Simon Riggs (#19)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On Tue, Apr 30, 2013 at 1:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 30 April 2013 06:57, Simon Riggs <simon@2ndquadrant.com> wrote:

I'm about to light up the build farm with a trial commit of the
compiler instructions stuff.

Amazingly that seemed to work.

Thanks for committing. Sorry about missing the .h file from the patch.
The two commits look good to me.

I can confirm that compiling with CFLAGS="-O2 -march=native" will
vectorize the committed code on GCC 4.7.

I also checked the situation on clang. clang-3.2 isn't able to
vectorize the loop even with vectorization options. I will check what
is stopping it. If any volunteer has a working build setup with ICC or
MSVC and is willing to run a couple of test compiles, I think we can
achieve vectorization there too.

ISTM that we also need this patch to put memory barriers in place
otherwise the code might be rearranged.

The compiler and CPU both have to preserve correctness when
rearranging code, so I don't think we care about it here. It might
matter if these routine could be called concurrently by multiple
backends for a single buffer, but in that case memory barriers won't
be enough, we'd need full exclusion.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Simon Riggs
simon@2ndQuadrant.com
In reply to: Ants Aasma (#21)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On 30 April 2013 12:23, Ants Aasma <ants@cybertec.at> wrote:

ISTM that we also need this patch to put memory barriers in place
otherwise the code might be rearranged.

The compiler and CPU both have to preserve correctness when
rearranging code, so I don't think we care about it here. It might
matter if these routine could be called concurrently by multiple
backends for a single buffer, but in that case memory barriers won't
be enough, we'd need full exclusion.

Certainly happy not to commit anything else...

Thanks to Ants and Andres.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#19)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

Simon Riggs <simon@2ndQuadrant.com> writes:

ISTM that we also need this patch to put memory barriers in place
otherwise the code might be rearranged.

This is simply silly.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#23)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On 30 April 2013 15:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

ISTM that we also need this patch to put memory barriers in place
otherwise the code might be rearranged.

This is simply silly.

You crack me up sometimes. Yes, it is; seem to be having a bad day for thinkos.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Greg Smith
greg@2ndQuadrant.com
In reply to: Simon Riggs (#18)
1 attachment(s)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

I re-ran the benchmark that's had me most worried against the committed
code and things look good so far. I've been keeping quiet because my
tests recently have all agreed with what Ants already described. This
is more a confirmation summary than new data.

The problem case has been Jeff's test 2 "worst-case overhead for
calculating checksum while reading data" from the OS cache. I wrapped
that into a test harness and gave results similar to Jeff's at
/messages/by-id/5133D732.4090801@2ndQuadrant.com
based on the originally proposed Fletcher-16 checksum.

I made some system improvements since then such that the absolute
runtime improved for most of the tests I'm running. But the percentage
changes didn't seem off enough to bother re-running the Fletcher tests
again. Details are in attached spreadsheet, to summarize:

-The original Fletcher-16 code slowed this test case down 24 to 32%,
depending on whether you look at the average of 3 runs or the median.

-The initial checksum commit with the truncated WAL CRC was almost an
order of magnitude worse: 146% to 224% slowdown. The test case that
took ~830ms was taking as much as 2652ms with that method. I'm still
not sure why the first run of this test is always so much faster than
the second and third. But since it happens so often I think it's fair
to consider that worst case really important.

-Committed FNV-1a implementation is now slightly better than Fletcher-16
speed wise: 19 to 27% slowdown.

-Slicing by 8 CRC I didn't test because once I'd fully come around to
agree with Ants's position it didn't seem likely to be useful. I don't
want to lose track of that idea though, it might be the right path for a
future implementation with 32 bit checksums.

Since the >=25% slowdown on this test with Fletcher-16 turned into more
like a 2% drop on more mixed workloads, I'd expect we're back to where
that's again the case with the new FNV-1a. I plan to step back to
looking at more of those cases, but it will take a few days at least to
start sorting that out.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachments:

ChecksumMethods.xlsapplication/vnd.ms-excel; name=ChecksumMethods.xlsDownload
#26Martijn van Oosterhout
kleptog@svana.org
In reply to: Greg Smith (#25)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On Tue, Apr 30, 2013 at 01:05:30PM -0400, Greg Smith wrote:

I re-ran the benchmark that's had me most worried against the
committed code and things look good so far. I've been keeping quiet
because my tests recently have all agreed with what Ants already
described. This is more a confirmation summary than new data.

I came across this today: Data Integrity Extensions, basically a
standard for have an application calculate a checksum of a block and
submitting it together with the block so that the disk can verify that
the block it is writing matches what the application sent.

It appears SCSI has standardised on a CRC-16 checksum with polynomial
0x18bb7 .

http://www.t10.org/ftp/t10/document.03/03-290r0.pdf
https://oss.oracle.com/~mkp/docs/dix-draft.pdf

Not directly relavent to PostgreSQL now, but possibly in the future.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.

-- Arthur Schopenhauer

#27Greg Smith
greg@2ndQuadrant.com
In reply to: Martijn van Oosterhout (#26)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On 4/30/13 5:26 PM, Martijn van Oosterhout wrote:

I came across this today: Data Integrity Extensions, basically a
standard for have an application calculate a checksum of a block and
submitting it together with the block so that the disk can verify that
the block it is writing matches what the application sent.

It appears SCSI has standardised on a CRC-16 checksum with polynomial
0x18bb7 .

To be pedantic for a minute (for the first time *ever* on pgsql-hackers)
it's not quite all of SCSI. iSCSI has joined btrfs by settling on
CRC-32C with the Castagnoli polynomial, as mentioned in that first
reference. CRC-32C is also the one with the SSE4.2 instructions to help
too. All the work around the T10/Data Integrity Field standard that's
going on is nice. I think it's going to leave a lot of PostgreSQL users
behind though. I'd bet a large sum of money that five years from now,
there will still be more than 10X as many PostgreSQL servers on EC2 as
on T10/DIF capable hardware.

I feel pretty good that this new FNV-1a implementation is a good
trade-off spot that balances error detection and performance impact. If
you want a 16 bit checksum that seems ready for beta today, we can't do
much better. Fletcher-16 had too many detection holes, the WAL checksum
was way too expensive. Optimized FNV-1a is even better than unoptimized
Fletcher-16 without as many detection issues. Can't even complain about
the code bloat for this part either--checksum.c is only 68 lines if you
take out its documentation.

The WAL logging of hint bits is where the scary stuff to me for this
feature has always been at. My gut feel is that doing that needed to
start being available as an option anyway. Just this month we've had
two customer issues pop up where we had to look for block differences
between a master and a standby. The security update forced some normal
update stragglers to where they now have the 9.1.6 index corruption fix,
and we're looking for cases where standby indexes might have been
corrupted by it. In this case the comparisons can just avoid anything
but indexes, so hint bits are thankfully not involved.

But having false positives pop out of comparing a master and standby due
to hint bits makes this sort of process much harder in general. Being
able to turn checksums on, and then compare more things between master
and standby without expecting any block differences, that will make both
routine quality auditing and forensics of broken clusters so much easier.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Noah Misch
noah@leadboat.com
In reply to: Greg Smith (#27)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

Orthogonal to this thread, but:

On Tue, Apr 30, 2013 at 06:39:09PM -0400, Greg Smith wrote:

The WAL logging of hint bits is where the scary stuff to me for this
feature has always been at. My gut feel is that doing that needed to
start being available as an option anyway. Just this month we've had
two customer issues pop up where we had to look for block differences
between a master and a standby. The security update forced some normal
update stragglers to where they now have the 9.1.6 index corruption fix,
and we're looking for cases where standby indexes might have been
corrupted by it. In this case the comparisons can just avoid anything
but indexes, so hint bits are thankfully not involved.

B-tree indexes have hints; see callers of ItemIdMarkDead().

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Andres Freund
andres@2ndquadrant.com
In reply to: Greg Smith (#27)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)

On 2013-04-30 18:39:09 -0400, Greg Smith wrote:

The WAL logging of hint bits is where the scary stuff to me for this feature
has always been at. My gut feel is that doing that needed to start being
available as an option anyway. Just this month we've had two customer
issues pop up where we had to look for block differences between a master
and a standby. The security update forced some normal update stragglers to
where they now have the 9.1.6 index corruption fix, and we're looking for
cases where standby indexes might have been corrupted by it. In this case
the comparisons can just avoid anything but indexes, so hint bits are
thankfully not involved.

But having false positives pop out of comparing a master and standby due to
hint bits makes this sort of process much harder in general. Being able to
turn checksums on, and then compare more things between master and standby
without expecting any block differences, that will make both routine quality
auditing and forensics of broken clusters so much easier.

I don't think the current implementation helps you with that. We only
log the first hint bit set after a checkpoint, you will still get
inconsistent bits set after that. So you might have some fewer
inconsistencies but not enough to weed them out manually or such.
c.f. MarkBufferDirtyHint() and XLogSaveBufferForHint().

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers