what checksum algo?

Started by Scott Ribeover 12 years ago6 messagesgeneral

scott_ribe@elevated-dev.com

over 12 years ago

What checksum algorithm wound up in 9.3?

(I found Simon Riggs 12/2011 submittal using Fletcher's, Michael Paquier's 7/2013 post stating CRC32 reduced to 16, and another post online claiming that it was changed from CRC before release but not stating what it was changed to.)

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Michael Paquier

michael@paquier.xyz

over 12 years ago

In reply to: Scott Ribe (#1)

Re: what checksum algo?

On Thu, Nov 14, 2013 at 12:58 AM, Scott Ribe
<scott_ribe@elevated-dev.com> wrote:

What checksum algorithm wound up in 9.3?

(I found Simon Riggs 12/2011 submittal using Fletcher's, Michael Paquier's 7/2013 post stating CRC32 reduced to 16, and another post online claiming that it was changed from CRC before release but not stating what it was changed to.)

CRC16 is used. It has been introduced with this commit:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=43e7a668499b8a69a62cc539a0fbe6983384339c
And then moved completely to src/include/storage/checksum_impl.h with
this commit to ease an external use of this algo:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f04216341dd1cc235e975f93ac806d9d3729a344
Regards,
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Peter Geoghegan

pg@bowt.ie

over 12 years ago

In reply to: Michael Paquier (#2)

Re: what checksum algo?

On Wed, Nov 13, 2013 at 4:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

CRC16 is used.

Actually, subsequently another algorithm was introduced - see commit
43e7a668499b8a69a62cc539a0fbe6983384339c .

--
Regards,
Peter Geoghegan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

ishii@postgresql.org

over 12 years ago

In reply to: Peter Geoghegan (#3)

Re: what checksum algo?

Hi,

It was good to see you in Japan.

PostgreSQL Enterprise Consortium (non profit PostgreSQL related
organization in Japan. http://www.pgecons.org) is about to inspect the
performance impact of the checksum using High-end PC server (real 80
cores with 2TB memory). What in my mind is using pgbench with custom
query (purely SELECT). Is there any recommendations/suggestions in
doing that?

(The result will be in public of course).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

On Wed, Nov 13, 2013 at 4:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

CRC16 is used.

Actually, subsequently another algorithm was introduced - see commit
43e7a668499b8a69a62cc539a0fbe6983384339c .

--
Regards,
Peter Geoghegan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Peter Geoghegan

pg@bowt.ie

over 12 years ago

In reply to: Tatsuo Ishii (#4)

Re: what checksum algo?

On Wed, Nov 13, 2013 at 5:53 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:

It was good to see you in Japan.

Likewise.

PostgreSQL Enterprise Consortium (non profit PostgreSQL related
organization in Japan. http://www.pgecons.org) is about to inspect the
performance impact of the checksum using High-end PC server (real 80
cores with 2TB memory). What in my mind is using pgbench with custom
query (purely SELECT). Is there any recommendations/suggestions in
doing that?

(The result will be in public of course).

Well, off the top of my head I would of course be sure to build
Postgres to take advantage of this:

* Vectorization of the algorithm requires 32bit x 32bit -> 32bit integer
* multiplication instruction. As of 2013 the corresponding instruction is
* available on x86 SSE4.1 extensions (pmulld) and ARM NEON (vmul.i32).
* Vectorization requires a compiler to do the vectorization for us. For recent
* GCC versions the flags -msse4.1 -funroll-loops -ftree-vectorize are enough
* to achieve vectorization.

Unfortunately I have no idea what packagers are currently doing about
this. Could you please enlighten me, Devrim?

It also occurs to me that pgbench will be pretty unsympathetic to
checksums as compared to a non-checksummed baseline here, because of
course as always it uses a uniform distribution, and that's going to
literally maximize the amount of verification that must occur. Maybe
that's something you're interested in, because you want to
characterize the worst case. If the average case is more interesting,
you could try applying this patch:

https://commitfest.postgresql.org/action/patch_view?id=1240

I don't know if the patch is any good, having not looked at the code,
but surely as the original author of pgbench you are eminently
qualified to judge this. I think that in general I prefer a uniform
distribution, because most often I look to pgbench to satisfy myself
that certain types of regressions have not occurred. That's quite a
different thing to a representative workload, obviously.

--
Regards,
Peter Geoghegan

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

ishii@postgresql.org

over 12 years ago

In reply to: Peter Geoghegan (#5)

Re: what checksum algo?

Well, off the top of my head I would of course be sure to build
Postgres to take advantage of this:

* Vectorization of the algorithm requires 32bit x 32bit -> 32bit integer
* multiplication instruction. As of 2013 the corresponding instruction is
* available on x86 SSE4.1 extensions (pmulld) and ARM NEON (vmul.i32).
* Vectorization requires a compiler to do the vectorization for us. For recent
* GCC versions the flags -msse4.1 -funroll-loops -ftree-vectorize are enough
* to achieve vectorization.

Unfortunately I have no idea what packagers are currently doing about
this. Could you please enlighten me, Devrim?

No problem. We will install PostgreSQL from source code anyway. I
tried in my local environment, PostgreSQL compiles fine with the
addional arguments you gave me and passed regression test (of course
pg_regress.c is modified to add initdb -k flag).

It also occurs to me that pgbench will be pretty unsympathetic to
checksums as compared to a non-checksummed baseline here, because of
course as always it uses a uniform distribution, and that's going to
literally maximize the amount of verification that must occur. Maybe
that's something you're interested in, because you want to
characterize the worst case. If the average case is more interesting,
you could try applying this patch:

https://commitfest.postgresql.org/action/patch_view?id=1240

I don't know if the patch is any good, having not looked at the code,
but surely as the original author of pgbench you are eminently
qualified to judge this. I think that in general I prefer a uniform
distribution, because most often I look to pgbench to satisfy myself
that certain types of regressions have not occurred. That's quite a
different thing to a representative workload, obviously.

Ok, I will look into this when I have enough time.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general