Re: New CRC algorithm: Slicing by 8
Resending... The previous 3 attempt haven't succeeded, probably because of
size restrictions; so sending in two parts.
On 10/18/06, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:
Hi all,
Please refer to the following link for a new algorithm to calculate
CRC, developed by Intel.http://www.intel.com/technology/magazine/communications/slicing-by-8-0306.htm
Please find attached the patch (pg_crc_sb8_4.diff.gz), that implements
this algorithm (originally obtained from http://sourceforge.net/projects/slicing-by-8
).I tested it using pgbench, and found an average improvement of about
15 tps (291 vs. 307 tps) when running with a scaling-factor of 10, number of
clients: 10, and 1000 transactions per client (I don't remember these
pgbench params exactly; it's been quite a while now).The author of the algo says that it works three times faster
(theoretically) than the conventional CRC calculation methods, but the tps
improvements didn't reflect that; so, to prove the speed, I extracted the
source files (both, PG and SB8) and made a saparate project ( test.tar.gz
).The resulting binary's comand format is:
<snip>
<prog_name> <algo_name> [ <scaling_factor> [ <block_size> [ <list_size>
] ] ]algo_name : PG or SB8 or BOTH (just the first char will also do!)
scaling_factor: the multiple to use when creating the list of buffer
blocks.
block_size : buffer size (in KBs); default is 4.
list_size : number of data blocks to create for the test; default is
100.So the command : a.out p 3 1024 300
will test the PG's algo over 900 blocks of 1 MB each.
</snip>The test.sh script is a wrapper around this binary, to test each of PG
and SB8, and both algos simultaneously, three times each. I performed two
tests using this script to prove SB8's effectiveness on large and small
sized data blocks:sh test.sh 1 1024 800 -- 800 blocks of 1 MB each; effective data size
800 MB
sh test.sh 800 1 1024 -- 800 * 1024 blocks of 1 KB each; effective
data size 800 MBThe new algo shined in both test configurations. It performs 3 to 4
times better than regular CRC algo in both types of tests. The results are
in the files results_1_1024_800.out and results_800_1_1024.out,
respectively.To test it yourself, extract all the files from test.tar.gz, and issue
the following commands (assuming bash):sh make.sh
sh test.sh 1 1024 800
sh test.sh 800 1 1024(Note: you would like to reduce the effective data size on a Windows box.
I used 500 MB on windows+MinGW. I tested with these params since I have only
1 GB of RAM, and increasing the data size beyond these numbers caused the
memory contents to be spilled over into swap-space/pagefile, and this disk
I/O causes a severe distortion in results.)The tests were performed on Fedora Core 5 and MinGW on WinXP
Professional.If possible, people should test it on different platforms, so as to
ensure that it doesn't perform any worse than older implementation on any
supported platform (please post the results, if you do test it). Also,
recommendations for a better (than pgbench) benhmark are welcome, so that we
can show the improvement when it is used as a part of PG.Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
--
gurjeet@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
Attachments:
Import Notes
Reply to msg id not found: 65937bea0610180811l1e324302qbeec112c4cdbb185@mail.gmail.comReference msg id not found: 65937bea0610180811l1e324302qbeec112c4cdbb185@mail.gmail.com
Resending... The previous 3 attempt haven't succeeded, probably because
of size restrictions; so sending in two parts.
This is part 2; unfortunately, the size of the gzipped file is still a
bit big (26 KB), so I have uploaded it to
http://www.geocities.com/gurjeet79/crc_sb8_test.tar.gz
Please download and test it from there.
Can someone let us all know what is the limit on the attachments sent to
the -hackers list?
Best regards,
--
gurjeet@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
Show quoted text
On 10/18/06, Gurjeet Singh < singh.gurjeet@gmail.com> wrote:
Hi all,
Please refer to the following link for a new algorithm to calculate
CRC, developed by Intel.http://www.intel.com/technology/magazine/communications/slicing-by-8-0306.htm
Please find attached the patch (pg_crc_sb8_4.diff.gz), that implements
this algorithm (originally obtained from http://sourceforge.net/projects/slicing-by-8
).I tested it using pgbench, and found an average improvement of about
15 tps (291 vs. 307 tps) when running with a scaling-factor of 10, number of
clients: 10, and 1000 transactions per client (I don't remember these
pgbench params exactly; it's been quite a while now).The author of the algo says that it works three times faster
(theoretically) than the conventional CRC calculation methods, but the tps
improvements didn't reflect that; so, to prove the speed, I extracted the
source files (both, PG and SB8) and made a saparate project ( test.tar.gz
).The resulting binary's comand format is:
<snip>
<prog_name> <algo_name> [ <scaling_factor> [ <block_size> [ <list_size>
] ] ]algo_name : PG or SB8 or BOTH (just the first char will also do!)
scaling_factor: the multiple to use when creating the list of buffer
blocks.
block_size : buffer size (in KBs); default is 4.
list_size : number of data blocks to create for the test; default is
100.So the command : a.out p 3 1024 300
will test the PG's algo over 900 blocks of 1 MB each.
</snip>The test.sh script is a wrapper around this binary, to test each of PG
and SB8, and both algos simultaneously, three times each. I performed two
tests using this script to prove SB8's effectiveness on large and small
sized data blocks:sh test.sh 1 1024 800 -- 800 blocks of 1 MB each; effective data size
800 MB
sh test.sh 800 1 1024 -- 800 * 1024 blocks of 1 KB each; effective
data size 800 MBThe new algo shined in both test configurations. It performs 3 to 4
times better than regular CRC algo in both types of tests. The results are
in the files results_1_1024_800.out and results_800_1_1024.out,
respectively.To test it yourself, extract all the files from test.tar.gz, and issue
the following commands (assuming bash):sh make.sh
sh test.sh 1 1024 800
sh test.sh 800 1 1024(Note: you would like to reduce the effective data size on a Windows box.
I used 500 MB on windows+MinGW. I tested with these params since I have only
1 GB of RAM, and increasing the data size beyond these numbers caused the
memory contents to be spilled over into swap-space/pagefile, and this disk
I/O causes a severe distortion in results.)The tests were performed on Fedora Core 5 and MinGW on WinXP
Professional.If possible, people should test it on different platforms, so as to
ensure that it doesn't perform any worse than older implementation on any
supported platform (please post the results, if you do test it). Also,
recommendations for a better (than pgbench) benhmark are welcome, so that we
can show the improvement when it is used as a part of PG.Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
Gurjeet Singh wrote:
Can someone let us all know what is the limit on the attachments
sent to the -hackers list?
Generally it's probably best not to send attachments to -hackers at all.
If you have a patch, send it to -hackers. Or you can publish it
somewhere on the web and post a link to it (this is one of the things I
think a developers' wiki is actually useful for).
cheers
andrew
Andrew Dunstan wrote:
If you have a patch, send it to -hackers.
I mean -patches of course. Sorry.
cheers
andrew
"Gurjeet Singh" <singh.gurjeet@gmail.com> writes:
Please refer to the following link for a new algorithm to calculate
CRC, developed by Intel.http://www.intel.com/technology/magazine/communications/slicing-by-8-0306.htm
I can't help wondering what the patent status of this algorithm is.
(This from a guy who still remembers very well how Unisys published the
LZW algorithm in a manner that made it look like it was free for anyone
to use...)
regards, tom lane
Tom Lane wrote:
"Gurjeet Singh" <singh.gurjeet@gmail.com> writes:
Please refer to the following link for a new algorithm to calculate
CRC, developed by Intel.http://www.intel.com/technology/magazine/communications/slicing-by-8-0306.htm
I can't help wondering what the patent status of this algorithm is.
(This from a guy who still remembers very well how Unisys published the
LZW algorithm in a manner that made it look like it was free for anyone
to use...)
This article
http://www.intel.com/technology/magazine/communications/slicing-by-8-0306.htm
seems to imply it's provided "free". It doesn't say anything about
being free of patents though.
The Sourceforge project referred to in the article (but for which no
link is given) seems to be this one:
http://sourceforge.net/projects/slicing-by-8
The "files" section contains a zip archive, inside which there are three
source files all of which state that the package, which is (c) Intel
Corp. 2004-2006, is released under the BSD license, with this URL:
http://www.opensource.org/licenses/bsd-license.html
Again, no mention of patents anywhere.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On 10/20/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:
It doesn't say anything about
being free of patents though.The Sourceforge project referred to in the article (but for which no
link is given) seems to be this one:
Yes. I had already mentioned that link in my posts.
The "files" section contains a zip archive, inside which there are three
source files all of which state that the package, which is (c) Intel
Corp. 2004-2006, is released under the BSD license, with this URL:http://www.opensource.org/licenses/bsd-license.html
Again, no mention of patents anywhere.
I'll try to get in touch with the author and get the clarification.
Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
Gurjeet Singh wrote:
On 10/20/06, Alvaro Herrera <alvherre@commandprompt.com> wrote:
It doesn't say anything about
being free of patents though.The Sourceforge project referred to in the article (but for which no
link is given) seems to be this one:Yes. I had already mentioned that link in my posts.
The "files" section contains a zip archive, inside which there are three
source files all of which state that the package, which is (c) Intel
Corp. 2004-2006, is released under the BSD license, with this URL:http://www.opensource.org/licenses/bsd-license.html
Again, no mention of patents anywhere.
I'll try to get in touch with the author and get the clarification.
Yep, the license and patent status seem to be separate issues.
However, I am not sure getting a clarification from the author even
helps us legally. Also, why are we more critical of an Intel-provided
idea than any other idea we get from the community? The scary answer
is that in many ways they are the same, and have the same risks.
So unless we hear about a problem, I think we should use the code.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
However, I am not sure getting a clarification from the author even
helps us legally. Also, why are we more critical of an Intel-provided
idea than any other idea we get from the community?
Bitter experience with other companies.
So unless we hear about a problem, I think we should use the code.
It hasn't even been tested. One thing I'd want to know about is the
performance effect on non-Intel machines.
regards, tom lane
On 10/21/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
[snip]
It hasn't even been tested. One thing I'd want to know about is the
performance effect on non-Intel machines.
On Opteron 265 his test code shows SB8 (the intel alg) is 2.48x faster
for checksum and 1.95x faster for verify for the 800 * 1024 blocks of
1 KB each workload. For 100000 blocks of 8k I got simmlar results as
well.
It looks like the new code may have a larger cache footprint, so
actual performance may differ from the microbenchmark.
Hi all,
Michael Kounavis has given a green signal (please refer the forwarded
message).
Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
---------- Forwarded message ----------
From: Kounavis, Michael E <michael.e.kounavis@intel.com>
Date: Oct 20, 2006 10:43 PM
Subject: RE: CRC algorithm, Slicing By 8; concerns about patents
infringements
To: Gurjeet Singh <singh.gurjeet@gmail.com>, Michael Kounavis <
mekounav@users.sourceforge.net>
Hi,
Thank you for your interest on Slicing-by-8. No. Intel decided not to file a
patent on this algorithm but to make it freely available under BSD license.
You should have no problem using it.
MIke
------------------------------
*From:* Gurjeet Singh [mailto:singh.gurjeet@gmail.com]
*Sent:* Friday, October 20, 2006 5:50 AM
*To:* Michael Kounavis
*Subject:* CRC algorithm, Slicing By 8; concerns about patents infringements
Hi Michael,
Please refer the following post, and the conversation that followed:
http://archives.postgresql.org/pgsql-hackers/2006-10/msg01015.php
As is evident form that post, I have used the source code provided by
you on sf.net, and modified it to suit PostgreSQL.
Now the community is concerned if we will be infringing any patents. We
all understand that that you (on behalf of Intel) have released the
algorithm, and the code, under BSD license, which allows anybody to use it
commercially, but we need assurance that we will not be infringing any
patents owned by Intel by including your algorithm in PostgreSQL, because
PostgreSQL is used/sold by many companies for profit.
Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
Import Notes
Resolved by subject fallback
On 10/22/06, Gregory Maxwell <gmaxwell@gmail.com> wrote:
On Opteron 265 his test code shows SB8 (the intel alg) is 2.48x faster
for checksum and 1.95x faster for verify for the 800 * 1024 blocks of
1 KB each workload. For 100000 blocks of 8k I got simmlar results as
well.
I think you meant
800*1024 blocks of 1 >MB< each workload.
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
On Sun, Oct 22, 2006 at 08:10:56PM +0530, Gurjeet Singh wrote:
Hi all,
Michael Kounavis has given a green signal (please refer the forwarded
message).
I don't think that helps. The publishing date of this article was March
2006. If this is really the first time this algorithm was published, that
means that anyone else (or even the author) has the option of patenting
this algorithm sometime before March 2007 and still claiming they
invented it first.
And realistically we would wait at least a year or three after that,
because you don't get to see patents as they're applied for.
Maybe March 2010 we can look into it...
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
On 10/22/06, Martijn van Oosterhout <kleptog@svana.org> wrote:
On Sun, Oct 22, 2006 at 08:10:56PM +0530, Gurjeet Singh wrote:
Hi all,
Michael Kounavis has given a green signal (please refer the forwarded
message).I don't think that helps. The publishing date of this article was March
2006. If this is really the first time this algorithm was published, that
means that anyone else (or even the author) has the option of patenting
this algorithm sometime before March 2007 and still claiming they
invented it first.And realistically we would wait at least a year or three after that,
because you don't get to see patents as they're applied for.Maybe March 2010 we can look into it...
Consudering the author has OK-d it and given how easy
is to replace the algorithm, I don't see the reason for
such carefulness?
--
marko
"Gurjeet Singh" <singh.gurjeet@gmail.com> writes:
If possible, people should test it on different platforms, so as to
ensure that it doesn't perform any worse than older implementation on any
supported platform (please post the results, if you do test it).
I didn't particularly trust the timing calculations in your benchmark
program, so I made up my own low-tech test version (attached).
I get the following timings for 1 million iterations of
INIT_CRC32/COMP_CRC32/FIN_CRC32 on different buffer sizes,
using "gcc -O2" on some machines laying about the house:
Std CRC Slice8 CRC
HPPA (HPUX 10.20)
8192 bytes 44.897212 35.191499
1024 bytes 5.639081 4.669850
64 bytes 0.377416 0.613195
PPC (Mac G4, Darwin 10.4.8)
8192 bytes 12.663135 12.158293
1024 bytes 1.579940 1.614967
64 bytes 0.104310 0.229401
Intel Xeon EM64T (Fedora Core 5)
8192 bytes 4.420879 7.633120
1024 bytes 0.571794 0.819372
64 bytes 0.047354 0.071906
Intel Pentium 4 (Fedora Core 5)
8192 bytes 6.942324 28.848572 (yes, really)
1024 bytes 0.905259 3.625360
64 bytes 0.068065 0.260224
It's worth pointing out that this test program is biased in favor of the
slice8 code about as far as it could possibly be: after the first
iteration, the remaining 999999 will find the cache as hot as possible.
Furthermore, the test doesn't exercise misaligned or odd-length cases,
which would incur extra start/stop overhead for slice8.
These numbers are um, not impressive. Considering that a large fraction
of our WAL records are pretty short, the fact that slice8 consistently
loses at short buffer lengths is especially discouraging. Much of that
ground could be made up perhaps with tenser coding of the initialization
and finalization code, but it'd still not be worth taking any legal risk
for AFAICS.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
However, I am not sure getting a clarification from the author even
helps us legally. Also, why are we more critical of an Intel-provided
idea than any other idea we get from the community?Bitter experience with other companies.
The problem is we have lots of companies involved, and I bet some we
don't even know about (e.g. yahoo/gmail addresses), and with
contributors who don't know that their employment agreement says
anything they do while employed is company property. And we have Unisys
involved now too. How worrying is that? :-(
So unless we hear about a problem, I think we should use the code.
It hasn't even been tested. One thing I'd want to know about is the
performance effect on non-Intel machines.
Sure.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
Also, why are we more critical of an Intel-provided
idea than any other idea we get from the community?Bitter experience with other companies.
The problem is we have lots of companies involved, and I bet some we
don't even know about (e.g. yahoo/gmail addresses),
It's not so much that I don't trust Intel as that a CRC algorithm is
exactly the sort of nice little self-contained thing that people love
to try to patent these days. What I am really afraid of is that someone
else has already invented this same method (or something close enough
to it) and filed for a patent that Intel doesn't know about either.
I'd be wondering about that no matter where the code had come from.
Given the numbers I posted earlier today, the proposal is dead in the
water anyway, quite aside from any legal considerations.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
Also, why are we more critical of an Intel-provided
idea than any other idea we get from the community?Bitter experience with other companies.
The problem is we have lots of companies involved, and I bet some we
don't even know about (e.g. yahoo/gmail addresses),It's not so much that I don't trust Intel as that a CRC algorithm is
exactly the sort of nice little self-contained thing that people love
to try to patent these days. What I am really afraid of is that someone
else has already invented this same method (or something close enough
to it) and filed for a patent that Intel doesn't know about either.
I'd be wondering about that no matter where the code had come from.Given the numbers I posted earlier today, the proposal is dead in the
water anyway, quite aside from any legal considerations.
Agreed. I just wanted to point out we have other sharks in the water. :-(
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Sun, Oct 22, 2006 at 06:06:13PM -0400, Tom Lane wrote:
Intel Xeon EM64T (Fedora Core 5)
8192 bytes 4.420879 7.633120
1024 bytes 0.571794 0.819372
64 bytes 0.047354 0.071906Intel Pentium 4 (Fedora Core 5)
8192 bytes 6.942324 28.848572 (yes, really)
1024 bytes 0.905259 3.625360
64 bytes 0.068065 0.260224
AMDX2 3800+ (Fedora Core 5)
STD CRC SLICE8 CRC
8192 bytes 8.576635 7.170038
1024 bytes 1.504361 1.402446
64 bytes 0.154459 0.144209
Odd that the AMD shows opposite of the two Intel numbers above, and
that it was an "Intel engineer" who wrote it. My first speculation is
that you did your Intel numbers backwards. My second speculation is
that you already thought of that and confirmed before posting. :-)
So yeah - not too impressive...
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
mark@mark.mielke.cc writes:
... My first speculation is
that you did your Intel numbers backwards. My second speculation is
that you already thought of that and confirmed before posting. :-)
Yah, I checked. Several times... but if anyone else wants to repeat
the experiment, please do. Or look for bugs in either my test case
or Gurjeet's.
regards, tom lane