CUDA Sorting

Started by Vitor Reusover 14 years ago41 messageshackers
Jump to latest
#1Vitor Reus
vitor.reus@gmail.com

Hello everyone,

I'm implementing a CUDA based sorting on PostgreSQL, and I believe it
can improve the ORDER BY statement performance in 4 to 10 times. I
already have a generic CUDA sort that performs around 10 times faster
than std qsort. I also managed to load CUDA into pgsql.

Since I'm new to pgsql development, I replaced the code of pgsql
qsort_arg to get used with the way postgres does the sort. The problem
is that I can't use the qsort_arg_comparator comparator function on
GPU, I need to implement my own. I didn't find out how to access the
sorting key value data of the tuples on the Tuplesortstate or
SortTuple structures. This part looks complicated because it seems the
state holds the pointer for the scanner(?), but I didn't managed to
access the values directly. Can anyone tell me how this works?

Cheers,
Vítor

#2Thom Brown
thom@linux.com
In reply to: Vitor Reus (#1)
Re: CUDA Sorting

On 19 September 2011 13:11, Vitor Reus <vitor.reus@gmail.com> wrote:

Hello everyone,

I'm implementing a CUDA based sorting on PostgreSQL, and I believe it
can improve the ORDER BY statement performance in 4 to 10 times. I
already have a generic CUDA sort that performs around 10 times faster
than std qsort. I also managed to load CUDA into pgsql.

Since I'm new to pgsql development, I replaced the code of pgsql
qsort_arg to get used with the way postgres does the sort. The problem
is that I can't use the qsort_arg_comparator comparator function on
GPU, I need to implement my own. I didn't find out how to access the
sorting key value data of the tuples on the Tuplesortstate or
SortTuple structures. This part looks complicated because it seems the
state holds the pointer for the scanner(?), but I didn't managed to
access the values directly. Can anyone tell me how this works?

I can't help with explaining the inner workings of sorting code, but
just a note that CUDA is a proprietary framework from nVidia and
confines its use to nVidia GPUs only. You'd probably be better off
investing in the OpenCL standard which is processor-agnostic. Work
has already been done in this area by Tim Child with pgOpenCL,
although doesn't appear to be available yet. It might be worth
engaging with him to see if there are commonalities to what you're
both trying to achieve.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Thom Brown
thom@linux.com
In reply to: Vitor Reus (#1)
Re: CUDA Sorting

On 19 September 2011 14:32, Vitor Reus <vitor.reus@gmail.com> wrote:

2011/9/19 Thom Brown <thom@linux.com>:

On 19 September 2011 13:11, Vitor Reus <vitor.reus@gmail.com> wrote:

Hello everyone,

I'm implementing a CUDA based sorting on PostgreSQL, and I believe it
can improve the ORDER BY statement performance in 4 to 10 times. I
already have a generic CUDA sort that performs around 10 times faster
than std qsort. I also managed to load CUDA into pgsql.

Since I'm new to pgsql development, I replaced the code of pgsql
qsort_arg to get used with the way postgres does the sort. The problem
is that I can't use the qsort_arg_comparator comparator function on
GPU, I need to implement my own. I didn't find out how to access the
sorting key value data of the tuples on the Tuplesortstate or
SortTuple structures. This part looks complicated because it seems the
state holds the pointer for the scanner(?), but I didn't managed to
access the values directly. Can anyone tell me how this works?

I can't help with explaining the inner workings of sorting code, but
just a note that CUDA is a proprietary framework from nVidia and
confines its use to nVidia GPUs only.  You'd probably be better off
investing in the OpenCL standard which is processor-agnostic.  Work
has already been done in this area by Tim Child with pgOpenCL,
although doesn't appear to be available yet.  It might be worth
engaging with him to see if there are commonalities to what you're
both trying to achieve.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi Thom Brown,

thank you very much for your reply.

I am aware that CUDA is a proprietary framework, but since the high
level CUDA API is easier than OpenCL, it will be faster to implement
and test. Also, CUDA can be translated to OpenCL in a straightforward
way, since the low level CUDA API generated code is really similar to
OpenCL.

I'll try engaging with Tim Child, but it seems that his work is to
create GPU support for specific SQL, like procedural SQL statements
with CUDA extensions, did I understand it right? And my focus is to
"unlock" the GPU power without the user being aware of this.

Please use Reply To All in your responses so the mailing list is included.

Is your aim to have this committed into core PostgreSQL, or just for
your own version? If it's the former, I don't anticipate any
enthusiasm from the hacker community.

But you're right, Tim Child's work is aimed at procedural acceleration
rather than speeding up core functionality (from what I gather
anyway).

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Bruce Momjian
bruce@momjian.us
In reply to: Vitor Reus (#1)
Re: CUDA Sorting

On Mon, Sep 19, 2011 at 1:11 PM, Vitor Reus <vitor.reus@gmail.com> wrote:

Since I'm new to pgsql development, I replaced the code of pgsql
qsort_arg to get used with the way postgres does the sort. The problem
is that I can't use the qsort_arg_comparator comparator function on
GPU, I need to implement my own. I didn't find out how to access the
sorting key value data of the tuples on the Tuplesortstate or
SortTuple structures. This part looks complicated because it seems the
state holds the pointer for the scanner(?), but I didn't managed to
access the values directly. Can anyone tell me how this works?

This is something I've been curious about for a while. The biggest
difficulty is that Postgres has a user-extensible type system and
calls user provided functions to do things like comparisons. Postgres
only supports comparison sorts and does so by calling the user
function for the data type being sorted.

These user defined function is looked up earlier in the query parsing
and analysis phase and stored in Tuplesortstate->scanKeys which is an
array of structures that hold information about the ordering required.
In there there's a pointer to the function, a set of flags (such as
NULLS FIRST/LAST), the text collation needed and the collation.

I assume you're going to have to have tuplesort.c recognize if all the
comparators are one of a small set of standard comparators that you
can implement on the GPU such as integer and floating point
comparison. In which case you could call a specialized qsort which
implements that comparator inlined instead of calling the standard
function. That might actually be a useful optimization to do anyways
since it may well be much faster even without the GPU. So that would
probably be a good place to start.

But the barrier to get over here might be relatively high. In order to
tolerate that amount of duplicated code and special cases there would
have to be benchmarks showing it's significantly faster and helps
real-world user queries. It would also have to be pretty cleanly
implemented so that it doesn't impose a lot of extra overhead every
time this code needs to be changed -- for example when adding
collations it would have been unfortunate to have to add it to half a
dozen specializations of tuplesort (though frankly I don't think that
would have made that much of a dent in the happiness of the people who
worked on collations).

All that said my personal opinion is that this can be done cleanly and
would be more than worth the benefit even without the GPU -- sorting
integers and floating point numbers is a very common case and Peter
Geoghan recently showed our qsort could be about twice as fast if it
could inline the comparisons. With the GPU I'm curious to see how well
it handles multiple processes contending for resources, it might be a
flashy feature that gets lots of attention but might not really be
very useful in practice. But it would be very interesting to see.

--
greg

#5Greg Smith
gsmith@gregsmith.com
In reply to: Bruce Momjian (#4)
Re: CUDA Sorting

On 09/19/2011 10:12 AM, Greg Stark wrote:

With the GPU I'm curious to see how well
it handles multiple processes contending for resources, it might be a
flashy feature that gets lots of attention but might not really be
very useful in practice. But it would be very interesting to see.

The main problem here is that the sort of hardware commonly used for
production database servers doesn't have any serious enough GPU to
support CUDA/OpenCL available. The very clear trend now is that all
systems other than gaming ones ship with motherboard graphics chipsets
more than powerful enough for any task but that. I just checked the 5
most popular configurations of server I see my customers deploy
PostgreSQL onto (a mix of Dell and HP units), and you don't get a
serious GPU from any of them.

Intel's next generation Ivy Bridge chipset, expected for the spring of
2012, is going to add support for OpenCL to the built-in motherboard
GPU. We may eventually see that trickle into the server hardware side
of things too.

I've never seen a PostgreSQL server capable of running CUDA, and I don't
expect that to change.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#6Thom Brown
thom@linux.com
In reply to: Greg Smith (#5)
Re: CUDA Sorting

On 19 September 2011 15:36, Greg Smith <greg@2ndquadrant.com> wrote:

On 09/19/2011 10:12 AM, Greg Stark wrote:

With the GPU I'm curious to see how well
it handles multiple processes contending for resources, it might be a
flashy feature that gets lots of attention but might not really be
very useful in practice. But it would be very interesting to see.

The main problem here is that the sort of hardware commonly used for
production database servers doesn't have any serious enough GPU to support
CUDA/OpenCL available.  The very clear trend now is that all systems other
than gaming ones ship with motherboard graphics chipsets more than powerful
enough for any task but that.  I just checked the 5 most popular
configurations of server I see my customers deploy PostgreSQL onto (a mix of
Dell and HP units), and you don't get a serious GPU from any of them.

Intel's next generation Ivy Bridge chipset, expected for the spring of 2012,
is going to add support for OpenCL to the built-in motherboard GPU.  We may
eventually see that trickle into the server hardware side of things too.

I've never seen a PostgreSQL server capable of running CUDA, and I don't
expect that to change.

But couldn't that also be seen as a chicken/egg situation? No-one
buys GPUs for database servers because the database won't make use of
it, but databases don't implement GPU functionality since database
servers don't tend to have GPUs. It's more likely the latter of those
two reasonings would have to be the first to budge.

But nVidia does produce a non-graphics-oriented GPGPU line called
Tesla dedicated to such processing.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Bruce Momjian
bruce@momjian.us
In reply to: Greg Smith (#5)
Re: CUDA Sorting

On Mon, Sep 19, 2011 at 3:36 PM, Greg Smith <greg@2ndquadrant.com> wrote:

The main problem here is that the sort of hardware commonly used for
production database servers doesn't have any serious enough GPU to support
CUDA/OpenCL available

Of course that could change if adding a GPU would help Postgres... I
would expect it to help mostly for data warehouse batch query type
systems, especially ones with very large i/o subsystems that can
saturate the memory bus with sequential i/o. "Run your large batch
queries twice as fast by adding a $400 part to your $40,000 server"
might be a pretty compelling sales pitch :)

That said, to help in the case I described you would have to implement
the tapesort algorithm on the GPU as well. I expect someone has
implemented heaps for CUDA/OpenCL already though.

--
greg

#8Thom Brown
thom@linux.com
In reply to: Bruce Momjian (#7)
Re: CUDA Sorting

On 19 September 2011 15:54, Greg Stark <stark@mit.edu> wrote:

On Mon, Sep 19, 2011 at 3:36 PM, Greg Smith <greg@2ndquadrant.com> wrote:

The main problem here is that the sort of hardware commonly used for
production database servers doesn't have any serious enough GPU to support
CUDA/OpenCL available

Of course that could change if adding a GPU would help Postgres... I
would expect it to help mostly for data warehouse batch query type
systems, especially ones with very large i/o subsystems that can
saturate the memory bus with sequential i/o. "Run your large batch
queries twice as fast by adding a $400 part to your $40,000 server"
might be a pretty compelling sales pitch :)

That said, to help in the case I described you would have to implement
the tapesort algorithm on the GPU as well. I expect someone has
implemented heaps for CUDA/OpenCL already though.

I seem to recall a paper on such a thing by Carnegie Mellon
University. Can't remember where I saw it though.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Thom Brown
thom@linux.com
In reply to: Thom Brown (#8)
Re: CUDA Sorting

On 19 September 2011 16:10, Thom Brown <thom@linux.com> wrote:

On 19 September 2011 15:54, Greg Stark <stark@mit.edu> wrote:

On Mon, Sep 19, 2011 at 3:36 PM, Greg Smith <greg@2ndquadrant.com> wrote:

The main problem here is that the sort of hardware commonly used for
production database servers doesn't have any serious enough GPU to support
CUDA/OpenCL available

Of course that could change if adding a GPU would help Postgres... I
would expect it to help mostly for data warehouse batch query type
systems, especially ones with very large i/o subsystems that can
saturate the memory bus with sequential i/o. "Run your large batch
queries twice as fast by adding a $400 part to your $40,000 server"
might be a pretty compelling sales pitch :)

That said, to help in the case I described you would have to implement
the tapesort algorithm on the GPU as well. I expect someone has
implemented heaps for CUDA/OpenCL already though.

I seem to recall a paper on such a thing by Carnegie Mellon
University.  Can't remember where I saw it though.

Found it! http://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/ngm/15-823/project/Final.pdf

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)
Re: CUDA Sorting

Greg Stark <stark@mit.edu> writes:

That said, to help in the case I described you would have to implement
the tapesort algorithm on the GPU as well.

I think the real problem would be that we are seldom sorting just the
key values. If you have to push the tuples through the GPU too, your
savings are going to go up in smoke pretty quickly ...

FWIW, I tend to believe a variant of what Greg Stark said upthread:
there would surely be some win from reducing the impedance mismatch for
comparison functions. In concrete terms, there would be no reason to
have tuplesort.c's myFunctionCall2Coll, and maybe not
inlineApplySortFunction either, if the datatype-specific comparison
functions had APIs that were closer to what sorting wants rather than
following the general SQL-callable-function API. And those functions
cost a *lot* more than a one-instruction comparison does. But it's very
much more of a stretch to believe that inlining per se is going to do
much for us, and even more of a stretch to believe that getting a
separate processor involved is going to be a win.

regards, tom lane

#11Vitor Reus
vitor.reus@gmail.com
In reply to: Tom Lane (#10)
Re: CUDA Sorting

2011/9/19 Thom Brown <thom@linux.com>

Is your aim to have this committed into core PostgreSQL, or just for
your own version?  If it's the former, I don't anticipate any
enthusiasm from the hacker community.

This is a research thesis and I'm not confident to commit it on the
core just by myself. I will, however, release the source, and I
believe it will open the way to future work be committed on core
PostgreSQL.

2011/9/19 Greg Stark <stark@mit.edu>

Of course that could change if adding a GPU would help Postgres... I
would expect it to help mostly for data warehouse batch query type
systems, especially ones with very large i/o subsystems that can
saturate the memory bus with sequential i/o. "Run your large batch
queries twice as fast by adding a $400 part to your $40,000 server"
might be a pretty compelling sales pitch :)

My focus is also energy proportionality. If you add a GPU, you will
increase the power consumption in about 2 times, but perhaps could
increse the efficiency much more.

That said, to help in the case I described you would have to implement
the tapesort algorithm on the GPU as well. I expect someone has
implemented heaps for CUDA/OpenCL already though.

For now, I'm planning to implement just the in-memory sort, for
simplicity and to see if it would give a real performance gain.

2011/9/19 Greg Stark <stark@mit.edu>:

In which case you could call a specialized qsort which
implements that comparator inlined instead of calling the standard
function.

Actually I'm now trying to make a custom comparator for integers, but
I didn't had great progress. If this works, I'll port it to GPU and
start working with the next comparators, such as float, then strings,
in a incremental way.

2011/9/19 Thom Brown <thom@linux.com>:

Found it! http://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/ngm/15-823/project/Final.pdf

This is a really great work, and I'm basing mine on it. But it's
implemented using OpenGL (yes, not OpenCL), and therefore has a lot of
limitations. I also tried to contact naju but didn't get any answer.

Vítor Uwe Reus

#12Nulik Nol
nuliknol@gmail.com
In reply to: Vitor Reus (#1)
Re: CUDA Sorting

On Mon, Sep 19, 2011 at 7:11 AM, Vitor Reus <vitor.reus@gmail.com> wrote:

Hello everyone,

I'm implementing a CUDA based sorting on PostgreSQL, and I believe it
can improve the ORDER BY statement performance in 4 to 10 times. I
already have a generic CUDA sort that performs around 10 times faster
than std qsort. I also managed to load CUDA into pgsql.

NVIDIA cards are not that good as ATI cards. ATI cards are much faster
with integer operations, and should be ideal for sorting transaction
ids or sort of similar numbers (unless you are going to sort prices
stored as float, which ATI still beats NVIDIA but not by that much)
Another problem you have to deal with is PCI Express speed. Transfer
is very slow compared to RAM. You will have to put more GPUs to match
the performance and this will increase solution cost. There was a
sorting algorithm for 4 CPU cores that was beating sort on a GTX 285
(I don't have the link, sorry), but CPUs are not that bad with sorting
like you think.
AMD is already working with embedding GPUs into the motherboard, if I
am not mistaken there are already some of them on the market available
for purchase.
Anyone who uses a tiny embedded ATI for sorting problems with integers
will outperform your NVIDIA based PCI-Express connected GPU with CUDA,
because basically your algorithm will waste a lot of time transfering
data to GPU and getting it back.
But if you use embedded ATI GPU , you can also use SSE registers on
each CPU core to add more performance to your algorithm. It is not
going to be a very hardware compatible solution but if you want good
speed/cost, this should be the best solution.
I recommend doing some bandwidth benchmark test before you start coding.

Regards
Nulik

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
==================================
The power of zero is infinite

#13Chris Browne
cbbrowne@acm.org
In reply to: Greg Smith (#5)
Re: CUDA Sorting

On Mon, Sep 19, 2011 at 10:36 AM, Greg Smith <greg@2ndquadrant.com> wrote:

Intel's next generation Ivy Bridge chipset, expected for the spring of 2012,
is going to add support for OpenCL to the built-in motherboard GPU.  We may
eventually see that trickle into the server hardware side of things too.

Note that Amazon's EC2 offerings include a configuration with a pair of GPUs.

Whether or not this continues has a certain "chicken and egg" aspect to it...

- I'm glad that Amazon is selling such a configuration, as it does
give folks the option of trying it out.

- Presumably, it will only continue on their product list if customers
do more than merely "trying it out."

I think I'd be shocked if PostgreSQL offered much support for such a
configuration in the next year; despite there being some work ongoing,
drawing the functionality into core would require Core decisions that
I'd be surprised to see so quickly.

Unfortunately, that may be slow enough progress that PostgreSQL won't
be contributing to the would-be success of the technology.

If this kind of GPU usage fails to attract much interest, then it's
probably a good thing that we're not committed to it. But if other
uses lead to it taking off, then we'll doubtless get a lot of noise on
lists about a year from now to the effect "Why don't you have this in
core yet? Not 3773t enough!?!?"

Having a bit of progress taking place now would probably be good
timing, in case it *does* take off...
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

#14Vitor Reus
vitor.reus@gmail.com
In reply to: Nulik Nol (#12)
Re: CUDA Sorting

2011/9/19 Nulik Nol <nuliknol@gmail.com>:

On Mon, Sep 19, 2011 at 7:11 AM, Vitor Reus <vitor.reus@gmail.com> wrote:
I recommend doing some bandwidth benchmark test before you start coding.

I already did some benchmarks with GPU sorting (not in pgsql), and
measured total sort times, copy bandwidth and energy usage, and got
some exciting results:

I got around 1GB/s bandwidth with a GeForce GT 430 on a MS-9803 MB.
The power increase ratio was 2.75 times, In a Core 2 Duo T8300, adding
the GT 430: http://tinyurl.com/6h7cgv2
The sorting time performance increases when you have more data, but in
average is 7.8 times faster than CPU: http://tinyurl.com/6c95dc2

#15Stephen Frost
sfrost@snowman.net
In reply to: Thom Brown (#6)
Re: CUDA Sorting

* Thom Brown (thom@linux.com) wrote:

But nVidia does produce a non-graphics-oriented GPGPU line called
Tesla dedicated to such processing.

Just as a side-note, I've got a couple Tesla's that aren't doing
terribly much at the moment and they're in a Linux 'server'-type box
from Penguin computing. I could certainly install PG on it and run some
tests- if someone's written the code and provides the tests.

I agree that it'd be interesting to do, but I share Lord Stark's
feelings about the challenges and lack of potential gain- it's a very
small set of queries that would benefit from this. You need to be
working with enough data to make the cost of tranferring it all over to
the GPU worthwhile, just for starters..

Thanks,

Stephen

#16Greg Smith
gsmith@gregsmith.com
In reply to: Thom Brown (#6)
Re: CUDA Sorting

On 09/19/2011 10:53 AM, Thom Brown wrote:

But couldn't that also be seen as a chicken/egg situation?

The chicken/egg problem here is a bit deeper than just "no one offers
GPUs because no one wants them" on server systems. One of the reasons
there aren't more GPUs in typical database server configurations is that
you're already filling up some number of the full size slots, and
correspondingly the bandwidth available to cards, with disk
controllers. It doesn't help that many server class motherboards don't
even have a x16 PCI-e slot on them, which is what most GPUs as delivered
on regular consumer video cards are optimized for.

But nVidia does produce a non-graphics-oriented GPGPU line called
Tesla dedicated to such processing.

Tesla units start at around $1500 USD, which is a nice budget to spend
on either more RAM (to allow higher work_mem), faster storage to store
temporary files onto, or a faster CPU to chew through all sorts of tasks
more quickly. The Tesla units are easy to justify if you have a serious
GPU-oriented application. The good bang for the buck point with CPU
sorting for PostgreSQL is probably going to be a $50-$100 video card
instead. For example, the card Vitor is seeing good results on costs
around $60. (That's also a system with fairly slow RAM, though; it will
be interesting to see if the gain holds up on newer systems.)

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#17Hans-Jürgen Schönig
postgres@cybertec.at
In reply to: Tom Lane (#10)
Re: CUDA Sorting

On Sep 19, 2011, at 5:16 PM, Tom Lane wrote:

Greg Stark <stark@mit.edu> writes:

That said, to help in the case I described you would have to implement
the tapesort algorithm on the GPU as well.

I think the real problem would be that we are seldom sorting just the
key values. If you have to push the tuples through the GPU too, your
savings are going to go up in smoke pretty quickly …

i would argument along a similar line.
to make GPU code fast it has to be pretty much tailored to do exactly one thing - otherwise you have no chance to get anywhere close to card-bandwith.
if you look at "two similar" GPU codes which seem to do the same thing you might easily see that one is 10 times faster than the other - for bloody reason such as memory alignment, memory transaction size or whatever.
this opens a bit of a problem: PostgreSQL sorting is so generic and so flexible that i would be really surprised if somebody could come up with a solution which really comes close to what the GPU can do.
it would definitely be interesting to see a prototype, however.

btw, there is a handful of interesting talks / lectures about GPU programming provided by the university of chicago (just cannot find the link atm).

regards,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

#18Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Greg Smith (#16)
Re: CUDA Sorting

2011/9/19 Greg Smith <greg@2ndquadrant.com>:

On 09/19/2011 10:53 AM, Thom Brown wrote:

But couldn't that also be seen as a chicken/egg situation?

The chicken/egg problem here is a bit deeper than just "no one offers GPUs
because no one wants them" on server systems.  One of the reasons there
aren't more GPUs in typical database server configurations is that you're
already filling up some number of the full size slots, and correspondingly
the bandwidth available to cards, with disk controllers.  It doesn't help
that many server class motherboards don't even have a x16 PCI-e slot on
them, which is what most GPUs as delivered on regular consumer video cards
are optimized for.

Sandy bridge and ivy bridge intel series are CPU/GPU. I don't know how
using the GPU affect the CPU part but it might be interesting to
explore...

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

#19Florian Pflug
fgp@phlo.org
In reply to: Stephen Frost (#15)
Re: CUDA Sorting

On Sep19, 2011, at 19:46 , Stephen Frost wrote:

I agree that it'd be interesting to do, but I share Lord Stark's
feelings about the challenges and lack of potential gain- it's a very
small set of queries that would benefit from this. You need to be
working with enough data to make the cost of tranferring it all over to
the GPU worthwhile, just for starters..

I wonder if anyone has ever tried to employ a GPU for more low-level
tasks. Things like sorting or hashing are hard to move to the
GPU in postgres because, in the general case, they involve essentially
arbitrary user-defined functions. But couldn't for example the WAL CRC
computation be moved to a GPU? Or, to get really crazy, even the search
for the optimal join order (only for a large number of joins though,
i.e. where we currently switch to a genetic algorithmn)?

best regards,
Florian Pflug

#20Nulik Nol
nuliknol@gmail.com
In reply to: Vitor Reus (#14)
Re: CUDA Sorting

I already did some benchmarks with GPU sorting (not in pgsql), and
measured total sort times, copy bandwidth and energy usage, and got
some exciting results:

Was that qsort implementation on CPU cache friendly and optimized for SSE ?
To make a fair comparison you have to take the best CPU implementation
and compare it to best GPU implementation. Because if not, you are
comparing full throttled GPU vs lazy CPU.
Check this paper on how hash join was optimized 17x when SSE
instructions were used.
www.vldb.org/pvldb/2/vldb09-257.pdf

Regards

--
==================================
The power of zero is infinite

#21Hannu Krosing
hannu@tm.ee
In reply to: Bruce Momjian (#4)
#22Hannu Krosing
hannu@tm.ee
In reply to: Greg Smith (#5)
#23Vitor Reus
vitor.reus@gmail.com
In reply to: Hannu Krosing (#22)
#24Gaetano Mendola
mendola@gmail.com
In reply to: Greg Smith (#5)
#25Gaetano Mendola
mendola@gmail.com
In reply to: Hans-Jürgen Schönig (#17)
#26Oleg Bartunov
oleg@sai.msu.su
In reply to: Gaetano Mendola (#24)
#27Gaetano Mendola
mendola@gmail.com
In reply to: Oleg Bartunov (#26)
#28Greg Smith
gsmith@gregsmith.com
In reply to: Gaetano Mendola (#24)
#29KaiGai Kohei
kaigai@ak.jp.nec.com
In reply to: Greg Smith (#28)
#30Gaetano Mendola
mendola@gmail.com
In reply to: KaiGai Kohei (#29)
#31Bruce Momjian
bruce@momjian.us
In reply to: Gaetano Mendola (#30)
#32Gaetano Mendola
mendola@gmail.com
In reply to: Bruce Momjian (#31)
#33Gaetano Mendola
mendola@gmail.com
In reply to: Greg Smith (#28)
#34Marti Raudsepp
marti@juffo.org
In reply to: Bruce Momjian (#31)
#35Gaetano Mendola
mendola@gmail.com
In reply to: Bruce Momjian (#31)
#36Gaetano Mendola
mendola@gmail.com
In reply to: Bruce Momjian (#31)
In reply to: Gaetano Mendola (#35)
#38Gaetano Mendola
mendola@gmail.com
In reply to: Peter Geoghegan (#37)
#39Gaetano Mendola
mendola@gmail.com
In reply to: Peter Geoghegan (#37)
In reply to: Gaetano Mendola (#38)
#41Dann Corbit
DCorbit@connx.com
In reply to: Gaetano Mendola (#38)