Faster methods for getting SPI results

Started by Jim Nasbyover 9 years ago48 messageshackers
Jump to latest
#1Jim Nasby
Jim.Nasby@BlueTreble.com

I've been looking at the performance of SPI calls within plpython.
There's a roughly 1.5x difference from equivalent python code just in
pulling data out of the SPI tuplestore. Some of that is due to an
inefficiency in how plpython is creating result dictionaries, but fixing
that is ultimately a dead-end: if you're dealing with a lot of results
in python, you want a tuple of arrays, not an array of tuples.

While we could just brute-force a tuple of arrays by plowing through the
SPI tuplestore (this is what pl/r does), there's still a lot of extra
work involved in doing that. AFAICT there's at least 2 copies that
happen between the executor producing a tuple and it making it into the
tuplestore, plus the tuplestore is going to consume a potentially very
large amount of memory for a very short period of time, before all the
data gets duplicated (again) into python objects.

It would be a lot more efficient if we could just grab datums from the
executor and make a single copy into plpython (or R), letting the PL
deal with all the memory management overhead.

I briefly looked at using SPI cursors to do just that, but that looks
even worse: every fetch is executed in a subtransaction, and every fetch
creates an entire tuplestore even if it's just going to return a single
value. (But hey, we never claimed cursors were fast...)

Is there any way to avoid all of this? I'm guessing one issue might be
that we don't want to call an external interpreter while potentially
holding page pins, but even then couldn't we just copy a single tuple at
a time and save a huge amount of palloc overhead?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jim Nasby (#1)
Re: Faster methods for getting SPI results

On 12/20/16 10:14 PM, Jim Nasby wrote:

It would be a lot more efficient if we could just grab datums from the
executor and make a single copy into plpython (or R), letting the PL
deal with all the memory management overhead.

I briefly looked at using SPI cursors to do just that, but that looks
even worse: every fetch is executed in a subtransaction, and every fetch
creates an entire tuplestore even if it's just going to return a single
value. (But hey, we never claimed cursors were fast...)

Is there any way to avoid all of this? I'm guessing one issue might be
that we don't want to call an external interpreter while potentially
holding page pins, but even then couldn't we just copy a single tuple at
a time and save a huge amount of palloc overhead?

AFAICT that's exactly how DestRemote works: it grabs a raw slot from the
executor, makes sure it's fully expanded, and sends it on it's way via
pq_send*(). So presumably the same could be done for SPI, by creating a
new CommandDest (ISTM none of the existing ones would do what we want).

I'm not sure what the API for this should look like. One possibility is
to have SPI_execute and friends accept a flag that indicates not to
build a tupletable. I don't think a query needs to be read-only to allow
for no tuplestore, so overloading read_only seems like a bad idea.

Another option is to treat this as a "lightweight cursor" that only
allows forward fetches. One nice thing about that option is it leaves
open the possibility of using a small tuplestore for each "fetch",
without all the overhead that a full blown cursor has. This assumes
there are some use cases where you want to operate on relatively small
sets of tuples at a time, but you don't need to materialize the whole
thing in one shot.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jim Nasby (#2)
Re: Faster methods for getting SPI results

On 12/21/16 8:21 AM, Jim Nasby wrote:

On 12/20/16 10:14 PM, Jim Nasby wrote:

It would be a lot more efficient if we could just grab datums from the
executor and make a single copy into plpython (or R), letting the PL
deal with all the memory management overhead.

I briefly looked at using SPI cursors to do just that, but that looks
even worse: every fetch is executed in a subtransaction, and every fetch
creates an entire tuplestore even if it's just going to return a single
value. (But hey, we never claimed cursors were fast...)

Is there any way to avoid all of this? I'm guessing one issue might be
that we don't want to call an external interpreter while potentially
holding page pins, but even then couldn't we just copy a single tuple at
a time and save a huge amount of palloc overhead?

AFAICT that's exactly how DestRemote works: it grabs a raw slot from the
executor, makes sure it's fully expanded, and sends it on it's way via
pq_send*(). So presumably the same could be done for SPI, by creating a
new CommandDest (ISTM none of the existing ones would do what we want).

I'm not sure what the API for this should look like. One possibility is
to have SPI_execute and friends accept a flag that indicates not to
build a tupletable. I don't think a query needs to be read-only to allow
for no tuplestore, so overloading read_only seems like a bad idea.

Another option is to treat this as a "lightweight cursor" that only
allows forward fetches. One nice thing about that option is it leaves
open the possibility of using a small tuplestore for each "fetch",
without all the overhead that a full blown cursor has. This assumes
there are some use cases where you want to operate on relatively small
sets of tuples at a time, but you don't need to materialize the whole
thing in one shot.

I've looked at this some more, and ITSM that the only way to do this
without some major surgery is to create a new type of Destination
specifically for SPI that allows for the execution of an arbitrary C
function for each tuple to be sent. AFAICT this should be fairly safe,
since DestRemote can potentially block while sending a tuple and also
runs output functions (which presumably could themselves generate errors).

_SPI_execute_plan() would need to accept an arbitrary DestReceiver
struct, and use that (if specified) instead of creating it's own.

Once that's done, my plan is to allow plpy to use this functionality
with a receiver function that adds tuple fields to corresponding python
lists. This should result in significantly less overhead than going
through a tuplestore when dealing with a large number of rows.

Before I go code this up, I'd like to know if there's some fatal flaw in
this, or if there's an easier way to hack this up just to test my
performance theory.

Suggestions?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Craig Ringer
craig@2ndquadrant.com
In reply to: Jim Nasby (#3)
Re: Faster methods for getting SPI results

On 28 December 2016 at 09:58, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

I've looked at this some more, and ITSM that the only way to do this without
some major surgery is to create a new type of Destination specifically for
SPI that allows for the execution of an arbitrary C function for each tuple
to be sent.

That sounds a lot more sensible than the prior proposals. Callback driven.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Craig Ringer (#4)
Re: Faster methods for getting SPI results

On 12/27/16 9:10 PM, Craig Ringer wrote:

On 28 December 2016 at 09:58, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

I've looked at this some more, and ITSM that the only way to do this without
some major surgery is to create a new type of Destination specifically for
SPI that allows for the execution of an arbitrary C function for each tuple
to be sent.

That sounds a lot more sensible than the prior proposals. Callback driven.

Are there other places this would be useful? I'm reluctant to write all
of this just to discover it doesn't help performance at all, but if it's
useful on it's own I can just submit it as a stand-alone patch.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Craig Ringer
craig@2ndquadrant.com
In reply to: Jim Nasby (#5)
Re: Faster methods for getting SPI results

On 28 December 2016 at 12:32, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 12/27/16 9:10 PM, Craig Ringer wrote:

On 28 December 2016 at 09:58, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

I've looked at this some more, and ITSM that the only way to do this
without
some major surgery is to create a new type of Destination specifically
for
SPI that allows for the execution of an arbitrary C function for each
tuple
to be sent.

That sounds a lot more sensible than the prior proposals. Callback driven.

Are there other places this would be useful? I'm reluctant to write all of
this just to discover it doesn't help performance at all, but if it's useful
on it's own I can just submit it as a stand-alone patch.

I don't have a use for it personally. In BDR and pglogical anything
that does work with nontrivial numbers of tuples uses lower level
scans anyway.

I expect anything that uses the SPI to run arbitrary user queries
could have a use for something like this though. Any PL, for one.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Craig Ringer (#4)
Re: Faster methods for getting SPI results

On 12/27/16 9:10 PM, Craig Ringer wrote:

On 28 December 2016 at 09:58, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

I've looked at this some more, and ITSM that the only way to do this without
some major surgery is to create a new type of Destination specifically for
SPI that allows for the execution of an arbitrary C function for each tuple
to be sent.

That sounds a lot more sensible than the prior proposals. Callback driven.

Here's what I've got right now. I haven't bothered with
SPI_execute_callback() yet, and there's some missing sanity checks.

I'm not sure if the changes to CreateDestReceiver() are warranted or
necessary, though it would at least give you sane defaults. My
incomplete code that would make use of this currently does

CallbackState callback;

memcpy(callback.pub, CreateDestReceiver(DestSPICallback),
sizeof(DestReceiver));
callback.pub.receiveSlot = PLy_CSreceive;
callback.pub.rStartup = PLy_CSStartup;
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Craig Ringer (#6)
Re: Faster methods for getting SPI results

On 12/28/16 3:14 AM, Craig Ringer wrote:

On 28 December 2016 at 12:32, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 12/27/16 9:10 PM, Craig Ringer wrote:

On 28 December 2016 at 09:58, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

I've looked at this some more, and ITSM that the only way to do this
without
some major surgery is to create a new type of Destination specifically
for
SPI that allows for the execution of an arbitrary C function for each
tuple
to be sent.

That sounds a lot more sensible than the prior proposals. Callback driven.

Are there other places this would be useful? I'm reluctant to write all of
this just to discover it doesn't help performance at all, but if it's useful
on it's own I can just submit it as a stand-alone patch.

I don't have a use for it personally. In BDR and pglogical anything
that does work with nontrivial numbers of tuples uses lower level
scans anyway.

I expect anything that uses the SPI to run arbitrary user queries
could have a use for something like this though. Any PL, for one.

Just a quick update on this: I've gotten this working well enough in
plpython to do some performance testing. This patch does change python
results from being a list of dicts to a dict of lists, but I suspect the
vast majority of the speed improvement is from not creating a tuplestore.

The attached sample (from OS X /usr/bin/sample) is interesting. The
highlight is:

! 3398 SPI_execute_callback (in postgres) + 163 [0x110125793]
! 3394 _SPI_execute_plan (in postgres) + 1262 [0x1101253fe]
! : 2043 standard_ExecutorRun (in postgres) + 288 [0x1100f9a40]
! : | 1990 ExecProcNode (in postgres) + 250 [0x1100fd62a]

The top line is the entry into SPI from plpython. The bottom line is
generate_series into a tuplestore and then reading from that tuplestore.
Almost all the time being spent in standard_ExecutorRun is in
PLy_CSreceive, which is appending values to a set of python lists as
it's getting tuples.

The other noteworthy item in the sample is this:

535 list_dealloc (in Python) + 116,103,... [0x11982b1b4,0x11982b1a7,...]

that's how long it's taking python to free the 3 lists (each with
9999999 python int objects).

In short (and at best*), this makes plpython just as fast at processing
results as SELECT count(SELECT s, s, s FROM generate_series()).

The * on that is there's something odd going on where plpython starts
out really fast at this, then gets 100% slower. I've reached out to some
python folks about that. Even so, the overall results from a quick test
on my laptop are (IMHO) impressive:

Old Code New Code Improvement
Pure SQL 2 sec 2 sec
plpython 12.7-14 sec 4-10 sec ~1.3-3x
plpython - SQL 10.7-12 sec 2-8 sec ~1.3-6x

Pure SQL is how long an equivalent query takes to run with just SQL.
plpython - SQL is simply the raw python times minus the pure SQL time.

I suspect other PL languages that have fairly fast object alloc and
dealloc would see a similar benefit.

BTW, the patch currently breaks on nested calls to plpython, but I don't
think that should change the performance much.

The test function:

CREATE OR REPLACE FUNCTION test_series(
iter int
) RETURNS int LANGUAGE plpythonu AS $body$
d = plpy.execute('SELECT s AS some_table_id, s AS some_field_name, s AS some_other_field_name FROM generate_series(1,{}) s'.format(iter) )
return len(d['some_table_id'])
$body$;

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

Attachments:

plpython_callbeck_v2.difftext/plain; charset=UTF-8; name=plpython_callbeck_v2.diff; x-mac-creator=0; x-mac-type=0Download+339-10
sample.txttext/plain; charset=UTF-8; name=sample.txtDownload
#9Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jim Nasby (#8)
Re: Faster methods for getting SPI results (460% improvement)

On 1/5/17 9:50 PM, Jim Nasby wrote:

The * on that is there's something odd going on where plpython starts
out really fast at this, then gets 100% slower. I've reached out to some
python folks about that. Even so, the overall results from a quick test
on my laptop are (IMHO) impressive:

Old Code New Code Improvement
Pure SQL 2 sec 2 sec
plpython 12.7-14 sec 4-10 sec ~1.3-3x
plpython - SQL 10.7-12 sec 2-8 sec ~1.3-6x

Pure SQL is how long an equivalent query takes to run with just SQL.
plpython - SQL is simply the raw python times minus the pure SQL time.

I finally got all the kinks worked out and did some testing with python
3. Performance for my test [1] improved ~460% when returning a dict of
lists (as opposed to the current list of dicts). Based on previous
testing, I expect that using this method to return a list of dicts will
be about 8% slower. The inconsistency in results on 2.7 has to do with
how python 2 handles ints.

Someone who's familiar with pl/perl should take a look at this and see
if it would apply there. I've attached the SPI portion of this patch.

I think the last step here is to figure out how to support switching
between the current behavior and the "columnar" behavior of a dict of
lists. I believe the best way to do that is to add two optional
arguments to the execution functions: container=[] and members={}, and
then copy those to produce the output objects. That means you can get
the new behavior by doing something like:

plpy.execute('...', container={}, members=[])

Or, more interesting, you could do:

plpy.execute('...', container=Pandas.DataFrame, members=Pandas.Series)

since that's what a lot of people are going to want anyway.

In the future we could also add a GUC to change the default behavior.

Any concerns with that approach?

1:

d = plpy.execute('SELECT s AS some_table_id, s AS some_field_name, s AS some_other_field_name FROM generate_series(1,{}) s'.format(iter) )
return len(d['some_table_id'])

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

Attachments:

spi_callback.patchtext/plain; charset=UTF-8; name=spi_callback.patch; x-mac-creator=0; x-mac-type=0Download+83-9
#10Craig Ringer
craig@2ndquadrant.com
In reply to: Jim Nasby (#9)
Re: Faster methods for getting SPI results (460% improvement)

On 24 January 2017 at 11:23, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

I finally got all the kinks worked out and did some testing with python 3.
Performance for my test [1] improved ~460% when returning a dict of lists
(as opposed to the current list of dicts). Based on previous testing, I
expect that using this method to return a list of dicts will be about 8%
slower. The inconsistency in results on 2.7 has to do with how python 2
handles ints.

Impressive results.

I think the last step here is to figure out how to support switching between
the current behavior and the "columnar" behavior of a dict of lists.

That sounds like it'd be much better approached as a separate, later patch.

If I understand you correctly, you propose to return the resultset

a b
1 10
2 20

which is currently returned as

[ {"a":1, "b":10}, {"a":2, "b":20} ]

instead as

{ "a": [1, 2], "b": [10, 20] }

?

If so I see that as a lot more of a niche thing. I can see why it'd be
useful and would help performance, but it seems much more disruptive.
It requires users to discover it exists, actively adopt a different
style of ingesting data, etc. For a 10%-ish gain in a PL.

I strongly suggest making this design effort a separate thread, and
focusing on the SPI improvements that give "free" no-user-action
performance boosts here.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Craig Ringer (#10)
Re: Faster methods for getting SPI results (460% improvement)

On 1/23/17 10:36 PM, Craig Ringer wrote:

which is currently returned as

[ {"a":1, "b":10}, {"a":2, "b":20} ]

instead as

{ "a": [1, 2], "b": [10, 20] }

Correct.

If so I see that as a lot more of a niche thing. I can see why it'd be
useful and would help performance, but it seems much more disruptive.
It requires users to discover it exists, actively adopt a different
style of ingesting data, etc. For a 10%-ish gain in a PL.

In data science, what we're doing now is actually the niche. All real
analytics happens with something like a Pandas DataFrame, which is
organized as a dict of lists.

This isn't just idle nomenclature either: organizing results in what
amounts to a column store provides a significant speed improvement for
most analytics, because you're working on an array of contiguous memory
(at least, when you're using more advanced types like DataFrames and
Series).

I strongly suggest making this design effort a separate thread, and
focusing on the SPI improvements that give "free" no-user-action
performance boosts here.

Fair enough. I posted the SPI portion of that yesterday. That should be
useful for pl/R and possibly pl/perl. pl/tcl could make use of it, but
it would end up executing arbitrary tcl code in the middle of portal
execution, which doesn't strike me as a great idea. Unfortunately, I
don't think plpgsql could make much use of this for similar reasons.

I'll post a plpython patch that doesn't add the output format control.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jim Nasby (#9)
Re: Faster methods for getting SPI results (460% improvement)

On 1/23/17 9:23 PM, Jim Nasby wrote:

I think the last step here is to figure out how to support switching
between the current behavior and the "columnar" behavior of a dict of lists.

I've thought more about this... instead of trying to switch from the
current situation of 1 choice of how results are return to 2 choices, I
think it'd be better to just expose the API that the new Destination
type provides to SPI. Specifically, execute a python function during
Portal startup, and a different function for receiving tuples. There'd
be an optional 3rd function for Portal shutdown.

The startup function would be handed details of the resultset it was
about to receive, as a list that contained python tuples with the
results of SPI_fname, _gettype, _gettypeid. This function would return a
callback version number and a python object that would be kept in the
DestReceiver.

The receiver function would get the object created by the startup
function, as well as a python tuple of the TupleTableSlot that had gone
through type conversion. It would need to add the value to the object
from the startup function. It would return true or false, just like a
Portal receiver function does.

The shutdown function would receive the object that's been passed
around. It would be able to do any post-processing. Whatever it returned
is what would be handed back to python as the results of the query.

The version number returned by the startup function allows for future
improvements to this facility. One idea there is allowing the startup
function to control how Datums get mapped into python objects.

In order to support all of this without breaking backwards compatibility
or forking a new API, plpy.execute would accept a kwdict, to avoid
conflicting with the arbitrary number of arguments that can currently be
accepted. We'd look in the kwdict for a key called "portal_functions"
pointing at a 2 or 3 element tuple of the startup, receive and shutdown
functions. plpy would pre-define a tuple that provides the current
behavior, and that's what would be used by default. In the future, we
might add a way to control the default.

Comments?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jim Nasby (#11)
Re: Faster methods for getting SPI results (460% improvement)

On 1/24/17 10:43 PM, Jim Nasby wrote:

I strongly suggest making this design effort a separate thread, and
focusing on the SPI improvements that give "free" no-user-action
performance boosts here.

Fair enough. I posted the SPI portion of that yesterday. That should be
useful for pl/R and possibly pl/perl. pl/tcl could make use of it, but
it would end up executing arbitrary tcl code in the middle of portal
execution, which doesn't strike me as a great idea. Unfortunately, I
don't think plpgsql could make much use of this for similar reasons.

I'll post a plpython patch that doesn't add the output format control.

I've attached the results of that. Unfortunately the speed improvement
is only 27% at this point (with 9999999 tuples). Presumably that's
because it's constructing a brand new dictionary from scratch for each
tuple.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

Attachments:

0002-Minimal-adoption-of-SPI-callbacks-in-plpython.patchtext/plain; charset=UTF-8; name=0002-Minimal-adoption-of-SPI-callbacks-in-plpython.patch; x-mac-creator=0; x-mac-type=0Download+231-80
0001-Add-SPI_execute_callback-and-callback-based-DestRece.patchtext/plain; charset=UTF-8; name=0001-Add-SPI_execute_callback-and-callback-based-DestRece.patch; x-mac-creator=0; x-mac-type=0Download+83-10
#14Peter Eisentraut
peter_e@gmx.net
In reply to: Jim Nasby (#1)
Re: Faster methods for getting SPI results

On 12/20/16 23:14, Jim Nasby wrote:

I've been looking at the performance of SPI calls within plpython.
There's a roughly 1.5x difference from equivalent python code just in
pulling data out of the SPI tuplestore. Some of that is due to an
inefficiency in how plpython is creating result dictionaries, but fixing
that is ultimately a dead-end: if you're dealing with a lot of results
in python, you want a tuple of arrays, not an array of tuples.

There is nothing that requires us to materialize the results into an
actual list of actual rows. We could wrap the SPI_tuptable into a
Python object and implement __getitem__ or __iter__ to emulate sequence
or mapping access.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Jon Nelson
jnelson+pgsql@jamponi.net
In reply to: Peter Eisentraut (#14)
Re: Faster methods for getting SPI results

On Thu, Mar 2, 2017 at 10:03 AM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 12/20/16 23:14, Jim Nasby wrote:

I've been looking at the performance of SPI calls within plpython.
There's a roughly 1.5x difference from equivalent python code just in
pulling data out of the SPI tuplestore. Some of that is due to an
inefficiency in how plpython is creating result dictionaries, but fixing
that is ultimately a dead-end: if you're dealing with a lot of results
in python, you want a tuple of arrays, not an array of tuples.

There is nothing that requires us to materialize the results into an
actual list of actual rows. We could wrap the SPI_tuptable into a
Python object and implement __getitem__ or __iter__ to emulate sequence
or mapping access.

Python objects have a small (but non-zero) overhead in terms of both memory
and speed. A built-in dictionary is probably one of the least-expensive
(memory/cpu) choices, although how the dictionary is constructed also
impacts performance. Another choice is a tuple.

Avoiding Py_BuildValue(...) in exchange for a bit more complexity (via
PyTuple_New(..) and PyTuple_SetItem(...)) is also a nice performance win in
my experience.

--
Jon

#16Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Peter Eisentraut (#14)
Re: Faster methods for getting SPI results

On 3/2/17 8:03 AM, Peter Eisentraut wrote:

On 12/20/16 23:14, Jim Nasby wrote:

I've been looking at the performance of SPI calls within plpython.
There's a roughly 1.5x difference from equivalent python code just in
pulling data out of the SPI tuplestore. Some of that is due to an
inefficiency in how plpython is creating result dictionaries, but fixing
that is ultimately a dead-end: if you're dealing with a lot of results
in python, you want a tuple of arrays, not an array of tuples.

There is nothing that requires us to materialize the results into an
actual list of actual rows. We could wrap the SPI_tuptable into a
Python object and implement __getitem__ or __iter__ to emulate sequence
or mapping access.

Would it be possible to have that just pull tuples directly from the
executor? The overhead of populating the tuplestore just to drain it
again can become quite significant, and AFAICT it's completely unnecessary.

Unfortunately, I think adding support for that would be even more
invasive, which is why I haven't attempted it. On the flip side, I
believe that kind of an interface would be usable by plpgsql, whereas
the DestReceiver approach is not (AFAICT).
--
Jim Nasby, Chief Data Architect, OpenSCG

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jim Nasby (#13)
Re: Faster methods for getting SPI results (460% improvement)

On 2/28/17 9:42 PM, Jim Nasby wrote:

I'll post a plpython patch that doesn't add the output format control.

I've attached the results of that. Unfortunately the speed improvement
is only 27% at this point (with 9999999 tuples). Presumably that's
because it's constructing a brand new dictionary from scratch for each
tuple.

I found a couple bugs. New patches attached.
--
Jim Nasby, Chief Data Architect, OpenSCG

Attachments:

0001-Add-SPI_execute_callback-and-callback-based-DestRece.patchtext/plain; charset=UTF-8; name=0001-Add-SPI_execute_callback-and-callback-based-DestRece.patch; x-mac-creator=0; x-mac-type=0Download+85-11
0002-Modify-plpython-to-use-SPI-callbacks.patchtext/plain; charset=UTF-8; name=0002-Modify-plpython-to-use-SPI-callbacks.patch; x-mac-creator=0; x-mac-type=0Download+234-82
#18Peter Eisentraut
peter_e@gmx.net
In reply to: Jim Nasby (#16)
Re: Faster methods for getting SPI results

On 3/5/17 16:07, Jim Nasby wrote:

There is nothing that requires us to materialize the results into an
actual list of actual rows. We could wrap the SPI_tuptable into a
Python object and implement __getitem__ or __iter__ to emulate sequence
or mapping access.

Would it be possible to have that just pull tuples directly from the
executor? The overhead of populating the tuplestore just to drain it
again can become quite significant, and AFAICT it's completely unnecessary.

I think there are many options, depending on what you want. If you want
to materialize the result, then you have to materialize it somewhere,
and then make a Python object around that. Or you could make an
iterator interface that just reads a tuple at a time. Or maybe there
are other options.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Craig Ringer
craig@2ndquadrant.com
In reply to: Jim Nasby (#17)
Re: Faster methods for getting SPI results (460% improvement)

On 6 March 2017 at 05:09, Jim Nasby <jim.nasby@openscg.com> wrote:

On 2/28/17 9:42 PM, Jim Nasby wrote:

I'll post a plpython patch that doesn't add the output format control.

I've attached the results of that. Unfortunately the speed improvement
is only 27% at this point (with 9999999 tuples). Presumably that's
because it's constructing a brand new dictionary from scratch for each
tuple.

Taking a look at this now.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#19)
Re: Faster methods for getting SPI results (460% improvement)

On 5 April 2017 at 08:00, Craig Ringer <craig@2ndquadrant.com> wrote:

Taking a look at this now.

Rebased to current master with conflicts and whitespace errors fixed.
Review pending.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Add-SPI_execute_callback-and-callback-based-DestRece.patchtext/x-patch; charset=US-ASCII; name=0001-Add-SPI_execute_callback-and-callback-based-DestRece.patchDownload+85-11
0002-Modify-plpython-to-use-SPI-callbacks.patchtext/x-patch; charset=US-ASCII; name=0002-Modify-plpython-to-use-SPI-callbacks.patchDownload+235-85
#21Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#20)
#22Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Craig Ringer (#21)
#23Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Craig Ringer (#21)
#24Craig Ringer
craig@2ndquadrant.com
In reply to: Jim Nasby (#22)
#25Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jim Nasby (#23)
#26Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#24)
#27Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Craig Ringer (#26)
#28Craig Ringer
craig@2ndquadrant.com
In reply to: Jim Nasby (#25)
#29Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#28)
#30Peter Eisentraut
peter_e@gmx.net
In reply to: Craig Ringer (#29)
#31Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Peter Eisentraut (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#30)
#33Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#33)
#35Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#34)
#36Andres Freund
andres@anarazel.de
In reply to: Jim Nasby (#31)
#37Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andres Freund (#36)
#38Andres Freund
andres@anarazel.de
In reply to: Jim Nasby (#37)
#39Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andres Freund (#38)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#35)
#41Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#40)
#42Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andres Freund (#41)
#43Chapman Flack
chap@anastigmatix.net
In reply to: Craig Ringer (#28)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chapman Flack (#43)
#45Chapman Flack
chap@anastigmatix.net
In reply to: Tom Lane (#44)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chapman Flack (#45)
#47Chapman Flack
chap@anastigmatix.net
In reply to: Tom Lane (#46)
#48Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#46)