Re: asynchronous execution
[ Adjusting subject line to reflect the actual topic of discussion better. ]
On Fri, Sep 23, 2016 at 9:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 23, 2016 at 8:45 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
For e.g., in the above plan which you specified, suppose :
1. Hash Join has called ExecProcNode() for the child foreign scan b, and so
is
waiting in ExecAsyncWaitForNode(foreign_scan_on_b).
2. The event wait list already has foreign scan on a that is on a different
subtree.
3. This foreign scan a happens to be ready, so in
ExecAsyncWaitForNode (), ExecDispatchNode(foreign_scan_a) is called,
which returns with result_ready.
4. Since it returns result_ready, it's parent node is now inserted in the
callbacks array, and so it's parent (Append) is executed.
5. But, this Append planstate is already in the middle of executing Hash
join, and is waiting for HashJoin.Ah, yeah, something like that could happen. I've spent much of this
week working on a new design for this feature which I think will avoid
this problem. It doesn't work yet - in fact I can't even really test
it yet. But I'll post what I've got by the end of the day today so
that anyone who is interested can look at it and critique.
Well, I promised to post this, so here it is. It's not really working
all that well at this point, and it's definitely not doing anything
that interesting, but you can see the outline of what I have in mind.
Since Kyotaro Horiguchi found that my previous design had a
system-wide performance impact due to the ExecProcNode changes, I
decided to take a different approach here: I created an async
infrastructure where both the requestor and the requestee have to be
specifically modified to support parallelism, and then modified Append
and ForeignScan to cooperate using the new interface. Hopefully that
means that anything other than those two nodes will suffer no
performance impact. Of course, it might have other problems....
Some notes:
- EvalPlanQual rechecks are broken.
- EXPLAIN ANALYZE instrumentation is broken.
- ExecReScanAppend is broken, because the async stuff needs some way
of canceling an async request and I didn't invent anything like that
yet.
- The postgres_fdw changes pretend to be async but aren't actually.
It's just a demo of (part of) the interface at this point.
- The postgres_fdw changes also report all pg-fdw paths as
async-capable, but actually the direct-modify ones aren't, so the
regression tests fail.
- Errors in the executor can leak the WaitEventSet. Probably we need
to modify ResourceOwners to be able to own WaitEventSets.
- There are probably other bugs, too.
Whee!
Note that I've tried to solve the re-entrancy problems by (1) putting
all of the event loop's state inside the EState rather than in local
variables and (2) having the function that is called to report arrival
of a result be thoroughly different than the function that is used to
return a tuple to a synchronous caller.
Comments welcome, if you're feeling brave enough to look at anything
this half-baked.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
async-wip-2016-09-23.patchbinary/octet-stream; name=async-wip-2016-09-23.patchDownload+909-26
On 24 September 2016 at 06:39, Robert Haas <robertmhaas@gmail.com> wrote:
Since Kyotaro Horiguchi found that my previous design had a
system-wide performance impact due to the ExecProcNode changes, I
decided to take a different approach here: I created an async
infrastructure where both the requestor and the requestee have to be
specifically modified to support parallelism, and then modified Append
and ForeignScan to cooperate using the new interface. Hopefully that
means that anything other than those two nodes will suffer no
performance impact. Of course, it might have other problems....
I see that the reason why you re-designed the asynchronous execution
implementation is because the earlier implementation showed
performance degradation in local sequential and local parallel scans.
But I checked that the ExecProcNode() changes were not that
significant as to cause the degradation. It will not call
ExecAsyncWaitForNode() unless that node supports asynchronism. Do you
feel there is anywhere else in the implementation that is really
causing this degrade ? That previous implementation has some issues,
but they seemed solvable. We could resolve the plan state recursion
issue by explicitly making sure the same plan state does not get
called again while it is already executing.
Thanks
-Amit Khandekar
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Sorry for delayed response, I'll have enough time from now and
address this.
At Fri, 23 Sep 2016 21:09:03 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoaXQEt4tZ03FtQhnzeDEMzBck+Lrni0UWHVVgOTnA6C1w@mail.gmail.com>
Well, I promised to post this, so here it is. It's not really working
all that well at this point, and it's definitely not doing anything
that interesting, but you can see the outline of what I have in mind.
Since Kyotaro Horiguchi found that my previous design had a
system-wide performance impact due to the ExecProcNode changes, I
decided to take a different approach here: I created an async
infrastructure where both the requestor and the requestee have to be
specifically modified to support parallelism, and then modified Append
and ForeignScan to cooperate using the new interface. Hopefully that
means that anything other than those two nodes will suffer no
performance impact. Of course, it might have other problems....Some notes:
- EvalPlanQual rechecks are broken.
- EXPLAIN ANALYZE instrumentation is broken.
- ExecReScanAppend is broken, because the async stuff needs some way
of canceling an async request and I didn't invent anything like that
yet.
- The postgres_fdw changes pretend to be async but aren't actually.
It's just a demo of (part of) the interface at this point.
- The postgres_fdw changes also report all pg-fdw paths as
async-capable, but actually the direct-modify ones aren't, so the
regression tests fail.
- Errors in the executor can leak the WaitEventSet. Probably we need
to modify ResourceOwners to be able to own WaitEventSets.
- There are probably other bugs, too.Whee!
Note that I've tried to solve the re-entrancy problems by (1) putting
all of the event loop's state inside the EState rather than in local
variables and (2) having the function that is called to report arrival
of a result be thoroughly different than the function that is used to
return a tuple to a synchronous caller.Comments welcome, if you're feeling brave enough to look at anything
this half-baked.
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, thank you for the comment.
At Wed, 28 Sep 2016 10:00:08 +0530, Amit Khandekar <amitdkhan.pg@gmail.com> wrote in <CAJ3gD9fRmEhUoBMnNN8K_QrHZf7m4rmOHTFDj492oeLZff8o=w@mail.gmail.com>
On 24 September 2016 at 06:39, Robert Haas <robertmhaas@gmail.com> wrote:
Since Kyotaro Horiguchi found that my previous design had a
system-wide performance impact due to the ExecProcNode changes, I
decided to take a different approach here: I created an async
infrastructure where both the requestor and the requestee have to be
specifically modified to support parallelism, and then modified Append
and ForeignScan to cooperate using the new interface. Hopefully that
means that anything other than those two nodes will suffer no
performance impact. Of course, it might have other problems....I see that the reason why you re-designed the asynchronous execution
implementation is because the earlier implementation showed
performance degradation in local sequential and local parallel scans.
But I checked that the ExecProcNode() changes were not that
significant as to cause the degradation.
The basic thought is that we don't allow degradation of as small
as around one percent for simple cases in exchange for this
feature (or similar ones).
Very simple case of SeqScan runs through a very short path, on
where prediction failure penalties of CPU by few branches results
in visible impact. I avoided that by using likely/unlikly but
more fundamental measure is preferable.
It will not call ExecAsyncWaitForNode() unless that node
supports asynchronism.
That's true, but it takes a certain amount of CPU cycle to decide
call it or not. The small bit of time is the issue in focus now.
Do you feel there is anywhere else in
the implementation that is really causing this degrade ? That
previous implementation has some issues, but they seemed
solvable. We could resolve the plan state recursion issue by
explicitly making sure the same plan state does not get called
again while it is already executing.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thank you for the thought.
At Fri, 23 Sep 2016 21:09:03 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoaXQEt4tZ03FtQhnzeDEMzBck+Lrni0UWHVVgOTnA6C1w@mail.gmail.com>
[ Adjusting subject line to reflect the actual topic of discussion better. ]
On Fri, Sep 23, 2016 at 9:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 23, 2016 at 8:45 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
For e.g., in the above plan which you specified, suppose :
1. Hash Join has called ExecProcNode() for the child foreign scan b, and so
is
waiting in ExecAsyncWaitForNode(foreign_scan_on_b).
2. The event wait list already has foreign scan on a that is on a different
subtree.
3. This foreign scan a happens to be ready, so in
ExecAsyncWaitForNode (), ExecDispatchNode(foreign_scan_a) is called,
which returns with result_ready.
4. Since it returns result_ready, it's parent node is now inserted in the
callbacks array, and so it's parent (Append) is executed.
5. But, this Append planstate is already in the middle of executing Hash
join, and is waiting for HashJoin.Ah, yeah, something like that could happen. I've spent much of this
week working on a new design for this feature which I think will avoid
this problem. It doesn't work yet - in fact I can't even really test
it yet. But I'll post what I've got by the end of the day today so
that anyone who is interested can look at it and critique.Well, I promised to post this, so here it is. It's not really working
all that well at this point, and it's definitely not doing anything
that interesting, but you can see the outline of what I have in mind.
Since Kyotaro Horiguchi found that my previous design had a
system-wide performance impact due to the ExecProcNode changes, I
decided to take a different approach here: I created an async
infrastructure where both the requestor and the requestee have to be
specifically modified to support parallelism, and then modified Append
and ForeignScan to cooperate using the new interface. Hopefully that
means that anything other than those two nodes will suffer no
performance impact. Of course, it might have other problems....
The previous framework didn't need to distinguish async-capable
and uncapable nodes from the parent node's view. The things in
ExecProcNode was required for the reason. Instead, this new one
removes the ExecProcNode stuff by distinguishing the two kinds of
node in async-aware parents, that is, Append. This no longer
involves async-unaware nodes into the tuple bubbling-up mechanism
so the reentrant problem doesn't seem to occur.
On the other hand, for example, the following plan, regrardless
of its practicality, (there should be more good example..)
(Async-unaware node)
- NestLoop
- Append
- n * ForegnScan
- Append
- n * ForegnScan
If the NestLoop, Append are async-aware, all of the ForeignScans
can run asynchronously with the previous framework. The topmost
NestLoop will be awakened after that firing of any ForenScans
makes a tuple bubbles up to the NestLoop. This is because the
not-need-to-distinguish-aware-or-not nature provided by the
ExecProcNode stuff.
On the other hand, with the new one, in order to do the same
thing, ExecAppend have in turn to behave differently whether the
parent is async or not. To do this will be bothersome but not
with confidence.
I examine this further intensively, especially for performance
degeneration and obstacles to complete this.
Some notes:
- EvalPlanQual rechecks are broken.
- EXPLAIN ANALYZE instrumentation is broken.
- ExecReScanAppend is broken, because the async stuff needs some way
of canceling an async request and I didn't invent anything like that
yet.
- The postgres_fdw changes pretend to be async but aren't actually.
It's just a demo of (part of) the interface at this point.
- The postgres_fdw changes also report all pg-fdw paths as
async-capable, but actually the direct-modify ones aren't, so the
regression tests fail.
- Errors in the executor can leak the WaitEventSet. Probably we need
to modify ResourceOwners to be able to own WaitEventSets.
- There are probably other bugs, too.Whee!
Note that I've tried to solve the re-entrancy problems by (1) putting
all of the event loop's state inside the EState rather than in local
variables and (2) having the function that is called to report arrival
of a result be thoroughly different than the function that is used to
return a tuple to a synchronous caller.Comments welcome, if you're feeling brave enough to look at anything
this half-baked.
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 28, 2016 at 12:30 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On 24 September 2016 at 06:39, Robert Haas <robertmhaas@gmail.com> wrote:
Since Kyotaro Horiguchi found that my previous design had a
system-wide performance impact due to the ExecProcNode changes, I
decided to take a different approach here: I created an async
infrastructure where both the requestor and the requestee have to be
specifically modified to support parallelism, and then modified Append
and ForeignScan to cooperate using the new interface. Hopefully that
means that anything other than those two nodes will suffer no
performance impact. Of course, it might have other problems....I see that the reason why you re-designed the asynchronous execution
implementation is because the earlier implementation showed
performance degradation in local sequential and local parallel scans.
But I checked that the ExecProcNode() changes were not that
significant as to cause the degradation.
I think we need some testing to prove that one way or the other. If
you can do some - say on a plan with multiple nested loop joins with
inner index-scans, which will call ExecProcNode() a lot - that would
be great. I don't think we can just rely on "it doesn't seem like it
should be slower", though - ExecProcNode() is too important a function
for us to guess at what the performance will be.
The thing I'm really worried about with either implementation is what
happens when we start to add asynchronous capability to multiple
nodes. For example, if you imagine a plan like this:
Append
-> Hash Join
-> Foreign Scan
-> Hash
-> Seq Scan
-> Hash Join
-> Foreign Scan
-> Hash
-> Seq Scan
In order for this to run asynchronously, you need not only Append and
Foreign Scan to be async-capable, but also Hash Join. That's true in
either approach. Things are slightly better with the original
approach, but the basic problem is there in both cases. So it seems
we need an approach that will make adding async capability to a node
really cheap, which seems like it might be a problem.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4 October 2016 at 02:30, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Sep 28, 2016 at 12:30 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On 24 September 2016 at 06:39, Robert Haas <robertmhaas@gmail.com> wrote:
Since Kyotaro Horiguchi found that my previous design had a
system-wide performance impact due to the ExecProcNode changes, I
decided to take a different approach here: I created an async
infrastructure where both the requestor and the requestee have to be
specifically modified to support parallelism, and then modified Append
and ForeignScan to cooperate using the new interface. Hopefully that
means that anything other than those two nodes will suffer no
performance impact. Of course, it might have other problems....I see that the reason why you re-designed the asynchronous execution
implementation is because the earlier implementation showed
performance degradation in local sequential and local parallel scans.
But I checked that the ExecProcNode() changes were not that
significant as to cause the degradation.I think we need some testing to prove that one way or the other. If
you can do some - say on a plan with multiple nested loop joins with
inner index-scans, which will call ExecProcNode() a lot - that would
be great. I don't think we can just rely on "it doesn't seem like it
should be slower"
Agreed. I will come up with some tests.
, though - ExecProcNode() is too important a function
for us to guess at what the performance will be.
Also, parent pointers are not required in the new design. Thinking of
parent pointers, now it seems the event won't get bubbled up the tree
with the new design. But still, , I think it's possible to switch over
to the other asynchronous tree when some node in the current subtree
is waiting. But I am not sure, will think more on that.
The thing I'm really worried about with either implementation is what
happens when we start to add asynchronous capability to multiple
nodes. For example, if you imagine a plan like this:Append
-> Hash Join
-> Foreign Scan
-> Hash
-> Seq Scan
-> Hash Join
-> Foreign Scan
-> Hash
-> Seq ScanIn order for this to run asynchronously, you need not only Append and
Foreign Scan to be async-capable, but also Hash Join. That's true in
either approach. Things are slightly better with the original
approach, but the basic problem is there in both cases. So it seems
we need an approach that will make adding async capability to a node
really cheap, which seems like it might be a problem.
Yes, we might have to deal with this.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 4, 2016 at 7:53 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
Also, parent pointers are not required in the new design. Thinking of
parent pointers, now it seems the event won't get bubbled up the tree
with the new design. But still, , I think it's possible to switch over
to the other asynchronous tree when some node in the current subtree
is waiting. But I am not sure, will think more on that.
The bubbling-up still happens, because each node that made an async
request gets a callback with the final response - and if it is itself
the recipient of an async request, it can use that callback to respond
to that request in turn. This version doesn't bubble up through
non-async-aware nodes, but that might be a good thing.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, this works but ExecAppend gets a bit degradation.
At Mon, 03 Oct 2016 19:46:32 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20161003.194632.204401048.horiguchi.kyotaro@lab.ntt.co.jp>
Some notes:
- EvalPlanQual rechecks are broken.
This is fixed by adding (restoring) async-cancelation.
- EXPLAIN ANALYZE instrumentation is broken.
EXPLAIN ANALYE seems working but async-specific information is
not available yet.
- ExecReScanAppend is broken, because the async stuff needs some way
of canceling an async request and I didn't invent anything like that
yet.
Fixed as EvalPlanQual.
- The postgres_fdw changes pretend to be async but aren't actually.
It's just a demo of (part of) the interface at this point.
Applied my previous patch with some modification.
- The postgres_fdw changes also report all pg-fdw paths as
async-capable, but actually the direct-modify ones aren't, so the
regression tests fail.
All actions other than scan does vacate_connection() to use a
connection.
- Errors in the executor can leak the WaitEventSet. Probably we need
to modify ResourceOwners to be able to own WaitEventSets.
WaitEventSet itself is not leaked but epoll-fd should be closed
at failure. This seems doable with TRY-CATCHing in
ExecAsyncEventLoop. (not yet)
- There are probably other bugs, too.
Whee!
Note that I've tried to solve the re-entrancy problems by (1) putting
all of the event loop's state inside the EState rather than in local
variables and (2) having the function that is called to report arrival
of a result be thoroughly different than the function that is used to
return a tuple to a synchronous caller.Comments welcome, if you're feeling brave enough to look at anything
this half-baked.
This doesn't cause reentry since this no longer bubbles up
tupples through async-unaware nodes. This framework passes tuples
through private channels for requestor and requestees.
Anyway, I amended this and made postgres_fdw async and then
finally all regtests passed with minor modifications. The
attached patches are the following.
0001-robert-s-2nd-framework.patch
The patch Robert shown upthread
0002-Fix-some-bugs.patch
A small patch to fix complation errors of 0001.
0003-Modify-async-execution-infrastructure.patch
Several modifications on the infrastructure. The details are
shown after the measurement below.
0004-Make-postgres_fdw-async-capable.patch
True-async postgres-fdw.
gentblr.sql, testrun.sh, calc.pl
Performance test script suite.
gentblr.sql - creates test tables.
testrun.sh - does single test run and
calc.pl - drives testrunc.sh and summirize its results.
I measured performance and had the following result.
t0 - SELECT sum(a) FROM <local single table>;
pl - SELECT sum(a) FROM <4 local children>;
pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
The result is written as "time<ms> (std dev <ms>)"
sync
t0: 3820.33 ( 1.88)
pl: 1608.59 ( 12.06)
pf0: 7928.29 ( 46.58)
pf1: 8023.16 ( 26.43)
async
t0: 3806.31 ( 4.49) 0.4% faster (should be error)
pl: 1629.17 ( 0.29) 1.3% slower
pf0: 6447.07 ( 25.19) 18.7% faster
pf1: 1876.80 ( 47.13) 76.6% faster
t0 is not affected since the ExecProcNode stuff has gone.
pl is getting a bit slower. (almost the same to simple seqscan of
the previous patch) This should be a misprediction penalty.
pf0, pf1 are faster as expected.
========
The below is a summary of modifications made by 0002 and 0003 patch.
execAsync.c, execnodes.h:
- Added include "pgstat.h" to use WAIT_EVENT_ASYNC_WAIT.
- Changed the interface of ExecAsyncRequest to return if a tuple is
immediately available or not.
- Made ExecAsyncConfigureWait to return if it registered at least
one waitevent or not. This is used to know the caller
(ExecAsyncEventWait) has a event to wait (for safety).
If two or more postgres_fdw nodes are sharing one connection,
only one of them can be waited at once. It is a
responsibility to the FDW drivers to ensure at least one wait
event to be added but on failure WaitEventSetWait silently
waits forever.
- There were separate areq->callback_pending and
areq->request_complete but they are altering together so they are
replaced with one state variable areq->state. New enum
AsyncRequestState for areq->state in execnodes.h.
nodeAppend.c:
- Return a tuple immediately if ExecAsyncRequest says that a
tuple is available.
- Reduced nest level of for(;;).
nodeForeignscan.[ch], fdwapi.h, execProcnode.c::
- Calling postgresIterateForeignScan can yield tuples in wrong
shape. Call ExecForeignScan instead.
- Changed the interface of AsyncConfigureWait as execAsync.c.
- Added ShutdownForeignScan interface.
createplan.c, ruleutils.c, plannodes.h:
- With the Rebert's change, explain shows somewhat odd plans
where the Output of Append is named after non-parent
child. This does not harm but uneasy. Added index of the
parent in Append.referent to make it reasoable. (But this
looks ugly..). Still children in explain are in different
order from definition. (expected/postgres_fdw.out is edited)
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-robert-s-2nd-framework.patchtext/x-patch; charset=us-asciiDownload+909-26
0002-Fix-some-bugs.patchtext/x-patch; charset=us-asciiDownload+81-72
0003-Modify-async-execution-infrastructure.patchtext/x-patch; charset=us-asciiDownload+167-111
0004-Make-postgres_fdw-async-capable.patchtext/x-patch; charset=us-asciiDownload+510-133
gentblr.sqltext/plain; charset=us-asciiDownload
testrun.shtext/plain; charset=us-asciiDownload
calc.pltext/plain; charset=us-asciiDownload
This is the rebased version on the current master(-0004), and
added resowner stuff (0005) and unlikely(0006).
At Tue, 18 Oct 2016 10:30:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20161018.103051.30820907.horiguchi.kyotaro@lab.ntt.co.jp>
- Errors in the executor can leak the WaitEventSet. Probably we need
to modify ResourceOwners to be able to own WaitEventSets.WaitEventSet itself is not leaked but epoll-fd should be closed
at failure. This seems doable with TRY-CATCHing in
ExecAsyncEventLoop. (not yet)
Haha, that's a silly talk. The wait event can continue to live
when timeout and any error can happen on the way after the
that. I added an entry for wait event set to resource owner and
hang ones created in ExecAsyncEventWait to
TopTransactionResourceOwner. Currently WaitLatchOrSocket doesn't
do so not to change the current behavior. WaitEventSet doesn't
have usable identifier for resowner.c so currently I use the
address(pointer value) for the purpose. The patch 0005 does that.
I measured performance and had the following result.
t0 - SELECT sum(a) FROM <local single table>;
pl - SELECT sum(a) FROM <4 local children>;
pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;The result is written as "time<ms> (std dev <ms>)"
sync
t0: 3820.33 ( 1.88)
pl: 1608.59 ( 12.06)
pf0: 7928.29 ( 46.58)
pf1: 8023.16 ( 26.43)async
t0: 3806.31 ( 4.49) 0.4% faster (should be error)
pl: 1629.17 ( 0.29) 1.3% slower
pf0: 6447.07 ( 25.19) 18.7% faster
pf1: 1876.80 ( 47.13) 76.6% fastert0 is not affected since the ExecProcNode stuff has gone.
pl is getting a bit slower. (almost the same to simple seqscan of
the previous patch) This should be a misprediction penalty.
Using likely macro for ExecAppend, and it seems to have shaken
off the degradation.
sync
t0: 3919.49 ( 5.95)
pl: 1637.95 ( 0.75)
pf0: 8304.20 ( 43.94)
pf1: 8222.09 ( 28.20)
async
t0: 3885.84 ( 40.20) 0.86% faster (should be error but stable on my env..)
pl: 1617.20 ( 3.51) 1.26% faster (ditto)
pf0: 6680.95 (478.72) 19.5% faster
pf1: 1886.87 ( 36.25) 77.1% faster
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-robert-s-2nd-framework.patchtext/x-patch; charset=us-asciiDownload+909-26
0002-Fix-some-bugs.patchtext/x-patch; charset=us-asciiDownload+81-72
0003-Modify-async-execution-infrastructure.patchtext/x-patch; charset=us-asciiDownload+167-111
0004-Make-postgres_fdw-async-capable.patchtext/x-patch; charset=us-asciiDownload+510-133
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patchtext/x-patch; charset=us-asciiDownload+114-5
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patchtext/x-patch; charset=us-asciiDownload+2-3
Hi, this is the 7th patch to make instrumentation work.
Explain analyze shows the following result by the previous patch set .
| Aggregate (cost=820.25..820.26 rows=1 width=8) (actual time=4324.676..4324.676
| rows=1 loops=1)
| -> Append (cost=0.00..791.00 rows=11701 width=4) (actual time=0.910..3663.8
|82 rows=4000000 loops=1)
| -> Foreign Scan on ft10 (cost=100.00..197.75 rows=2925 width=4)
| (never executed)
| -> Foreign Scan on ft20 (cost=100.00..197.75 rows=2925 width=4)
| (never executed)
| -> Foreign Scan on ft30 (cost=100.00..197.75 rows=2925 width=4)
| (never executed)
| -> Foreign Scan on ft40 (cost=100.00..197.75 rows=2925 width=4)
| (never executed)
| -> Seq Scan on pf0 (cost=0.00..0.00 rows=1 width=4)
| (actual time=0.004..0.004 rows=0 loops=1)
The current instrument stuff assumes that requested tuple always
returns a tuple or the end of tuple comes. This async framework
has two point of executing underneath nodes. ExecAsyncRequest and
ExecAsyncEventLoop. So I'm not sure if this is appropriate but
anyway it seems to show sane numbers.
| Aggregate (cost=820.25..820.26 rows=1 width=8) (actual time=4571.205..4571.206
| rows=1 loops=1)
| -> Append (cost=0.00..791.00 rows=11701 width=4) (actual time=1.362..3893.1
|14 rows=4000000 loops=1)
| -> Foreign Scan on ft10 (cost=100.00..197.75 rows=2925 width=4)
| (actual time=1.056..770.863 rows=1000000 loops=1)
| -> Foreign Scan on ft20 (cost=100.00..197.75 rows=2925 width=4)
| (actual time=0.461..767.840 rows=1000000 loops=1)
| -> Foreign Scan on ft30 (cost=100.00..197.75 rows=2925 width=4)
| (actual time=0.474..782.547 rows=1000000 loops=1)
| -> Foreign Scan on ft40 (cost=100.00..197.75 rows=2925 width=4)
| (actual time=0.156..765.920 rows=1000000 loops=1)
| -> Seq Scan on pf0 (cost=0.00..0.00 rows=1 width=4) (never executed)
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0007-Add-instrumentation-to-async-execution.patchtext/x-patch; charset=us-asciiDownload+20-2
Hello,
I'm not sure this is in a sutable shape for commit fest but I
decided to register this to ride on the bus for 10.0.
Hi, this is the 7th patch to make instrumentation work.
This a PoC patch of asynchronous execution feature, based on a
executor infrastructure Robert proposed. These patches are
rebased on the current master.
0001-robert-s-2nd-framework.patch
Roberts executor async infrastructure. Async-driver nodes
register its async-capable children and sync and data transfer
are done out of band of ordinary ExecProcNode channel. So async
execution no longer disturbs async-unaware node and slows them
down.
0002-Fix-some-bugs.patch
Some fixes for 0001 to work. This is just to preserve the shape
of 0001 patch.
0003-Modify-async-execution-infrastructure.patch
The original infrastructure doesn't work when multiple foreign
tables is on the same connection. This makes it work.
0004-Make-postgres_fdw-async-capable.patch
Makes postgres_fdw to work asynchronously.
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch
This addresses a problem pointed by Robers about 0001 patch,
that WaitEventSet used for async execution can leak by errors.
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch
ExecAppend gets a bit slower by penalties of misprediction of
branches. This fixes it by using unlikely() macro.
0007-Add-instrumentation-to-async-execution.patch
As the description above for 0001, async infrastructure conveys
tuples outside ExecProcNode channel so EXPLAIN ANALYZE requires
special treat to show sane results. This patch tries that.
A result of a performance measurement is in this message.
/messages/by-id/20161025.182150.230901487.horiguchi.kyotaro@lab.ntt.co.jp
| t0 - SELECT sum(a) FROM <local single table>;
| pl - SELECT sum(a) FROM <4 local children>;
| pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
| pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
...
| async
| t0: 3885.84 ( 40.20) 0.86% faster (should be error but stable on my env..)
| pl: 1617.20 ( 3.51) 1.26% faster (ditto)
| pf0: 6680.95 (478.72) 19.5% faster
| pf1: 1886.87 ( 36.25) 77.1% faster
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-robert-s-2nd-framework.patchtext/x-patch; charset=us-asciiDownload+909-26
0002-Fix-some-bugs.patchtext/x-patch; charset=us-asciiDownload+81-72
0003-Modify-async-execution-infrastructure.patchtext/x-patch; charset=us-asciiDownload+167-111
0004-Make-postgres_fdw-async-capable.patchtext/x-patch; charset=us-asciiDownload+510-133
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patchtext/x-patch; charset=us-asciiDownload+114-5
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patchtext/x-patch; charset=us-asciiDownload+2-3
0007-Add-instrumentation-to-async-execution.patchtext/x-patch; charset=us-asciiDownload+20-2
Hello, this is a maintenance post of reased patches.
I added a change of ResourceOwnerData missed in 0005.
At Mon, 31 Oct 2016 10:39:12 +0900 (JST), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20161031.103912.217430542.horiguchi.kyotaro@lab.ntt.co.jp>
This a PoC patch of asynchronous execution feature, based on a
executor infrastructure Robert proposed. These patches are
rebased on the current master.0001-robert-s-2nd-framework.patch
Roberts executor async infrastructure. Async-driver nodes
register its async-capable children and sync and data transfer
are done out of band of ordinary ExecProcNode channel. So async
execution no longer disturbs async-unaware node and slows them
down.0002-Fix-some-bugs.patch
Some fixes for 0001 to work. This is just to preserve the shape
of 0001 patch.0003-Modify-async-execution-infrastructure.patch
The original infrastructure doesn't work when multiple foreign
tables is on the same connection. This makes it work.0004-Make-postgres_fdw-async-capable.patch
Makes postgres_fdw to work asynchronously.
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch
This addresses a problem pointed by Robers about 0001 patch,
that WaitEventSet used for async execution can leak by errors.0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch
ExecAppend gets a bit slower by penalties of misprediction of
branches. This fixes it by using unlikely() macro.0007-Add-instrumentation-to-async-execution.patch
As the description above for 0001, async infrastructure conveys
tuples outside ExecProcNode channel so EXPLAIN ANALYZE requires
special treat to show sane results. This patch tries that.A result of a performance measurement is in this message.
/messages/by-id/20161025.182150.230901487.horiguchi.kyotaro@lab.ntt.co.jp
| t0 - SELECT sum(a) FROM <local single table>;
| pl - SELECT sum(a) FROM <4 local children>;
| pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
| pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
...
| async
| t0: 3885.84 ( 40.20) 0.86% faster (should be error but stable on my env..)
| pl: 1617.20 ( 3.51) 1.26% faster (ditto)
| pf0: 6680.95 (478.72) 19.5% faster
| pf1: 1886.87 ( 36.25) 77.1% faster
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-robert-s-2nd-framework.patchtext/x-patch; charset=us-asciiDownload+909-26
0002-Fix-some-bugs.patchtext/x-patch; charset=us-asciiDownload+81-72
0003-Modify-async-execution-infrastructure.patchtext/x-patch; charset=us-asciiDownload+167-111
0004-Make-postgres_fdw-async-capable.patchtext/x-patch; charset=us-asciiDownload+510-133
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patchtext/x-patch; charset=us-asciiDownload+114-5
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patchtext/x-patch; charset=us-asciiDownload+2-3
0007-Add-instrumentation-to-async-execution.patchtext/x-patch; charset=us-asciiDownload+20-2
Hello,
I cannot respond until next Monday, so I move this to the next CF
by myself.
At Tue, 15 Nov 2016 20:25:13 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20161115.202513.268072050.horiguchi.kyotaro@lab.ntt.co.jp>
Hello, this is a maintenance post of reased patches.
I added a change of ResourceOwnerData missed in 0005.At Mon, 31 Oct 2016 10:39:12 +0900 (JST), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20161031.103912.217430542.horiguchi.kyotaro@lab.ntt.co.jp>
This a PoC patch of asynchronous execution feature, based on a
executor infrastructure Robert proposed. These patches are
rebased on the current master.0001-robert-s-2nd-framework.patch
Roberts executor async infrastructure. Async-driver nodes
register its async-capable children and sync and data transfer
are done out of band of ordinary ExecProcNode channel. So async
execution no longer disturbs async-unaware node and slows them
down.0002-Fix-some-bugs.patch
Some fixes for 0001 to work. This is just to preserve the shape
of 0001 patch.0003-Modify-async-execution-infrastructure.patch
The original infrastructure doesn't work when multiple foreign
tables is on the same connection. This makes it work.0004-Make-postgres_fdw-async-capable.patch
Makes postgres_fdw to work asynchronously.
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch
This addresses a problem pointed by Robers about 0001 patch,
that WaitEventSet used for async execution can leak by errors.0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch
ExecAppend gets a bit slower by penalties of misprediction of
branches. This fixes it by using unlikely() macro.0007-Add-instrumentation-to-async-execution.patch
As the description above for 0001, async infrastructure conveys
tuples outside ExecProcNode channel so EXPLAIN ANALYZE requires
special treat to show sane results. This patch tries that.A result of a performance measurement is in this message.
/messages/by-id/20161025.182150.230901487.horiguchi.kyotaro@lab.ntt.co.jp
| t0 - SELECT sum(a) FROM <local single table>;
| pl - SELECT sum(a) FROM <4 local children>;
| pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
| pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
...
| async
| t0: 3885.84 ( 40.20) 0.86% faster (should be error but stable on my env..)
| pl: 1617.20 ( 3.51) 1.26% faster (ditto)
| pf0: 6680.95 (478.72) 19.5% faster
| pf1: 1886.87 ( 36.25) 77.1% faster
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
This patch conflicts with e13029a (es_query_dsa) so I rebased
this.
At Mon, 31 Oct 2016 10:39:12 +0900 (JST), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20161031.103912.217430542.horiguchi.kyotaro@lab.ntt.co.jp>
This a PoC patch of asynchronous execution feature, based on a
executor infrastructure Robert proposed. These patches are
rebased on the current master.0001-robert-s-2nd-framework.patch
Roberts executor async infrastructure. Async-driver nodes
register its async-capable children and sync and data transfer
are done out of band of ordinary ExecProcNode channel. So async
execution no longer disturbs async-unaware node and slows them
down.0002-Fix-some-bugs.patch
Some fixes for 0001 to work. This is just to preserve the shape
of 0001 patch.0003-Modify-async-execution-infrastructure.patch
The original infrastructure doesn't work when multiple foreign
tables is on the same connection. This makes it work.0004-Make-postgres_fdw-async-capable.patch
Makes postgres_fdw to work asynchronously.
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch
This addresses a problem pointed by Robers about 0001 patch,
that WaitEventSet used for async execution can leak by errors.0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch
ExecAppend gets a bit slower by penalties of misprediction of
branches. This fixes it by using unlikely() macro.0007-Add-instrumentation-to-async-execution.patch
As the description above for 0001, async infrastructure conveys
tuples outside ExecProcNode channel so EXPLAIN ANALYZE requires
special treat to show sane results. This patch tries that.A result of a performance measurement is in this message.
/messages/by-id/20161025.182150.230901487.horiguchi.kyotaro@lab.ntt.co.jp
| t0 - SELECT sum(a) FROM <local single table>;
| pl - SELECT sum(a) FROM <4 local children>;
| pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
| pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
...
| async
| t0: 3885.84 ( 40.20) 0.86% faster (should be error but stable on my env..)
| pl: 1617.20 ( 3.51) 1.26% faster (ditto)
| pf0: 6680.95 (478.72) 19.5% faster
| pf1: 1886.87 ( 36.25) 77.1% faster
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-robert-s-2nd-framework.patchtext/x-patch; charset=us-asciiDownload+909-26
0002-Fix-some-bugs.patchtext/x-patch; charset=us-asciiDownload+81-72
0003-Modify-async-execution-infrastructure.patchtext/x-patch; charset=us-asciiDownload+167-111
0004-Make-postgres_fdw-async-capable.patchtext/x-patch; charset=us-asciiDownload+510-133
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patchtext/x-patch; charset=us-asciiDownload+114-5
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patchtext/x-patch; charset=us-asciiDownload+2-3
0007-Add-instrumentation-to-async-execution.patchtext/x-patch; charset=us-asciiDownload+20-2
I noticed that this patch is conflicting with 665d1fa (Logical
replication) so I rebased this. Only executor/Makefile
conflicted.
At Mon, 31 Oct 2016 10:39:12 +0900 (JST), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20161031.103912.217430542.horiguchi.kyotaro@lab.ntt.co.jp>
This a PoC patch of asynchronous execution feature, based on a
executor infrastructure Robert proposed. These patches are
rebased on the current master.0001-robert-s-2nd-framework.patch
Roberts executor async infrastructure. Async-driver nodes
register its async-capable children and sync and data transfer
are done out of band of ordinary ExecProcNode channel. So async
execution no longer disturbs async-unaware node and slows them
down.0002-Fix-some-bugs.patch
Some fixes for 0001 to work. This is just to preserve the shape
of 0001 patch.0003-Modify-async-execution-infrastructure.patch
The original infrastructure doesn't work when multiple foreign
tables is on the same connection. This makes it work.0004-Make-postgres_fdw-async-capable.patch
Makes postgres_fdw to work asynchronously.
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch
This addresses a problem pointed by Robers about 0001 patch,
that WaitEventSet used for async execution can leak by errors.0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch
ExecAppend gets a bit slower by penalties of misprediction of
branches. This fixes it by using unlikely() macro.0007-Add-instrumentation-to-async-execution.patch
As the description above for 0001, async infrastructure conveys
tuples outside ExecProcNode channel so EXPLAIN ANALYZE requires
special treat to show sane results. This patch tries that.A result of a performance measurement is in this message.
/messages/by-id/20161025.182150.230901487.horiguchi.kyotaro@lab.ntt.co.jp
| t0 - SELECT sum(a) FROM <local single table>;
| pl - SELECT sum(a) FROM <4 local children>;
| pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
| pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
...
| async
| t0: 3885.84 ( 40.20) 0.86% faster (should be error but stable on my env..)
| pl: 1617.20 ( 3.51) 1.26% faster (ditto)
| pf0: 6680.95 (478.72) 19.5% faster
| pf1: 1886.87 ( 36.25) 77.1% faster
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-robert-s-2nd-framework.patchtext/x-patch; charset=us-asciiDownload+909-26
0002-Fix-some-bugs.patchtext/x-patch; charset=us-asciiDownload+81-72
0003-Modify-async-execution-infrastructure.patchtext/x-patch; charset=us-asciiDownload+167-111
0004-Make-postgres_fdw-async-capable.patchtext/x-patch; charset=us-asciiDownload+510-133
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patchtext/x-patch; charset=us-asciiDownload+114-5
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patchtext/x-patch; charset=us-asciiDownload+2-3
0007-Add-instrumentation-to-async-execution.patchtext/x-patch; charset=us-asciiDownload+20-2
On Tue, Jan 31, 2017 at 12:45 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
I noticed that this patch is conflicting with 665d1fa (Logical
replication) so I rebased this. Only executor/Makefile
conflicted.
The patches still apply, moved to CF 2017-03. Be aware of that:
$ git diff HEAD~6 --check
contrib/postgres_fdw/postgres_fdw.c:388: indent with spaces.
+ PendingAsyncRequest *areq,
contrib/postgres_fdw/postgres_fdw.c:389: indent with spaces.
+ bool reinit);
src/backend/utils/resowner/resowner.c:1332: new blank line at EOF.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thank you.
At Wed, 1 Feb 2017 14:11:58 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqS0MhZrzgMVQeFEnnKABcsMnNULd8=O0PG7_h-FUp5aEQ@mail.gmail.com>
On Tue, Jan 31, 2017 at 12:45 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:I noticed that this patch is conflicting with 665d1fa (Logical
replication) so I rebased this. Only executor/Makefile
conflicted.The patches still apply, moved to CF 2017-03. Be aware of that: $ git diff HEAD~6 --check contrib/postgres_fdw/postgres_fdw.c:388: indent with spaces. + PendingAsyncRequest *areq, contrib/postgres_fdw/postgres_fdw.c:389: indent with spaces. + bool reinit); src/backend/utils/resowner/resowner.c:1332: new blank line at EOF.
Thank you for letting me know the command. I changed my check
scripts to use them and it seems working fine on both commit and
rebase.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
I noticed that this patch is conflicting with 665d1fa (Logical
replication) so I rebased this. Only executor/Makefile
conflicted.
I was lucky enough to see an infinite loop when using this patch, which I
fixed by this change:
diff --git a/src/backend/executor/execAsync.c b/src/backend/executor/execAsync.c
new file mode 100644
index 588ba18..9b87fbd
*** a/src/backend/executor/execAsync.c
--- b/src/backend/executor/execAsync.c
*************** ExecAsyncEventWait(EState *estate, long
*** 364,369 ****
--- 364,370 ----
if ((w->events & WL_LATCH_SET) != 0)
{
+ ResetLatch(MyLatch);
process_latch_set = true;
continue;
}
Actually _almost_ fixed because at some point one of the following
Assert(areq->state == ASYNC_WAITING);
statements fired. I think it was the immediately following one, but I can
imagine the same to happen in the branch
if (process_latch_set)
...
I think the wants_process_latch field of PendingAsyncRequest is not useful
alone because the process latch can be set for reasons completely unrelated to
the asynchronous processing. If the asynchronous node should use latch to
signal it's readiness, I think an additional flag is needed in the request
which tells ExecAsyncEventWait that the latch was set by the asynchronous
node.
BTW, do we really need the ASYNC_CALLBACK_PENDING state? I can imagine the
async node either to change ASYNC_WAITING directly to ASYNC_COMPLETE, or leave
it ASYNC_WAITING if the data is not ready.
In addition, the following comments are based only on code review, I didn't
verify my understanding experimentally:
* Isn't it possible for AppendState.as_asyncresult to contain multiple
responses from the same async node? Since the array stores TupleTableSlot
instead of the actual tuple (so multiple items of as_asyncresult point to
the same slot), I suspect the slot contents might not be defined when the
Append node eventually tries to return it to the upper plan.
* For the WaitEvent subsystem to work, I think postgres_fdw should keep a
separate libpq connection per node, not per user mapping. Currently the
connections are cached by user mapping, but it's legal to locate multiple
child postgres_fdw nodes of Append plan on the same remote server. I expect
that these "co-located" nodes would currently use the same user mapping and
therefore the same connection.
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Feb 3, 2017 at 5:04 AM, Antonin Houska <ah@cybertec.at> wrote:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
I noticed that this patch is conflicting with 665d1fa (Logical
replication) so I rebased this. Only executor/Makefile
conflicted.I was lucky enough to see an infinite loop when using this patch, which I
fixed by this change:diff --git a/src/backend/executor/execAsync.c b/src/backend/executor/execAsync.c new file mode 100644 index 588ba18..9b87fbd *** a/src/backend/executor/execAsync.c --- b/src/backend/executor/execAsync.c *************** ExecAsyncEventWait(EState *estate, long *** 364,369 **** --- 364,370 ----if ((w->events & WL_LATCH_SET) != 0)
{
+ ResetLatch(MyLatch);
process_latch_set = true;
continue;
}
Hi, I've been testing this patch because seemed like it would help a use
case of mine, but can't tell if it's currently working for cases other than
a local parent table that has many child partitions which happen to be
foreign tables. Is it? I was hoping to use it for a case like:
select x, sum(y) from one_remote_table
union all
select x, sum(y) from another_remote_table
union all
select x, sum(y) from a_third_remote_table
but while aggregates do appear to be pushed down, it seems that the remote
tables are being queried in sequence. Am I doing something wrong?