Do parallel queries work with only dblink not with fdw?

Started by Klaus P. Pieperalmost 10 years ago5 messagesgeneral
Jump to latest
#1Klaus P. Pieper
kpi6288@gmail.com

In order to improve cpu and disk utilization, I am testing parallel
queries.

The approach with dblink_send_query() and dblink_get_result() works in my
proof-of-concept. Runtime of my reference query was reduced from 55 seconds
to ~20seconds using 4 parallel connections. Not what I had hoped but
certainly a significant improvement.

My approach is fairly easy:

db_link_send_query('conn1', 'statement based on partitioning field');
db_link_send_query('conn2', 'statement based on partitioning field');

...

SELECT
dblink_get_result('conn1')
UNION ALL
dblink_get_result('conn2')

...

However, using fdw foreign data wrappers, I was not able to run any query
in parallel. I came across this presentation
http://de.slideshare.net/EnterpriseDB/foreign-data-wrappers-and-you-with-postgres
where it says on page 12 "parallel FDW access". Is there any example
available on how to run FDW queries in parallel?

My approach with FDW is similar to the dblink example above:

SELECT * FROM myFdwTable WHERE (clause based on partitioning field)
UNION ALL
SELECT * FROM myFdwTable WHERE (clause based on partitioning field)
...

My experience is however that the statements are carried out sequentially.

Thanks

Klaus Pieper

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Klaus P. Pieper (#1)
Re: Do parallel queries work with only dblink not with fdw?

On 05/04/2016 01:28 AM, Klaus P. wrote:

In order to improve cpu and disk utilization, I am testing parallel
queries.

The approach with dblink_send_query() and dblink_get_result() works in
my proof-of-concept. Runtime of my reference query was reduced from 55
seconds to ~20seconds using 4 parallel connections. Not what I had hoped
but certainly a significant improvement.

My approach is fairly easy:

db_link_send_query('conn1', 'statement based on partitioning field');
db_link_send_query('conn2', 'statement based on partitioning field');

...

SELECT
dblink_get_result('conn1')
UNION ALL
dblink_get_result('conn2')

...

However, using fdw foreign data wrappers, I was not able to run any
query in parallel. I came across this presentation
http://de.slideshare.net/EnterpriseDB/foreign-data-wrappers-and-you-with-postgres
where it says on page 12 "parallel FDW access". Is there any example
available on how to run FDW queries in parallel?

My approach with FDW is similar to the dblink example above:

SELECT * FROM myFdwTable WHERE (clause based on partitioning field)
UNION ALL
SELECT * FROM myFdwTable WHERE (clause based on partitioning field)
...

My experience is however that the statements are carried out sequentially.

I am not sure of the status of parallel query in FDW, but for those that
are some more information would be helpful:

1) You are using postgres_fdw, correct?

2) What version of the FDW are you using?

3) What version(s) of Postgres are you connecting from/to?

Thanks

Klaus Pieper

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Klaus P. Pieper
kpi6288@gmail.com
In reply to: Adrian Klaver (#2)
Re: Do parallel queries work with only dblink not with fdw?

-----Ursprüngliche Nachricht-----
Von: Adrian Klaver [mailto:adrian.klaver@aklaver.com]

I am not sure of the status of parallel query in FDW, but for those that are some
more information would be helpful:

1) You are using postgres_fdw, correct?

Yes.

2) What version of the FDW are you using?

Not sure if FDW could be different from the cluster version (see below). Postgres_fdw.dll shows file version 9.5.2.16088 - this is the one installed with the Windows PostgreSQL package.

3) What version(s) of Postgres are you connecting from/to?

For testing I have installed a fresh download 9.5.2 Windows x64 (on Windows Server 2008 R2, if this is of interest)

There is only one test database running in this cluster. I am connecting in a "loopback" mode to that database.

I could repeat my tests with two different databases or run it on a Linux / Debian machine if that makes a difference.

Thanks Klaus

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Marc Mamin
M.Mamin@intershop.de
In reply to: Klaus P. Pieper (#1)
Re: Do parallel queries work with only dblink not with fdw?

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Klaus P.
Sent: Mittwoch, 4. Mai 2016 10:28
To: pgsql-general@postgresql.org
Subject: [GENERAL] Do parallel queries work with only dblink not with fdw?

In order to improve cpu and disk utilization, I am testing parallel queries.

The approach with dblink_send_query() and dblink_get_result() works in my proof-of-concept. Runtime of my reference query was reduced from 55 seconds to ~20seconds using 4 parallel connections. Not what I had hoped but certainly a significant improvement.
My approach is fairly easy:
db_link_send_query('conn1', 'statement based on partitioning field');
db_link_send_query('conn2', 'statement based on partitioning field');

...
SELECT
dblink_get_result('conn1')
UNION ALL
dblink_get_result('conn2')

...

Hello,

I'm using the db_link approach too, but rather than using dblink_get_result, I store the result in an additional table.
This was faster, at least for my use case and at the time when I implemented that solution...

something like

db_link_send_query('conn0', 'CREATE UNLOGGED TABLE my_result...
db_link_send_query('conn1', 'INSERT INTO my_result statement based on partitioning field');
db_link_send_query('conn2', 'INSERT INTO my_result statement based on partitioning field');
...
select * from my_result

regards,

Marc Mamin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Stefan Keller
sfkeller@gmail.com
In reply to: Marc Mamin (#4)
Re: Do parallel queries work with only dblink not with fdw?

Hi,

Sorry for appending to that thread, but I think this is related:
Does anyone have experience with parsel [1]http://geeohspatial.blogspot.ch/2013/12/a-simple-function-for-parallel-queries_18.html and/or it's extension
parallelsql [2]https://github.com/k1aus/parallelsql?

:Stefan

[1]: http://geeohspatial.blogspot.ch/2013/12/a-simple-function-for-parallel-queries_18.html
[2]: https://github.com/k1aus/parallelsql

2016-05-04 16:10 GMT+02:00 Marc Mamin <M.Mamin@intershop.de>:

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Klaus P.
Sent: Mittwoch, 4. Mai 2016 10:28
To: pgsql-general@postgresql.org
Subject: [GENERAL] Do parallel queries work with only dblink not with fdw?

In order to improve cpu and disk utilization, I am testing parallel queries.

The approach with dblink_send_query() and dblink_get_result() works in my proof-of-concept. Runtime of my reference query was reduced from 55 seconds to ~20seconds using 4 parallel connections. Not what I had hoped but certainly a significant improvement.
My approach is fairly easy:
db_link_send_query('conn1', 'statement based on partitioning field');
db_link_send_query('conn2', 'statement based on partitioning field');

...
SELECT
dblink_get_result('conn1')
UNION ALL
dblink_get_result('conn2')

...

Hello,

I'm using the db_link approach too, but rather than using dblink_get_result, I store the result in an additional table.
This was faster, at least for my use case and at the time when I implemented that solution...

something like

db_link_send_query('conn0', 'CREATE UNLOGGED TABLE my_result...
db_link_send_query('conn1', 'INSERT INTO my_result statement based on partitioning field');
db_link_send_query('conn2', 'INSERT INTO my_result statement based on partitioning field');
...
select * from my_result

regards,

Marc Mamin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general