[POC] FETCH limited by bytes.
Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.
Is such a feature and syntax could be allowed to be added?
==
Postgres_fdw fetches tuples from remote servers using cursor. The
transfer gets faster as the number of fetch decreases. On the
other hand buffer size for the fetched tuples widely varies
according to their average length. 100 tuples per fetch is quite
small for short tuples but larger fetch size will easily cause
memory exhaustion. However, there's no way to know it in advance.
One means to settle the contradiction would be a FETCH which
sends result limiting by size, not the number of tuples. So I'd
like to propose this.
This patch is a POC for the feature. For exapmle,
FETCH 10000 LIMIT 1000000 FROM c1;
This FETCH retrieves up to 10000 tuples but cut out just after
the total tuple length exceeds 1MB. (It does not literally
"LIMIT" in that sense)
The syntax added by this patch is described as following.
FETCH [FORWARD|BACKWARD] <ALL|SignedIconst> LIMIT Iconst [FROM|IN] curname
The "data size" to be compared with the LIMIT size is the
summation of the result of the following expression. The
appropriateness of it should be arguable.
[if tupleslot has tts_tuple]
HEAPTUPLESIZE + slot->tts_tuple->t_len
[else]
HEAPTUPLESIZE +
heap_compute_data_size(slot->tts_tupleDescriptor,
slot->tts_values,
slot->tts_isnull);
========================
This patch does following changes,
- This patch adds the parameter "size" to following functions
(standard_)ExecutorRun / ExecutePlan / RunFromStore
PortalRun / PortalRunSelect / PortalRunFetch / DoPortalRunFetch
- The core is in StandardExecutorRun and RunFromStore. Simplly
sum up the sent tuple length and compare against the given
limit.
- struct FetchStmt and EState has new member.
- The modifications in gram.y affects on ecpg parser. I think I
could fix them but with no confidence :(
- Modified the corespondence parts of the changes above in
auto_explain and pg_stat_statments only in parameter list.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-Size-limitation-feature-of-FETCH-v0.patchtext/x-patch; charset=us-asciiDownload+248-60
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.
Is such a feature and syntax could be allowed to be added?
This seems like a lot of work, and frankly an incredibly ugly API,
for a benefit that is entirely hypothetical. Have you got numbers
showing any actual performance win for postgres_fdw?
Even if we wanted to do something like this, I strongly object to
measuring size by heap_compute_data_size. That's not a number that users
would normally have any direct knowledge of; nor does it have anything
at all to do with the claimed use-case, where what you'd really need to
measure is bytes transmitted down the wire. (The difference is not small:
for instance, toasted values would likely still be toasted at the point
where you're measuring.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thank you for the comment.
The automatic way to determin the fetch_size looks become too
much for the purpose. An example of non-automatic way is a new
foreign table option like 'fetch_size' but this exposes the
inside too much... Which do you think is preferable?
Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <24503.1421943472@sss.pgh.pa.us>
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.Is such a feature and syntax could be allowed to be added?
This seems like a lot of work, and frankly an incredibly ugly API,
for a benefit that is entirely hypothetical. Have you got numbers
showing any actual performance win for postgres_fdw?
The API is a rush work to make the path for the new parameter
(but, yes, I did too much for the purpose that use from
postgres_fdw..) and it can be any saner syntax but it's not the
time to do so yet.
The data-size limitation, any size to limit, would give
significant gain especially for small sized rows.
This patch began from the fact that it runs about twice faster
when fetch size = 10000 than 100.
/messages/by-id/20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp
I took exec times to get 1M rows from localhost via postgres_fdw
and it showed the following numbers.
=# SELECT a from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.75s
100 60 6.2s 6000 (0.006)
10000 60 2.7s 600000 (0.6 )
33333 60 2.2s 1999980 (2.0 )
66666 60 2.4s 3999960 (4.0 )
=# SELECT a, b, c from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 204 12 s 20400 (0.02 )
1000 204 10 s 204000 (0.2 )
10000 204 5.8s 2040000 (2 )
20000 204 5.9s 4080000 (4 )
=# SELECT a, b, d from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 1356 17 s 135600 (0.136)
1000 1356 15 s 1356000 (1.356)
1475 1356 13 s 2000100 (2.0 )
2950 1356 13 s 4000200 (4.0 )
The definitions of the environment are the following.
CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'postgres');
CREATE USER MAPPING FOR PUBLIC SERVER sv1;
CREATE TABLE lt1 (a int, b timestamp, c text, d text);
CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER sv1 OPTIONS (table_name 'lt1');
INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280) FROM generate_series(0, 999999) a);
The "avg row size" is alloced_mem/fetch_size and the alloced_mem
is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE +
tup->t_len) for all stored tuples in the receiver side,
fetch_more_data() in postgres_fdw.
They are about 50% gain for the smaller tuple size and 25% for
the larger. They looks to be optimal at where alloced_mem is
around 2MB by the reason unknown to me. Anyway the difference
seems to be significant.
Even if we wanted to do something like this, I strongly object to
measuring size by heap_compute_data_size. That's not a number that users
would normally have any direct knowledge of; nor does it have anything
at all to do with the claimed use-case, where what you'd really need to
measure is bytes transmitted down the wire. (The difference is not small:
for instance, toasted values would likely still be toasted at the point
where you're measuring.)
Sure. Finally, the attached patch #1 which does the following
things.
- Sender limits the number of tuples using the sum of the net
length of the column values to be sent, not including protocol
overhead. It is calculated in the added function
slot_compute_attr_size(), using raw length for compressed
values.
- postgres_fdw calculates fetch limit bytes by the following
formula,
MAX_FETCH_MEM - MAX_FETCH_SIZE * (estimated overhead per tuple);
The result of the patch is as follows. MAX_FETCH_MEM = 2MiB and
MAX_FETCH_SIZE = 30000.
fetch_size, avg row size(*1), time, max alloced_mem/fetch(Mbytes)
(auto) 60 2.4s 1080000 ( 1.08)
(auto) 204 7.3s 536400 ( 0.54)
(auto) 1356 15 s 430236 ( 0.43)
This is meaningfully fast but the patch looks too big and the
meaning of the new parameter is hard to understand..:(
On the other hand the cause of the displacements of alloced_mem
shown above is per-tuple overhead, the sum of which is unknown
before execution. The second patch makes FETCH accept the tuple
overhead bytes. The result seems pretty good, but I think this
might be too spcialized to this usage.
MAX_FETCH_SIZE = 30000 and MAX_FETCH_MEM = 2MiB,
max_fetch_size, avg row size(*1), time, max alloced_mem/fetch(MiBytes)
30000 60 2.3s 1080000 ( 1.0)
9932 204 5.7s 1787760 ( 1.7)
1376 1356 13 s 1847484 ( 1.8)
MAX_FETCH_SIZE = 25000 and MAX_FETCH_MEM = 1MiB,
max_fetch_size, avg row size(*1), time, max alloced_mem/fetch(MiBytes)
25000 60 2.4s 900000 ( 0.86)
4358 204 6.6s 816840 ( 0.78)
634 1356 16 s 844488 ( 0.81)
MAX_FETCH_SIZE = 10000 and MAX_FETCH_MEM = 0.5MiB,
max_fetch_size, avg row size(*1), time, max alloced_mem/fetch(MiBytes)
10000 60 2.8s 360000 ( 0.35)
2376 204 7.8s 427680 ( 0.41)
332 1356 17 s 442224 ( 0.42)
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Last year I was working on a patch to postgres_fdw where the fetch_size
could be set at the table level and the server level.
I was able to get the settings parsed and they would show up in
pg_foreign_table
and pg_foreign_servers. Unfortunately, I'm not very familiar with how
foreign data wrappers work, so I wasn't able to figure out how to get these
custom values passed from the PgFdwRelationInfo struct into the
query's PgFdwScanState
struct.
I bring this up only because it might be a simpler solution, in that the
table designer could set the fetch size very high for narrow tables, and
lower or default for wider tables. It's also a very clean syntax, just
another option on the table and/or server creation.
My incomplete patch is attached.
On Tue, Jan 27, 2015 at 4:24 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Show quoted text
Thank you for the comment.
The automatic way to determin the fetch_size looks become too
much for the purpose. An example of non-automatic way is a new
foreign table option like 'fetch_size' but this exposes the
inside too much... Which do you think is preferable?Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <
24503.1421943472@sss.pgh.pa.us>Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.Is such a feature and syntax could be allowed to be added?
This seems like a lot of work, and frankly an incredibly ugly API,
for a benefit that is entirely hypothetical. Have you got numbers
showing any actual performance win for postgres_fdw?The API is a rush work to make the path for the new parameter
(but, yes, I did too much for the purpose that use from
postgres_fdw..) and it can be any saner syntax but it's not the
time to do so yet.The data-size limitation, any size to limit, would give
significant gain especially for small sized rows.This patch began from the fact that it runs about twice faster
when fetch size = 10000 than 100./messages/by-id/20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp
I took exec times to get 1M rows from localhost via postgres_fdw
and it showed the following numbers.=# SELECT a from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.75s
100 60 6.2s 6000 (0.006)
10000 60 2.7s 600000 (0.6 )
33333 60 2.2s 1999980 (2.0 )
66666 60 2.4s 3999960 (4.0 )=# SELECT a, b, c from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 204 12 s 20400 (0.02 )
1000 204 10 s 204000 (0.2 )
10000 204 5.8s 2040000 (2 )
20000 204 5.9s 4080000 (4 )=# SELECT a, b, d from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 1356 17 s 135600 (0.136)
1000 1356 15 s 1356000 (1.356)
1475 1356 13 s 2000100 (2.0 )
2950 1356 13 s 4000200 (4.0 )The definitions of the environment are the following.
CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
'localhost', dbname 'postgres');
CREATE USER MAPPING FOR PUBLIC SERVER sv1;
CREATE TABLE lt1 (a int, b timestamp, c text, d text);
CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER sv1
OPTIONS (table_name 'lt1');
INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280) FROM
generate_series(0, 999999) a);The "avg row size" is alloced_mem/fetch_size and the alloced_mem
is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE +
tup->t_len) for all stored tuples in the receiver side,
fetch_more_data() in postgres_fdw.They are about 50% gain for the smaller tuple size and 25% for
the larger. They looks to be optimal at where alloced_mem is
around 2MB by the reason unknown to me. Anyway the difference
seems to be significant.Even if we wanted to do something like this, I strongly object to
measuring size by heap_compute_data_size. That's not a number that users
would normally have any direct knowledge of; nor does it have anything
at all to do with the claimed use-case, where what you'd really need to
measure is bytes transmitted down the wire. (The difference is notsmall:
for instance, toasted values would likely still be toasted at the point
where you're measuring.)Sure. Finally, the attached patch #1 which does the following
things.- Sender limits the number of tuples using the sum of the net
length of the column values to be sent, not including protocol
overhead. It is calculated in the added function
slot_compute_attr_size(), using raw length for compressed
values.- postgres_fdw calculates fetch limit bytes by the following
formula,MAX_FETCH_MEM - MAX_FETCH_SIZE * (estimated overhead per tuple);
The result of the patch is as follows. MAX_FETCH_MEM = 2MiB and
MAX_FETCH_SIZE = 30000.fetch_size, avg row size(*1), time, max alloced_mem/fetch(Mbytes)
(auto) 60 2.4s 1080000 ( 1.08)
(auto) 204 7.3s 536400 ( 0.54)
(auto) 1356 15 s 430236 ( 0.43)This is meaningfully fast but the patch looks too big and the
meaning of the new parameter is hard to understand..:(On the other hand the cause of the displacements of alloced_mem
shown above is per-tuple overhead, the sum of which is unknown
before execution. The second patch makes FETCH accept the tuple
overhead bytes. The result seems pretty good, but I think this
might be too spcialized to this usage.MAX_FETCH_SIZE = 30000 and MAX_FETCH_MEM = 2MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
30000 60 2.3s 1080000 ( 1.0)
9932 204 5.7s 1787760 ( 1.7)
1376 1356 13 s 1847484 ( 1.8)MAX_FETCH_SIZE = 25000 and MAX_FETCH_MEM = 1MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
25000 60 2.4s 900000 ( 0.86)
4358 204 6.6s 816840 ( 0.78)
634 1356 16 s 844488 ( 0.81)MAX_FETCH_SIZE = 10000 and MAX_FETCH_MEM = 0.5MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
10000 60 2.8s 360000 ( 0.35)
2376 204 7.8s 427680 ( 0.41)
332 1356 17 s 442224 ( 0.42)regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Attachments:
diff_so_far.difftext/plain; charset=US-ASCII; name=diff_so_far.diffDownload+25-4
Hmm, somehow I removed some recipients, especially the
list. Sorry for the duplicate.
-----
Sorry, I've been back. Thank you for the comment.
Do you have any insight into where I would pass the custom row fetches from
the table struct to the scan struct?
Yeah it's one simple way to tune it, if the user knows the
appropreate value.
Last year I was working on a patch to postgres_fdw where the fetch_size
could be set at the table level and the server level.I was able to get the settings parsed and they would show up in
pg_foreign_table
and pg_foreign_servers. Unfortunately, I'm not very familiar with how
foreign data wrappers work, so I wasn't able to figure out how to get these
custom values passed from the PgFdwRelationInfo struct into the
query's PgFdwScanState
struct.
Directly answering, the states needed to be shared among several
stages are holded within fdw_private. Your new variable
fpinfo->fetch_size can be read in postgresGetForeignPlan. It
newly creates another fdw_private. You can pass your values to
ForeignPlan making it hold the value there. Finally, you will get
it in postgresBeginForeginScan and can set it into
PgFdwScanState.
I bring this up only because it might be a simpler solution, in that the
table designer could set the fetch size very high for narrow tables, and
lower or default for wider tables. It's also a very clean syntax, just
another option on the table and/or server creation.My incomplete patch is attached.
However, the fetch_size is not needed by planner (so far), so we
can simply read the options in postgresBeginForeignScan() and set
into PgFdwScanState. This runs once per exection.
Finally, the attached patch will work as you intended.
What do you think about this?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Show quoted text
On Tue, Jan 27, 2015 at 4:24 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:Thank you for the comment.
The automatic way to determin the fetch_size looks become too
much for the purpose. An example of non-automatic way is a new
foreign table option like 'fetch_size' but this exposes the
inside too much... Which do you think is preferable?Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <
24503.1421943472@sss.pgh.pa.us>Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.Is such a feature and syntax could be allowed to be added?
This seems like a lot of work, and frankly an incredibly ugly API,
for a benefit that is entirely hypothetical. Have you got numbers
showing any actual performance win for postgres_fdw?The API is a rush work to make the path for the new parameter
(but, yes, I did too much for the purpose that use from
postgres_fdw..) and it can be any saner syntax but it's not the
time to do so yet.The data-size limitation, any size to limit, would give
significant gain especially for small sized rows.This patch began from the fact that it runs about twice faster
when fetch size = 10000 than 100./messages/by-id/20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp
I took exec times to get 1M rows from localhost via postgres_fdw
and it showed the following numbers.=# SELECT a from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.75s
100 60 6.2s 6000 (0.006)
10000 60 2.7s 600000 (0.6 )
33333 60 2.2s 1999980 (2.0 )
66666 60 2.4s 3999960 (4.0 )=# SELECT a, b, c from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 204 12 s 20400 (0.02 )
1000 204 10 s 204000 (0.2 )
10000 204 5.8s 2040000 (2 )
20000 204 5.9s 4080000 (4 )=# SELECT a, b, d from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 1356 17 s 135600 (0.136)
1000 1356 15 s 1356000 (1.356)
1475 1356 13 s 2000100 (2.0 )
2950 1356 13 s 4000200 (4.0 )The definitions of the environment are the following.
CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
'localhost', dbname 'postgres');
CREATE USER MAPPING FOR PUBLIC SERVER sv1;
CREATE TABLE lt1 (a int, b timestamp, c text, d text);
CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER sv1
OPTIONS (table_name 'lt1');
INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280) FROM
generate_series(0, 999999) a);The "avg row size" is alloced_mem/fetch_size and the alloced_mem
is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE +
tup->t_len) for all stored tuples in the receiver side,
fetch_more_data() in postgres_fdw.They are about 50% gain for the smaller tuple size and 25% for
the larger. They looks to be optimal at where alloced_mem is
around 2MB by the reason unknown to me. Anyway the difference
seems to be significant.Even if we wanted to do something like this, I strongly object to
measuring size by heap_compute_data_size. That's not a number that users
would normally have any direct knowledge of; nor does it have anything
at all to do with the claimed use-case, where what you'd really need to
measure is bytes transmitted down the wire. (The difference is notsmall:
for instance, toasted values would likely still be toasted at the point
where you're measuring.)Sure. Finally, the attached patch #1 which does the following
things.- Sender limits the number of tuples using the sum of the net
length of the column values to be sent, not including protocol
overhead. It is calculated in the added function
slot_compute_attr_size(), using raw length for compressed
values.- postgres_fdw calculates fetch limit bytes by the following
formula,MAX_FETCH_MEM - MAX_FETCH_SIZE * (estimated overhead per tuple);
The result of the patch is as follows. MAX_FETCH_MEM = 2MiB and
MAX_FETCH_SIZE = 30000.fetch_size, avg row size(*1), time, max alloced_mem/fetch(Mbytes)
(auto) 60 2.4s 1080000 ( 1.08)
(auto) 204 7.3s 536400 ( 0.54)
(auto) 1356 15 s 430236 ( 0.43)This is meaningfully fast but the patch looks too big and the
meaning of the new parameter is hard to understand..:(On the other hand the cause of the displacements of alloced_mem
shown above is per-tuple overhead, the sum of which is unknown
before execution. The second patch makes FETCH accept the tuple
overhead bytes. The result seems pretty good, but I think this
might be too spcialized to this usage.MAX_FETCH_SIZE = 30000 and MAX_FETCH_MEM = 2MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
30000 60 2.3s 1080000 ( 1.0)
9932 204 5.7s 1787760 ( 1.7)
1376 1356 13 s 1847484 ( 1.8)MAX_FETCH_SIZE = 25000 and MAX_FETCH_MEM = 1MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
25000 60 2.4s 900000 ( 0.86)
4358 204 6.6s 816840 ( 0.78)
634 1356 16 s 844488 ( 0.81)MAX_FETCH_SIZE = 10000 and MAX_FETCH_MEM = 0.5MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
10000 60 2.8s 360000 ( 0.35)
2376 204 7.8s 427680 ( 0.41)
332 1356 17 s 442224 ( 0.42)regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Attachments:
0001-Make-fetch_size-settable-per-foreign-server-and-fore.patchtext/x-patch; charset=us-asciiDownload+42-2
I applied this patch to REL9_4_STABLE, and I was able to connect to a
foreign database (redshift, actually).
the basic outline of the test is below, names changed to protect my
employment.
create extension if not exists postgres_fdw;
create server redshift_server foreign data wrapper postgres_fdw
options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
fetch_size '150' );
create user mapping for public server redshift_server options ( user
'redacted_user', password 'comeonyouarekiddingright' );
create foreign table redshift_tab150 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema' );
create foreign table redshift_tab151 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema', fetch_size '151' );
-- i don't expect the fdw to push the aggregate, this is just a test to see
what query shows up in stv_inflight.
select count(*) from redshift_ccp150;
-- i don't expect the fdw to push the aggregate, this is just a test to see
what query shows up in stv_inflight.
select count(*) from redshift_ccp151;
For those of you that aren't familiar with Redshift, it's a columnar
database that seems to be a fork of postgres 8.cough. You can connect to it
with modern libpq programs (psql, psycopg2, etc).
Redshift has a table, stv_inflight, which serves about the same purpose as
pg_stat_activity. Redshift seems to perform better with very high fetch
sizes (10,000 is a good start), so the default foreign data wrapper didn't
perform so well.
I was able to confirm that the first query showed "FETCH 150 FROM c1" as
the query, which is normal highly unhelpful, but in this case it confirms
that tables created in redshift_server do by default use the fetch_size
option given during server creation.
I was also able to confirm that the second query showed "FETCH 151 FROM c1"
as the query, which shows that per-table overrides also work.
I'd be happy to stop here, but Kyotaro might feel differently. With this
limited patch, one could estimate the number of rows that would fit into
the desired memory block based on the row width and set fetch_size
accordingly.
But we could go further, and have a fetch_max_memory option only at the
table level, and the fdw could do that same memory / estimated_row_size
calculation and derive fetch_size that way *at table creation time*, not
query time.
Thanks to Kyotaro and Bruce Momjian for their help.
On Mon, Feb 2, 2015 at 2:27 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Show quoted text
Hmm, somehow I removed some recipients, especially the
list. Sorry for the duplicate.-----
Sorry, I've been back. Thank you for the comment.Do you have any insight into where I would pass the custom row fetches
from
the table struct to the scan struct?
Yeah it's one simple way to tune it, if the user knows the
appropreate value.Last year I was working on a patch to postgres_fdw where the fetch_size
could be set at the table level and the server level.I was able to get the settings parsed and they would show up in
pg_foreign_table
and pg_foreign_servers. Unfortunately, I'm not very familiar with how
foreign data wrappers work, so I wasn't able to figure out how to getthese
custom values passed from the PgFdwRelationInfo struct into the
query's PgFdwScanState
struct.Directly answering, the states needed to be shared among several
stages are holded within fdw_private. Your new variable
fpinfo->fetch_size can be read in postgresGetForeignPlan. It
newly creates another fdw_private. You can pass your values to
ForeignPlan making it hold the value there. Finally, you will get
it in postgresBeginForeginScan and can set it into
PgFdwScanState.I bring this up only because it might be a simpler solution, in that the
table designer could set the fetch size very high for narrow tables, and
lower or default for wider tables. It's also a very clean syntax, just
another option on the table and/or server creation.My incomplete patch is attached.
However, the fetch_size is not needed by planner (so far), so we
can simply read the options in postgresBeginForeignScan() and set
into PgFdwScanState. This runs once per exection.Finally, the attached patch will work as you intended.
What do you think about this?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software CenterOn Tue, Jan 27, 2015 at 4:24 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:Thank you for the comment.
The automatic way to determin the fetch_size looks become too
much for the purpose. An example of non-automatic way is a new
foreign table option like 'fetch_size' but this exposes the
inside too much... Which do you think is preferable?Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote
in <
24503.1421943472@sss.pgh.pa.us>
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.Is such a feature and syntax could be allowed to be added?
This seems like a lot of work, and frankly an incredibly ugly API,
for a benefit that is entirely hypothetical. Have you got numbers
showing any actual performance win for postgres_fdw?The API is a rush work to make the path for the new parameter
(but, yes, I did too much for the purpose that use from
postgres_fdw..) and it can be any saner syntax but it's not the
time to do so yet.The data-size limitation, any size to limit, would give
significant gain especially for small sized rows.This patch began from the fact that it runs about twice faster
when fetch size = 10000 than 100./messages/by-id/20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp
I took exec times to get 1M rows from localhost via postgres_fdw
and it showed the following numbers.=# SELECT a from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.75s
100 60 6.2s 6000 (0.006)
10000 60 2.7s 600000 (0.6 )
33333 60 2.2s 1999980 (2.0 )
66666 60 2.4s 3999960 (4.0 )=# SELECT a, b, c from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 204 12 s 20400 (0.02 )
1000 204 10 s 204000 (0.2 )
10000 204 5.8s 2040000 (2 )
20000 204 5.9s 4080000 (4 )=# SELECT a, b, d from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 1356 17 s 135600 (0.136)
1000 1356 15 s 1356000 (1.356)
1475 1356 13 s 2000100 (2.0 )
2950 1356 13 s 4000200 (4.0 )The definitions of the environment are the following.
CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
'localhost', dbname 'postgres');
CREATE USER MAPPING FOR PUBLIC SERVER sv1;
CREATE TABLE lt1 (a int, b timestamp, c text, d text);
CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVERsv1
OPTIONS (table_name 'lt1');
INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280)FROM
generate_series(0, 999999) a);
The "avg row size" is alloced_mem/fetch_size and the alloced_mem
is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE +
tup->t_len) for all stored tuples in the receiver side,
fetch_more_data() in postgres_fdw.They are about 50% gain for the smaller tuple size and 25% for
the larger. They looks to be optimal at where alloced_mem is
around 2MB by the reason unknown to me. Anyway the difference
seems to be significant.Even if we wanted to do something like this, I strongly object to
measuring size by heap_compute_data_size. That's not a number thatusers
would normally have any direct knowledge of; nor does it have
anything
at all to do with the claimed use-case, where what you'd really need
to
measure is bytes transmitted down the wire. (The difference is not
small:
for instance, toasted values would likely still be toasted at the
point
where you're measuring.)
Sure. Finally, the attached patch #1 which does the following
things.- Sender limits the number of tuples using the sum of the net
length of the column values to be sent, not including protocol
overhead. It is calculated in the added function
slot_compute_attr_size(), using raw length for compressed
values.- postgres_fdw calculates fetch limit bytes by the following
formula,MAX_FETCH_MEM - MAX_FETCH_SIZE * (estimated overhead per tuple);
The result of the patch is as follows. MAX_FETCH_MEM = 2MiB and
MAX_FETCH_SIZE = 30000.fetch_size, avg row size(*1), time, max alloced_mem/fetch(Mbytes)
(auto) 60 2.4s 1080000 ( 1.08)
(auto) 204 7.3s 536400 ( 0.54)
(auto) 1356 15 s 430236 ( 0.43)This is meaningfully fast but the patch looks too big and the
meaning of the new parameter is hard to understand..:(On the other hand the cause of the displacements of alloced_mem
shown above is per-tuple overhead, the sum of which is unknown
before execution. The second patch makes FETCH accept the tuple
overhead bytes. The result seems pretty good, but I think this
might be too spcialized to this usage.MAX_FETCH_SIZE = 30000 and MAX_FETCH_MEM = 2MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
30000 60 2.3s 1080000 ( 1.0)
9932 204 5.7s 1787760 ( 1.7)
1376 1356 13 s 1847484 ( 1.8)MAX_FETCH_SIZE = 25000 and MAX_FETCH_MEM = 1MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
25000 60 2.4s 900000 ( 0.86)
4358 204 6.6s 816840 ( 0.78)
634 1356 16 s 844488 ( 0.81)MAX_FETCH_SIZE = 10000 and MAX_FETCH_MEM = 0.5MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
10000 60 2.8s 360000 ( 0.35)
2376 204 7.8s 427680 ( 0.41)
332 1356 17 s 442224 ( 0.42)regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
Redshift has a table, stv_inflight, which serves about the same purpose as
pg_stat_activity. Redshift seems to perform better with very high fetch
sizes (10,000 is a good start), so the default foreign data wrapper didn't
perform so well.
I agree with you.
I was able to confirm that the first query showed "FETCH 150 FROM c1" as
the query, which is normal highly unhelpful, but in this case it confirms
that tables created in redshift_server do by default use the fetch_size
option given during server creation.I was also able to confirm that the second query showed "FETCH 151 FROM c1"
as the query, which shows that per-table overrides also work.I'd be happy to stop here, but Kyotaro might feel differently.
This is enough in its own way, of course.
With this
limited patch, one could estimate the number of rows that would fit into
the desired memory block based on the row width and set fetch_size
accordingly.
The users including me will be happy with it when the users know
how to determin the fetch size. Especially the remote tables are
very few or the configuration will be enough stable.
On widely distributed systems, it would be far difficult to tune
fetch size manually for every foreign tables, so finally it would
be left at the default and safe size, it's 100 or so.
This is the similar discussion about max_wal_size on another
thread. Calculating fetch size is far tougher for users than
setting expected memory usage, I think.
But we could go further, and have a fetch_max_memory option only at the
table level, and the fdw could do that same memory / estimated_row_size
calculation and derive fetch_size that way *at table creation time*, not
query time.
We cannot know the real length of the text type data in advance,
other than that, even defined as char(n), the n is the
theoretically(or in-design) maximum size for the field but in the
most cases the mean length of the real data would be far small
than that. For that reason, calculating the ratio at the table
creation time seems to be difficult.
However, I agree to the Tom's suggestion that the changes in
FETCH statement is defenitly ugly, especially the "overhead"
argument is prohibitive even for me:(
Thanks to Kyotaro and Bruce Momjian for their help.
Not at all.
regardes,
At Wed, 4 Feb 2015 18:06:02 -0500, Corey Huinker <corey.huinker@gmail.com> wrote in <CADkLM=eTpKYX5VOfjLr0VvfXhEZbC2UeakN=P6MXMg7S86Cdqw@mail.gmail.com>
I applied this patch to REL9_4_STABLE, and I was able to connect to a
foreign database (redshift, actually).the basic outline of the test is below, names changed to protect my
employment.create extension if not exists postgres_fdw;
create server redshift_server foreign data wrapper postgres_fdw
options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
fetch_size '150' );create user mapping for public server redshift_server options ( user
'redacted_user', password 'comeonyouarekiddingright' );create foreign table redshift_tab150 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema' );create foreign table redshift_tab151 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema', fetch_size '151' );-- i don't expect the fdw to push the aggregate, this is just a test to see
what query shows up in stv_inflight.
select count(*) from redshift_ccp150;-- i don't expect the fdw to push the aggregate, this is just a test to see
what query shows up in stv_inflight.
select count(*) from redshift_ccp151;For those of you that aren't familiar with Redshift, it's a columnar
database that seems to be a fork of postgres 8.cough. You can connect to it
with modern libpq programs (psql, psycopg2, etc).
Redshift has a table, stv_inflight, which serves about the same purpose as
pg_stat_activity. Redshift seems to perform better with very high fetch
sizes (10,000 is a good start), so the default foreign data wrapper didn't
perform so well.I was able to confirm that the first query showed "FETCH 150 FROM c1" as
the query, which is normal highly unhelpful, but in this case it confirms
that tables created in redshift_server do by default use the fetch_size
option given during server creation.I was also able to confirm that the second query showed "FETCH 151 FROM c1"
as the query, which shows that per-table overrides also work.I'd be happy to stop here, but Kyotaro might feel differently. With this
limited patch, one could estimate the number of rows that would fit into
the desired memory block based on the row width and set fetch_size
accordingly.But we could go further, and have a fetch_max_memory option only at the
table level, and the fdw could do that same memory / estimated_row_size
calculation and derive fetch_size that way *at table creation time*, not
query time.Thanks to Kyotaro and Bruce Momjian for their help.
On Mon, Feb 2, 2015 at 2:27 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:Hmm, somehow I removed some recipients, especially the
list. Sorry for the duplicate.-----
Sorry, I've been back. Thank you for the comment.Do you have any insight into where I would pass the custom row fetches
from
the table struct to the scan struct?
Yeah it's one simple way to tune it, if the user knows the
appropreate value.Last year I was working on a patch to postgres_fdw where the fetch_size
could be set at the table level and the server level.I was able to get the settings parsed and they would show up in
pg_foreign_table
and pg_foreign_servers. Unfortunately, I'm not very familiar with how
foreign data wrappers work, so I wasn't able to figure out how to getthese
custom values passed from the PgFdwRelationInfo struct into the
query's PgFdwScanState
struct.Directly answering, the states needed to be shared among several
stages are holded within fdw_private. Your new variable
fpinfo->fetch_size can be read in postgresGetForeignPlan. It
newly creates another fdw_private. You can pass your values to
ForeignPlan making it hold the value there. Finally, you will get
it in postgresBeginForeginScan and can set it into
PgFdwScanState.I bring this up only because it might be a simpler solution, in that the
table designer could set the fetch size very high for narrow tables, and
lower or default for wider tables. It's also a very clean syntax, just
another option on the table and/or server creation.My incomplete patch is attached.
However, the fetch_size is not needed by planner (so far), so we
can simply read the options in postgresBeginForeignScan() and set
into PgFdwScanState. This runs once per exection.Finally, the attached patch will work as you intended.
What do you think about this?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Attached is a diff containing the original (working) patch from my
(incomplete) patch, plus regression test changes and documentation changes.
While it's easy to regression-test the persistence of the fetch_size
options, I am confounded as to how we would show that the fetch_size
setting was respected. I've seen it with my own eyes viewing the query log
on redshift, but I see no way to automate that. Suggestions welcome.
On Fri, Feb 6, 2015 at 3:11 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Show quoted text
Hello,
Redshift has a table, stv_inflight, which serves about the same purpose
as
pg_stat_activity. Redshift seems to perform better with very high fetch
sizes (10,000 is a good start), so the default foreign data wrapperdidn't
perform so well.
I agree with you.
I was able to confirm that the first query showed "FETCH 150 FROM c1" as
the query, which is normal highly unhelpful, but in this case it confirms
that tables created in redshift_server do by default use the fetch_size
option given during server creation.I was also able to confirm that the second query showed "FETCH 151 FROM
c1"
as the query, which shows that per-table overrides also work.
I'd be happy to stop here, but Kyotaro might feel differently.
This is enough in its own way, of course.
With this
limited patch, one could estimate the number of rows that would fit into
the desired memory block based on the row width and set fetch_size
accordingly.The users including me will be happy with it when the users know
how to determin the fetch size. Especially the remote tables are
very few or the configuration will be enough stable.On widely distributed systems, it would be far difficult to tune
fetch size manually for every foreign tables, so finally it would
be left at the default and safe size, it's 100 or so.This is the similar discussion about max_wal_size on another
thread. Calculating fetch size is far tougher for users than
setting expected memory usage, I think.But we could go further, and have a fetch_max_memory option only at the
table level, and the fdw could do that same memory / estimated_row_size
calculation and derive fetch_size that way *at table creation time*, not
query time.We cannot know the real length of the text type data in advance,
other than that, even defined as char(n), the n is the
theoretically(or in-design) maximum size for the field but in the
most cases the mean length of the real data would be far small
than that. For that reason, calculating the ratio at the table
creation time seems to be difficult.However, I agree to the Tom's suggestion that the changes in
FETCH statement is defenitly ugly, especially the "overhead"
argument is prohibitive even for me:(Thanks to Kyotaro and Bruce Momjian for their help.
Not at all.
regardes,
At Wed, 4 Feb 2015 18:06:02 -0500, Corey Huinker <corey.huinker@gmail.com>
wrote in <CADkLM=eTpKYX5VOfjLr0VvfXhEZbC2UeakN=
P6MXMg7S86Cdqw@mail.gmail.com>I applied this patch to REL9_4_STABLE, and I was able to connect to a
foreign database (redshift, actually).the basic outline of the test is below, names changed to protect my
employment.create extension if not exists postgres_fdw;
create server redshift_server foreign data wrapper postgres_fdw
options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
fetch_size '150' );create user mapping for public server redshift_server options ( user
'redacted_user', password 'comeonyouarekiddingright' );create foreign table redshift_tab150 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema' );create foreign table redshift_tab151 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema', fetch_size '151' );-- i don't expect the fdw to push the aggregate, this is just a test to
see
what query shows up in stv_inflight.
select count(*) from redshift_ccp150;-- i don't expect the fdw to push the aggregate, this is just a test to
see
what query shows up in stv_inflight.
select count(*) from redshift_ccp151;For those of you that aren't familiar with Redshift, it's a columnar
database that seems to be a fork of postgres 8.cough. You can connect toit
with modern libpq programs (psql, psycopg2, etc).
Redshift has a table, stv_inflight, which serves about the same purposeas
pg_stat_activity. Redshift seems to perform better with very high fetch
sizes (10,000 is a good start), so the default foreign data wrapperdidn't
perform so well.
I was able to confirm that the first query showed "FETCH 150 FROM c1" as
the query, which is normal highly unhelpful, but in this case it confirms
that tables created in redshift_server do by default use the fetch_size
option given during server creation.I was also able to confirm that the second query showed "FETCH 151 FROM
c1"
as the query, which shows that per-table overrides also work.
I'd be happy to stop here, but Kyotaro might feel differently. With this
limited patch, one could estimate the number of rows that would fit into
the desired memory block based on the row width and set fetch_size
accordingly.But we could go further, and have a fetch_max_memory option only at the
table level, and the fdw could do that same memory / estimated_row_size
calculation and derive fetch_size that way *at table creation time*, not
query time.Thanks to Kyotaro and Bruce Momjian for their help.
On Mon, Feb 2, 2015 at 2:27 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:Hmm, somehow I removed some recipients, especially the
list. Sorry for the duplicate.-----
Sorry, I've been back. Thank you for the comment.Do you have any insight into where I would pass the custom row
fetches
from
the table struct to the scan struct?
Yeah it's one simple way to tune it, if the user knows the
appropreate value.Last year I was working on a patch to postgres_fdw where the
fetch_size
could be set at the table level and the server level.
I was able to get the settings parsed and they would show up in
pg_foreign_table
and pg_foreign_servers. Unfortunately, I'm not very familiar with how
foreign data wrappers work, so I wasn't able to figure out how to getthese
custom values passed from the PgFdwRelationInfo struct into the
query's PgFdwScanState
struct.Directly answering, the states needed to be shared among several
stages are holded within fdw_private. Your new variable
fpinfo->fetch_size can be read in postgresGetForeignPlan. It
newly creates another fdw_private. You can pass your values to
ForeignPlan making it hold the value there. Finally, you will get
it in postgresBeginForeginScan and can set it into
PgFdwScanState.I bring this up only because it might be a simpler solution, in that
the
table designer could set the fetch size very high for narrow tables,
and
lower or default for wider tables. It's also a very clean syntax,
just
another option on the table and/or server creation.
My incomplete patch is attached.
However, the fetch_size is not needed by planner (so far), so we
can simply read the options in postgresBeginForeignScan() and set
into PgFdwScanState. This runs once per exection.Finally, the attached patch will work as you intended.
What do you think about this?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
postgres_fdw_fetch_size_20150227.difftext/plain; charset=US-ASCII; name=postgres_fdw_fetch_size_20150227.diffDownload+105-9
On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
+static DefElem* +get_option(List *options, char *optname) +{ + ListCell *lc; + + foreach(lc, options) + { + DefElem *def = (DefElem *) lfirst(lc); + + if (strcmp(def->defname, optname) == 0) + return def; + } + return NULL; +}
/*
* Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state stays NULL.
@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
server = GetForeignServer(table->serverid);
user = GetUserMapping(userid, server->serverid);+ /* Reading table options */ + fsstate->fetch_size = -1; + + def = get_option(table->options, "fetch_size"); + if (!def) + def = get_option(server->options, "fetch_size"); + + if (def) + { + fsstate->fetch_size = strtod(defGetString(def), NULL); + if (fsstate->fetch_size < 0) + elog(ERROR, "invalid fetch size for foreign table \"%s\"", + get_rel_name(table->relid)); + } + else + fsstate->fetch_size = 100;
I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.
Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scan node?
And be a part of the scan state?
Have you thought about how this option should cooperate with join
pushdown once implemented?
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 2, 2015 at 9:08 AM, Andres Freund <andres@anarazel.de> wrote:
On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
+static DefElem* +get_option(List *options, char *optname) +{ + ListCell *lc; + + foreach(lc, options) + { + DefElem *def = (DefElem *) lfirst(lc); + + if (strcmp(def->defname, optname) == 0) + return def; + } + return NULL; +}/*
* Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state staysNULL.
@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
int eflags)
server = GetForeignServer(table->serverid);
user = GetUserMapping(userid, server->serverid);+ /* Reading table options */ + fsstate->fetch_size = -1; + + def = get_option(table->options, "fetch_size"); + if (!def) + def = get_option(server->options, "fetch_size"); + + if (def) + { + fsstate->fetch_size = strtod(defGetString(def), NULL); + if (fsstate->fetch_size < 0) + elog(ERROR, "invalid fetch size for foreign table\"%s\"",
+ get_rel_name(table->relid)); + } + else + fsstate->fetch_size = 100;I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.
Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scan node?
And be a part of the scan state?
I agree, that was my original plan, but I wasn't familiar enough with the
FDW architecture to know where the table struct and the scan struct were
both exposed in the same function.
What I submitted incorporated some of Kyotaro's feedback (and working
patch) to my original incomplete patch.
Have you thought about how this option should cooperate with join
pushdown once implemented?
I hadn't until now, but I think the only sensible thing would be to
disregard table-specific settings once a second foreign table is detected,
and instead consider only the server-level setting.
I suppose one could argue that if ALL the tables in the join had the same
table-level setting, we should go with that, but I think that would be
complicated, expensive, and generally a good argument for changing the
server setting instead.
Hello,
@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
int eflags)
...
+ def = get_option(table->options, "fetch_size");
I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scan node?
And be a part of the scan state?I agree, that was my original plan, but I wasn't familiar enough with the
FDW architecture to know where the table struct and the scan struct were
both exposed in the same function.What I submitted incorporated some of Kyotaro's feedback (and working
patch) to my original incomplete patch.
Sorry, it certainly shouldn't be a good place to do such thing. I
easily selected the place in order to avoid adding new similar
member in multiple existing structs (PgFdwRelationInfo and
PgFdwScanState).
Having a new member fetch_size is added in PgFdwRelationInfo and
PgFdwScanState, I think postgresGetForeignRelSize is the best
place to do that, from the point that it collects basic
information needed to calculate scan costs.
# Fetch sizes of foreign join would be the future issue..
typedef struct PgFdwRelationInfo
{
...
+ int fetch_size; /* fetch size for this remote table */
====================
postgreGetForeignRelSize()
{
...
fpinfo->table = GetForeignTable(foreigntableid);
fpinfo->server = GetForeignServer(fpinfo->table->serverid);
+ def = get_option(table->options, "fetch_size");
+ ..
+ fpinfo->fetch_size = strtod(defGetString...
Also it is doable in postgresGetForeignPlan and doing there
removes redundant copy of fetch_size from PgFdwRelation to
PgFdwScanState but theoretical basis would be weak.
regards,
On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
+static DefElem* +get_option(List *options, char *optname) +{ + ListCell *lc; + + foreach(lc, options) + { + DefElem *def = (DefElem *) lfirst(lc); + + if (strcmp(def->defname, optname) == 0) + return def; + } + return NULL; +}/*
* Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state staysNULL.
@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
int eflags)
server = GetForeignServer(table->serverid);
user = GetUserMapping(userid, server->serverid);+ /* Reading table options */ + fsstate->fetch_size = -1; + + def = get_option(table->options, "fetch_size"); + if (!def) + def = get_option(server->options, "fetch_size"); + + if (def) + { + fsstate->fetch_size = strtod(defGetString(def), NULL); + if (fsstate->fetch_size < 0) + elog(ERROR, "invalid fetch size for foreign table\"%s\"",
+ get_rel_name(table->relid)); + } + else + fsstate->fetch_size = 100;I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scan node?
And be a part of the scan state?I agree, that was my original plan, but I wasn't familiar enough with the
FDW architecture to know where the table struct and the scan struct were
both exposed in the same function.What I submitted incorporated some of Kyotaro's feedback (and working
patch) to my original incomplete patch.Have you thought about how this option should cooperate with join
pushdown once implemented?I hadn't until now, but I think the only sensible thing would be to
disregard table-specific settings once a second foreign table is detected,
and instead consider only the server-level setting.I suppose one could argue that if ALL the tables in the join had the same
table-level setting, we should go with that, but I think that would be
complicated, expensive, and generally a good argument for changing the
server setting instead.
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 4, 2015 at 2:45 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello,
@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
int eflags)
...
+ def = get_option(table->options, "fetch_size");
I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scannode?
And be a part of the scan state?
I agree, that was my original plan, but I wasn't familiar enough with the
FDW architecture to know where the table struct and the scan struct were
both exposed in the same function.What I submitted incorporated some of Kyotaro's feedback (and working
patch) to my original incomplete patch.Sorry, it certainly shouldn't be a good place to do such thing. I
easily selected the place in order to avoid adding new similar
member in multiple existing structs (PgFdwRelationInfo and
PgFdwScanState).Having a new member fetch_size is added in PgFdwRelationInfo and
PgFdwScanState, I think postgresGetForeignRelSize is the best
place to do that, from the point that it collects basic
information needed to calculate scan costs.# Fetch sizes of foreign join would be the future issue..
typedef struct PgFdwRelationInfo
{...
+ int fetch_size; /* fetch size for this remote table */====================
postgreGetForeignRelSize()
{...
fpinfo->table = GetForeignTable(foreigntableid);
fpinfo->server = GetForeignServer(fpinfo->table->serverid);+ def = get_option(table->options, "fetch_size"); + .. + fpinfo->fetch_size = strtod(defGetString...Also it is doable in postgresGetForeignPlan and doing there
removes redundant copy of fetch_size from PgFdwRelation to
PgFdwScanState but theoretical basis would be weak.regards,
On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
+static DefElem* +get_option(List *options, char *optname) +{ + ListCell *lc; + + foreach(lc, options) + { + DefElem *def = (DefElem *) lfirst(lc); + + if (strcmp(def->defname, optname) == 0) + return def; + } + return NULL; +}/*
* Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_statestays
NULL.
@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
int eflags)
server = GetForeignServer(table->serverid);
user = GetUserMapping(userid, server->serverid);+ /* Reading table options */ + fsstate->fetch_size = -1; + + def = get_option(table->options, "fetch_size"); + if (!def) + def = get_option(server->options, "fetch_size"); + + if (def) + { + fsstate->fetch_size = strtod(defGetString(def), NULL); + if (fsstate->fetch_size < 0) + elog(ERROR, "invalid fetch size for foreigntable
\"%s\"",
+ get_rel_name(table->relid)); + } + else + fsstate->fetch_size = 100;I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scannode?
And be a part of the scan state?
I agree, that was my original plan, but I wasn't familiar enough with the
FDW architecture to know where the table struct and the scan struct were
both exposed in the same function.What I submitted incorporated some of Kyotaro's feedback (and working
patch) to my original incomplete patch.Have you thought about how this option should cooperate with join
pushdown once implemented?I hadn't until now, but I think the only sensible thing would be to
disregard table-specific settings once a second foreign table isdetected,
and instead consider only the server-level setting.
I suppose one could argue that if ALL the tables in the join had the same
table-level setting, we should go with that, but I think that would be
complicated, expensive, and generally a good argument for changing the
server setting instead.--
Kyotaro Horiguchi
NTT Open Source Software Center
Ok, with some guidance from RhodiumToad (thanks!) I was able to get the
proper RelOptInfo->Plan->Scan handoff.
What I *don't* know how to do is show that the proper fetch sizes are being
used on the remote server with the resources available in the regression
test. *Suggestions welcome.*
This patch works for my original added test-cases, and works for me
connecting to a redshift cluster that we have, the queries show up in the
console like this:
FETCH 101 FROM c1
FETCH 30 FROM c1
FETCH 50 FROM c1
The (redacted) source of that test is as follows:
begin;
create extension if not exists postgres_fdw;
create server redshift foreign data wrapper postgres_fdw
options (host 'REDACTED', port '5439', dbname 'REDACTED', fetch_size '101');
select * from pg_foreign_server;
create user mapping for public server redshift
options ( user 'REDACTED', password 'REDACTED');
select * from pg_user_mappings;
create foreign table test_table ( date date, tval text )
server redshift
options (table_name 'REDACTED');
select count(*) from ( select tval from test_table where date = 'REDACTED'
) x;
alter server redshift options ( set fetch_size '30' );
select count(*) from ( select tval from test_table where date = 'REDACTED'
) x;
alter foreign table test_table options ( fetch_size '50' );
select count(*) from ( select tval from test_table where date = 'REDACTED'
) x;
rollback;
Attached is the patch / diff against current master.
Attachments:
0002-Make-fetch_size-settable-per-foreign-server-and-fore.patchtext/x-patch; charset=US-ASCII; name=0002-Make-fetch_size-settable-per-foreign-server-and-fore.patchDownload+144-20
On Mon, Sep 21, 2015 at 1:52 PM, Corey Huinker <corey.huinker@gmail.com> wrote:
Attached is the patch / diff against current master.
I have moved this patch to next CF, there is a new patch, but no reviews.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Dec 23, 2015 at 8:43 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Mon, Sep 21, 2015 at 1:52 PM, Corey Huinker <corey.huinker@gmail.com>
wrote:Attached is the patch / diff against current master.
I have moved this patch to next CF, there is a new patch, but no reviews.
--
Michael
That's fair. I'm still at a loss for how to show that the fetch size was
respected by the remote server, suggestions welcome!
On 12/23/15 12:15 PM, Corey Huinker wrote:
That's fair. I'm still at a loss for how to show that the fetch size was
respected by the remote server, suggestions welcome!
A combination of repeat() and generate_series()?
I'm guessing it's not that obvious and that I'm missing something; can
you elaborate?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Dec 23, 2015 at 3:08 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 12/23/15 12:15 PM, Corey Huinker wrote:
That's fair. I'm still at a loss for how to show that the fetch size was
respected by the remote server, suggestions welcome!A combination of repeat() and generate_series()?
I'm guessing it's not that obvious and that I'm missing something; can you
elaborate?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
I'll try. So the basic test of whether the FDW respected the fetch limit is
this:
1. create foreign server using postgres_fdw, create foreign table.
2. run a query against that table. it works. great.
3. alter server set fetch size option to 101 (or any number different from
100)
4. run same query against the table. the server side should show that the
result set was fetched in 101 row chunks[1].
5. alter table set fetch size option to 102 (or any number different from
100 and the number you picked in #3)
6. run same query against the table. the server side should show that the
result set was fetched in 102 row chunks[1].
The parts marked [1] are the problem...because the way I know it works is
looking at the query console on the remote redshift cluster where the query
column reads "FETCH 101 in c1" or somesuch rather than the query text.
That's great, *I* know it works, but I don't know how capture that
information from a vanilla postgres server, and I don't know if we can do
the regression with a loopback connection, or if we'd need to set up a
second pg instance for the regression test scaffolding.
Hello,
At Thu, 24 Dec 2015 18:31:42 -0500, Corey Huinker <corey.huinker@gmail.com> wrote in <CADkLM=fh+ZUEykcCDu8P0PPrOyYwLEp5OBRjKCe5O7swqDF65w@mail.gmail.com>
On Wed, Dec 23, 2015 at 3:08 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 12/23/15 12:15 PM, Corey Huinker wrote:
That's fair. I'm still at a loss for how to show that the fetch size was
respected by the remote server, suggestions welcome!A combination of repeat() and generate_series()?
I'm guessing it's not that obvious and that I'm missing something; can you
elaborate?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.comI'll try. So the basic test of whether the FDW respected the fetch limit is
this:1. create foreign server using postgres_fdw, create foreign table.
2. run a query against that table. it works. great.
3. alter server set fetch size option to 101 (or any number different from
100)
4. run same query against the table. the server side should show that the
result set was fetched in 101 row chunks[1].
5. alter table set fetch size option to 102 (or any number different from
100 and the number you picked in #3)
6. run same query against the table. the server side should show that the
result set was fetched in 102 row chunks[1].The parts marked [1] are the problem...because the way I know it works is
looking at the query console on the remote redshift cluster where the query
column reads "FETCH 101 in c1" or somesuch rather than the query text.
That's great, *I* know it works, but I don't know how capture that
information from a vanilla postgres server, and I don't know if we can do
the regression with a loopback connection, or if we'd need to set up a
second pg instance for the regression test scaffolding.
I believe it won't be needed as a regression test but DEBUGn
messages could be usable to see "what was sent to the remote
side". It can be shown to the console so easily done in
regression test. See the deocumentation for client_min_messages
for details. (It could receive far many messages then expected,
though, and maybe fragile for changing in other part.)
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Dec 25, 2015 at 12:31 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello,
At Thu, 24 Dec 2015 18:31:42 -0500, Corey Huinker <corey.huinker@gmail.com>
wrote in <CADkLM=
fh+ZUEykcCDu8P0PPrOyYwLEp5OBRjKCe5O7swqDF65w@mail.gmail.com>On Wed, Dec 23, 2015 at 3:08 PM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:
On 12/23/15 12:15 PM, Corey Huinker wrote:
That's fair. I'm still at a loss for how to show that the fetch size
was
respected by the remote server, suggestions welcome!
A combination of repeat() and generate_series()?
I'm guessing it's not that obvious and that I'm missing something; can
you
elaborate?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.comI'll try. So the basic test of whether the FDW respected the fetch limit
is
this:
1. create foreign server using postgres_fdw, create foreign table.
2. run a query against that table. it works. great.
3. alter server set fetch size option to 101 (or any number differentfrom
100)
4. run same query against the table. the server side should show that the
result set was fetched in 101 row chunks[1].
5. alter table set fetch size option to 102 (or any number different from
100 and the number you picked in #3)
6. run same query against the table. the server side should show that the
result set was fetched in 102 row chunks[1].The parts marked [1] are the problem...because the way I know it works is
looking at the query console on the remote redshift cluster where thequery
column reads "FETCH 101 in c1" or somesuch rather than the query text.
That's great, *I* know it works, but I don't know how capture that
information from a vanilla postgres server, and I don't know if we can do
the regression with a loopback connection, or if we'd need to set up a
second pg instance for the regression test scaffolding.I believe it won't be needed as a regression test but DEBUGn
messages could be usable to see "what was sent to the remote
side". It can be shown to the console so easily done in
regression test. See the deocumentation for client_min_messages
for details. (It could receive far many messages then expected,
though, and maybe fragile for changing in other part.)regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Ok, I'll see what debug-level messages reveal.
Have you got numbers showing any actual performance win for postgres_fdw?
For JDBC purposes, it would be nice to have an ability of asking
backend "to stop fetching if produced more than X MiB of response
data".
For small table (4 int4 fields), having decent fetchSize (~1000) makes
result processing 7 times faster than with fetchSize of 50 rows (14 ms
-> 2 ms for 2000 rows).
Here are the measurements: [1]https://github.com/pgjdbc/pgjdbc/issues/292#issue-82595473 and [2]https://github.com/pgjdbc/pgjdbc/issues/292#issuecomment-107019387.
Note: it is not required to precisely follow given "max fetch bytes"
limit. It would be enough just to stop after certain amount of data
was sent.
The whole thing of using limited fetch size is to avoid running out of
memory at client side.
I do not think developers care how many rows is fetched at once. It
they do, they should rather use "limit X" SQL syntax.
Do you have a suggestion for such a use case?
For fixed-size data types, JDBC driver can estimate "max sane fetch
size", however:
1) In order to know data types, a roundtrip is required. This means
the first fetch must be conservative, thus small queries would be
penalized.
2) For variable length types there is no way to estimate "sane number
of rows", except of using "average row size of already received data".
This is not reliable, especially if the first rows have nulls, and
subsequent ones contain non-empty strings.
[1]: https://github.com/pgjdbc/pgjdbc/issues/292#issue-82595473
[2]: https://github.com/pgjdbc/pgjdbc/issues/292#issuecomment-107019387
Vladimir
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Dec 26, 2015 at 6:16 AM, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:
Have you got numbers showing any actual performance win for postgres_fdw?
For JDBC purposes, it would be nice to have an ability of asking
backend "to stop fetching if produced more than X MiB of response
data".
For small table (4 int4 fields), having decent fetchSize (~1000) makes
result processing 7 times faster than with fetchSize of 50 rows (14 ms
-> 2 ms for 2000 rows).
Here are the measurements: [1] and [2].Note: it is not required to precisely follow given "max fetch bytes"
limit. It would be enough just to stop after certain amount of data
was sent.
The whole thing of using limited fetch size is to avoid running out of
memory at client side.
I do not think developers care how many rows is fetched at once. It
they do, they should rather use "limit X" SQL syntax.Do you have a suggestion for such a use case?
For fixed-size data types, JDBC driver can estimate "max sane fetch
size", however:
1) In order to know data types, a roundtrip is required. This means
the first fetch must be conservative, thus small queries would be
penalized.
2) For variable length types there is no way to estimate "sane number
of rows", except of using "average row size of already received data".
This is not reliable, especially if the first rows have nulls, and
subsequent ones contain non-empty strings.[1]: https://github.com/pgjdbc/pgjdbc/issues/292#issue-82595473
[2]: https://github.com/pgjdbc/pgjdbc/issues/292#issuecomment-107019387Vladimir
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I believe that Kyotaro proposed something like that, wherein the FDW would
be more adaptive based on the amount of memory available, and fetch a
number of rows that, by its estimation, would fit in the memory available.
I don't know the progress of that patch.
This patch is a far less complicated solution and puts the burden on the
DBA to figure out approximately how many rows would fit in memory based on
the average row size, and set the per-table option accordingly. If it is
later determined that the rows are now too heavy to fit into the space
allotted, the fetch size can be altered for that table as needed.