FDW handling count(*) through AnalyzeForeignTable or other constant time push-down
Hey all,
I'm building a FDW around a column-store backend (similar to CStore but for genomic data!).
I have tables in the billions of rows, and have a common query pattern of asking for the table size (i.e. SELECT COUNT(*) FROM big_fdw_table; ).
This is a read-optimized system in which I know in constant time the exact dimensions of the table.
Is there any way to convince Postgres FDW to leverage the analyze row counts or even the "double* totalRowCount" returned from the AcquireSampleRows callback from my AnalyzeForeignTable function so that it does not do a full-table scan for a COUNT(*) etc?
My current fallback is to export a specialized function that returns the table row count for a given FDW table, but that then leaks into the user-application driving these queries.
Thanks in advance!
Gabe
Gabe Rudy | VP Product & Engineering | Golden Helix, Inc.
"Gabe F. Rudy" <rudy@goldenhelix.com> writes:
Is there any way to convince Postgres FDW to leverage the analyze row counts or even the "double* totalRowCount" returned from the AcquireSampleRows callback from my AnalyzeForeignTable function so that it does not do a full-table scan for a COUNT(*) etc?
No. In PG's view, ANALYZE-based row counts are imprecise by definition.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 25 February 2016 at 09:48, Gabe F. Rudy <rudy@goldenhelix.com> wrote:
Hey all,
I’m building a FDW around a column-store backend (similar to CStore but
for genomic data!).I have tables in the billions of rows, and have a common query pattern of
asking for the table size (i.e. SELECT COUNT(*) FROM big_fdw_table; ).This is a read-optimized system in which I know in constant time the exact
dimensions of the table.Is there any way to convince Postgres FDW to leverage the analyze row
counts or even the “double* totalRowCount” returned from the
AcquireSampleRows callback from my AnalyzeForeignTable function so that it
does not do a full-table scan for a COUNT(*) etc?My current fallback is to export a specialized function that returns the
table row count for a given FDW table, but that then leaks into the
user-application driving these queries.
Look at TABLESAMPLE, which does mostly what you're asking.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Ok, I get that.
Really what I am *rooting* for is Aggregate (and Sort By) Push-Down to FDW plugins.
I can already internalize conditional filters for most cases, and doing a count on the filtered results would be considerably faster in my FDW back-end before all the records and Datums have to be constructed for postgres to do the counting.
Similarly, I'm very excited about the potential for FDW to advertise a-priori sort states, so things like external merge-sorts can pass-through the request for sorted data for fields in which sorting is a no-op in my backend.
Importantly my IDs are sorted by definition since they are essentially array indexes into the column-store, so joining on them with merge-sort should be blazing fast, but currently time is wasted sorting these pre-sorted fields.
Just my 2c, and I'll be tracking the 9.6 progress that includes some of these proposals.
Gabe
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, February 25, 2016 11:21 PM
To: Gabe F. Rudy <rudy@goldenhelix.com>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] FDW handling count(*) through AnalyzeForeignTable or other constant time push-down
"Gabe F. Rudy" <rudy@goldenhelix.com> writes:
Is there any way to convince Postgres FDW to leverage the analyze row counts or even the "double* totalRowCount" returned from the AcquireSampleRows callback from my AnalyzeForeignTable function so that it does not do a full-table scan for a COUNT(*) etc?
No. In PG's view, ANALYZE-based row counts are imprecise by definition.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Gabe,
Did you get Aggregate Pushdown FDW plugin?
Would be really helpful if you can share some insight on your investigation.
Regards
Anant
--
View this message in context: http://postgresql.nabble.com/FDW-handling-count-through-AnalyzeForeignTable-or-other-constant-time-push-down-tp5889291p5912699.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers