FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

Started by Gabe F. Rudyalmost 10 years ago5 messages
#1Gabe F. Rudy
rudy@goldenhelix.com

Hey all,

I'm building a FDW around a column-store backend (similar to CStore but for genomic data!).

I have tables in the billions of rows, and have a common query pattern of asking for the table size (i.e. SELECT COUNT(*) FROM big_fdw_table; ).

This is a read-optimized system in which I know in constant time the exact dimensions of the table.

Is there any way to convince Postgres FDW to leverage the analyze row counts or even the "double* totalRowCount" returned from the AcquireSampleRows callback from my AnalyzeForeignTable function so that it does not do a full-table scan for a COUNT(*) etc?

My current fallback is to export a specialized function that returns the table row count for a given FDW table, but that then leaks into the user-application driving these queries.

Thanks in advance!
Gabe

Gabe Rudy | VP Product & Engineering | Golden Helix, Inc.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gabe F. Rudy (#1)
Re: FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

"Gabe F. Rudy" <rudy@goldenhelix.com> writes:

Is there any way to convince Postgres FDW to leverage the analyze row counts or even the "double* totalRowCount" returned from the AcquireSampleRows callback from my AnalyzeForeignTable function so that it does not do a full-table scan for a COUNT(*) etc?

No. In PG's view, ANALYZE-based row counts are imprecise by definition.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Gabe F. Rudy (#1)
Re: FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

On 25 February 2016 at 09:48, Gabe F. Rudy <rudy@goldenhelix.com> wrote:

Hey all,

I’m building a FDW around a column-store backend (similar to CStore but
for genomic data!).

I have tables in the billions of rows, and have a common query pattern of
asking for the table size (i.e. SELECT COUNT(*) FROM big_fdw_table; ).

This is a read-optimized system in which I know in constant time the exact
dimensions of the table.

Is there any way to convince Postgres FDW to leverage the analyze row
counts or even the “double* totalRowCount” returned from the
AcquireSampleRows callback from my AnalyzeForeignTable function so that it
does not do a full-table scan for a COUNT(*) etc?

My current fallback is to export a specialized function that returns the
table row count for a given FDW table, but that then leaks into the
user-application driving these queries.

Look at TABLESAMPLE, which does mostly what you're asking.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Gabe F. Rudy
rudy@goldenhelix.com
In reply to: Tom Lane (#2)
Re: FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

Ok, I get that.

Really what I am *rooting* for is Aggregate (and Sort By) Push-Down to FDW plugins.

I can already internalize conditional filters for most cases, and doing a count on the filtered results would be considerably faster in my FDW back-end before all the records and Datums have to be constructed for postgres to do the counting.

Similarly, I'm very excited about the potential for FDW to advertise a-priori sort states, so things like external merge-sorts can pass-through the request for sorted data for fields in which sorting is a no-op in my backend.

Importantly my IDs are sorted by definition since they are essentially array indexes into the column-store, so joining on them with merge-sort should be blazing fast, but currently time is wasted sorting these pre-sorted fields.

Just my 2c, and I'll be tracking the 9.6 progress that includes some of these proposals.

Gabe

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, February 25, 2016 11:21 PM
To: Gabe F. Rudy <rudy@goldenhelix.com>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

"Gabe F. Rudy" <rudy@goldenhelix.com> writes:

Is there any way to convince Postgres FDW to leverage the analyze row counts or even the "double* totalRowCount" returned from the AcquireSampleRows callback from my AnalyzeForeignTable function so that it does not do a full-table scan for a COUNT(*) etc?

No. In PG's view, ANALYZE-based row counts are imprecise by definition.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5anantbhasu
anantbhasu1985@gmail.com
In reply to: Gabe F. Rudy (#4)
Re: FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

Hi Gabe,
Did you get Aggregate Pushdown FDW plugin?
Would be really helpful if you can share some insight on your investigation.
Regards
Anant

--
View this message in context: http://postgresql.nabble.com/FDW-handling-count-through-AnalyzeForeignTable-or-other-constant-time-push-down-tp5889291p5912699.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers