Ideas about a better API for postgres_fdw remote estimates
In <1116564.1593813043@sss.pgh.pa.us> I wrote:
I wonder whether someday we ought to invent a new API that's more
suited to postgres_fdw's needs than EXPLAIN is. It's not like the
remote planner doesn't know the number we want; it just fails to
include it in EXPLAIN.
I've been thinking about this a little more, and I'd like to get some
ideas down on electrons before they vanish.
The current method for postgres_fdw to obtain remote estimates is to
issue an EXPLAIN command to the remote server and then decipher the
result. This has just one big advantage, which is that it works
against existing, even very old, remote PG versions. In every other
way it's pretty awful: it involves a lot of cycles on the far end
to create output details we don't really care about, it requires a
fair amount of logic to parse that output, and we can't get some
details that we *do* care about (such as the total size of the foreign
table, as per the other discussion).
We can do better. I don't propose removing the existing logic, because
being able to work against old remote PG versions seems pretty useful.
But we could probe at connection start for whether the remote server
has support for a better way, and then use that way if available.
What should the better way look like? I suggest the following:
* Rather than adding a core-server feature, the remote-end part of the new
API should be a function installed by an extension (either postgres_fdw
itself, or a new extension "postgres_fdw_remote" or the like). One
attraction of this approach is that it'd be conceivable to back-port the
new code into existing PG releases by updating the extension. Also
there'd be room for multiple versions of the support. The
connection-start probe could be of the form "does this function exist
in pg_proc?".
* I'm imagining the function being of the form
function pg_catalog.postgres_fdw_support(query text) returns something
where the input is still the text of a query we're considering issuing,
and the output is some structure that contains the items of EXPLAIN-like
data we need, but not the items we don't. The implementation of the
function would run the query through parse/plan, then pick out the
data we want and return that.
* We could do a lot worse than to have the "structure" be JSON.
This'd allow structured, labeled data to be returned; it would not be
too difficult to construct, even in PG server versions predating the
addition of JSON logic to the core; and the receiving postgres_fdw
extension could use the core's JSON logic to parse the data.
* The contents of the structure need to be designed with forethought
for extensibility, but this doesn't seem hard if it's all a collection
of labeled fields. We can just say that the recipient must ignore
fields it doesn't recognize. Once a given field has been defined, we
can't change its contents, but we can introduce new fields as needed.
Note that I would not be in favor of putting an overall version number
within the structure; that's way too coarse-grained.
I'm not planning to do anything about these ideas myself, at least
not in the short term. But perhaps somebody else would like to
run with them.
regards, tom lane
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
In <1116564.1593813043@sss.pgh.pa.us> I wrote:
I wonder whether someday we ought to invent a new API that's more
suited to postgres_fdw's needs than EXPLAIN is. It's not like the
remote planner doesn't know the number we want; it just fails to
include it in EXPLAIN.I've been thinking about this a little more, and I'd like to get some
ideas down on electrons before they vanish.The current method for postgres_fdw to obtain remote estimates is to
issue an EXPLAIN command to the remote server and then decipher the
result. This has just one big advantage, which is that it works
against existing, even very old, remote PG versions. In every other
way it's pretty awful: it involves a lot of cycles on the far end
to create output details we don't really care about, it requires a
fair amount of logic to parse that output, and we can't get some
details that we *do* care about (such as the total size of the foreign
table, as per the other discussion).We can do better. I don't propose removing the existing logic, because
being able to work against old remote PG versions seems pretty useful.
But we could probe at connection start for whether the remote server
has support for a better way, and then use that way if available.
I agree we can, and should, try to do better here.
What should the better way look like? I suggest the following:
* Rather than adding a core-server feature, the remote-end part of the new
API should be a function installed by an extension (either postgres_fdw
itself, or a new extension "postgres_fdw_remote" or the like). One
attraction of this approach is that it'd be conceivable to back-port the
new code into existing PG releases by updating the extension. Also
there'd be room for multiple versions of the support. The
connection-start probe could be of the form "does this function exist
in pg_proc?".* I'm imagining the function being of the form
function pg_catalog.postgres_fdw_support(query text) returns something
where the input is still the text of a query we're considering issuing,
and the output is some structure that contains the items of EXPLAIN-like
data we need, but not the items we don't. The implementation of the
function would run the query through parse/plan, then pick out the
data we want and return that.* We could do a lot worse than to have the "structure" be JSON.
This'd allow structured, labeled data to be returned; it would not be
too difficult to construct, even in PG server versions predating the
addition of JSON logic to the core; and the receiving postgres_fdw
extension could use the core's JSON logic to parse the data.
I also tend to agree with using JSON for this.
* The contents of the structure need to be designed with forethought
for extensibility, but this doesn't seem hard if it's all a collection
of labeled fields. We can just say that the recipient must ignore
fields it doesn't recognize. Once a given field has been defined, we
can't change its contents, but we can introduce new fields as needed.
Note that I would not be in favor of putting an overall version number
within the structure; that's way too coarse-grained.
This also makes sense to me.
I'm not planning to do anything about these ideas myself, at least
not in the short term. But perhaps somebody else would like to
run with them.
I'm trying to figure out why it makes more sense to use
'postgres_fdw_support(query text)', which would still do parse/plan and
return EXPLAIN-like data, rather than having:
EXPLAIN (FORMAT JSON, FDW true) query ...
(Or, perhaps better, individual boolean options for whatever stuff we
want to ask for, or to exclude if we don't want it, so that other tools
could use this...).
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
* Rather than adding a core-server feature, the remote-end part of the new
API should be a function installed by an extension (either postgres_fdw
itself, or a new extension "postgres_fdw_remote" or the like).
I'm trying to figure out why it makes more sense to use
'postgres_fdw_support(query text)', which would still do parse/plan and
return EXPLAIN-like data, rather than having:
EXPLAIN (FORMAT JSON, FDW true) query ...
I see a couple of reasons not to do it like that:
1. This is specific to postgres_fdw. Some other extension might want some
other data, and different versions of postgres_fdw might want different
data. So putting it into core seems like the wrong thing.
2. Wedging this into EXPLAIN would be quite ugly, because (at least
as I envision it) the output would have just about nothing to do with
any existing EXPLAIN output.
3. We surely would not back-patch a core change like this. OTOH, if
the added infrastructure is in an extension, somebody might want to
back-patch that (even if unofficially). This argument falls to the
ground of course if we're forced to make any core changes to be able
to get at the data we need; but I'm not sure that will be needed.
regards, tom lane
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
* Rather than adding a core-server feature, the remote-end part of the new
API should be a function installed by an extension (either postgres_fdw
itself, or a new extension "postgres_fdw_remote" or the like).I'm trying to figure out why it makes more sense to use
'postgres_fdw_support(query text)', which would still do parse/plan and
return EXPLAIN-like data, rather than having:EXPLAIN (FORMAT JSON, FDW true) query ...
I see a couple of reasons not to do it like that:
1. This is specific to postgres_fdw. Some other extension might want some
other data, and different versions of postgres_fdw might want different
data. So putting it into core seems like the wrong thing.
Another extension or use-case might want exactly the same information
too though. In a way, we'd be 'hiding' that information from other
potential users unless they want to install their own extension, which
is a pretty big leap. Are we sure this information wouldn't be at all
interesting to pgAdmin4 or explain.depesz.com?
2. Wedging this into EXPLAIN would be quite ugly, because (at least
as I envision it) the output would have just about nothing to do with
any existing EXPLAIN output.
This is a better argument for not making it part of EXPLAIN, though I
don't really feel like I've got a decent idea of what you are suggesting
the output *would* look like, so it's difficult for me to agree (or
disagree) about this particular point.
3. We surely would not back-patch a core change like this. OTOH, if
the added infrastructure is in an extension, somebody might want to
back-patch that (even if unofficially). This argument falls to the
ground of course if we're forced to make any core changes to be able
to get at the data we need; but I'm not sure that will be needed.
Since postgres_fdw is part of core and core's release cycle, and the
packagers manage the extensions from core in a way that they have to
match up, this argument doesn't hold any weight with me. For this to be
a viable argument, we would need to segregate extensions from core and
give them their own release cycle and clear indication of which
extension versions work with which PG major versions, etc. I'm actually
generally in support of *that* idea- and with that, would agree with
your point 3 above, but that's not the reality of today.
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
2. Wedging this into EXPLAIN would be quite ugly, because (at least
as I envision it) the output would have just about nothing to do with
any existing EXPLAIN output.
This is a better argument for not making it part of EXPLAIN, though I
don't really feel like I've got a decent idea of what you are suggesting
the output *would* look like, so it's difficult for me to agree (or
disagree) about this particular point.
Per postgres_fdw's get_remote_estimate(), the only data we use right now
is the startup_cost, total_cost, rows and width estimates from the
top-level Plan node. That's available immediately from the Plan tree,
meaning that basically *nothing* of the substantial display effort
expended by explain.c and ruleutils.c is of any value. So the level-zero
implementation of this would be to run the parser and planner, format
those four numbers into a JSON object (which would require little more
infrastructure than sprintf), and return that. Sure, we could make that
into some kind of early-exit path in explain.c, but I think it'd be a
pretty substantial wart, especially since it'd mean that none of the
other EXPLAIN options are sensible in combination with this one.
Further down the road, we might want to rethink the whole idea of
completely constructing a concrete Plan. We could get the data we need
at the list-of-Paths stage. Even more interesting, we could (with very
little more work) return data about multiple Paths, so that the client
could find out, for example, the costs of sorted and unsorted output
without paying two network round trips to discover that. That'd
definitely require changes in the core planner, since it has no API to
stop at that point. And it's even less within the charter of EXPLAIN.
I grant your point that there might be other users for this besides
postgres_fdw, but that doesn't mean it must be a core feature.
3. We surely would not back-patch a core change like this. OTOH, if
the added infrastructure is in an extension, somebody might want to
back-patch that (even if unofficially).
Since postgres_fdw is part of core and core's release cycle, and the
packagers manage the extensions from core in a way that they have to
match up, this argument doesn't hold any weight with me.
Certainly only v14 (or whenever) and later postgres_fdw would be able
to *use* this data. The scenario I'm imagining is that somebody wants
to be able to use that client against an older remote server, and is
willing to install some simple extension on the remote server to do so.
Perhaps this scenario is not worth troubling over, but I don't think
it's entirely far-fetched.
regards, tom lane
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
2. Wedging this into EXPLAIN would be quite ugly, because (at least
as I envision it) the output would have just about nothing to do with
any existing EXPLAIN output.This is a better argument for not making it part of EXPLAIN, though I
don't really feel like I've got a decent idea of what you are suggesting
the output *would* look like, so it's difficult for me to agree (or
disagree) about this particular point.Per postgres_fdw's get_remote_estimate(), the only data we use right now
is the startup_cost, total_cost, rows and width estimates from the
top-level Plan node. That's available immediately from the Plan tree,
meaning that basically *nothing* of the substantial display effort
expended by explain.c and ruleutils.c is of any value. So the level-zero
The 'display effort' you're referring to, when using JSON format with
explain, is basically to format the results into JSON and return them-
which is what you're suggesting this mode would do anyway, no..?
If the remote side 'table' is actually a view that's complicated then
having a way to get just the top-level information (and excluding the
rest) sounds like it'd be useful and perhaps excluding that other info
doesn't really fit into EXPLAIN's mandate, but that's also much less
common.
implementation of this would be to run the parser and planner, format
those four numbers into a JSON object (which would require little more
infrastructure than sprintf), and return that. Sure, we could make that
into some kind of early-exit path in explain.c, but I think it'd be a
pretty substantial wart, especially since it'd mean that none of the
other EXPLAIN options are sensible in combination with this one.
That EXPLAIN has options that only make sense in combination with
certain other options isn't anything new- BUFFERS makes no sense without
ANALYZE, etc.
Further down the road, we might want to rethink the whole idea of
completely constructing a concrete Plan. We could get the data we need
at the list-of-Paths stage. Even more interesting, we could (with very
little more work) return data about multiple Paths, so that the client
could find out, for example, the costs of sorted and unsorted output
without paying two network round trips to discover that. That'd
definitely require changes in the core planner, since it has no API to
stop at that point. And it's even less within the charter of EXPLAIN.
I have to admit that I'm not really sure how we could make it work, but
having a way to get multiple paths returned by EXPLAIN would certainly
be interesting to a lot of users. Certainly it's easier to see how we
could get at that info in a postgres_fdw-specific function, and be able
to understand how to deal with it there and what could be done, but once
it's there I wonder if other tools might see that and possibly even
build on it because it'd be the only way to get that kind of info, which
certainly wouldn't be ideal.
I grant your point that there might be other users for this besides
postgres_fdw, but that doesn't mean it must be a core feature.
That postgres_fdw is an extension is almost as much of a wart as
anything being discussed here and suggesting that things added to
postgres_fdw aren't 'core features' seems akin to ignoring the forest
for the trees- consider that, today, there isn't even an option to
install only the core server from the PGDG repos (at least for Debian /
Ubuntu, not sure if the RPMs have caught up to that yet, but they
probably should). The 'postgresql-12' .deb includes all the extensions
that are part of the core git repo, because they're released and
maintained just the same as the core server and, from a practical
perspective, to run a decent PG system you really should have them
installed, so why bother having a separate package?
3. We surely would not back-patch a core change like this. OTOH, if
the added infrastructure is in an extension, somebody might want to
back-patch that (even if unofficially).Since postgres_fdw is part of core and core's release cycle, and the
packagers manage the extensions from core in a way that they have to
match up, this argument doesn't hold any weight with me.Certainly only v14 (or whenever) and later postgres_fdw would be able
to *use* this data. The scenario I'm imagining is that somebody wants
to be able to use that client against an older remote server, and is
willing to install some simple extension on the remote server to do so.
Perhaps this scenario is not worth troubling over, but I don't think
it's entirely far-fetched.
I definitely don't think that such an extension should be maintained
outside of core, and I seriously doubt any of our packagers would be
anxious to build an indepedent package for this to be usable in older
servers. Sure, it's possible someone will care about this enough to
spend the effort to try and build it for an older version and use it but
I definitely don't think we should be considering that a serious design
goal or a reason to put this capability in a separate extension.
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Per postgres_fdw's get_remote_estimate(), the only data we use right now
is the startup_cost, total_cost, rows and width estimates from the
top-level Plan node. That's available immediately from the Plan tree,
meaning that basically *nothing* of the substantial display effort
expended by explain.c and ruleutils.c is of any value. So the level-zero
The 'display effort' you're referring to, when using JSON format with
explain, is basically to format the results into JSON and return them-
which is what you're suggesting this mode would do anyway, no..?
Not hardly. Spend some time studying ruleutils.c sometime ---
reverse-compiling a plan is *expensive*. For instance, we have to
look up the names of all the operators used in the query quals,
decide what needs quoting, decide what needs parenthesization, etc.
There's also a fun little bit that assigns unique aliases to each
table appearing in the query, which from memory is at least O(N^2)
and maybe worse. (Admittedly, shipped queries are usually not so
complicated that N would be large.) And by the way, we're also
starting up the executor, even if you didn't say ANALYZE.
A little bit of fooling with "perf" suggests that when explaining
a pretty simple bitmapscan query --- I used
EXPLAIN SELECT * FROM tenk1 WHERE unique1 > 9995
which ought to be somewhat representative of what postgres_fdw needs
--- only about half of the runtime is spent within pg_plan_query, and
the other half is spent on explain.c + ruleutils.c formatting work.
So while getting rid of that overhead wouldn't be an earthshattering
improvement, I think it'd be worthwhile.
Further down the road, we might want to rethink the whole idea of
completely constructing a concrete Plan. We could get the data we need
at the list-of-Paths stage. Even more interesting, we could (with very
little more work) return data about multiple Paths, so that the client
could find out, for example, the costs of sorted and unsorted output
without paying two network round trips to discover that.
I have to admit that I'm not really sure how we could make it work, but
having a way to get multiple paths returned by EXPLAIN would certainly
be interesting to a lot of users. Certainly it's easier to see how we
could get at that info in a postgres_fdw-specific function, and be able
to understand how to deal with it there and what could be done, but once
it's there I wonder if other tools might see that and possibly even
build on it because it'd be the only way to get that kind of info, which
certainly wouldn't be ideal.
Yeah, thinking about it as a function that inspects partial planner
results, it might be useful for other purposes besides postgres_fdw.
As I said before, I don't think this necessarily has to be bundled as
part of postgres_fdw. That still doesn't make it part of EXPLAIN.
That postgres_fdw is an extension is almost as much of a wart as
anything being discussed here and suggesting that things added to
postgres_fdw aren't 'core features' seems akin to ignoring the forest
for the trees-
I think we just had this discussion in another thread. The fact that
postgres_fdw is an extension is a feature, not a bug, because (a) it
means that somebody could implement their own version if they wanted
it to act differently; and (b) it keeps us honest about whether the
APIs needed by an FDW are accessible from outside core. I think moving
postgres_fdw into core would be a large step backwards.
regards, tom lane
Greetings,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Stephen Frost <sfrost@snowman.net> writes:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Per postgres_fdw's get_remote_estimate(), the only data we use right now
is the startup_cost, total_cost, rows and width estimates from the
top-level Plan node. That's available immediately from the Plan tree,
meaning that basically *nothing* of the substantial display effort
expended by explain.c and ruleutils.c is of any value. So the level-zeroThe 'display effort' you're referring to, when using JSON format with
explain, is basically to format the results into JSON and return them-
which is what you're suggesting this mode would do anyway, no..?Not hardly. Spend some time studying ruleutils.c sometime ---
reverse-compiling a plan is *expensive*. For instance, we have to
look up the names of all the operators used in the query quals,
decide what needs quoting, decide what needs parenthesization, etc.
Ah, alright, that makes more sense then.
There's also a fun little bit that assigns unique aliases to each
table appearing in the query, which from memory is at least O(N^2)
and maybe worse. (Admittedly, shipped queries are usually not so
complicated that N would be large.) And by the way, we're also
starting up the executor, even if you didn't say ANALYZE.A little bit of fooling with "perf" suggests that when explaining a pretty simple bitmapscan query --- I used EXPLAIN SELECT * FROM tenk1 WHERE unique1 > 9995 which ought to be somewhat representative of what postgres_fdw needs --- only about half of the runtime is spent within pg_plan_query, and the other half is spent on explain.c + ruleutils.c formatting work. So while getting rid of that overhead wouldn't be an earthshattering improvement, I think it'd be worthwhile.
Sure.
Further down the road, we might want to rethink the whole idea of
completely constructing a concrete Plan. We could get the data we need
at the list-of-Paths stage. Even more interesting, we could (with very
little more work) return data about multiple Paths, so that the client
could find out, for example, the costs of sorted and unsorted output
without paying two network round trips to discover that.I have to admit that I'm not really sure how we could make it work, but
having a way to get multiple paths returned by EXPLAIN would certainly
be interesting to a lot of users. Certainly it's easier to see how we
could get at that info in a postgres_fdw-specific function, and be able
to understand how to deal with it there and what could be done, but once
it's there I wonder if other tools might see that and possibly even
build on it because it'd be the only way to get that kind of info, which
certainly wouldn't be ideal.Yeah, thinking about it as a function that inspects partial planner
results, it might be useful for other purposes besides postgres_fdw.
As I said before, I don't think this necessarily has to be bundled as
part of postgres_fdw. That still doesn't make it part of EXPLAIN.
Providing it as a function rather than through EXPLAIN does make a bit
more sense if we're going to skip things like the lookups you mention
above. I'm still inclined to have it be a part of core rather than
having it as postgres_fdw though. I'm not completely against it being
part of postgres_fdw... but I would think that would really be
appropriate if it's actually using something in postgres_fdw, but if
everything that it's doing is part of core and nothing related
specifically to the postgres FDW, then having it as part of core makes
more sense to me. Also, having it as part of core would make it more
appropriate for other tools to look at and adding that kind of
inspection capability for partial planner results could be very
interesting for tools like pgAdmin and such.
That postgres_fdw is an extension is almost as much of a wart as
anything being discussed here and suggesting that things added to
postgres_fdw aren't 'core features' seems akin to ignoring the forest
for the trees-I think we just had this discussion in another thread. The fact that
postgres_fdw is an extension is a feature, not a bug, because (a) it
means that somebody could implement their own version if they wanted
it to act differently; and (b) it keeps us honest about whether the
APIs needed by an FDW are accessible from outside core. I think moving
postgres_fdw into core would be a large step backwards.
I'm not looking to change it today, as that ship has sailed, but while
having FDWs as a general capability that can be implemented by
extensions is certainly great and I'd love to see more of that (even
better would be more of those that are well maintained and cared for by
this community of folks), requiring users to install an extension into
every database where they want to query another PG server from isn't a
feature.
Thanks,
Stephen
On Mon, Jul 6, 2020 at 11:28:28AM -0400, Stephen Frost wrote:
Yeah, thinking about it as a function that inspects partial planner
results, it might be useful for other purposes besides postgres_fdw.
As I said before, I don't think this necessarily has to be bundled as
part of postgres_fdw. That still doesn't make it part of EXPLAIN.Providing it as a function rather than through EXPLAIN does make a bit
more sense if we're going to skip things like the lookups you mention
above. I'm still inclined to have it be a part of core rather than
having it as postgres_fdw though. I'm not completely against it being
part of postgres_fdw... but I would think that would really be
appropriate if it's actually using something in postgres_fdw, but if
everything that it's doing is part of core and nothing related
specifically to the postgres FDW, then having it as part of core makes
more sense to me. Also, having it as part of core would make it more
appropriate for other tools to look at and adding that kind of
inspection capability for partial planner results could be very
interesting for tools like pgAdmin and such.
I agree the statistics extraction should probably be part of core.
There is the goal if FDWs returning data, and returning the data
quickly. I think we can require all-new FDW servers to get improved
performance. I am not even clear if we have a full understanding of the
performance characteristics of FDWs yet. I know Tomas did some research
on its DML behavior, but other than that, I haven't seen much.
On a related note, I have wished to be able to see all the costs
associated with plans not chosen, and I think others would like that as
well. Getting multiple costs for a query goes in that direction.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
On 7/14/20 6:32 AM, Bruce Momjian wrote:
On Mon, Jul 6, 2020 at 11:28:28AM -0400, Stephen Frost wrote:
Yeah, thinking about it as a function that inspects partial planner
results, it might be useful for other purposes besides postgres_fdw.
As I said before, I don't think this necessarily has to be bundled as
part of postgres_fdw. That still doesn't make it part of EXPLAIN.Providing it as a function rather than through EXPLAIN does make a bit
more sense if we're going to skip things like the lookups you mention
above. I'm still inclined to have it be a part of core rather than
having it as postgres_fdw though. I'm not completely against it being
part of postgres_fdw... but I would think that would really be
appropriate if it's actually using something in postgres_fdw, but if
everything that it's doing is part of core and nothing related
specifically to the postgres FDW, then having it as part of core makes
more sense to me. Also, having it as part of core would make it more
appropriate for other tools to look at and adding that kind of
inspection capability for partial planner results could be very
interesting for tools like pgAdmin and such.I agree the statistics extraction should probably be part of core.
There is the goal if FDWs returning data, and returning the data
quickly. I think we can require all-new FDW servers to get improved
performance. I am not even clear if we have a full understanding of the
performance characteristics of FDWs yet. I know Tomas did some research
on its DML behavior, but other than that, I haven't seen much.On a related note, I have wished to be able to see all the costs
associated with plans not chosen, and I think others would like that as
well. Getting multiple costs for a query goes in that direction.
During the implementation of sharding related improvements i noticed
that if we use a lot of foreign partitions, we have bad plans because of
vacuum don't update statistics of foreign tables.This is done by the
ANALYZE command, but it is very expensive operation for foreign table.
Problem with statistics demonstrates with TAP-test from the first patch
in attachment.
I implemented some FDW + pg core machinery to reduce weight of the
problem. The ANALYZE command on foreign table executes query on foreign
server that extracts statistics tuple, serializes it into json-formatted
string and returns to the caller. The caller deserializes this string,
generates statistics for this foreign table and update it. The second
patch is a proof-of-concept.
This patch speedup analyze command and provides statistics relevance on
a foreign table after autovacuum operation. Its effectiveness depends on
relevance of statistics on the remote server, but still.
--
regards,
Andrey Lepikhov
Postgres Professional
Attachments:
0001-TAP-Test-on-bad-statistics.patchtext/x-patch; charset=UTF-8; name=0001-TAP-Test-on-bad-statistics.patchDownload+59-4
0002-Pull-statistic-for-a-foreign-table-from-remote-serve.patchtext/x-patch; charset=UTF-8; name=0002-Pull-statistic-for-a-foreign-table-from-remote-serve.patchDownload+798-22
Greetings,
* Andrey Lepikhov (a.lepikhov@postgrespro.ru) wrote:
During the implementation of sharding related improvements i noticed that if
we use a lot of foreign partitions, we have bad plans because of vacuum
don't update statistics of foreign tables.This is done by the ANALYZE
command, but it is very expensive operation for foreign table.
Problem with statistics demonstrates with TAP-test from the first patch in
attachment.
Yes, the way we handle ANALYZE today for FDWs is pretty terrible, since
we stream the entire table across to do it.
I implemented some FDW + pg core machinery to reduce weight of the problem.
The ANALYZE command on foreign table executes query on foreign server that
extracts statistics tuple, serializes it into json-formatted string and
returns to the caller. The caller deserializes this string, generates
statistics for this foreign table and update it. The second patch is a
proof-of-concept.
Isn't this going to create a version dependency that we'll need to deal
with..? What if a newer major version has some kind of improved ANALYZE
command, in terms of what it looks at or stores, and it's talking to an
older server?
When I was considering the issue with ANALYZE and FDWs, I had been
thinking it'd make sense to just change the query that's built in
deparseAnalyzeSql() to have a TABLESAMPLE clause, but otherwise run in
more-or-less the same manner as today. If we don't like the available
TABLESAMPLE methods then we could add a new one that's explicitly the
'right' sample for an ANALYZE call and use that when it's available on
the remote side. Not sure if it'd make any sense for ANALYZE itself to
start using that same TABLESAMPLE code, but maybe? Not that I think
it'd be much of an issue if it's independent either, with appropriate
comments to note that we should probably try to make them match up for
the sake of FDWs.
This patch speedup analyze command and provides statistics relevance on a
foreign table after autovacuum operation. Its effectiveness depends on
relevance of statistics on the remote server, but still.
If we do decide to go down this route, wouldn't it mean we'd have to
solve the problem of what to do when it's a 9.6 foreign server being
queried from a v12 server and dealing with any difference in the
statistics structures of the two?
Seems like we would... in which case I would say that we should pull
that bit out and make it general, and use it for pg_upgrade too, which
would benefit a great deal from having the ability to upgrade stats
between major versions also. That's a much bigger piece to take on, of
course, but seems to be what's implied with this approach for the FDW.
Thanks,
Stephen
Stephen Frost <sfrost@snowman.net> writes:
Isn't this going to create a version dependency that we'll need to deal
with..? What if a newer major version has some kind of improved ANALYZE
command, in terms of what it looks at or stores, and it's talking to an
older server?
Yeah, this proposal is a nonstarter unless it can deal with the remote
server being a different PG version with different stats.
Years ago (when I was still at Salesforce, IIRC, so ~5 years) we had
some discussions about making it possible for pg_dump and/or pg_upgrade
to propagate stats data forward to the new database. There is at least
one POC patch in the archives for doing that by dumping the stats data
wrapped in a function call, where the target database's version of the
function would be responsible for adapting the data if necessary, or
maybe just discarding it if it couldn't adapt. We seem to have lost
interest but it still seems like something worth pursuing. I'd guess
that if such infrastructure existed it could be helpful for this.
When I was considering the issue with ANALYZE and FDWs, I had been
thinking it'd make sense to just change the query that's built in
deparseAnalyzeSql() to have a TABLESAMPLE clause, but otherwise run in
more-or-less the same manner as today.
+1, that seems like something worth doing in any case, since even if
we do get somewhere with the present idea it would only work for new
remote servers. TABLESAMPLE would work pretty far back (9.5,
looks like).
regards, tom lane
On 8/29/20 9:22 PM, Stephen Frost wrote:
I implemented some FDW + pg core machinery to reduce weight of the problem.
The ANALYZE command on foreign table executes query on foreign server that
extracts statistics tuple, serializes it into json-formatted string and
returns to the caller. The caller deserializes this string, generates
statistics for this foreign table and update it. The second patch is a
proof-of-concept.Isn't this going to create a version dependency that we'll need to deal
with..? What if a newer major version has some kind of improved ANALYZE
command, in terms of what it looks at or stores, and it's talking to an
older server?When I was considering the issue with ANALYZE and FDWs, I had been
thinking it'd make sense to just change the query that's built in
deparseAnalyzeSql() to have a TABLESAMPLE clause, but otherwise run in
more-or-less the same manner as today. If we don't like the available
TABLESAMPLE methods then we could add a new one that's explicitly the
'right' sample for an ANALYZE call and use that when it's available on
the remote side. Not sure if it'd make any sense for ANALYZE itself to
start using that same TABLESAMPLE code, but maybe? Not that I think
it'd be much of an issue if it's independent either, with appropriate
comments to note that we should probably try to make them match up for
the sake of FDWs.
This approach does not contradict your idea. This is a lightweight
opportunity to reduce the cost of analysis if we have a set of servers
with actual versions of system catalog and fdw.
This patch speedup analyze command and provides statistics relevance on a
foreign table after autovacuum operation. Its effectiveness depends on
relevance of statistics on the remote server, but still.If we do decide to go down this route, wouldn't it mean we'd have to
solve the problem of what to do when it's a 9.6 foreign server being
queried from a v12 server and dealing with any difference in the
statistics structures of the two?Seems like we would... in which case I would say that we should pull
that bit out and make it general, and use it for pg_upgrade too, which
would benefit a great deal from having the ability to upgrade stats
between major versions also. That's a much bigger piece to take on, of
course, but seems to be what's implied with this approach for the FDW.
Thank you for this use case.
We can add field "version" to statistics string (btree uses versioning
too). As you can see, in this patch we are only trying to get
statistics. If for some reason this does not work out, then we go along
a difficult route.
Moreover, I believe this strategy should only work if we analyze a
relation implicitly. If the user executes analysis explicitly by the
command "ANALYZE <relname>", we need to perform an fair analysis of the
table.
--
regards,
Andrey Lepikhov
Postgres Professional
On 8/29/20 9:50 PM, Tom Lane wrote:
Years ago (when I was still at Salesforce, IIRC, so ~5 years) we had
some discussions about making it possible for pg_dump and/or pg_upgrade
to propagate stats data forward to the new database. There is at least
one POC patch in the archives for doing that by dumping the stats data
wrapped in a function call, where the target database's version of the
function would be responsible for adapting the data if necessary, or
maybe just discarding it if it couldn't adapt. We seem to have lost
interest but it still seems like something worth pursuing. I'd guess
that if such infrastructure existed it could be helpful for this.
Thanks for this helpful feedback.
I found several threads related to the problem [1-3].
I agreed that this task needs to implement an API for
serialization/deserialization of statistics:
pg_load_relation_statistics(json_string text);
pg_get_relation_statistics(relname text);
We can use a version number for resolving conflicts with different
statistics implementations.
"Load" function will validate the values[] anyarray while deserializing
the input json string to the datatype of the relation column.
Maybe I didn't feel all the problems of this task?
1. /messages/by-id/724322880.K8vzik8zPz@abook
2.
/messages/by-id/CAAZKuFaWdLkK8eozSAooZBets9y_mfo2HS6urPAKXEPbd-JLCA@mail.gmail.com
3.
/messages/by-id/GNELIHDDFBOCMGBFGEFOOEOPCBAA.chriskl@familyhealth.com.au
--
regards,
Andrey Lepikhov
Postgres Professional
On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
Thanks for this helpful feedback.
I found several threads related to the problem [1-3].
I agreed that this task needs to implement an API for
serialization/deserialization of statistics:
pg_load_relation_statistics(json_string text);
pg_get_relation_statistics(relname text);
We can use a version number for resolving conflicts with different
statistics implementations.
"Load" function will validate the values[] anyarray while deserializing
the input json string to the datatype of the relation column.
This is a valuable feature. Analysing a foreign table by fetching rows
from the foreign server isn't very efficient. In fact the current FDW
API for doing that forges that in-efficiency by requiring the FDW to
return a sample of rows that will be analysed by the core. That's why
I see that your patch introduces a new API to get foreign rel stat. I
don't think there's any point in maintaining these two APIs just for
ANALYSING table. Instead we should have only one FDW API which will do
whatever it wants and return statistics that can be understood by the
core and let core install it in the catalogs. I believe that's doable.
In case of PostgreSQL it could get the stats available as is from the
foreign server, convert it into a form that the core understands and
returns. The patch introduces a new function postgres_fdw_stat() which
will be available only from version 14 onwards. Can we use
row_to_json(), which is available in all the supported versions,
instead?
In case of some other foreign server, an FDW will be responsible to
return statistics in a form that the core will understand. It may
fetch rows from the foreign server or be a bit smart and fetch the
statistics and convert.
This also means that FDWs will have to deal with the statistics format
that the core understands and thus will need changes in their code
with every version in the worst case. But AFAIR, PostgreSQL supports
different forms of statistics so the problem may not remain that
severe if FDWs and core agree on some bare minimum format that the
core supports for long.
I think the patch has some other problems like it works only for
regular tables on foreign server but a foreign table can be pointing
to any relation like a materialized view, partitioned table or a
foreign table on the foreign server all of which have statistics
associated with them. I didn't look closely but it does not consider
that the foreign table may not have all the columns from the relation
on the foreign server or may have different names. But I think those
problems are kind of secondary. We have to agree on the design first.
--
Best Wishes,
Ashutosh Bapat
On Sat, Aug 29, 2020 at 12:50:59PM -0400, Tom Lane wrote:
Stephen Frost <sfrost@snowman.net> writes:
Isn't this going to create a version dependency that we'll need to deal
with..? What if a newer major version has some kind of improved ANALYZE
command, in terms of what it looks at or stores, and it's talking to an
older server?Yeah, this proposal is a nonstarter unless it can deal with the remote
server being a different PG version with different stats.Years ago (when I was still at Salesforce, IIRC, so ~5 years) we had
some discussions about making it possible for pg_dump and/or pg_upgrade
to propagate stats data forward to the new database. There is at least
one POC patch in the archives for doing that by dumping the stats data
wrapped in a function call, where the target database's version of the
function would be responsible for adapting the data if necessary, or
maybe just discarding it if it couldn't adapt. We seem to have lost
interest but it still seems like something worth pursuing. I'd guess
that if such infrastructure existed it could be helpful for this.
I don't think there was enough value to do statistics migration just for
pg_upgrade, but doing it for pg_upgrade and FDWs seems like it might
have enough demand to justify the required work and maintenance.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
Greetings,
* Bruce Momjian (bruce@momjian.us) wrote:
On Sat, Aug 29, 2020 at 12:50:59PM -0400, Tom Lane wrote:
Stephen Frost <sfrost@snowman.net> writes:
Isn't this going to create a version dependency that we'll need to deal
with..? What if a newer major version has some kind of improved ANALYZE
command, in terms of what it looks at or stores, and it's talking to an
older server?Yeah, this proposal is a nonstarter unless it can deal with the remote
server being a different PG version with different stats.Years ago (when I was still at Salesforce, IIRC, so ~5 years) we had
some discussions about making it possible for pg_dump and/or pg_upgrade
to propagate stats data forward to the new database. There is at least
one POC patch in the archives for doing that by dumping the stats data
wrapped in a function call, where the target database's version of the
function would be responsible for adapting the data if necessary, or
maybe just discarding it if it couldn't adapt. We seem to have lost
interest but it still seems like something worth pursuing. I'd guess
that if such infrastructure existed it could be helpful for this.I don't think there was enough value to do statistics migration just for
pg_upgrade, but doing it for pg_upgrade and FDWs seems like it might
have enough demand to justify the required work and maintenance.
Not sure that it really matters much, but I disagree with the assessment
that there wasn't enough value to do it for pg_upgrade; I feel that it
just hasn't been something that's had enough people interested in
working on it, which isn't the same thing.
If a good patch showed up tomorrow, with someone willing to spend time
on it, I definitely think it'd be something we should include even if
it's just for pg_upgrade. A solution that works for both pg_upgrade and
the postgres FDW would be even better, of course.
Thanks,
Stephen
On Mon, Aug 31, 2020 at 12:19:52PM -0400, Stephen Frost wrote:
* Bruce Momjian (bruce@momjian.us) wrote:
I don't think there was enough value to do statistics migration just for
pg_upgrade, but doing it for pg_upgrade and FDWs seems like it might
have enough demand to justify the required work and maintenance.Not sure that it really matters much, but I disagree with the assessment
that there wasn't enough value to do it for pg_upgrade; I feel that it
just hasn't been something that's had enough people interested in
working on it, which isn't the same thing.
I am not sure what point you are trying to make, but if it had enough
value, wouldn't people work on it, or are you saying that it had enough
value, but people didn't realize it, so didn't work on it? I guess I
can see that. For me, it was the maintenance burden that always scared
me from getting involved since it would be the rare case where
pg_upgrade would have to be modified for perhaps every major release.
If a good patch showed up tomorrow, with someone willing to spend time
on it, I definitely think it'd be something we should include even if
it's just for pg_upgrade. A solution that works for both pg_upgrade and
the postgres FDW would be even better, of course.
Yep, see above. The problem isn't mostly the initial patch, but someone
who is going to work on it and test it for every major release,
potentially forever. Frankly, this is a pg_dump feature, rather than
something pg_upgrade should be doing, because not having to run ANALYZE
after restoring a dump is also a needed feature.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee
Greetings,
* Bruce Momjian (bruce@momjian.us) wrote:
On Mon, Aug 31, 2020 at 12:19:52PM -0400, Stephen Frost wrote:
* Bruce Momjian (bruce@momjian.us) wrote:
I don't think there was enough value to do statistics migration just for
pg_upgrade, but doing it for pg_upgrade and FDWs seems like it might
have enough demand to justify the required work and maintenance.Not sure that it really matters much, but I disagree with the assessment
that there wasn't enough value to do it for pg_upgrade; I feel that it
just hasn't been something that's had enough people interested in
working on it, which isn't the same thing.I am not sure what point you are trying to make, but if it had enough
value, wouldn't people work on it, or are you saying that it had enough
value, but people didn't realize it, so didn't work on it? I guess I
can see that. For me, it was the maintenance burden that always scared
me from getting involved since it would be the rare case where
pg_upgrade would have to be modified for perhaps every major release.
The point I was making was that it has value and people did realize it
but there's only so many resources to go around when it comes to hacking
on PG and therefore it simply hasn't been done yet.
There's a big difference between "yes, we all agree that would be good
to have, but no one has had time to work on it" and "we don't think this
is worth having because of the maintenance work it'd require." The
latter shuts down anyone thinking of working on it, which is why I said
anything.
If a good patch showed up tomorrow, with someone willing to spend time
on it, I definitely think it'd be something we should include even if
it's just for pg_upgrade. A solution that works for both pg_upgrade and
the postgres FDW would be even better, of course.Yep, see above. The problem isn't mostly the initial patch, but someone
who is going to work on it and test it for every major release,
potentially forever. Frankly, this is a pg_dump feature, rather than
something pg_upgrade should be doing, because not having to run ANALYZE
after restoring a dump is also a needed feature.
I tend to agree with it being more of a pg_dump issue- but that also
shows that your assessment above doesn't actually fit, because we
definitely change pg_dump every release. Consider that if someone wants
to add some new option to CREATE TABLE, which gets remembered in the
catalog, they have to make sure that pg_dump support is added for that.
If we added the statistics export/import to pg_dump, someone changing
those parts of the system would also be expected to update pg_dump to
manage those changes, including working with older versions of PG.
Thanks,
Stephen
On Mon, Aug 31, 2020 at 12:56:21PM -0400, Stephen Frost wrote:
Greetings,
* Bruce Momjian (bruce@momjian.us) wrote:
The point I was making was that it has value and people did realize it
but there's only so many resources to go around when it comes to hacking
on PG and therefore it simply hasn't been done yet.There's a big difference between "yes, we all agree that would be good
to have, but no one has had time to work on it" and "we don't think this
is worth having because of the maintenance work it'd require." The
latter shuts down anyone thinking of working on it, which is why I said
anything.
I actually don't know which statement above is correct, because of the
"forever" maintenance.
I tend to agree with it being more of a pg_dump issue- but that also
shows that your assessment above doesn't actually fit, because we
definitely change pg_dump every release. Consider that if someone wants
to add some new option to CREATE TABLE, which gets remembered in the
catalog, they have to make sure that pg_dump support is added for that.
If we added the statistics export/import to pg_dump, someone changing
those parts of the system would also be expected to update pg_dump to
manage those changes, including working with older versions of PG.
Yes, very true, but technically any change in any aspect of the
statistics system would require modification of the statistics dump,
which usually isn't required for most feature changes.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com
The usefulness of a cup is in its emptiness, Bruce Lee