Ideas about a better API for postgres_fdw remote estimates

Started by Tom Laneover 5 years ago33 messages
#1Tom Lane
tgl@sss.pgh.pa.us

In <1116564.1593813043@sss.pgh.pa.us> I wrote:

I wonder whether someday we ought to invent a new API that's more
suited to postgres_fdw's needs than EXPLAIN is. It's not like the
remote planner doesn't know the number we want; it just fails to
include it in EXPLAIN.

I've been thinking about this a little more, and I'd like to get some
ideas down on electrons before they vanish.

The current method for postgres_fdw to obtain remote estimates is to
issue an EXPLAIN command to the remote server and then decipher the
result. This has just one big advantage, which is that it works
against existing, even very old, remote PG versions. In every other
way it's pretty awful: it involves a lot of cycles on the far end
to create output details we don't really care about, it requires a
fair amount of logic to parse that output, and we can't get some
details that we *do* care about (such as the total size of the foreign
table, as per the other discussion).

We can do better. I don't propose removing the existing logic, because
being able to work against old remote PG versions seems pretty useful.
But we could probe at connection start for whether the remote server
has support for a better way, and then use that way if available.

What should the better way look like? I suggest the following:

* Rather than adding a core-server feature, the remote-end part of the new
API should be a function installed by an extension (either postgres_fdw
itself, or a new extension "postgres_fdw_remote" or the like). One
attraction of this approach is that it'd be conceivable to back-port the
new code into existing PG releases by updating the extension. Also
there'd be room for multiple versions of the support. The
connection-start probe could be of the form "does this function exist
in pg_proc?".

* I'm imagining the function being of the form

function pg_catalog.postgres_fdw_support(query text) returns something

where the input is still the text of a query we're considering issuing,
and the output is some structure that contains the items of EXPLAIN-like
data we need, but not the items we don't. The implementation of the
function would run the query through parse/plan, then pick out the
data we want and return that.

* We could do a lot worse than to have the "structure" be JSON.
This'd allow structured, labeled data to be returned; it would not be
too difficult to construct, even in PG server versions predating the
addition of JSON logic to the core; and the receiving postgres_fdw
extension could use the core's JSON logic to parse the data.

* The contents of the structure need to be designed with forethought
for extensibility, but this doesn't seem hard if it's all a collection
of labeled fields. We can just say that the recipient must ignore
fields it doesn't recognize. Once a given field has been defined, we
can't change its contents, but we can introduce new fields as needed.
Note that I would not be in favor of putting an overall version number
within the structure; that's way too coarse-grained.

I'm not planning to do anything about these ideas myself, at least
not in the short term. But perhaps somebody else would like to
run with them.

regards, tom lane

#2Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#1)
Re: Ideas about a better API for postgres_fdw remote estimates

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

In <1116564.1593813043@sss.pgh.pa.us> I wrote:

I wonder whether someday we ought to invent a new API that's more
suited to postgres_fdw's needs than EXPLAIN is. It's not like the
remote planner doesn't know the number we want; it just fails to
include it in EXPLAIN.

I've been thinking about this a little more, and I'd like to get some
ideas down on electrons before they vanish.

The current method for postgres_fdw to obtain remote estimates is to
issue an EXPLAIN command to the remote server and then decipher the
result. This has just one big advantage, which is that it works
against existing, even very old, remote PG versions. In every other
way it's pretty awful: it involves a lot of cycles on the far end
to create output details we don't really care about, it requires a
fair amount of logic to parse that output, and we can't get some
details that we *do* care about (such as the total size of the foreign
table, as per the other discussion).

We can do better. I don't propose removing the existing logic, because
being able to work against old remote PG versions seems pretty useful.
But we could probe at connection start for whether the remote server
has support for a better way, and then use that way if available.

I agree we can, and should, try to do better here.

What should the better way look like? I suggest the following:

* Rather than adding a core-server feature, the remote-end part of the new
API should be a function installed by an extension (either postgres_fdw
itself, or a new extension "postgres_fdw_remote" or the like). One
attraction of this approach is that it'd be conceivable to back-port the
new code into existing PG releases by updating the extension. Also
there'd be room for multiple versions of the support. The
connection-start probe could be of the form "does this function exist
in pg_proc?".

* I'm imagining the function being of the form

function pg_catalog.postgres_fdw_support(query text) returns something

where the input is still the text of a query we're considering issuing,
and the output is some structure that contains the items of EXPLAIN-like
data we need, but not the items we don't. The implementation of the
function would run the query through parse/plan, then pick out the
data we want and return that.

* We could do a lot worse than to have the "structure" be JSON.
This'd allow structured, labeled data to be returned; it would not be
too difficult to construct, even in PG server versions predating the
addition of JSON logic to the core; and the receiving postgres_fdw
extension could use the core's JSON logic to parse the data.

I also tend to agree with using JSON for this.

* The contents of the structure need to be designed with forethought
for extensibility, but this doesn't seem hard if it's all a collection
of labeled fields. We can just say that the recipient must ignore
fields it doesn't recognize. Once a given field has been defined, we
can't change its contents, but we can introduce new fields as needed.
Note that I would not be in favor of putting an overall version number
within the structure; that's way too coarse-grained.

This also makes sense to me.

I'm not planning to do anything about these ideas myself, at least
not in the short term. But perhaps somebody else would like to
run with them.

I'm trying to figure out why it makes more sense to use
'postgres_fdw_support(query text)', which would still do parse/plan and
return EXPLAIN-like data, rather than having:

EXPLAIN (FORMAT JSON, FDW true) query ...

(Or, perhaps better, individual boolean options for whatever stuff we
want to ask for, or to exclude if we don't want it, so that other tools
could use this...).

Thanks,

Stephen

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#2)
Re: Ideas about a better API for postgres_fdw remote estimates

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

* Rather than adding a core-server feature, the remote-end part of the new
API should be a function installed by an extension (either postgres_fdw
itself, or a new extension "postgres_fdw_remote" or the like).

I'm trying to figure out why it makes more sense to use
'postgres_fdw_support(query text)', which would still do parse/plan and
return EXPLAIN-like data, rather than having:

EXPLAIN (FORMAT JSON, FDW true) query ...

I see a couple of reasons not to do it like that:

1. This is specific to postgres_fdw. Some other extension might want some
other data, and different versions of postgres_fdw might want different
data. So putting it into core seems like the wrong thing.

2. Wedging this into EXPLAIN would be quite ugly, because (at least
as I envision it) the output would have just about nothing to do with
any existing EXPLAIN output.

3. We surely would not back-patch a core change like this. OTOH, if
the added infrastructure is in an extension, somebody might want to
back-patch that (even if unofficially). This argument falls to the
ground of course if we're forced to make any core changes to be able
to get at the data we need; but I'm not sure that will be needed.

regards, tom lane

#4Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#3)
Re: Ideas about a better API for postgres_fdw remote estimates

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

* Rather than adding a core-server feature, the remote-end part of the new
API should be a function installed by an extension (either postgres_fdw
itself, or a new extension "postgres_fdw_remote" or the like).

I'm trying to figure out why it makes more sense to use
'postgres_fdw_support(query text)', which would still do parse/plan and
return EXPLAIN-like data, rather than having:

EXPLAIN (FORMAT JSON, FDW true) query ...

I see a couple of reasons not to do it like that:

1. This is specific to postgres_fdw. Some other extension might want some
other data, and different versions of postgres_fdw might want different
data. So putting it into core seems like the wrong thing.

Another extension or use-case might want exactly the same information
too though. In a way, we'd be 'hiding' that information from other
potential users unless they want to install their own extension, which
is a pretty big leap. Are we sure this information wouldn't be at all
interesting to pgAdmin4 or explain.depesz.com?

2. Wedging this into EXPLAIN would be quite ugly, because (at least
as I envision it) the output would have just about nothing to do with
any existing EXPLAIN output.

This is a better argument for not making it part of EXPLAIN, though I
don't really feel like I've got a decent idea of what you are suggesting
the output *would* look like, so it's difficult for me to agree (or
disagree) about this particular point.

3. We surely would not back-patch a core change like this. OTOH, if
the added infrastructure is in an extension, somebody might want to
back-patch that (even if unofficially). This argument falls to the
ground of course if we're forced to make any core changes to be able
to get at the data we need; but I'm not sure that will be needed.

Since postgres_fdw is part of core and core's release cycle, and the
packagers manage the extensions from core in a way that they have to
match up, this argument doesn't hold any weight with me. For this to be
a viable argument, we would need to segregate extensions from core and
give them their own release cycle and clear indication of which
extension versions work with which PG major versions, etc. I'm actually
generally in support of *that* idea- and with that, would agree with
your point 3 above, but that's not the reality of today.

Thanks,

Stephen

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#4)
Re: Ideas about a better API for postgres_fdw remote estimates

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

2. Wedging this into EXPLAIN would be quite ugly, because (at least
as I envision it) the output would have just about nothing to do with
any existing EXPLAIN output.

This is a better argument for not making it part of EXPLAIN, though I
don't really feel like I've got a decent idea of what you are suggesting
the output *would* look like, so it's difficult for me to agree (or
disagree) about this particular point.

Per postgres_fdw's get_remote_estimate(), the only data we use right now
is the startup_cost, total_cost, rows and width estimates from the
top-level Plan node. That's available immediately from the Plan tree,
meaning that basically *nothing* of the substantial display effort
expended by explain.c and ruleutils.c is of any value. So the level-zero
implementation of this would be to run the parser and planner, format
those four numbers into a JSON object (which would require little more
infrastructure than sprintf), and return that. Sure, we could make that
into some kind of early-exit path in explain.c, but I think it'd be a
pretty substantial wart, especially since it'd mean that none of the
other EXPLAIN options are sensible in combination with this one.

Further down the road, we might want to rethink the whole idea of
completely constructing a concrete Plan. We could get the data we need
at the list-of-Paths stage. Even more interesting, we could (with very
little more work) return data about multiple Paths, so that the client
could find out, for example, the costs of sorted and unsorted output
without paying two network round trips to discover that. That'd
definitely require changes in the core planner, since it has no API to
stop at that point. And it's even less within the charter of EXPLAIN.

I grant your point that there might be other users for this besides
postgres_fdw, but that doesn't mean it must be a core feature.

3. We surely would not back-patch a core change like this. OTOH, if
the added infrastructure is in an extension, somebody might want to
back-patch that (even if unofficially).

Since postgres_fdw is part of core and core's release cycle, and the
packagers manage the extensions from core in a way that they have to
match up, this argument doesn't hold any weight with me.

Certainly only v14 (or whenever) and later postgres_fdw would be able
to *use* this data. The scenario I'm imagining is that somebody wants
to be able to use that client against an older remote server, and is
willing to install some simple extension on the remote server to do so.
Perhaps this scenario is not worth troubling over, but I don't think
it's entirely far-fetched.

regards, tom lane

#6Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#5)
Re: Ideas about a better API for postgres_fdw remote estimates

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

2. Wedging this into EXPLAIN would be quite ugly, because (at least
as I envision it) the output would have just about nothing to do with
any existing EXPLAIN output.

This is a better argument for not making it part of EXPLAIN, though I
don't really feel like I've got a decent idea of what you are suggesting
the output *would* look like, so it's difficult for me to agree (or
disagree) about this particular point.

Per postgres_fdw's get_remote_estimate(), the only data we use right now
is the startup_cost, total_cost, rows and width estimates from the
top-level Plan node. That's available immediately from the Plan tree,
meaning that basically *nothing* of the substantial display effort
expended by explain.c and ruleutils.c is of any value. So the level-zero

The 'display effort' you're referring to, when using JSON format with
explain, is basically to format the results into JSON and return them-
which is what you're suggesting this mode would do anyway, no..?

If the remote side 'table' is actually a view that's complicated then
having a way to get just the top-level information (and excluding the
rest) sounds like it'd be useful and perhaps excluding that other info
doesn't really fit into EXPLAIN's mandate, but that's also much less
common.

implementation of this would be to run the parser and planner, format
those four numbers into a JSON object (which would require little more
infrastructure than sprintf), and return that. Sure, we could make that
into some kind of early-exit path in explain.c, but I think it'd be a
pretty substantial wart, especially since it'd mean that none of the
other EXPLAIN options are sensible in combination with this one.

That EXPLAIN has options that only make sense in combination with
certain other options isn't anything new- BUFFERS makes no sense without
ANALYZE, etc.

Further down the road, we might want to rethink the whole idea of
completely constructing a concrete Plan. We could get the data we need
at the list-of-Paths stage. Even more interesting, we could (with very
little more work) return data about multiple Paths, so that the client
could find out, for example, the costs of sorted and unsorted output
without paying two network round trips to discover that. That'd
definitely require changes in the core planner, since it has no API to
stop at that point. And it's even less within the charter of EXPLAIN.

I have to admit that I'm not really sure how we could make it work, but
having a way to get multiple paths returned by EXPLAIN would certainly
be interesting to a lot of users. Certainly it's easier to see how we
could get at that info in a postgres_fdw-specific function, and be able
to understand how to deal with it there and what could be done, but once
it's there I wonder if other tools might see that and possibly even
build on it because it'd be the only way to get that kind of info, which
certainly wouldn't be ideal.

I grant your point that there might be other users for this besides
postgres_fdw, but that doesn't mean it must be a core feature.

That postgres_fdw is an extension is almost as much of a wart as
anything being discussed here and suggesting that things added to
postgres_fdw aren't 'core features' seems akin to ignoring the forest
for the trees- consider that, today, there isn't even an option to
install only the core server from the PGDG repos (at least for Debian /
Ubuntu, not sure if the RPMs have caught up to that yet, but they
probably should). The 'postgresql-12' .deb includes all the extensions
that are part of the core git repo, because they're released and
maintained just the same as the core server and, from a practical
perspective, to run a decent PG system you really should have them
installed, so why bother having a separate package?

3. We surely would not back-patch a core change like this. OTOH, if
the added infrastructure is in an extension, somebody might want to
back-patch that (even if unofficially).

Since postgres_fdw is part of core and core's release cycle, and the
packagers manage the extensions from core in a way that they have to
match up, this argument doesn't hold any weight with me.

Certainly only v14 (or whenever) and later postgres_fdw would be able
to *use* this data. The scenario I'm imagining is that somebody wants
to be able to use that client against an older remote server, and is
willing to install some simple extension on the remote server to do so.
Perhaps this scenario is not worth troubling over, but I don't think
it's entirely far-fetched.

I definitely don't think that such an extension should be maintained
outside of core, and I seriously doubt any of our packagers would be
anxious to build an indepedent package for this to be usable in older
servers. Sure, it's possible someone will care about this enough to
spend the effort to try and build it for an older version and use it but
I definitely don't think we should be considering that a serious design
goal or a reason to put this capability in a separate extension.

Thanks,

Stephen

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#6)
Re: Ideas about a better API for postgres_fdw remote estimates

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Per postgres_fdw's get_remote_estimate(), the only data we use right now
is the startup_cost, total_cost, rows and width estimates from the
top-level Plan node. That's available immediately from the Plan tree,
meaning that basically *nothing* of the substantial display effort
expended by explain.c and ruleutils.c is of any value. So the level-zero

The 'display effort' you're referring to, when using JSON format with
explain, is basically to format the results into JSON and return them-
which is what you're suggesting this mode would do anyway, no..?

Not hardly. Spend some time studying ruleutils.c sometime ---
reverse-compiling a plan is *expensive*. For instance, we have to
look up the names of all the operators used in the query quals,
decide what needs quoting, decide what needs parenthesization, etc.
There's also a fun little bit that assigns unique aliases to each
table appearing in the query, which from memory is at least O(N^2)
and maybe worse. (Admittedly, shipped queries are usually not so
complicated that N would be large.) And by the way, we're also
starting up the executor, even if you didn't say ANALYZE.

A little bit of fooling with "perf" suggests that when explaining
a pretty simple bitmapscan query --- I used
	EXPLAIN SELECT * FROM tenk1 WHERE unique1 > 9995
which ought to be somewhat representative of what postgres_fdw needs
--- only about half of the runtime is spent within pg_plan_query, and
the other half is spent on explain.c + ruleutils.c formatting work.
So while getting rid of that overhead wouldn't be an earthshattering
improvement, I think it'd be worthwhile.

Further down the road, we might want to rethink the whole idea of
completely constructing a concrete Plan. We could get the data we need
at the list-of-Paths stage. Even more interesting, we could (with very
little more work) return data about multiple Paths, so that the client
could find out, for example, the costs of sorted and unsorted output
without paying two network round trips to discover that.

I have to admit that I'm not really sure how we could make it work, but
having a way to get multiple paths returned by EXPLAIN would certainly
be interesting to a lot of users. Certainly it's easier to see how we
could get at that info in a postgres_fdw-specific function, and be able
to understand how to deal with it there and what could be done, but once
it's there I wonder if other tools might see that and possibly even
build on it because it'd be the only way to get that kind of info, which
certainly wouldn't be ideal.

Yeah, thinking about it as a function that inspects partial planner
results, it might be useful for other purposes besides postgres_fdw.
As I said before, I don't think this necessarily has to be bundled as
part of postgres_fdw. That still doesn't make it part of EXPLAIN.

That postgres_fdw is an extension is almost as much of a wart as
anything being discussed here and suggesting that things added to
postgres_fdw aren't 'core features' seems akin to ignoring the forest
for the trees-

I think we just had this discussion in another thread. The fact that
postgres_fdw is an extension is a feature, not a bug, because (a) it
means that somebody could implement their own version if they wanted
it to act differently; and (b) it keeps us honest about whether the
APIs needed by an FDW are accessible from outside core. I think moving
postgres_fdw into core would be a large step backwards.

regards, tom lane

#8Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#7)
Re: Ideas about a better API for postgres_fdw remote estimates

Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Per postgres_fdw's get_remote_estimate(), the only data we use right now
is the startup_cost, total_cost, rows and width estimates from the
top-level Plan node. That's available immediately from the Plan tree,
meaning that basically *nothing* of the substantial display effort
expended by explain.c and ruleutils.c is of any value. So the level-zero

The 'display effort' you're referring to, when using JSON format with
explain, is basically to format the results into JSON and return them-
which is what you're suggesting this mode would do anyway, no..?

Not hardly. Spend some time studying ruleutils.c sometime ---
reverse-compiling a plan is *expensive*. For instance, we have to
look up the names of all the operators used in the query quals,
decide what needs quoting, decide what needs parenthesization, etc.

Ah, alright, that makes more sense then.

There's also a fun little bit that assigns unique aliases to each
table appearing in the query, which from memory is at least O(N^2)
and maybe worse. (Admittedly, shipped queries are usually not so
complicated that N would be large.) And by the way, we're also
starting up the executor, even if you didn't say ANALYZE.

A little bit of fooling with "perf" suggests that when explaining
a pretty simple bitmapscan query --- I used
EXPLAIN SELECT * FROM tenk1 WHERE unique1 > 9995
which ought to be somewhat representative of what postgres_fdw needs
--- only about half of the runtime is spent within pg_plan_query, and
the other half is spent on explain.c + ruleutils.c formatting work.
So while getting rid of that overhead wouldn't be an earthshattering
improvement, I think it'd be worthwhile.

Sure.

Further down the road, we might want to rethink the whole idea of
completely constructing a concrete Plan. We could get the data we need
at the list-of-Paths stage. Even more interesting, we could (with very
little more work) return data about multiple Paths, so that the client
could find out, for example, the costs of sorted and unsorted output
without paying two network round trips to discover that.

I have to admit that I'm not really sure how we could make it work, but
having a way to get multiple paths returned by EXPLAIN would certainly
be interesting to a lot of users. Certainly it's easier to see how we
could get at that info in a postgres_fdw-specific function, and be able
to understand how to deal with it there and what could be done, but once
it's there I wonder if other tools might see that and possibly even
build on it because it'd be the only way to get that kind of info, which
certainly wouldn't be ideal.

Yeah, thinking about it as a function that inspects partial planner
results, it might be useful for other purposes besides postgres_fdw.
As I said before, I don't think this necessarily has to be bundled as
part of postgres_fdw. That still doesn't make it part of EXPLAIN.

Providing it as a function rather than through EXPLAIN does make a bit
more sense if we're going to skip things like the lookups you mention
above. I'm still inclined to have it be a part of core rather than
having it as postgres_fdw though. I'm not completely against it being
part of postgres_fdw... but I would think that would really be
appropriate if it's actually using something in postgres_fdw, but if
everything that it's doing is part of core and nothing related
specifically to the postgres FDW, then having it as part of core makes
more sense to me. Also, having it as part of core would make it more
appropriate for other tools to look at and adding that kind of
inspection capability for partial planner results could be very
interesting for tools like pgAdmin and such.

That postgres_fdw is an extension is almost as much of a wart as
anything being discussed here and suggesting that things added to
postgres_fdw aren't 'core features' seems akin to ignoring the forest
for the trees-

I think we just had this discussion in another thread. The fact that
postgres_fdw is an extension is a feature, not a bug, because (a) it
means that somebody could implement their own version if they wanted
it to act differently; and (b) it keeps us honest about whether the
APIs needed by an FDW are accessible from outside core. I think moving
postgres_fdw into core would be a large step backwards.

I'm not looking to change it today, as that ship has sailed, but while
having FDWs as a general capability that can be implemented by
extensions is certainly great and I'd love to see more of that (even
better would be more of those that are well maintained and cared for by
this community of folks), requiring users to install an extension into
every database where they want to query another PG server from isn't a
feature.

Thanks,

Stephen

#9Bruce Momjian
bruce@momjian.us
In reply to: Stephen Frost (#8)
Re: Ideas about a better API for postgres_fdw remote estimates

On Mon, Jul 6, 2020 at 11:28:28AM -0400, Stephen Frost wrote:

Yeah, thinking about it as a function that inspects partial planner
results, it might be useful for other purposes besides postgres_fdw.
As I said before, I don't think this necessarily has to be bundled as
part of postgres_fdw. That still doesn't make it part of EXPLAIN.

Providing it as a function rather than through EXPLAIN does make a bit
more sense if we're going to skip things like the lookups you mention
above. I'm still inclined to have it be a part of core rather than
having it as postgres_fdw though. I'm not completely against it being
part of postgres_fdw... but I would think that would really be
appropriate if it's actually using something in postgres_fdw, but if
everything that it's doing is part of core and nothing related
specifically to the postgres FDW, then having it as part of core makes
more sense to me. Also, having it as part of core would make it more
appropriate for other tools to look at and adding that kind of
inspection capability for partial planner results could be very
interesting for tools like pgAdmin and such.

I agree the statistics extraction should probably be part of core.
There is the goal if FDWs returning data, and returning the data
quickly. I think we can require all-new FDW servers to get improved
performance. I am not even clear if we have a full understanding of the
performance characteristics of FDWs yet. I know Tomas did some research
on its DML behavior, but other than that, I haven't seen much.

On a related note, I have wished to be able to see all the costs
associated with plans not chosen, and I think others would like that as
well. Getting multiple costs for a query goes in that direction.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#10Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Bruce Momjian (#9)
2 attachment(s)
Re: Ideas about a better API for postgres_fdw remote estimates

On 7/14/20 6:32 AM, Bruce Momjian wrote:

On Mon, Jul 6, 2020 at 11:28:28AM -0400, Stephen Frost wrote:

Yeah, thinking about it as a function that inspects partial planner
results, it might be useful for other purposes besides postgres_fdw.
As I said before, I don't think this necessarily has to be bundled as
part of postgres_fdw. That still doesn't make it part of EXPLAIN.

Providing it as a function rather than through EXPLAIN does make a bit
more sense if we're going to skip things like the lookups you mention
above. I'm still inclined to have it be a part of core rather than
having it as postgres_fdw though. I'm not completely against it being
part of postgres_fdw... but I would think that would really be
appropriate if it's actually using something in postgres_fdw, but if
everything that it's doing is part of core and nothing related
specifically to the postgres FDW, then having it as part of core makes
more sense to me. Also, having it as part of core would make it more
appropriate for other tools to look at and adding that kind of
inspection capability for partial planner results could be very
interesting for tools like pgAdmin and such.

I agree the statistics extraction should probably be part of core.
There is the goal if FDWs returning data, and returning the data
quickly. I think we can require all-new FDW servers to get improved
performance. I am not even clear if we have a full understanding of the
performance characteristics of FDWs yet. I know Tomas did some research
on its DML behavior, but other than that, I haven't seen much.

On a related note, I have wished to be able to see all the costs
associated with plans not chosen, and I think others would like that as
well. Getting multiple costs for a query goes in that direction.

During the implementation of sharding related improvements i noticed
that if we use a lot of foreign partitions, we have bad plans because of
vacuum don't update statistics of foreign tables.This is done by the
ANALYZE command, but it is very expensive operation for foreign table.
Problem with statistics demonstrates with TAP-test from the first patch
in attachment.

I implemented some FDW + pg core machinery to reduce weight of the
problem. The ANALYZE command on foreign table executes query on foreign
server that extracts statistics tuple, serializes it into json-formatted
string and returns to the caller. The caller deserializes this string,
generates statistics for this foreign table and update it. The second
patch is a proof-of-concept.

This patch speedup analyze command and provides statistics relevance on
a foreign table after autovacuum operation. Its effectiveness depends on
relevance of statistics on the remote server, but still.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

0001-TAP-Test-on-bad-statistics.patchtext/x-patch; charset=UTF-8; name=0001-TAP-Test-on-bad-statistics.patchDownload
From 329954981959ee3fc97e52266c89a436d02ddf5e Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Tue, 4 Aug 2020 09:29:37 +0500
Subject: [PATCH 1/2] TAP-Test on bad statistics.

Add dummy postgres_fdw_stat() routine. It will return stat tuples
for each input attribute.
---
 contrib/postgres_fdw/Makefile                  |  4 ++--
 contrib/postgres_fdw/expected/foreign_stat.out | 18 ++++++++++++++++++
 .../postgres_fdw/postgres_fdw--1.0--1.1.sql    | 11 +++++++++++
 contrib/postgres_fdw/postgres_fdw.c            | 17 +++++++++++++++++
 contrib/postgres_fdw/postgres_fdw.control      |  2 +-
 contrib/postgres_fdw/sql/foreign_stat.sql      | 10 ++++++++++
 6 files changed, 59 insertions(+), 3 deletions(-)
 create mode 100644 contrib/postgres_fdw/expected/foreign_stat.out
 create mode 100644 contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql
 create mode 100644 contrib/postgres_fdw/sql/foreign_stat.sql

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..15a6f6c353 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -14,9 +14,9 @@ PG_CPPFLAGS = -I$(libpq_srcdir)
 SHLIB_LINK_INTERNAL = $(libpq)
 
 EXTENSION = postgres_fdw
-DATA = postgres_fdw--1.0.sql
+DATA = postgres_fdw--1.0.sql postgres_fdw--1.0--1.1.sql
 
-REGRESS = postgres_fdw
+REGRESS = postgres_fdw foreign_stat
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/postgres_fdw/expected/foreign_stat.out b/contrib/postgres_fdw/expected/foreign_stat.out
new file mode 100644
index 0000000000..28a470bccc
--- /dev/null
+++ b/contrib/postgres_fdw/expected/foreign_stat.out
@@ -0,0 +1,18 @@
+CREATE TABLE ltable (a int, b real);
+CREATE FOREIGN TABLE ftable (a int) server loopback options (table_name 'ltable');
+INSERT INTO ltable (a, b) (SELECT *, 1.01 FROM generate_series(1, 1E4));
+VACUUM;
+-- Check statistic interface routine
+SELECT * FROM postgres_fdw_stat('public', 'test', 'a');
+ postgres_fdw_stat 
+-------------------
+ 
+(1 row)
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE)
+SELECT * FROM ftable;
+                                         QUERY PLAN                                          
+---------------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..445.00 rows=10000 width=4) (actual rows=10000 loops=1)
+(1 row)
+
diff --git a/contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql b/contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql
new file mode 100644
index 0000000000..40d36c95df
--- /dev/null
+++ b/contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql
@@ -0,0 +1,11 @@
+/* contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION postgres_fdw" to load this file. \quit
+
+-- Returns serialized stat tuples for the relation attributes
+CREATE FUNCTION postgres_fdw_stat(TEXT, TEXT, VARIADIC text[])
+RETURNS json
+AS 'MODULE_PATHNAME'
+LANGUAGE C STRICT;
+
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..e4c7bcec49 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -305,6 +305,7 @@ typedef struct
  * SQL functions
  */
 PG_FUNCTION_INFO_V1(postgres_fdw_handler);
+PG_FUNCTION_INFO_V1(postgres_fdw_stat);
 
 /*
  * FDW callback routines
@@ -561,6 +562,22 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(routine);
 }
 
+/*
+ * Get stat tuples for the attributes of relation
+ * See row_to_json()
+ */
+Datum
+postgres_fdw_stat(PG_FUNCTION_ARGS)
+{
+	const char *nspname = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	const char *relname = text_to_cstring(PG_GETARG_TEXT_PP(1));
+	StringInfo	result;
+
+	result = makeStringInfo();
+
+	PG_RETURN_TEXT_P(cstring_to_text_with_len(result->data, result->len));
+}
+
 /*
  * postgresGetForeignRelSize
  *		Estimate # of rows and width of the result of the scan
diff --git a/contrib/postgres_fdw/postgres_fdw.control b/contrib/postgres_fdw/postgres_fdw.control
index f9ed490752..d489382064 100644
--- a/contrib/postgres_fdw/postgres_fdw.control
+++ b/contrib/postgres_fdw/postgres_fdw.control
@@ -1,5 +1,5 @@
 # postgres_fdw extension
 comment = 'foreign-data wrapper for remote PostgreSQL servers'
-default_version = '1.0'
+default_version = '1.1'
 module_pathname = '$libdir/postgres_fdw'
 relocatable = true
diff --git a/contrib/postgres_fdw/sql/foreign_stat.sql b/contrib/postgres_fdw/sql/foreign_stat.sql
new file mode 100644
index 0000000000..18f649b477
--- /dev/null
+++ b/contrib/postgres_fdw/sql/foreign_stat.sql
@@ -0,0 +1,10 @@
+CREATE TABLE ltable (a int, b real);
+CREATE FOREIGN TABLE ftable (a int) server loopback options (table_name 'ltable');
+INSERT INTO ltable (a, b) (SELECT *, 1.01 FROM generate_series(1, 1E4));
+VACUUM;
+
+-- Check statistic interface routine
+SELECT * FROM postgres_fdw_stat('public', 'test', 'a');
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE)
+SELECT * FROM ftable;
-- 
2.17.1

0002-Pull-statistic-for-a-foreign-table-from-remote-serve.patchtext/x-patch; charset=UTF-8; name=0002-Pull-statistic-for-a-foreign-table-from-remote-serve.patchDownload
From 637b83b5deaa2a336d54601dff9ed3fc09f45061 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Mon, 24 Aug 2020 13:45:50 +0500
Subject: [PATCH 2/2] Pull statistic for a foreign table from remote server.
 Add the postgres_fdw_stat() routine that convert statistics on the relation
 into json format. New fdw API routine GetForeignRelStat() implements access
 to this machinery and returns JSON string to the caller. This function is
 called by ANALYZE command as an attempt to reduce the cost of updating
 statistics. Add foreign relation analyze into the VACUUM ANALYZE and
 autovacuum.

---
 contrib/postgres_fdw/deparse.c                |   8 +
 .../postgres_fdw/expected/foreign_stat.out    | 115 +++-
 .../postgres_fdw/postgres_fdw--1.0--1.1.sql   |   2 +-
 contrib/postgres_fdw/postgres_fdw.c           |  53 +-
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/foreign_stat.sql     |  41 +-
 src/backend/commands/analyze.c                | 580 ++++++++++++++++++
 src/backend/commands/vacuum.c                 |  13 +-
 src/backend/utils/adt/json.c                  |   6 +
 src/include/commands/vacuum.h                 |   1 +
 src/include/foreign/fdwapi.h                  |   2 +
 src/include/utils/json.h                      |   1 +
 12 files changed, 798 insertions(+), 25 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..9514fe996a 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2053,6 +2053,14 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 	appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ);
 }
 
+void
+deparseGetStatSql(StringInfo buf, Relation rel)
+{
+	appendStringInfo(buf, "SELECT * FROM public.postgres_fdw_stat('");
+	deparseRelation(buf, rel);
+	appendStringInfoString(buf, "');");
+}
+
 /*
  * Construct SELECT statement to acquire sample rows of given relation.
  *
diff --git a/contrib/postgres_fdw/expected/foreign_stat.out b/contrib/postgres_fdw/expected/foreign_stat.out
index 28a470bccc..a5d5ec8564 100644
--- a/contrib/postgres_fdw/expected/foreign_stat.out
+++ b/contrib/postgres_fdw/expected/foreign_stat.out
@@ -1,18 +1,107 @@
 CREATE TABLE ltable (a int, b real);
 CREATE FOREIGN TABLE ftable (a int) server loopback options (table_name 'ltable');
+VACUUM ANALYZE;
+-- Check statistic interface routine on an empty table.
+SELECT * FROM postgres_fdw_stat('ltable');
+                                                     postgres_fdw_stat                                                      
+----------------------------------------------------------------------------------------------------------------------------
+ {"namespace" : "public", "relname" : "ltable", "sta_num_slots" : 5, "relpages" : 0, "reltuples" : 0.000000, "attrs" : [ ]}
+(1 row)
+
+SELECT * FROM postgres_fdw_stat('ftable');
+ERROR:  Can be used for ordinary relation only. Reltype: f.
+-- Check statistic interface routine on non-empty tables.
 INSERT INTO ltable (a, b) (SELECT *, 1.01 FROM generate_series(1, 1E4));
-VACUUM;
--- Check statistic interface routine
-SELECT * FROM postgres_fdw_stat('public', 'test', 'a');
- postgres_fdw_stat 
--------------------
- 
-(1 row)
-
-EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE)
-SELECT * FROM ftable;
-                                         QUERY PLAN                                          
----------------------------------------------------------------------------------------------
- Foreign Scan on ftable  (cost=100.00..445.00 rows=10000 width=4) (actual rows=10000 loops=1)
+ANALYZE ltable, ftable;
+SELECT * FROM postgres_fdw_stat('ltable');
+                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  postgres_fdw_stat                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ {"namespace" : "public", "relname" : "ltable", "sta_num_slots" : 5, "relpages" : 45, "reltuples" : 10000.000000, "attrs" : [{"attname" : "a", "inh" : "false", "nullfrac" : 0.000000, "width" : 4, "distinct" : -1.000000, "stakind" : [2,3,0,0,0], "staop" : [97,97,0,0,0], "stacoll" : [0,0,0,0,0], "nn" : 101, "values1" : [1,100,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000,2100,2200,2300,2400,2500,2600,2700,2800,2900,3000,3100,3200,3300,3400,3500,3600,3700,3800,3900,4000,4100,4200,4300,4400,4500,4600,4700,4800,4900,5000,5100,5200,5300,5400,5500,5600,5700,5800,5900,6000,6100,6200,6300,6400,6500,6600,6700,6800,6900,7000,7100,7200,7300,7400,7500,7600,7700,7800,7900,8000,8100,8200,8300,8400,8500,8600,8700,8800,8900,9000,9100,9200,9300,9400,9500,9600,9700,9800,9900,10000], "values2" : [ ], "values3" : [ ], "values4" : [ ], "values5" : [ ], "numbers1" : [ ], "nn" : 1, "numbers2" : [1], "numbers3" : [ ], "numbers4" : [ ], "numbers5" : [ ]}, {"attname" : "b", "inh" : "false", "nullfrac" : 0.000000, "width" : 4, "distinct" : 1.000000, "stakind" : [1,3,0,0,0], "staop" : [620,622,0,0,0], "stacoll" : [0,0,0,0,0], "nn" : 1, "values1" : [1.01], "values2" : [ ], "values3" : [ ], "values4" : [ ], "values5" : [ ], "nn" : 1, "numbers1" : [1], "nn" : 1, "numbers2" : [1], "numbers3" : [ ], "numbers4" : [ ], "numbers5" : [ ]}]}
+(1 row)
+
+-- Check ANALYZE on foreign table
+INSERT INTO ltable (a, b) (SELECT *, 2.01 FROM generate_series(1E4, 2E4));
+ANALYZE ltable;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable;
+                                       QUERY PLAN                                       
+----------------------------------------------------------------------------------------
+ Seq Scan on ltable  (cost=0.00..289.01 rows=20001 width=8) (actual rows=20001 loops=1)
+(1 row)
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable;
+                                          QUERY PLAN                                          
+----------------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..445.00 rows=10000 width=4) (actual rows=20001 loops=1)
+(1 row)
+
+ANALYZE ftable;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable;
+                                          QUERY PLAN                                          
+----------------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..789.03 rows=20001 width=4) (actual rows=20001 loops=1)
+(1 row)
+
+-- Check selectivity
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 10;
+                                       QUERY PLAN                                       
+----------------------------------------------------------------------------------------
+ Seq Scan on ltable  (cost=0.00..339.01 rows=19991 width=8) (actual rows=19991 loops=1)
+   Filter: (a > 10)
+   Rows Removed by Filter: 10
+(3 rows)
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 10;
+                                          QUERY PLAN                                          
+----------------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..838.83 rows=19991 width=4) (actual rows=19991 loops=1)
+(1 row)
+
+-- Check new attribute
+ALTER TABLE ltable ADD COLUMN c int DEFAULT 42;
+ALTER TABLE ftable ADD COLUMN c int;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 10 AND c < 15;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Seq Scan on ltable  (cost=0.00..389.01 rows=6664 width=12) (actual rows=0 loops=1)
+   Filter: ((a > 10) AND (c < 15))
+   Rows Removed by Filter: 20001
+(3 rows)
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 10 AND c < 15;
+                                       QUERY PLAN                                        
+-----------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..622.29 rows=6664 width=8) (actual rows=0 loops=1)
+(1 row)
+
+ANALYZE ltable, ftable;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 10 AND c < 15;
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
+ Seq Scan on ltable  (cost=0.00..389.01 rows=1 width=12) (actual rows=0 loops=1)
+   Filter: ((a > 10) AND (c < 15))
+   Rows Removed by Filter: 20001
+(3 rows)
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 10 AND c < 15;
+                                      QUERY PLAN                                      
+--------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..489.03 rows=1 width=8) (actual rows=0 loops=1)
+(1 row)
+
+-- Test default vacuum analyzes foreign relation
+INSERT INTO ltable (a, b) (SELECT *, 2.01 FROM generate_series(2E4, 3E4));
+VACUUM ANALYZE ltable, ftable; -- Don't depend on default order of analysis here.
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 100 AND c < 43;
+                                       QUERY PLAN                                        
+-----------------------------------------------------------------------------------------
+ Seq Scan on ltable  (cost=0.00..593.03 rows=29902 width=12) (actual rows=29902 loops=1)
+   Filter: ((a > 100) AND (c < 43))
+   Rows Removed by Filter: 100
+(3 rows)
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 100 AND c < 43;
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..1291.07 rows=29902 width=8) (actual rows=29902 loops=1)
 (1 row)
 
diff --git a/contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql b/contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql
index 40d36c95df..ff0dd6f83e 100644
--- a/contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql
+++ b/contrib/postgres_fdw/postgres_fdw--1.0--1.1.sql
@@ -4,7 +4,7 @@
 \echo Use "CREATE EXTENSION postgres_fdw" to load this file. \quit
 
 -- Returns serialized stat tuples for the relation attributes
-CREATE FUNCTION postgres_fdw_stat(TEXT, TEXT, VARIADIC text[])
+CREATE FUNCTION postgres_fdw_stat(TEXT)
 RETURNS json
 AS 'MODULE_PATHNAME'
 LANGUAGE C STRICT;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index e4c7bcec49..f81be84c02 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -392,6 +392,7 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 										 RelOptInfo *input_rel,
 										 RelOptInfo *output_rel,
 										 void *extra);
+static char *postgresGetForeignRelStat(Relation rel);
 
 /*
  * Helper functions
@@ -559,6 +560,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	routine->GetForeignRelStat = postgresGetForeignRelStat;
 	PG_RETURN_POINTER(routine);
 }
 
@@ -569,13 +571,12 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 Datum
 postgres_fdw_stat(PG_FUNCTION_ARGS)
 {
-	const char *nspname = text_to_cstring(PG_GETARG_TEXT_PP(0));
-	const char *relname = text_to_cstring(PG_GETARG_TEXT_PP(1));
-	StringInfo	result;
+	const char *relname = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	const char *json_stat;
 
-	result = makeStringInfo();
+	json_stat = relstat_to_json(relname);
 
-	PG_RETURN_TEXT_P(cstring_to_text_with_len(result->data, result->len));
+	PG_RETURN_TEXT_P(cstring_to_text(json_stat));
 }
 
 /*
@@ -6601,3 +6602,45 @@ find_em_expr_for_input_target(PlannerInfo *root,
 	elog(ERROR, "could not find pathkey item to sort");
 	return NULL;				/* keep compiler quiet */
 }
+
+static char *
+postgresGetForeignRelStat(Relation rel)
+{
+	ForeignTable *table;
+	UserMapping *user;
+	PGconn	   *conn;
+	unsigned int cursor_number;
+	StringInfoData sql;
+	PGresult   *volatile res = NULL;
+	char fetch_sql[64];
+	char *json;
+
+	table = GetForeignTable(RelationGetRelid(rel));
+	user = GetUserMapping(rel->rd_rel->relowner, table->serverid);
+	conn = GetConnection(user, false);
+
+	cursor_number = GetCursorNumber(conn);
+	initStringInfo(&sql);
+	appendStringInfo(&sql, "DECLARE c%u CURSOR FOR ", cursor_number);
+	deparseGetStatSql(&sql, rel);
+
+	res = pgfdw_exec_query(conn, sql.data);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		pgfdw_report_error(ERROR, res, conn, false, sql.data);
+	PQclear(res);
+
+	snprintf(fetch_sql, sizeof(fetch_sql), "FETCH FROM c%u", cursor_number);
+	res = pgfdw_exec_query(conn, fetch_sql);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+		pgfdw_report_error(ERROR, res, conn, false, sql.data);
+
+	Assert(PQntuples(res) == 1);
+	Assert(PQnfields(res) == 1);
+
+	json = pstrdup(PQgetvalue(res, 0, 0));
+	PQclear(res);
+	close_cursor(conn, cursor_number);
+	ReleaseConnection(conn);
+
+	return json;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..bb590bfc21 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -190,6 +190,7 @@ extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 							  List **retrieved_attrs);
+extern void deparseGetStatSql(StringInfo buf, Relation rel);
 extern void deparseStringLiteral(StringInfo buf, const char *val);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
 extern Expr *find_em_expr_for_input_target(PlannerInfo *root,
diff --git a/contrib/postgres_fdw/sql/foreign_stat.sql b/contrib/postgres_fdw/sql/foreign_stat.sql
index 18f649b477..0829ebc522 100644
--- a/contrib/postgres_fdw/sql/foreign_stat.sql
+++ b/contrib/postgres_fdw/sql/foreign_stat.sql
@@ -1,10 +1,41 @@
 CREATE TABLE ltable (a int, b real);
 CREATE FOREIGN TABLE ftable (a int) server loopback options (table_name 'ltable');
+VACUUM ANALYZE;
+
+-- Check statistic interface routine on an empty table.
+SELECT * FROM postgres_fdw_stat('ltable');
+SELECT * FROM postgres_fdw_stat('ftable');
+
+-- Check statistic interface routine on non-empty tables.
 INSERT INTO ltable (a, b) (SELECT *, 1.01 FROM generate_series(1, 1E4));
-VACUUM;
+ANALYZE ltable, ftable;
+SELECT * FROM postgres_fdw_stat('ltable');
+
+-- Check ANALYZE on foreign table
+INSERT INTO ltable (a, b) (SELECT *, 2.01 FROM generate_series(1E4, 2E4));
+ANALYZE ltable;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable;
+ANALYZE ftable;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable;
+
+-- Check selectivity
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 10;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 10;
+
+-- Check new attribute
+ALTER TABLE ltable ADD COLUMN c int DEFAULT 42;
+ALTER TABLE ftable ADD COLUMN c int;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 10 AND c < 15;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 10 AND c < 15;
+
+ANALYZE ltable, ftable;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 10 AND c < 15;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 10 AND c < 15;
 
--- Check statistic interface routine
-SELECT * FROM postgres_fdw_stat('public', 'test', 'a');
+-- Test default vacuum analyzes foreign relation
+INSERT INTO ltable (a, b) (SELECT *, 2.01 FROM generate_series(2E4, 3E4));
+VACUUM ANALYZE ltable, ftable; -- Don't depend on default order of analysis here.
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 100 AND c < 43;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 100 AND c < 43;
 
-EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE)
-SELECT * FROM ftable;
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8af12b5c6b..ca2642360e 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -30,6 +30,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_namespace.h"
@@ -38,8 +39,10 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "common/jsonapi.h"
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parse_oper.h"
@@ -58,9 +61,11 @@
 #include "utils/datum.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/pg_rusage.h"
+#include "utils/regproc.h"
 #include "utils/sampling.h"
 #include "utils/sortsupport.h"
 #include "utils/syscache.h"
@@ -107,6 +112,10 @@ static void update_attstats(Oid relid, bool inh,
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 
+static void parse_statjson(Relation rel, char *buffer, size_t size,
+						   VacAttrStats **vacattrstats, int *natts,
+						   BlockNumber *relpages, double *reltuples);
+static bool update_foreign_relation_stat(Relation rel);
 
 /*
  *	analyze_rel() -- analyze one relation
@@ -193,6 +202,17 @@ analyze_rel(Oid relid, RangeVar *relation,
 		return;
 	}
 
+	if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE &&
+		update_foreign_relation_stat(onerel))
+	{
+		/*
+		 * The foreign relation statistics was updated by existed stat tuple
+		 * from remote server.
+		 */
+		relation_close(onerel, ShareUpdateExclusiveLock);
+		return;
+	}
+
 	/*
 	 * Check that it's of an analyzable relkind, and set up appropriately.
 	 */
@@ -1421,6 +1441,566 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 	return numrows;
 }
 
+typedef enum
+{
+	JS_EXPECT_NSPNAME_VALUE,
+	JS_EXPECT_RELNAME_VALUE,
+	JS_EXPECT_SLOTSNUM_VALUE,
+	JS_EXPECT_RELPAGES_VALUE,
+	JS_EXPECT_RELTUPLES_VALUE,
+	JS_EXPECT_ATTNAME_FIELD,
+	JS_EXPECT_ATTNAME_VALUE,
+	JS_EXPECT_INH_VALUE,
+	JS_EXPECT_NULLFRAC_VALUE,
+	JS_EXPECT_WIDTH_VALUE,
+	JS_EXPECT_DISTINCT_VALUE,
+	JS_EXPECT_KIND_VALUE,
+	JS_EXPECT_STAOP_VALUE,
+	JS_EXPECT_STACOLL_VALUE,
+	JS_EXPECT_ARRSIZE_VALUE,
+	JS_EXPECT_NUMBERS_VALUE,
+	JS_EXPECT_VALUES_VALUE,
+	JS_EXPECT_PARAM_FIELD
+} JsonStatSemanticState;
+
+typedef struct JsonStatParseState
+{
+	Relation rel;
+	int natts;
+	int state;
+	int arraysize; /* size of the array */
+	int arrayval; /* current position in the array */
+	int numbersN;
+	int valuesN;
+
+	char *nspname;
+	char *relname;
+
+	BlockNumber relpages;
+	double reltuples;
+	VacAttrStats **vas;
+} JsonStatParseState;
+
+static bool skip_attr = false;
+
+static void
+json_stat_object_field_start(void *state, char *fname, bool isnull)
+{
+	JsonStatParseState *parse = (JsonStatParseState *) state;
+	int attnum = parse->natts - 1;
+
+	if (skip_attr && strcmp(fname, "attname") != 0)
+		return;
+
+	Assert(parse->state == JS_EXPECT_PARAM_FIELD ||
+		   parse->state == JS_EXPECT_ATTNAME_FIELD);
+	Assert(parse->arrayval < 0);
+
+	if (strcmp(fname, "namespace") == 0)
+		parse->state = JS_EXPECT_NSPNAME_VALUE;
+	else if (strcmp(fname, "relname") == 0)
+		parse->state = JS_EXPECT_RELNAME_VALUE;
+	else if (strcmp(fname, "sta_num_slots") == 0)
+		parse->state = JS_EXPECT_SLOTSNUM_VALUE;
+	else if (strcmp(fname, "relpages") == 0)
+		parse->state = JS_EXPECT_RELPAGES_VALUE;
+	else if (strcmp(fname, "reltuples") == 0)
+		parse->state = JS_EXPECT_RELTUPLES_VALUE;
+	else if(strcmp(fname, "attrs") == 0)
+		parse->state = JS_EXPECT_ATTNAME_FIELD;
+	else if (strcmp(fname, "attname") == 0)
+	{
+		parse->state = JS_EXPECT_ATTNAME_VALUE;
+		skip_attr = false;
+	}
+	else if (strcmp(fname, "nullfrac") == 0)
+		parse->state = JS_EXPECT_NULLFRAC_VALUE;
+	else if (strcmp(fname, "inh") == 0)
+		parse->state = JS_EXPECT_INH_VALUE;
+		else if (strcmp(fname, "width") == 0)
+		parse->state = JS_EXPECT_WIDTH_VALUE;
+	else if (strcmp(fname, "distinct") == 0)
+		parse->state = JS_EXPECT_DISTINCT_VALUE;
+	else if (strcmp(fname, "nn") == 0)
+	{
+		Assert(parse->arraysize == 0);
+		parse->state = JS_EXPECT_ARRSIZE_VALUE;
+	}
+	else
+	{
+		parse->arrayval = 0;
+		if (strcmp(fname, "stakind") == 0)
+			parse->state = JS_EXPECT_KIND_VALUE;
+		else if (strcmp(fname, "staop") == 0)
+			parse->state = JS_EXPECT_STAOP_VALUE;
+		else if (strcmp(fname, "stacoll") == 0)
+			parse->state = JS_EXPECT_STACOLL_VALUE;
+		else if (strstr(fname, "numbers") != NULL)
+		{
+			parse->numbersN = atoi(&fname[7]);
+			Assert(parse->numbersN > 0 && parse->numbersN <= 5);
+			if (parse->arraysize > 0)
+				parse->vas[attnum]->stanumbers[parse->numbersN-1] =
+							(float4 *) palloc(parse->arraysize * sizeof(float4));
+			else
+				parse->vas[attnum]->stanumbers[parse->numbersN-1] = NULL;
+
+			parse->vas[attnum]->numnumbers[parse->numbersN-1] = parse->arraysize;
+			parse->state = JS_EXPECT_NUMBERS_VALUE;
+		}
+		else if (strstr(fname, "values") != NULL)
+		{
+			parse->valuesN = atoi(&fname[6]);
+			Assert(parse->valuesN > 0 && parse->valuesN <= 5);
+
+			if (parse->arraysize > 0)
+				parse->vas[parse->natts - 1]->stavalues[parse->valuesN-1] =
+							(Datum *) palloc(parse->arraysize * sizeof(Datum));
+			else
+				parse->vas[parse->natts - 1]->stavalues[parse->valuesN-1] = NULL;
+
+			parse->vas[attnum]->numvalues[parse->valuesN-1] = parse->arraysize;
+			parse->state = JS_EXPECT_VALUES_VALUE;
+		}
+		else
+			elog(ERROR, "Unknown stat parameter in JSON string: %s", fname);
+	}
+}
+
+static void
+json_stat_array_end(void *state)
+{
+	JsonStatParseState *parse = (JsonStatParseState *) state;
+	parse->arrayval = -1;
+	parse->arraysize = 0;
+	parse->numbersN = 0;
+	parse->valuesN = 0;
+}
+
+static void
+json_stat_array_element_end(void *state, bool isnull)
+{
+	JsonStatParseState *parse = (JsonStatParseState *) state;
+
+	Assert(!isnull);
+
+	if (parse->arrayval < 0) // Debug
+		return;
+
+	parse->arrayval++;
+}
+
+static void
+json_stat_field_end(void *state, char *fname, bool isnull)
+{
+	JsonStatParseState *parse = (JsonStatParseState *) state;
+
+	Assert(!isnull && parse->arrayval < 0);
+	parse->state = JS_EXPECT_PARAM_FIELD;
+}
+
+static void
+json_stat_scalar(void *state, char *token, JsonTokenType tokentype)
+{
+	JsonStatParseState *parse = (JsonStatParseState *) state;
+	Datum value;
+	int attnum = parse->natts - 1;
+
+	if (skip_attr)
+		return;
+
+	switch (parse->state)
+	{
+	case JS_EXPECT_NSPNAME_VALUE:
+		Assert(tokentype == JSON_TOKEN_STRING);
+		parse->nspname = token;
+		break;
+
+	case JS_EXPECT_RELNAME_VALUE:
+		Assert(tokentype == JSON_TOKEN_STRING);
+		parse->relname = token;
+		break;
+
+	case JS_EXPECT_SLOTSNUM_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		if (strcmp(token, "5") != 0)
+			elog(ERROR, "Incompatible PostgreSQL version");
+		break;
+
+	case JS_EXPECT_RELPAGES_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		value = DirectFunctionCall1(int4in, CStringGetDatum(token));
+		parse->relpages = DatumGetInt32(value);
+		break;
+
+	case JS_EXPECT_RELTUPLES_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		value = DirectFunctionCall1(float4in, CStringGetDatum(token));
+		parse->reltuples = DatumGetFloat4(value);
+		break;
+
+	case JS_EXPECT_ATTNAME_FIELD:
+		Assert(0);
+		break;
+
+	case JS_EXPECT_ATTNAME_VALUE:
+	{
+		int i;
+
+		Assert(tokentype == JSON_TOKEN_STRING);
+
+		for (i = 0; i < parse->rel->rd_att->natts; i++)
+		{
+			if (strcmp(NameStr(parse->rel->rd_att->attrs[i].attname), token) == 0)
+				break;
+		}
+
+		if (i == parse->rel->rd_att->natts)
+		{
+			/*
+			 * It is a valid case when a foreign table doesn't have all the
+			 * attributes from the base relation.
+			 */
+			skip_attr = true;
+			parse->state = JS_EXPECT_ATTNAME_FIELD;
+		}
+		else
+		{
+			/* Initialize new storage for the attribute. */
+			parse->natts++;
+			attnum++;
+			parse->vas[attnum] = examine_attribute(parse->rel, i+1, NULL);
+			parse->vas[attnum]->stats_valid = true;
+			Assert(parse->vas[attnum] != NULL);
+		}
+	}
+		break;
+
+	case JS_EXPECT_INH_VALUE:
+		Assert(tokentype == JSON_TOKEN_STRING);
+		/* XXX */
+		break;
+
+	case JS_EXPECT_WIDTH_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		value = DirectFunctionCall1(int4in, CStringGetDatum(token));
+		parse->vas[attnum]->stawidth = DatumGetInt32(value);
+		break;
+
+	case JS_EXPECT_DISTINCT_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		value = DirectFunctionCall1(float4in, CStringGetDatum(token));
+		parse->vas[attnum]->stadistinct = DatumGetFloat4(value);
+		break;
+
+	case JS_EXPECT_NULLFRAC_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		value = DirectFunctionCall1(float4in, CStringGetDatum(token));
+		parse->vas[attnum]->stanullfrac = DatumGetFloat4(value);
+		break;
+
+	case JS_EXPECT_KIND_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER && parse->arrayval >= 0);
+		value = DirectFunctionCall1(int2in, CStringGetDatum(token));
+		parse->vas[attnum]->stakind[parse->arrayval] = DatumGetInt16(value);
+		break;
+
+	case JS_EXPECT_STAOP_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER && parse->arrayval >= 0);
+		value = DirectFunctionCall1(oidin, CStringGetDatum(token));
+		parse->vas[attnum]->staop[parse->arrayval] = DatumGetObjectId(value);
+		break;
+
+	case JS_EXPECT_STACOLL_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER && parse->arrayval >= 0);
+		value = DirectFunctionCall1(oidin, CStringGetDatum(token));
+		parse->vas[attnum]->stacoll[parse->arrayval] = DatumGetObjectId(value);
+		break;
+
+	case JS_EXPECT_ARRSIZE_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER && parse->arrayval < 0);
+		value = DirectFunctionCall1(int4in, CStringGetDatum(token));
+		parse->arraysize = DatumGetInt32(value);
+		Assert(parse->arraysize > 0);
+		break;
+
+	case JS_EXPECT_NUMBERS_VALUE:
+	{
+		int n = parse->numbersN;
+		int m = parse->arrayval;
+
+		Assert(parse->valuesN == 0);
+		Assert(tokentype == JSON_TOKEN_NUMBER && n > 0 && n <= 5);
+		Assert(m >= 0 && m < parse->arraysize);
+
+		value = DirectFunctionCall1(float4in, CStringGetDatum(token));
+		parse->vas[attnum]->stanumbers[n-1][m] = DatumGetFloat4(value);
+	}
+		break;
+
+	case JS_EXPECT_VALUES_VALUE:
+	{
+		int n = parse->valuesN;
+		int m = parse->arrayval;
+		Form_pg_attribute att = parse->vas[attnum]->attr;
+		Oid			typinput;
+		Oid			typioparam;
+
+		Assert(parse->numbersN == 0);
+		Assert(tokentype == JSON_TOKEN_NUMBER && n > 0 && n <= 5);
+		Assert(m >= 0 && m < parse->arraysize);
+
+		getTypeInputInfo(att->atttypid, &typinput, &typioparam);
+		parse->vas[attnum]->stavalues[n-1][m] =
+			OidInputFunctionCall(typinput, token,
+								 typioparam, att->atttypmod);
+	}
+		break;
+
+	default:
+		elog(ERROR, "Unexpected token type: %d. Token: %s. State: %d.",
+			 tokentype, token, parse->state);
+	}
+}
+
+static void
+parse_statjson(Relation rel, char *buffer, size_t size,
+			   VacAttrStats **vacattrstats, int *natts,
+			   BlockNumber *relpages, double *reltuples)
+{
+	JsonLexContext *lex;
+	JsonSemAction sem;
+	JsonStatParseState parse;
+
+	/* Set up our private parsing context. */
+	parse.state = JS_EXPECT_PARAM_FIELD;
+	parse.arraysize = -1;
+	parse.arrayval = -1;
+	parse.numbersN = -1;
+	parse.vas = vacattrstats;
+	parse.natts = 0;
+	parse.rel = rel;
+	skip_attr = false;
+
+	/* Create a JSON lexing context. */
+	lex = makeJsonLexContextCstringLen(buffer, size, PG_UTF8, true);
+
+	/* Set up semantic actions. */
+	sem.semstate = &parse;
+	sem.object_start = NULL;
+	sem.object_end = NULL;
+	sem.array_start = NULL;
+	sem.array_end = json_stat_array_end;
+	sem.object_field_start = json_stat_object_field_start;
+	sem.object_field_end = json_stat_field_end;
+	sem.array_element_start = NULL;
+	sem.array_element_end = json_stat_array_element_end;
+	sem.scalar = json_stat_scalar;
+
+	/* Run the actual JSON parser. */
+	pg_parse_json(lex, &sem);
+	*natts = parse.natts;
+	*relpages = parse.relpages;
+	*reltuples = parse.reltuples;
+}
+
+static bool
+update_foreign_relation_stat(Relation rel)
+{
+	VacAttrStats **vacattrstats = (VacAttrStats **)
+					palloc0(rel->rd_att->natts * sizeof(VacAttrStats *));
+	char *statstr;
+	int natts = 0;
+	FdwRoutine *fdwroutine;
+	BlockNumber relallvisible;
+	BlockNumber numpages;
+	double numtuples;
+
+	fdwroutine = GetFdwRoutineForRelation(rel, false);
+	Assert(fdwroutine != NULL);
+
+	if (fdwroutine->GetForeignRelStat == NULL || strcmp(NameStr(rel->rd_rel->relname), "ftable") != 0)
+		return false;
+
+	statstr = fdwroutine->GetForeignRelStat(rel);
+
+	if (!statstr)
+		return false;
+	/*
+	 * Extract statistics from the JSON string.
+	 */
+	parse_statjson(rel, statstr, strlen(statstr), vacattrstats, &natts, &numpages, &numtuples);
+
+	update_attstats(RelationGetRelid(rel), false, natts, vacattrstats);
+
+	visibilitymap_count(rel, &relallvisible, NULL);
+	vac_update_relstats(rel, numpages, numtuples, relallvisible, false,
+						InvalidTransactionId, InvalidMultiXactId, false);
+
+	pfree(vacattrstats);
+	return true;
+}
+
+/*
+ * relstat_to_json
+ *
+ */
+const char *
+relstat_to_json(const char *relname)
+{
+	RangeVar	*relvar;
+	Relation	rel;
+	Relation	sd;
+	List		*relname_list;
+	int			attno;
+	StringInfo	str = makeStringInfo();
+	int			attnum = 0;
+
+	relname_list = stringToQualifiedNameList(relname);
+	relvar = makeRangeVarFromNameList(relname_list);
+	rel = relation_openrv(relvar, AccessShareLock);
+
+	if (rel->rd_rel->relkind != RELKIND_RELATION)
+	{
+		relation_close(rel, AccessShareLock);
+		elog(ERROR,
+			 "Can be used for ordinary relation only. Reltype: %c.",
+			 rel->rd_rel->relkind);
+	}
+
+	/* JSON header of overall relation statistics. */
+	appendStringInfoString(str, "{");
+	appendStringInfo(str, "\"namespace\" : \"%s\", \"relname\" : \"%s\", \"sta_num_slots\" : %d, ",
+					 get_namespace_name(rel->rd_rel->relnamespace),
+					 NameStr(rel->rd_rel->relname), STATISTIC_NUM_SLOTS);
+
+	appendStringInfo(str, "\"relpages\" : %u, \"reltuples\" : %f, ",
+					 rel->rd_rel->relpages, rel->rd_rel->reltuples);
+
+	appendStringInfo(str, "\"attrs\" : [");
+	sd = table_open(StatisticRelationId, RowExclusiveLock);
+
+	for (attno = 0; attno < rel->rd_att->natts; attno++)
+	{
+		HeapTuple	stup;
+		Datum		values[Natts_pg_statistic];
+		bool		nulls[Natts_pg_statistic];
+		int			i, k;
+
+		stup = SearchSysCache3(STATRELATTINH,
+							   ObjectIdGetDatum(RelationGetRelid(rel)),
+							   Int16GetDatum(rel->rd_att->attrs[attno].attnum),
+							   BoolGetDatum(false));
+
+		if (!HeapTupleIsValid(stup))
+			/* Go to the next attribute, if we haven't statistics for. */
+			continue;
+
+		if (attnum++ > 0)
+			appendStringInfoString(str, ", ");
+
+		heap_deform_tuple(stup, RelationGetDescr(sd), values, nulls);
+
+		/* JSON header of attrribute statistics. */
+		appendStringInfo(str, "{\"attname\" : \"%s\", \"inh\" : \"%s\", \"nullfrac\" : %f, \"width\" : %d, \"distinct\" : %f, ",
+			NameStr(*attnumAttName(rel, rel->rd_att->attrs[attno].attnum)),
+			DatumGetBool(values[Anum_pg_statistic_stainherit - 1]) ? "true" : "false",
+			DatumGetFloat4(values[Anum_pg_statistic_stanullfrac - 1]),
+			DatumGetInt32(values[Anum_pg_statistic_stawidth - 1]),
+			DatumGetFloat4(values[Anum_pg_statistic_stadistinct - 1]));
+
+		appendStringInfoString(str, "\"stakind\" : [");
+		i = Anum_pg_statistic_stakind1 - 1;
+		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
+		{
+			appendStringInfo(str, "%d", DatumGetInt16(values[i++]));
+			if (k < STATISTIC_NUM_SLOTS - 1)
+				appendStringInfo(str, ",");
+		}
+		appendStringInfoString(str, "], \"staop\" : [");
+
+		i = Anum_pg_statistic_staop1 - 1;
+		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
+		{
+			appendStringInfo(str, "%u", DatumGetObjectId(values[i++]));
+			if (k < STATISTIC_NUM_SLOTS - 1)
+				appendStringInfoChar(str, ',');
+		}
+		appendStringInfoString(str, "], \"stacoll\" : [");
+
+		i = Anum_pg_statistic_stacoll1 - 1;
+		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
+		{
+			appendStringInfo(str, "%u", DatumGetObjectId(values[i++]));
+			if (k < STATISTIC_NUM_SLOTS - 1)
+				appendStringInfoChar(str, ',');
+		}
+
+		appendStringInfoString(str, "], ");
+
+		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
+		{
+			bool		isnull;
+			Datum		val;
+
+			val = SysCacheGetAttr(STATRELATTINH, stup,
+							  Anum_pg_statistic_stavalues1 + k,
+							  &isnull);
+
+			if (isnull)
+				appendStringInfo(str, "\"values%d\" : [ ]", k+1);
+			else
+			{
+				ArrayType *v = DatumGetArrayTypeP(val);
+
+				appendStringInfo(str, "\"nn\" : %d, ",
+								 ArrayGetNItems(ARR_NDIM(v), ARR_DIMS(v)));
+				appendStringInfo(str, "\"values%d\" : ", k+1);
+				arr_to_json(val, str);
+			}
+			appendStringInfoString(str, ", ");
+		}
+
+		/* --- Extract numbers --- */
+		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
+		{
+			bool		isnull;
+			Datum		val;
+
+			if (k > 0)
+				appendStringInfoString(str, ", ");
+
+			val = SysCacheGetAttr(STATRELATTINH, stup,
+							  Anum_pg_statistic_stanumbers1 + k,
+							  &isnull);
+			if (isnull)
+				appendStringInfo(str, "\"numbers%d\" : [ ]", k+1);
+			else
+			{
+				ArrayType *v = DatumGetArrayTypeP(val);
+
+				appendStringInfo(str, "\"nn\" : %d, ",
+								 ArrayGetNItems(ARR_NDIM(v), ARR_DIMS(v)));
+				appendStringInfo(str, "\"numbers%d\" : ", k+1);
+				arr_to_json(val, str);
+			}
+		}
+
+		appendStringInfoString(str, "}");
+		ReleaseSysCache(stup);
+	}
+
+	if (attnum == 0)
+		appendStringInfoString(str, " ]");
+	else
+		appendStringInfoChar(str, ']');
+	appendStringInfoString(str, "}");
+
+	table_close(sd, RowExclusiveLock);
+	relation_close(rel, AccessShareLock);
+
+	return str->data;
+}
 
 /*
  *	update_attstats() -- update attribute statistics for one relation
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 23eb605d4c..963ee51523 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -882,6 +882,7 @@ get_all_vacuum_rels(int options)
 		 * them.
 		 */
 		if (classForm->relkind != RELKIND_RELATION &&
+			classForm->relkind != RELKIND_FOREIGN_TABLE &&
 			classForm->relkind != RELKIND_MATVIEW &&
 			classForm->relkind != RELKIND_PARTITIONED_TABLE)
 			continue;
@@ -1795,7 +1796,17 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 	/*
 	 * Check that it's of a vacuumable relkind.
 	 */
-	if (onerel->rd_rel->relkind != RELKIND_RELATION &&
+	if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		relation_close(onerel, lmode);
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+		if (params->options & VACOPT_ANALYZE)
+			return true;
+		else
+			return false;
+	}
+	else if (onerel->rd_rel->relkind != RELKIND_RELATION &&
 		onerel->rd_rel->relkind != RELKIND_MATVIEW &&
 		onerel->rd_rel->relkind != RELKIND_TOASTVALUE &&
 		onerel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index a7a91b72f6..adea67a18d 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -656,6 +656,12 @@ add_json(Datum val, bool is_null, StringInfo result,
 	datum_to_json(val, is_null, result, tcategory, outfuncoid, key_scalar);
 }
 
+void
+arr_to_json(Datum array, StringInfo result)
+{
+	array_to_json_internal(array, result, false);
+}
+
 /*
  * SQL function array_to_json(row)
  */
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index a4cd721400..9c1b89d5b6 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -283,6 +283,7 @@ extern Relation vacuum_open_relation(Oid relid, RangeVar *relation,
 extern void analyze_rel(Oid relid, RangeVar *relation,
 						VacuumParams *params, List *va_cols, bool in_outer_xact,
 						BufferAccessStrategy bstrategy);
+extern const char *relstat_to_json(const char *relname);
 extern bool std_typanalyze(VacAttrStats *stats);
 
 /* in utils/misc/sampling.c --- duplicate of declarations in utils/sampling.h */
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..51979ab3bf 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -169,6 +169,7 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef char *(*GetForeignRelStat_function) (Relation rel);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -246,6 +247,7 @@ typedef struct FdwRoutine
 
 	/* Support functions for path reparameterization. */
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
+	GetForeignRelStat_function GetForeignRelStat;
 } FdwRoutine;
 
 
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 4345fbdc31..a6b51bf191 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -20,5 +20,6 @@
 extern void escape_json(StringInfo buf, const char *str);
 extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
 								const int *tzp);
+extern void arr_to_json(Datum array, StringInfo result);
 
 #endif							/* JSON_H */
-- 
2.17.1

#11Stephen Frost
sfrost@snowman.net
In reply to: Andrey Lepikhov (#10)
Re: Ideas about a better API for postgres_fdw remote estimates

Greetings,

* Andrey Lepikhov (a.lepikhov@postgrespro.ru) wrote:

During the implementation of sharding related improvements i noticed that if
we use a lot of foreign partitions, we have bad plans because of vacuum
don't update statistics of foreign tables.This is done by the ANALYZE
command, but it is very expensive operation for foreign table.
Problem with statistics demonstrates with TAP-test from the first patch in
attachment.

Yes, the way we handle ANALYZE today for FDWs is pretty terrible, since
we stream the entire table across to do it.

I implemented some FDW + pg core machinery to reduce weight of the problem.
The ANALYZE command on foreign table executes query on foreign server that
extracts statistics tuple, serializes it into json-formatted string and
returns to the caller. The caller deserializes this string, generates
statistics for this foreign table and update it. The second patch is a
proof-of-concept.

Isn't this going to create a version dependency that we'll need to deal
with..? What if a newer major version has some kind of improved ANALYZE
command, in terms of what it looks at or stores, and it's talking to an
older server?

When I was considering the issue with ANALYZE and FDWs, I had been
thinking it'd make sense to just change the query that's built in
deparseAnalyzeSql() to have a TABLESAMPLE clause, but otherwise run in
more-or-less the same manner as today. If we don't like the available
TABLESAMPLE methods then we could add a new one that's explicitly the
'right' sample for an ANALYZE call and use that when it's available on
the remote side. Not sure if it'd make any sense for ANALYZE itself to
start using that same TABLESAMPLE code, but maybe? Not that I think
it'd be much of an issue if it's independent either, with appropriate
comments to note that we should probably try to make them match up for
the sake of FDWs.

This patch speedup analyze command and provides statistics relevance on a
foreign table after autovacuum operation. Its effectiveness depends on
relevance of statistics on the remote server, but still.

If we do decide to go down this route, wouldn't it mean we'd have to
solve the problem of what to do when it's a 9.6 foreign server being
queried from a v12 server and dealing with any difference in the
statistics structures of the two?

Seems like we would... in which case I would say that we should pull
that bit out and make it general, and use it for pg_upgrade too, which
would benefit a great deal from having the ability to upgrade stats
between major versions also. That's a much bigger piece to take on, of
course, but seems to be what's implied with this approach for the FDW.

Thanks,

Stephen

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#11)
Re: Ideas about a better API for postgres_fdw remote estimates

Stephen Frost <sfrost@snowman.net> writes:

Isn't this going to create a version dependency that we'll need to deal
with..? What if a newer major version has some kind of improved ANALYZE
command, in terms of what it looks at or stores, and it's talking to an
older server?

Yeah, this proposal is a nonstarter unless it can deal with the remote
server being a different PG version with different stats.

Years ago (when I was still at Salesforce, IIRC, so ~5 years) we had
some discussions about making it possible for pg_dump and/or pg_upgrade
to propagate stats data forward to the new database. There is at least
one POC patch in the archives for doing that by dumping the stats data
wrapped in a function call, where the target database's version of the
function would be responsible for adapting the data if necessary, or
maybe just discarding it if it couldn't adapt. We seem to have lost
interest but it still seems like something worth pursuing. I'd guess
that if such infrastructure existed it could be helpful for this.

When I was considering the issue with ANALYZE and FDWs, I had been
thinking it'd make sense to just change the query that's built in
deparseAnalyzeSql() to have a TABLESAMPLE clause, but otherwise run in
more-or-less the same manner as today.

+1, that seems like something worth doing in any case, since even if
we do get somewhere with the present idea it would only work for new
remote servers. TABLESAMPLE would work pretty far back (9.5,
looks like).

regards, tom lane

#13Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Stephen Frost (#11)
Re: Ideas about a better API for postgres_fdw remote estimates

On 8/29/20 9:22 PM, Stephen Frost wrote:

I implemented some FDW + pg core machinery to reduce weight of the problem.
The ANALYZE command on foreign table executes query on foreign server that
extracts statistics tuple, serializes it into json-formatted string and
returns to the caller. The caller deserializes this string, generates
statistics for this foreign table and update it. The second patch is a
proof-of-concept.

Isn't this going to create a version dependency that we'll need to deal
with..? What if a newer major version has some kind of improved ANALYZE
command, in terms of what it looks at or stores, and it's talking to an
older server?

When I was considering the issue with ANALYZE and FDWs, I had been
thinking it'd make sense to just change the query that's built in
deparseAnalyzeSql() to have a TABLESAMPLE clause, but otherwise run in
more-or-less the same manner as today. If we don't like the available
TABLESAMPLE methods then we could add a new one that's explicitly the
'right' sample for an ANALYZE call and use that when it's available on
the remote side. Not sure if it'd make any sense for ANALYZE itself to
start using that same TABLESAMPLE code, but maybe? Not that I think
it'd be much of an issue if it's independent either, with appropriate
comments to note that we should probably try to make them match up for
the sake of FDWs.

This approach does not contradict your idea. This is a lightweight
opportunity to reduce the cost of analysis if we have a set of servers
with actual versions of system catalog and fdw.

This patch speedup analyze command and provides statistics relevance on a
foreign table after autovacuum operation. Its effectiveness depends on
relevance of statistics on the remote server, but still.

If we do decide to go down this route, wouldn't it mean we'd have to
solve the problem of what to do when it's a 9.6 foreign server being
queried from a v12 server and dealing with any difference in the
statistics structures of the two?

Seems like we would... in which case I would say that we should pull
that bit out and make it general, and use it for pg_upgrade too, which
would benefit a great deal from having the ability to upgrade stats
between major versions also. That's a much bigger piece to take on, of
course, but seems to be what's implied with this approach for the FDW.

Thank you for this use case.

We can add field "version" to statistics string (btree uses versioning
too). As you can see, in this patch we are only trying to get
statistics. If for some reason this does not work out, then we go along
a difficult route.

Moreover, I believe this strategy should only work if we analyze a
relation implicitly. If the user executes analysis explicitly by the
command "ANALYZE <relname>", we need to perform an fair analysis of the
table.

--
regards,
Andrey Lepikhov
Postgres Professional

#14Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Tom Lane (#12)
Re: Ideas about a better API for postgres_fdw remote estimates

On 8/29/20 9:50 PM, Tom Lane wrote:

Years ago (when I was still at Salesforce, IIRC, so ~5 years) we had
some discussions about making it possible for pg_dump and/or pg_upgrade
to propagate stats data forward to the new database. There is at least
one POC patch in the archives for doing that by dumping the stats data
wrapped in a function call, where the target database's version of the
function would be responsible for adapting the data if necessary, or
maybe just discarding it if it couldn't adapt. We seem to have lost
interest but it still seems like something worth pursuing. I'd guess
that if such infrastructure existed it could be helpful for this.

Thanks for this helpful feedback.

I found several threads related to the problem [1-3].
I agreed that this task needs to implement an API for
serialization/deserialization of statistics:
pg_load_relation_statistics(json_string text);
pg_get_relation_statistics(relname text);
We can use a version number for resolving conflicts with different
statistics implementations.
"Load" function will validate the values[] anyarray while deserializing
the input json string to the datatype of the relation column.

Maybe I didn't feel all the problems of this task?

1. /messages/by-id/724322880.K8vzik8zPz@abook
2.
/messages/by-id/CAAZKuFaWdLkK8eozSAooZBets9y_mfo2HS6urPAKXEPbd-JLCA@mail.gmail.com
3.
/messages/by-id/GNELIHDDFBOCMGBFGEFOOEOPCBAA.chriskl@familyhealth.com.au

--
regards,
Andrey Lepikhov
Postgres Professional

#15Ashutosh Bapat
ashutosh.bapat.oss@gmail.com
In reply to: Andrey V. Lepikhov (#14)
Re: Ideas about a better API for postgres_fdw remote estimates

On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Thanks for this helpful feedback.

I found several threads related to the problem [1-3].
I agreed that this task needs to implement an API for
serialization/deserialization of statistics:
pg_load_relation_statistics(json_string text);
pg_get_relation_statistics(relname text);
We can use a version number for resolving conflicts with different
statistics implementations.
"Load" function will validate the values[] anyarray while deserializing
the input json string to the datatype of the relation column.

This is a valuable feature. Analysing a foreign table by fetching rows
from the foreign server isn't very efficient. In fact the current FDW
API for doing that forges that in-efficiency by requiring the FDW to
return a sample of rows that will be analysed by the core. That's why
I see that your patch introduces a new API to get foreign rel stat. I
don't think there's any point in maintaining these two APIs just for
ANALYSING table. Instead we should have only one FDW API which will do
whatever it wants and return statistics that can be understood by the
core and let core install it in the catalogs. I believe that's doable.

In case of PostgreSQL it could get the stats available as is from the
foreign server, convert it into a form that the core understands and
returns. The patch introduces a new function postgres_fdw_stat() which
will be available only from version 14 onwards. Can we use
row_to_json(), which is available in all the supported versions,
instead?

In case of some other foreign server, an FDW will be responsible to
return statistics in a form that the core will understand. It may
fetch rows from the foreign server or be a bit smart and fetch the
statistics and convert.

This also means that FDWs will have to deal with the statistics format
that the core understands and thus will need changes in their code
with every version in the worst case. But AFAIR, PostgreSQL supports
different forms of statistics so the problem may not remain that
severe if FDWs and core agree on some bare minimum format that the
core supports for long.

I think the patch has some other problems like it works only for
regular tables on foreign server but a foreign table can be pointing
to any relation like a materialized view, partitioned table or a
foreign table on the foreign server all of which have statistics
associated with them. I didn't look closely but it does not consider
that the foreign table may not have all the columns from the relation
on the foreign server or may have different names. But I think those
problems are kind of secondary. We have to agree on the design first.

--
Best Wishes,
Ashutosh Bapat

#16Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#12)
Re: Ideas about a better API for postgres_fdw remote estimates

On Sat, Aug 29, 2020 at 12:50:59PM -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

Isn't this going to create a version dependency that we'll need to deal
with..? What if a newer major version has some kind of improved ANALYZE
command, in terms of what it looks at or stores, and it's talking to an
older server?

Yeah, this proposal is a nonstarter unless it can deal with the remote
server being a different PG version with different stats.

Years ago (when I was still at Salesforce, IIRC, so ~5 years) we had
some discussions about making it possible for pg_dump and/or pg_upgrade
to propagate stats data forward to the new database. There is at least
one POC patch in the archives for doing that by dumping the stats data
wrapped in a function call, where the target database's version of the
function would be responsible for adapting the data if necessary, or
maybe just discarding it if it couldn't adapt. We seem to have lost
interest but it still seems like something worth pursuing. I'd guess
that if such infrastructure existed it could be helpful for this.

I don't think there was enough value to do statistics migration just for
pg_upgrade, but doing it for pg_upgrade and FDWs seems like it might
have enough demand to justify the required work and maintenance.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#17Stephen Frost
sfrost@snowman.net
In reply to: Bruce Momjian (#16)
Re: Ideas about a better API for postgres_fdw remote estimates

Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:

On Sat, Aug 29, 2020 at 12:50:59PM -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

Isn't this going to create a version dependency that we'll need to deal
with..? What if a newer major version has some kind of improved ANALYZE
command, in terms of what it looks at or stores, and it's talking to an
older server?

Yeah, this proposal is a nonstarter unless it can deal with the remote
server being a different PG version with different stats.

Years ago (when I was still at Salesforce, IIRC, so ~5 years) we had
some discussions about making it possible for pg_dump and/or pg_upgrade
to propagate stats data forward to the new database. There is at least
one POC patch in the archives for doing that by dumping the stats data
wrapped in a function call, where the target database's version of the
function would be responsible for adapting the data if necessary, or
maybe just discarding it if it couldn't adapt. We seem to have lost
interest but it still seems like something worth pursuing. I'd guess
that if such infrastructure existed it could be helpful for this.

I don't think there was enough value to do statistics migration just for
pg_upgrade, but doing it for pg_upgrade and FDWs seems like it might
have enough demand to justify the required work and maintenance.

Not sure that it really matters much, but I disagree with the assessment
that there wasn't enough value to do it for pg_upgrade; I feel that it
just hasn't been something that's had enough people interested in
working on it, which isn't the same thing.

If a good patch showed up tomorrow, with someone willing to spend time
on it, I definitely think it'd be something we should include even if
it's just for pg_upgrade. A solution that works for both pg_upgrade and
the postgres FDW would be even better, of course.

Thanks,

Stephen

#18Bruce Momjian
bruce@momjian.us
In reply to: Stephen Frost (#17)
Re: Ideas about a better API for postgres_fdw remote estimates

On Mon, Aug 31, 2020 at 12:19:52PM -0400, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

I don't think there was enough value to do statistics migration just for
pg_upgrade, but doing it for pg_upgrade and FDWs seems like it might
have enough demand to justify the required work and maintenance.

Not sure that it really matters much, but I disagree with the assessment
that there wasn't enough value to do it for pg_upgrade; I feel that it
just hasn't been something that's had enough people interested in
working on it, which isn't the same thing.

I am not sure what point you are trying to make, but if it had enough
value, wouldn't people work on it, or are you saying that it had enough
value, but people didn't realize it, so didn't work on it? I guess I
can see that. For me, it was the maintenance burden that always scared
me from getting involved since it would be the rare case where
pg_upgrade would have to be modified for perhaps every major release.

If a good patch showed up tomorrow, with someone willing to spend time
on it, I definitely think it'd be something we should include even if
it's just for pg_upgrade. A solution that works for both pg_upgrade and
the postgres FDW would be even better, of course.

Yep, see above. The problem isn't mostly the initial patch, but someone
who is going to work on it and test it for every major release,
potentially forever. Frankly, this is a pg_dump feature, rather than
something pg_upgrade should be doing, because not having to run ANALYZE
after restoring a dump is also a needed feature.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#19Stephen Frost
sfrost@snowman.net
In reply to: Bruce Momjian (#18)
Re: Ideas about a better API for postgres_fdw remote estimates

Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:

On Mon, Aug 31, 2020 at 12:19:52PM -0400, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

I don't think there was enough value to do statistics migration just for
pg_upgrade, but doing it for pg_upgrade and FDWs seems like it might
have enough demand to justify the required work and maintenance.

Not sure that it really matters much, but I disagree with the assessment
that there wasn't enough value to do it for pg_upgrade; I feel that it
just hasn't been something that's had enough people interested in
working on it, which isn't the same thing.

I am not sure what point you are trying to make, but if it had enough
value, wouldn't people work on it, or are you saying that it had enough
value, but people didn't realize it, so didn't work on it? I guess I
can see that. For me, it was the maintenance burden that always scared
me from getting involved since it would be the rare case where
pg_upgrade would have to be modified for perhaps every major release.

The point I was making was that it has value and people did realize it
but there's only so many resources to go around when it comes to hacking
on PG and therefore it simply hasn't been done yet.

There's a big difference between "yes, we all agree that would be good
to have, but no one has had time to work on it" and "we don't think this
is worth having because of the maintenance work it'd require." The
latter shuts down anyone thinking of working on it, which is why I said
anything.

If a good patch showed up tomorrow, with someone willing to spend time
on it, I definitely think it'd be something we should include even if
it's just for pg_upgrade. A solution that works for both pg_upgrade and
the postgres FDW would be even better, of course.

Yep, see above. The problem isn't mostly the initial patch, but someone
who is going to work on it and test it for every major release,
potentially forever. Frankly, this is a pg_dump feature, rather than
something pg_upgrade should be doing, because not having to run ANALYZE
after restoring a dump is also a needed feature.

I tend to agree with it being more of a pg_dump issue- but that also
shows that your assessment above doesn't actually fit, because we
definitely change pg_dump every release. Consider that if someone wants
to add some new option to CREATE TABLE, which gets remembered in the
catalog, they have to make sure that pg_dump support is added for that.
If we added the statistics export/import to pg_dump, someone changing
those parts of the system would also be expected to update pg_dump to
manage those changes, including working with older versions of PG.

Thanks,

Stephen

#20Bruce Momjian
bruce@momjian.us
In reply to: Stephen Frost (#19)
Re: Ideas about a better API for postgres_fdw remote estimates

On Mon, Aug 31, 2020 at 12:56:21PM -0400, Stephen Frost wrote:

Greetings,
* Bruce Momjian (bruce@momjian.us) wrote:
The point I was making was that it has value and people did realize it
but there's only so many resources to go around when it comes to hacking
on PG and therefore it simply hasn't been done yet.

There's a big difference between "yes, we all agree that would be good
to have, but no one has had time to work on it" and "we don't think this
is worth having because of the maintenance work it'd require." The
latter shuts down anyone thinking of working on it, which is why I said
anything.

I actually don't know which statement above is correct, because of the
"forever" maintenance.

I tend to agree with it being more of a pg_dump issue- but that also
shows that your assessment above doesn't actually fit, because we
definitely change pg_dump every release. Consider that if someone wants
to add some new option to CREATE TABLE, which gets remembered in the
catalog, they have to make sure that pg_dump support is added for that.
If we added the statistics export/import to pg_dump, someone changing
those parts of the system would also be expected to update pg_dump to
manage those changes, including working with older versions of PG.

Yes, very true, but technically any change in any aspect of the
statistics system would require modification of the statistics dump,
which usually isn't required for most feature changes.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#21Stephen Frost
sfrost@snowman.net
In reply to: Bruce Momjian (#20)
Re: Ideas about a better API for postgres_fdw remote estimates

Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:

On Mon, Aug 31, 2020 at 12:56:21PM -0400, Stephen Frost wrote:

The point I was making was that it has value and people did realize it
but there's only so many resources to go around when it comes to hacking
on PG and therefore it simply hasn't been done yet.

There's a big difference between "yes, we all agree that would be good
to have, but no one has had time to work on it" and "we don't think this
is worth having because of the maintenance work it'd require." The
latter shuts down anyone thinking of working on it, which is why I said
anything.

I actually don't know which statement above is correct, because of the
"forever" maintenance.

I can understand not being sure which is correct, and we can all have
different points of view on it too, but that's a much softer stance than
what I, at least, understood from your up-thread comment which was-

I don't think there was enough value to do statistics migration just
for pg_upgrade [...]

That statement came across to me as saying the latter statement above.
Perhaps that wasn't what you intended it to, in which case it's good to
have the discussion and clarify it, for others who might be following
this thread and wondering if they should consider working on this area
of the code or not.

Yes, very true, but technically any change in any aspect of the
statistics system would require modification of the statistics dump,
which usually isn't required for most feature changes.

Feature work either requires changes to pg_dump, or not. I agree that
features which don't require pg_dump changes are definitionally less
work than features which do (presuming the rest of the feature is the
same in both cases) but that isn't a justification to not have pg_dump
support in cases where it's expected- we just don't currently expect it
for statistics (which is a rather odd exception when you consider that
nearly everything else that ends up in the catalog tables is included).

For my part, at least, I'd like to see us change that expectation, for a
number of reasons:

- pg_upgrade could leverage it and reduce downtime and/or confusion for
users who are upgrading and dealing with poor statistics or no
statistics for however long after the upgrade

- Tables restored wouldn't require an ANALYZE to get reasonable queries
against them

- Debugging query plans would be a lot less guess-work or having to ask
the user to export the statistics by hand from the catalog and then
having to hand-hack them in to try and reproduce what's happening,
particularly when re-running an analyze ends up giving different
results, which isn't uncommon for edge cases

- The postgres_fdw would be able to leverage this, as discussed earlier
on in this thread

- Logical replication could potentially leverage the existing stats and
not require ANALYZE to be done after an import, leading to more
predictable query plans on the replica

I suspect there's probably other benefits than the ones above, but these
all seem pretty valuable to me.

Thanks,

Stephen

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#21)
Re: Ideas about a better API for postgres_fdw remote estimates

Stephen Frost <sfrost@snowman.net> writes:

Feature work either requires changes to pg_dump, or not. I agree that
features which don't require pg_dump changes are definitionally less
work than features which do (presuming the rest of the feature is the
same in both cases) but that isn't a justification to not have pg_dump
support in cases where it's expected- we just don't currently expect it
for statistics (which is a rather odd exception when you consider that
nearly everything else that ends up in the catalog tables is included).

For my part, at least, I'd like to see us change that expectation, for a
number of reasons:

Yeah. I think that originally we expected that the definition of the
stats might change fast enough that porting them cross-version would be
problematic. Subsequent experience has shown that they don't actually
change any faster than any other aspect of the catalogs. So, while
I do think we must have a plan for how to cope when/if the definition
changes, I don't buy Bruce's argument that it's going to require more
maintenance effort than any other part of the system does.

regards, tom lane

#23Bruce Momjian
bruce@momjian.us
In reply to: Stephen Frost (#21)
Re: Ideas about a better API for postgres_fdw remote estimates

On Mon, Aug 31, 2020 at 01:26:59PM -0400, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

I actually don't know which statement above is correct, because of the
"forever" maintenance.

I can understand not being sure which is correct, and we can all have
different points of view on it too, but that's a much softer stance than
what I, at least, understood from your up-thread comment which was-

I don't think there was enough value to do statistics migration just
for pg_upgrade [...]

That statement came across to me as saying the latter statement above.
Perhaps that wasn't what you intended it to, in which case it's good to
have the discussion and clarify it, for others who might be following
this thread and wondering if they should consider working on this area
of the code or not.

I concluded that based on the fact that pg_upgrade has been used for
years and there has been almost no work on statistics upgrades.

Yes, very true, but technically any change in any aspect of the
statistics system would require modification of the statistics dump,
which usually isn't required for most feature changes.

Feature work either requires changes to pg_dump, or not. I agree that
features which don't require pg_dump changes are definitionally less
work than features which do (presuming the rest of the feature is the
same in both cases) but that isn't a justification to not have pg_dump
support in cases where it's expected- we just don't currently expect it
for statistics (which is a rather odd exception when you consider that
nearly everything else that ends up in the catalog tables is included).

Agreed, but the big differences is that you can change most SQL commands
easily, e.g. system catalog changes, without any pg_dump changes, unless
you change the SQL API, while any statistics storage change would
potentially need pg_dump adjustments. And once you start doing it, you
had better keep doing it for every major release or there will be major
complaints.

For my part, at least, I'd like to see us change that expectation, for a
number of reasons:

Yes, there are certainly more uses now than we used to have.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#24Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#22)
Re: Ideas about a better API for postgres_fdw remote estimates

On Mon, Aug 31, 2020 at 01:53:01PM -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

Feature work either requires changes to pg_dump, or not. I agree that
features which don't require pg_dump changes are definitionally less
work than features which do (presuming the rest of the feature is the
same in both cases) but that isn't a justification to not have pg_dump
support in cases where it's expected- we just don't currently expect it
for statistics (which is a rather odd exception when you consider that
nearly everything else that ends up in the catalog tables is included).

For my part, at least, I'd like to see us change that expectation, for a
number of reasons:

Yeah. I think that originally we expected that the definition of the
stats might change fast enough that porting them cross-version would be
problematic. Subsequent experience has shown that they don't actually
change any faster than any other aspect of the catalogs. So, while
I do think we must have a plan for how to cope when/if the definition
changes, I don't buy Bruce's argument that it's going to require more
maintenance effort than any other part of the system does.

Well, my point is that even bucket/calculation/data text respresentation
changes could affect dumping statistics, and that is kind of rare for
other changes we make.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#25Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#24)
Re: Ideas about a better API for postgres_fdw remote estimates

On Mon, Aug 31, 2020 at 05:46:22PM -0400, Bruce Momjian wrote:

On Mon, Aug 31, 2020 at 01:53:01PM -0400, Tom Lane wrote:

Stephen Frost <sfrost@snowman.net> writes:

Feature work either requires changes to pg_dump, or not. I agree that
features which don't require pg_dump changes are definitionally less
work than features which do (presuming the rest of the feature is the
same in both cases) but that isn't a justification to not have pg_dump
support in cases where it's expected- we just don't currently expect it
for statistics (which is a rather odd exception when you consider that
nearly everything else that ends up in the catalog tables is included).

For my part, at least, I'd like to see us change that expectation, for a
number of reasons:

Yeah. I think that originally we expected that the definition of the
stats might change fast enough that porting them cross-version would be
problematic. Subsequent experience has shown that they don't actually
change any faster than any other aspect of the catalogs. So, while
I do think we must have a plan for how to cope when/if the definition
changes, I don't buy Bruce's argument that it's going to require more
maintenance effort than any other part of the system does.

Well, my point is that even bucket/calculation/data text respresentation
changes could affect dumping statistics, and that is kind of rare for
other changes we make.

And I have been hoping someone would prove me wrong all these years, but
it hasn't happened yet. It is possible we have hit a tipping point
where the work is worth it, and I hope that is the case. I am just
explaining why I think it has not happened yet.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

#26Andrey Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Ashutosh Bapat (#15)
Re: Ideas about a better API for postgres_fdw remote estimates

On 8/31/20 6:19 PM, Ashutosh Bapat wrote:

On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

I agreed that this task needs to implement an API for
serialization/deserialization of statistics:
pg_load_relation_statistics(json_string text);
pg_get_relation_statistics(relname text);
We can use a version number for resolving conflicts with different
statistics implementations.
"Load" function will validate the values[] anyarray while deserializing
the input json string to the datatype of the relation column.

This is a valuable feature. Analysing a foreign table by fetching rows
from the foreign server isn't very efficient. In fact the current FDW
API for doing that forges that in-efficiency by requiring the FDW to
return a sample of rows that will be analysed by the core. That's why
I see that your patch introduces a new API to get foreign rel stat. I
don't think there's any point in maintaining these two APIs just for

ANALYSING table. Instead we should have only one FDW API which will do
whatever it wants and return statistics that can be understood by the
core and let core install it in the catalogs. I believe that's doable.

I think the same.

In case of PostgreSQL it could get the stats available as is from the
foreign server, convert it into a form that the core understands and
returns. The patch introduces a new function postgres_fdw_stat() which
will be available only from version 14 onwards. Can we use
row_to_json(), which is available in all the supported versions,
instead?

I started from here. But we need to convert starelid, staop[] stacoll[]
oids into portable format. Also we need to explicitly specify the type
of each values[] array. And no one guaranteed that anyarray values[]
can't contained an array of complex type values, containing oids, that
can't be correctly converted to database objects on another server...
These considerations required me to add new postgres_fdw_stat() routine
that can be moved into the core.

In case of some other foreign server, an FDW will be responsible to
return statistics in a form that the core will understand. It may
fetch rows from the foreign server or be a bit smart and fetch the
statistics and convert.

I don't think I fully understood your idea. Please explain in more
detail if possible.

This also means that FDWs will have to deal with the statistics format
that the core understands and thus will need changes in their code
with every version in the worst case. But AFAIR, PostgreSQL supports
different forms of statistics so the problem may not remain that
severe if FDWs and core agree on some bare minimum format that the
core supports for long.

I don't think FDW needs to know anything about the internals of
statistics. It only need to execute query like
"SELECT extract_statistics(namespace.relation);"
and apply the text representation by the function call like this:
store_statistics(const char *stat);
All validation and update pg_statistic operations will be performed into
the core.

I think the patch has some other problems like it works only for
regular tables on foreign server but a foreign table can be pointing
to any relation like a materialized view, partitioned table or a
foreign table on the foreign server all of which have statistics
associated with them.

Ok. It was implemented for discussion, test and as a base of development.

I didn't look closely but it does not consider
that the foreign table may not have all the columns from the relation
on the foreign server or may have different names.

Here we get full statistics from remote server and extract statistics
only for columns, included into the tuple descriptor of foreign table.

But I think those
problems are kind of secondary. We have to agree on the design first.

+1.
I only want to point out the following. In previous threads statistics
was converted row-by-row. I want to suggest to serialize all statistics
tuples for the relation into single json string. On apply phase we can
filter unneeded attributes.

--
regards,
Andrey Lepikhov
Postgres Professional

#27Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Ashutosh Bapat (#15)
1 attachment(s)
Re: Ideas about a better API for postgres_fdw remote estimates

On 8/31/20 6:19 PM, Ashutosh Bapat wrote:

On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Thanks for this helpful feedback.

I think the patch has some other problems like it works only for
regular tables on foreign server but a foreign table can be pointing
to any relation like a materialized view, partitioned table or a
foreign table on the foreign server all of which have statistics
associated with them. I didn't look closely but it does not consider
that the foreign table may not have all the columns from the relation
on the foreign server or may have different names. But I think those
problems are kind of secondary. We have to agree on the design first.

In accordance with discussion i made some changes in the patch:
1. The extract statistic routine moved into the core.
2. Serialized stat contains 'version' field to indicate format of
statistic received.
3. ANALYZE and VACUUM ANALYZE uses this approach only in the case of
implicit analysis of the relation.

I am currently keeping limitation of using the approach for regular
relations only, because i haven't studied the specifics of another types
of relations.
But I don't know any reason to keep this limit in the future.

The patch in attachment is very raw. I publish for further substantive
discussion.

--
regards,
Andrey Lepikhov
Postgres Professional

Attachments:

v2-0001-Pull-statistic-for-a-foreign-table-from-remote-serve.patchtext/x-patch; charset=UTF-8; name=v2-0001-Pull-statistic-for-a-foreign-table-from-remote-serve.patchDownload
From 9cfd9b8a43691f1dacd0967dcc32fdf8414ddb56 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>
Date: Tue, 4 Aug 2020 09:29:37 +0500
Subject: [PATCH] Pull statistic for a foreign table from remote server.

Add the extract_relation_statistics() routine that convert statistics
on the relation into json format.
All OIDs - starelid, staop[] stacoll[] is converted to a portable
representation. Operation uniquely defined by set of features:
namespace, operator name, left operator namespace and name, right
operator namespace and name.
Collation uniquely defined by namespace, collation name and encoding.
New fdw API routine GetForeignRelStat() implements access to this
machinery and returns JSON string to the caller.
This function is called by ANALYZE command (without explicit relation
name) as an attempt to reduce the cost of updating statistics.
If attempt fails, ANALYZE go the expensive way. Add this feature into
the VACUUM ANALYZE and autovacuum.
In accordance with discussion [1] move the extract_relation_statistics()
routine into the core.

ToDo: tests on custom operations and collations.

1. https://www.postgresql.org/message-id/flat/1155731.1593832096%40sss.pgh.pa.us
---
 contrib/postgres_fdw/Makefile                 |   2 +-
 contrib/postgres_fdw/deparse.c                |   8 +
 .../postgres_fdw/expected/foreign_stat.out    | 112 +++
 contrib/postgres_fdw/postgres_fdw.c           |  49 ++
 contrib/postgres_fdw/postgres_fdw.h           |   1 +
 contrib/postgres_fdw/sql/foreign_stat.sql     |  46 +
 src/backend/commands/analyze.c                | 794 ++++++++++++++++++
 src/backend/commands/vacuum.c                 |  13 +-
 src/backend/utils/adt/json.c                  |   6 +
 src/backend/utils/cache/lsyscache.c           | 167 ++++
 src/include/catalog/pg_proc.dat               |   3 +
 src/include/catalog/pg_statistic.h            |   1 +
 src/include/foreign/fdwapi.h                  |   2 +
 src/include/utils/json.h                      |   1 +
 src/include/utils/lsyscache.h                 |   8 +
 15 files changed, 1211 insertions(+), 2 deletions(-)
 create mode 100644 contrib/postgres_fdw/expected/foreign_stat.out
 create mode 100644 contrib/postgres_fdw/sql/foreign_stat.sql

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index ee8a80a392..a5a838b8fc 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -16,7 +16,7 @@ SHLIB_LINK_INTERNAL = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESS = postgres_fdw foreign_stat
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index ad37a74221..e63cb4982f 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -2053,6 +2053,14 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel)
 	appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ);
 }
 
+void
+deparseGetStatSql(StringInfo buf, Relation rel)
+{
+	appendStringInfo(buf, "SELECT * FROM extract_relation_statistics('");
+	deparseRelation(buf, rel);
+	appendStringInfoString(buf, "');");
+}
+
 /*
  * Construct SELECT statement to acquire sample rows of given relation.
  *
diff --git a/contrib/postgres_fdw/expected/foreign_stat.out b/contrib/postgres_fdw/expected/foreign_stat.out
new file mode 100644
index 0000000000..46fd2f8427
--- /dev/null
+++ b/contrib/postgres_fdw/expected/foreign_stat.out
@@ -0,0 +1,112 @@
+CREATE TABLE ltable (a int, b real);
+CREATE FOREIGN TABLE ftable (a int) server loopback options (table_name 'ltable');
+VACUUM ANALYZE;
+-- Check statistic interface routine on an empty table.
+SELECT * FROM extract_relation_statistics('ltable');
+                                                        extract_relation_statistics                                                        
+-------------------------------------------------------------------------------------------------------------------------------------------
+ {"version" : 1, "namespace" : "public", "relname" : "ltable", "sta_num_slots" : 5, "relpages" : 0, "reltuples" : 0.000000, "attrs" : [ ]}
+(1 row)
+
+SELECT * FROM extract_relation_statistics('ftable');
+ERROR:  Can be used for ordinary relation only. Reltype: f.
+-- Check statistic interface routine on non-empty tables.
+INSERT INTO ltable (a, b) (SELECT *, 1.01 FROM generate_series(1, 1E4));
+-- hereinafter we don't want to depend on analyze order. If table will be
+-- analyzed before ltable than we got out-of-date statistic.
+ANALYZE ltable;
+ANALYZE;
+SELECT * FROM extract_relation_statistics('ltable');
+                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          extract_relation_statistics                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ {"version" : 1, "namespace" : "public", "relname" : "ltable", "sta_num_slots" : 5, "relpages" : 45, "reltuples" : 10000.000000, "attrs" : [{"attname" : "a", "inh" : "false", "nullfrac" : 0.000000, "width" : 4, "distinct" : -1.000000, "stakind" : [2,3,0,0,0], "staop" : [" .<. .int4. .int4"," .<. .int4. .int4"," "," "," "], "stacoll" : [" "," "," "," "," "], "nn" : 101, "values1" : [1,100,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000,2100,2200,2300,2400,2500,2600,2700,2800,2900,3000,3100,3200,3300,3400,3500,3600,3700,3800,3900,4000,4100,4200,4300,4400,4500,4600,4700,4800,4900,5000,5100,5200,5300,5400,5500,5600,5700,5800,5900,6000,6100,6200,6300,6400,6500,6600,6700,6800,6900,7000,7100,7200,7300,7400,7500,7600,7700,7800,7900,8000,8100,8200,8300,8400,8500,8600,8700,8800,8900,9000,9100,9200,9300,9400,9500,9600,9700,9800,9900,10000], "values2" : [ ], "values3" : [ ], "values4" : [ ], "values5" : [ ], "numbers1" : [ ], "nn" : 1, "numbers2" : [1], "numbers3" : [ ], "numbers4" : [ ], "numbers5" : [ ]}, {"attname" : "b", "inh" : "false", "nullfrac" : 0.000000, "width" : 4, "distinct" : 1.000000, "stakind" : [1,3,0,0,0], "staop" : [" .=. .float4. .float4"," .<. .float4. .float4"," "," "," "], "stacoll" : [" "," "," "," "," "], "nn" : 1, "values1" : [1.01], "values2" : [ ], "values3" : [ ], "values4" : [ ], "values5" : [ ], "nn" : 1, "numbers1" : [1], "nn" : 1, "numbers2" : [1], "numbers3" : [ ], "numbers4" : [ ], "numbers5" : [ ]}]}
+(1 row)
+
+-- Check ANALYZE on foreign table
+INSERT INTO ltable (a, b) (SELECT *, 2.01 FROM generate_series(1E4, 2E4));
+ANALYZE ltable;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable;
+                                       QUERY PLAN                                       
+----------------------------------------------------------------------------------------
+ Seq Scan on ltable  (cost=0.00..289.01 rows=20001 width=8) (actual rows=20001 loops=1)
+(1 row)
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable;
+                                          QUERY PLAN                                          
+----------------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..445.00 rows=10000 width=4) (actual rows=20001 loops=1)
+(1 row)
+
+ANALYZE;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable;
+                                          QUERY PLAN                                          
+----------------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..789.03 rows=20001 width=4) (actual rows=20001 loops=1)
+(1 row)
+
+-- Check selectivity
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 10;
+                                       QUERY PLAN                                       
+----------------------------------------------------------------------------------------
+ Seq Scan on ltable  (cost=0.00..339.01 rows=19991 width=8) (actual rows=19991 loops=1)
+   Filter: (a > 10)
+   Rows Removed by Filter: 10
+(3 rows)
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 10;
+                                          QUERY PLAN                                          
+----------------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..838.83 rows=19991 width=4) (actual rows=19991 loops=1)
+(1 row)
+
+-- Check new attribute
+ALTER TABLE ltable ADD COLUMN c int DEFAULT 42;
+ALTER TABLE ftable ADD COLUMN c int;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 10 AND c < 15;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Seq Scan on ltable  (cost=0.00..389.01 rows=6664 width=12) (actual rows=0 loops=1)
+   Filter: ((a > 10) AND (c < 15))
+   Rows Removed by Filter: 20001
+(3 rows)
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 10 AND c < 15;
+                                       QUERY PLAN                                        
+-----------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..622.29 rows=6664 width=8) (actual rows=0 loops=1)
+(1 row)
+
+ANALYZE ltable;
+ANALYZE;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 10 AND c < 15;
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
+ Seq Scan on ltable  (cost=0.00..389.01 rows=1 width=12) (actual rows=0 loops=1)
+   Filter: ((a > 10) AND (c < 15))
+   Rows Removed by Filter: 20001
+(3 rows)
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 10 AND c < 15;
+                                      QUERY PLAN                                      
+--------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..489.03 rows=1 width=8) (actual rows=0 loops=1)
+(1 row)
+
+-- Test default vacuum analyzes foreign relation
+INSERT INTO ltable (a, b) (SELECT *, 2.01 FROM generate_series(2E4, 3E4));
+ANALYZE ltable;
+VACUUM ANALYZE;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 100 AND c < 43;
+                                       QUERY PLAN                                        
+-----------------------------------------------------------------------------------------
+ Seq Scan on ltable  (cost=0.00..593.03 rows=29902 width=12) (actual rows=29902 loops=1)
+   Filter: ((a > 100) AND (c < 43))
+   Rows Removed by Filter: 100
+(3 rows)
+
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 100 AND c < 43;
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
+ Foreign Scan on ftable  (cost=100.00..1291.07 rows=29902 width=8) (actual rows=29902 loops=1)
+(1 row)
+
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a31abce7c9..092da7fd96 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -391,6 +391,7 @@ static void postgresGetForeignUpperPaths(PlannerInfo *root,
 										 RelOptInfo *input_rel,
 										 RelOptInfo *output_rel,
 										 void *extra);
+static char *postgresGetForeignRelStat(Relation rel);
 
 /*
  * Helper functions
@@ -558,6 +559,7 @@ postgres_fdw_handler(PG_FUNCTION_ARGS)
 	/* Support functions for upper relation push-down */
 	routine->GetForeignUpperPaths = postgresGetForeignUpperPaths;
 
+	routine->GetForeignRelStat = postgresGetForeignRelStat;
 	PG_RETURN_POINTER(routine);
 }
 
@@ -6583,3 +6585,50 @@ find_em_expr_for_input_target(PlannerInfo *root,
 	elog(ERROR, "could not find pathkey item to sort");
 	return NULL;				/* keep compiler quiet */
 }
+
+static char *
+postgresGetForeignRelStat(Relation rel)
+{
+	ForeignTable *table;
+	UserMapping *user;
+	PGconn	   *conn;
+	unsigned int cursor_number;
+	StringInfoData sql;
+	PGresult   *volatile res = NULL;
+	char fetch_sql[64];
+	char *json;
+
+	table = GetForeignTable(RelationGetRelid(rel));
+	user = GetUserMapping(rel->rd_rel->relowner, table->serverid);
+	conn = GetConnection(user, false);
+
+	cursor_number = GetCursorNumber(conn);
+	initStringInfo(&sql);
+	appendStringInfo(&sql, "DECLARE c%u CURSOR FOR ", cursor_number);
+	deparseGetStatSql(&sql, rel);
+
+	res = pgfdw_exec_query(conn, sql.data);
+	if (PQresultStatus(res) != PGRES_COMMAND_OK)
+		/*
+		 * TODO: need to process situation of missing statistic serializer at the
+		 * remote end.
+		 */
+		pgfdw_report_error(ERROR, res, conn, false, sql.data);
+
+	PQclear(res);
+
+	snprintf(fetch_sql, sizeof(fetch_sql), "FETCH FROM c%u", cursor_number);
+	res = pgfdw_exec_query(conn, fetch_sql);
+	if (PQresultStatus(res) != PGRES_TUPLES_OK)
+		pgfdw_report_error(ERROR, res, conn, false, sql.data);
+
+	Assert(PQntuples(res) == 1);
+	Assert(PQnfields(res) == 1);
+
+	json = pstrdup(PQgetvalue(res, 0, 0));
+	PQclear(res);
+	close_cursor(conn, cursor_number);
+	ReleaseConnection(conn);
+
+	return json;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index eef410db39..bb590bfc21 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -190,6 +190,7 @@ extern void deparseDirectDeleteSql(StringInfo buf, PlannerInfo *root,
 extern void deparseAnalyzeSizeSql(StringInfo buf, Relation rel);
 extern void deparseAnalyzeSql(StringInfo buf, Relation rel,
 							  List **retrieved_attrs);
+extern void deparseGetStatSql(StringInfo buf, Relation rel);
 extern void deparseStringLiteral(StringInfo buf, const char *val);
 extern Expr *find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel);
 extern Expr *find_em_expr_for_input_target(PlannerInfo *root,
diff --git a/contrib/postgres_fdw/sql/foreign_stat.sql b/contrib/postgres_fdw/sql/foreign_stat.sql
new file mode 100644
index 0000000000..1fe1f319e3
--- /dev/null
+++ b/contrib/postgres_fdw/sql/foreign_stat.sql
@@ -0,0 +1,46 @@
+CREATE TABLE ltable (a int, b real);
+CREATE FOREIGN TABLE ftable (a int) server loopback options (table_name 'ltable');
+VACUUM ANALYZE;
+
+-- Check statistic interface routine on an empty table.
+SELECT * FROM extract_relation_statistics('ltable');
+SELECT * FROM extract_relation_statistics('ftable');
+
+-- Check statistic interface routine on non-empty tables.
+INSERT INTO ltable (a, b) (SELECT *, 1.01 FROM generate_series(1, 1E4));
+-- hereinafter we don't want to depend on analyze order. If table will be
+-- analyzed before ltable than we got out-of-date statistic.
+ANALYZE ltable;
+ANALYZE;
+SELECT * FROM extract_relation_statistics('ltable');
+
+-- Check ANALYZE on foreign table
+INSERT INTO ltable (a, b) (SELECT *, 2.01 FROM generate_series(1E4, 2E4));
+ANALYZE ltable;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable;
+ANALYZE;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable;
+
+-- Check selectivity
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 10;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 10;
+
+-- Check new attribute
+ALTER TABLE ltable ADD COLUMN c int DEFAULT 42;
+ALTER TABLE ftable ADD COLUMN c int;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 10 AND c < 15;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 10 AND c < 15;
+
+ANALYZE ltable;
+ANALYZE;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 10 AND c < 15;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 10 AND c < 15;
+
+-- Test default vacuum analyzes foreign relation
+INSERT INTO ltable (a, b) (SELECT *, 2.01 FROM generate_series(2E4, 3E4));
+ANALYZE ltable;
+VACUUM ANALYZE;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ltable WHERE a > 100 AND c < 43;
+EXPLAIN (TIMING OFF, SUMMARY OFF, COSTS ON, ANALYZE) SELECT * FROM ftable WHERE a > 100 AND c < 43;
+
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8af12b5c6b..6177c21681 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -30,6 +30,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/namespace.h"
 #include "catalog/pg_collation.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_namespace.h"
@@ -38,8 +39,10 @@
 #include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
+#include "common/jsonapi.h"
 #include "executor/executor.h"
 #include "foreign/fdwapi.h"
+#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parse_oper.h"
@@ -58,9 +61,11 @@
 #include "utils/datum.h"
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
+#include "utils/json.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/pg_rusage.h"
+#include "utils/regproc.h"
 #include "utils/sampling.h"
 #include "utils/sortsupport.h"
 #include "utils/syscache.h"
@@ -107,6 +112,11 @@ static void update_attstats(Oid relid, bool inh,
 static Datum std_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 static Datum ind_fetch_func(VacAttrStatsP stats, int rownum, bool *isNull);
 
+static const char *relstat_to_json(const char *relname);
+static bool parse_statjson(Relation rel, char *buffer, size_t size,
+						   VacAttrStats **vacattrstats, int *natts,
+						   BlockNumber *relpages, double *reltuples);
+static bool update_foreign_relation_stat(Relation rel);
 
 /*
  *	analyze_rel() -- analyze one relation
@@ -193,6 +203,23 @@ analyze_rel(Oid relid, RangeVar *relation,
 		return;
 	}
 
+	/*
+	 * Use cheap approach to update foreign relation if user do not explicitly
+	 * mentioned this relation.
+	 * If user wants to ANALYZE this particular relation maybe he probably know
+	 * that foreign statistics are also out of date.
+	 */
+	if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE && relation == NULL &&
+		update_foreign_relation_stat(onerel))
+	{
+		/*
+		 * The foreign relation statistics was updated by existed stat tuple
+		 * from remote server.
+		 */
+		relation_close(onerel, ShareUpdateExclusiveLock);
+		return;
+	}
+
 	/*
 	 * Check that it's of an analyzable relkind, and set up appropriately.
 	 */
@@ -1421,6 +1448,773 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 	return numrows;
 }
 
+typedef enum
+{
+	JS_EXPECT_STAVER_VALUE,
+	JS_EXPECT_NSPNAME_VALUE,
+	JS_EXPECT_RELNAME_VALUE,
+	JS_EXPECT_SLOTSNUM_VALUE,
+	JS_EXPECT_RELPAGES_VALUE,
+	JS_EXPECT_RELTUPLES_VALUE,
+	JS_EXPECT_ATTNAME_FIELD,
+	JS_EXPECT_ATTNAME_VALUE,
+	JS_EXPECT_INH_VALUE,
+	JS_EXPECT_NULLFRAC_VALUE,
+	JS_EXPECT_WIDTH_VALUE,
+	JS_EXPECT_DISTINCT_VALUE,
+	JS_EXPECT_KIND_VALUE,
+	JS_EXPECT_STAOP_VALUE,
+	JS_EXPECT_STACOLL_VALUE,
+	JS_EXPECT_ARRSIZE_VALUE,
+	JS_EXPECT_NUMBERS_VALUE,
+	JS_EXPECT_VALUES_VALUE,
+	JS_EXPECT_PARAM_FIELD,
+	JS_INVALID_STAT_VERSION
+} JsonStatSemanticState;
+
+typedef struct JsonStatParseState
+{
+	Relation rel;
+	int natts;
+	int state;
+	int arraysize; /* size of the array */
+	int arrayval; /* current position in the array */
+	int numbersN;
+	int valuesN;
+
+	char *nspname;
+	char *relname;
+
+	BlockNumber relpages;
+	double reltuples;
+	VacAttrStats **vas;
+} JsonStatParseState;
+
+static bool skip_attr = false;
+
+static void
+json_stat_object_field_start(void *state, char *fname, bool isnull)
+{
+	JsonStatParseState *parse = (JsonStatParseState *) state;
+	int attnum = parse->natts - 1;
+
+	/* skip processing if incompatible statistic version was detected */
+	if (parse->state == JS_INVALID_STAT_VERSION)
+		return;
+
+	if (skip_attr && strcmp(fname, "attname") != 0)
+		return;
+
+	Assert(parse->state == JS_EXPECT_PARAM_FIELD ||
+		   parse->state == JS_EXPECT_ATTNAME_FIELD);
+	Assert(parse->arrayval < 0);
+	if (strcmp(fname, "version") == 0)
+		parse->state = JS_EXPECT_STAVER_VALUE;
+	else if (strcmp(fname, "namespace") == 0)
+		parse->state = JS_EXPECT_NSPNAME_VALUE;
+	else if (strcmp(fname, "relname") == 0)
+		parse->state = JS_EXPECT_RELNAME_VALUE;
+	else if (strcmp(fname, "sta_num_slots") == 0)
+		parse->state = JS_EXPECT_SLOTSNUM_VALUE;
+	else if (strcmp(fname, "relpages") == 0)
+		parse->state = JS_EXPECT_RELPAGES_VALUE;
+	else if (strcmp(fname, "reltuples") == 0)
+		parse->state = JS_EXPECT_RELTUPLES_VALUE;
+	else if(strcmp(fname, "attrs") == 0)
+		parse->state = JS_EXPECT_ATTNAME_FIELD;
+	else if (strcmp(fname, "attname") == 0)
+	{
+		parse->state = JS_EXPECT_ATTNAME_VALUE;
+		skip_attr = false;
+	}
+	else if (strcmp(fname, "nullfrac") == 0)
+		parse->state = JS_EXPECT_NULLFRAC_VALUE;
+	else if (strcmp(fname, "inh") == 0)
+		parse->state = JS_EXPECT_INH_VALUE;
+		else if (strcmp(fname, "width") == 0)
+		parse->state = JS_EXPECT_WIDTH_VALUE;
+	else if (strcmp(fname, "distinct") == 0)
+		parse->state = JS_EXPECT_DISTINCT_VALUE;
+	else if (strcmp(fname, "nn") == 0)
+	{
+		Assert(parse->arraysize == 0);
+		parse->state = JS_EXPECT_ARRSIZE_VALUE;
+	}
+	else
+	{
+		parse->arrayval = 0;
+		if (strcmp(fname, "stakind") == 0)
+			parse->state = JS_EXPECT_KIND_VALUE;
+		else if (strcmp(fname, "staop") == 0)
+			parse->state = JS_EXPECT_STAOP_VALUE;
+		else if (strcmp(fname, "stacoll") == 0)
+			parse->state = JS_EXPECT_STACOLL_VALUE;
+		else if (strstr(fname, "numbers") != NULL)
+		{
+			parse->numbersN = atoi(&fname[7]);
+			Assert(parse->numbersN > 0 && parse->numbersN <= 5);
+			if (parse->arraysize > 0)
+				parse->vas[attnum]->stanumbers[parse->numbersN-1] =
+							(float4 *) palloc(parse->arraysize * sizeof(float4));
+			else
+				parse->vas[attnum]->stanumbers[parse->numbersN-1] = NULL;
+
+			parse->vas[attnum]->numnumbers[parse->numbersN-1] = parse->arraysize;
+			parse->state = JS_EXPECT_NUMBERS_VALUE;
+		}
+		else if (strstr(fname, "values") != NULL)
+		{
+			parse->valuesN = atoi(&fname[6]);
+			Assert(parse->valuesN > 0 && parse->valuesN <= 5);
+
+			if (parse->arraysize > 0)
+				parse->vas[parse->natts - 1]->stavalues[parse->valuesN-1] =
+							(Datum *) palloc(parse->arraysize * sizeof(Datum));
+			else
+				parse->vas[parse->natts - 1]->stavalues[parse->valuesN-1] = NULL;
+
+			parse->vas[attnum]->numvalues[parse->valuesN-1] = parse->arraysize;
+			parse->state = JS_EXPECT_VALUES_VALUE;
+		}
+		else
+			elog(ERROR, "Unknown stat parameter in JSON string: %s", fname);
+	}
+}
+
+static void
+json_stat_array_end(void *state)
+{
+	JsonStatParseState *parse = (JsonStatParseState *) state;
+
+	/* skip processing if incompatible statistic version was detected */
+	if (parse->state == JS_INVALID_STAT_VERSION)
+		return;
+
+	parse->arrayval = -1;
+	parse->arraysize = 0;
+	parse->numbersN = 0;
+	parse->valuesN = 0;
+}
+
+static void
+json_stat_array_element_end(void *state, bool isnull)
+{
+	JsonStatParseState *parse = (JsonStatParseState *) state;
+
+	/* skip processing if incompatible statistic version was detected */
+	if (parse->state == JS_INVALID_STAT_VERSION)
+		return;
+
+	Assert(!isnull);
+
+	if (parse->arrayval < 0) // Debug
+		return;
+
+	parse->arrayval++;
+}
+
+static void
+json_stat_field_end(void *state, char *fname, bool isnull)
+{
+	JsonStatParseState *parse = (JsonStatParseState *) state;
+
+	/* skip processing if incompatible statistic version was detected */
+	if (parse->state == JS_INVALID_STAT_VERSION)
+		return;
+
+	Assert(!isnull && parse->arrayval < 0);
+	parse->state = JS_EXPECT_PARAM_FIELD;
+}
+
+#include "utils/varlena.h"
+
+#define NSP_OID(name) \
+	(strcmp(name, " ") == 0) ? LookupExplicitNamespace("pg_catalog", false) : \
+		LookupExplicitNamespace(name, false)
+
+static void
+get_string_values(char *str, char **values, int num)
+{
+	int i;
+	int cnt = 0;
+
+	Assert(num > 0);
+
+	values[0] = &str[0];
+	for (i = 1; str[i] != '\0'; i++)
+	{
+		if (str[i] == '.')
+		{
+			Assert(cnt < num - 1);
+		values[++cnt] = &str[i+1];
+			str[i] = '\0';
+		}
+	}
+
+	Assert(cnt == num - 1);
+}
+
+static void
+json_stat_scalar(void *state, char *token, JsonTokenType tokentype)
+{
+	JsonStatParseState *parse = (JsonStatParseState *) state;
+	Datum value;
+	int attnum = parse->natts - 1;
+
+	/* skip processing if incompatible statistic version was detected */
+	if (parse->state == JS_INVALID_STAT_VERSION)
+		return;
+
+	if (skip_attr)
+		return;
+
+	switch (parse->state)
+	{
+	case JS_EXPECT_STAVER_VALUE:
+	{
+		int version;
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		value = DirectFunctionCall1(int4in, CStringGetDatum(token));
+		version = DatumGetInt32(value);
+		if (version != STATISTIC_VERSION)
+		{
+			parse->state = JS_INVALID_STAT_VERSION;
+			return;
+		}
+	}
+		break;
+
+	case JS_EXPECT_NSPNAME_VALUE:
+		Assert(tokentype == JSON_TOKEN_STRING);
+		parse->nspname = token;
+		break;
+
+	case JS_EXPECT_RELNAME_VALUE:
+		Assert(tokentype == JSON_TOKEN_STRING);
+		parse->relname = token;
+		break;
+
+	case JS_EXPECT_SLOTSNUM_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		if (strcmp(token, "5") != 0)
+			elog(ERROR, "Incompatible PostgreSQL version");
+		break;
+
+	case JS_EXPECT_RELPAGES_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		value = DirectFunctionCall1(int4in, CStringGetDatum(token));
+		parse->relpages = DatumGetInt32(value);
+		break;
+
+	case JS_EXPECT_RELTUPLES_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		value = DirectFunctionCall1(float4in, CStringGetDatum(token));
+		parse->reltuples = DatumGetFloat4(value);
+		break;
+
+	case JS_EXPECT_ATTNAME_FIELD:
+		Assert(0);
+		break;
+
+	case JS_EXPECT_ATTNAME_VALUE:
+	{
+		int i;
+
+		Assert(tokentype == JSON_TOKEN_STRING);
+
+		for (i = 0; i < parse->rel->rd_att->natts; i++)
+		{
+			if (strcmp(NameStr(parse->rel->rd_att->attrs[i].attname), token) == 0)
+				break;
+		}
+
+		if (i == parse->rel->rd_att->natts)
+		{
+			/*
+			 * It is a valid case when a foreign table doesn't have all the
+			 * attributes from the base relation.
+			 */
+			skip_attr = true;
+			parse->state = JS_EXPECT_ATTNAME_FIELD;
+		}
+		else
+		{
+			/* Initialize new storage for the attribute. */
+			parse->natts++;
+			attnum++;
+			parse->vas[attnum] = examine_attribute(parse->rel, i+1, NULL);
+			parse->vas[attnum]->stats_valid = true;
+			Assert(parse->vas[attnum] != NULL);
+		}
+	}
+		break;
+
+	case JS_EXPECT_INH_VALUE:
+		Assert(tokentype == JSON_TOKEN_STRING);
+		/* XXX */
+		break;
+
+	case JS_EXPECT_WIDTH_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		value = DirectFunctionCall1(int4in, CStringGetDatum(token));
+		parse->vas[attnum]->stawidth = DatumGetInt32(value);
+		break;
+
+	case JS_EXPECT_DISTINCT_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		value = DirectFunctionCall1(float4in, CStringGetDatum(token));
+		parse->vas[attnum]->stadistinct = DatumGetFloat4(value);
+		break;
+
+	case JS_EXPECT_NULLFRAC_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER);
+		value = DirectFunctionCall1(float4in, CStringGetDatum(token));
+		parse->vas[attnum]->stanullfrac = DatumGetFloat4(value);
+		break;
+
+	case JS_EXPECT_KIND_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER && parse->arrayval >= 0);
+		value = DirectFunctionCall1(int2in, CStringGetDatum(token));
+		parse->vas[attnum]->stakind[parse->arrayval] = DatumGetInt16(value);
+		break;
+
+	case JS_EXPECT_STAOP_VALUE:
+	{
+		char *str;
+		char *values[6];
+		Oid oprleft;
+		Oid oprright;
+
+		Assert(tokentype == JSON_TOKEN_STRING && parse->arrayval >= 0);
+		Assert(strlen(token) > 0);
+
+		if (strcmp(token, " ") == 0)
+		{
+			/* Quick exit on empty field */
+			parse->vas[attnum]->staop[parse->arrayval] = InvalidOid;
+			break;
+		}
+
+		str = pstrdup(token);
+		get_string_values(str, values, 6);
+
+		if (strcmp(values[2], "NDEFINED") == 0)
+			oprleft = InvalidOid;
+		else
+			oprleft = get_typname_typid(values[3], NSP_OID(values[2]));
+
+		if (strcmp(values[4], "NDEFINED") == 0)
+			oprright = InvalidOid;
+		else
+			oprright = get_typname_typid(values[5], NSP_OID(values[4]));
+
+		parse->vas[attnum]->staop[parse->arrayval] = get_operid(values[1],
+																oprleft,
+																oprright,
+																NSP_OID(values[0]));
+		pfree(str);
+	}
+		break;
+
+	case JS_EXPECT_STACOLL_VALUE:
+	{
+		char *str;
+		char *values[3];
+		int32 collencoding;
+		Oid collnsp;
+
+		Assert(tokentype == JSON_TOKEN_STRING && parse->arrayval >= 0);
+		Assert(strlen(token) > 0);
+
+		if (strcmp(token, " ") == 0)
+		{
+			/* Quick exit on empty field */
+			parse->vas[attnum]->stacoll[parse->arrayval] = InvalidOid;
+			break;
+		}
+
+		str = pstrdup(token);
+		get_string_values(str, values, 3);
+
+		collnsp = NSP_OID(values[0]);
+		sscanf(values[2], "%d", &collencoding);
+		parse->vas[attnum]->stacoll[parse->arrayval] =
+								get_colloid(values[1], collencoding, collnsp);
+		pfree(str);
+	}
+		break;
+
+	case JS_EXPECT_ARRSIZE_VALUE:
+		Assert(tokentype == JSON_TOKEN_NUMBER && parse->arrayval < 0);
+		value = DirectFunctionCall1(int4in, CStringGetDatum(token));
+		parse->arraysize = DatumGetInt32(value);
+		Assert(parse->arraysize > 0);
+		break;
+
+	case JS_EXPECT_NUMBERS_VALUE:
+	{
+		int n = parse->numbersN;
+		int m = parse->arrayval;
+
+		Assert(parse->valuesN == 0);
+		Assert(tokentype == JSON_TOKEN_NUMBER && n > 0 && n <= 5);
+		Assert(m >= 0 && m < parse->arraysize);
+
+		value = DirectFunctionCall1(float4in, CStringGetDatum(token));
+		parse->vas[attnum]->stanumbers[n-1][m] = DatumGetFloat4(value);
+	}
+		break;
+
+	case JS_EXPECT_VALUES_VALUE:
+	{
+		int n = parse->valuesN;
+		int m = parse->arrayval;
+		Form_pg_attribute att = parse->vas[attnum]->attr;
+		Oid			typinput;
+		Oid			typioparam;
+
+		Assert(parse->numbersN == 0);
+		Assert(tokentype == JSON_TOKEN_NUMBER && n > 0 && n <= 5);
+		Assert(m >= 0 && m < parse->arraysize);
+
+		getTypeInputInfo(att->atttypid, &typinput, &typioparam);
+		parse->vas[attnum]->stavalues[n-1][m] =
+			OidInputFunctionCall(typinput, token,
+								 typioparam, att->atttypmod);
+	}
+		break;
+
+	default:
+		elog(ERROR, "Unexpected token type: %d. Token: %s. State: %d.",
+			 tokentype, token, parse->state);
+	}
+}
+
+/*
+ * Extract statistics from the JSON string.
+ */
+static bool
+parse_statjson(Relation rel, char *buffer, size_t size,
+			   VacAttrStats **vacattrstats, int *natts,
+			   BlockNumber *relpages, double *reltuples)
+{
+	JsonLexContext *lex;
+	JsonSemAction sem;
+	JsonStatParseState parse;
+
+	/* Set up our private parsing context. */
+	parse.state = JS_EXPECT_PARAM_FIELD;
+	parse.arraysize = -1;
+	parse.arrayval = -1;
+	parse.numbersN = -1;
+	parse.vas = vacattrstats;
+	parse.natts = 0;
+	parse.rel = rel;
+	skip_attr = false;
+
+	/* Create a JSON lexing context. */
+	lex = makeJsonLexContextCstringLen(buffer, size, PG_UTF8, true);
+
+	/* Set up semantic actions. */
+	sem.semstate = &parse;
+	sem.object_start = NULL;
+	sem.object_end = NULL;
+	sem.array_start = NULL;
+	sem.array_end = json_stat_array_end;
+	sem.object_field_start = json_stat_object_field_start;
+	sem.object_field_end = json_stat_field_end;
+	sem.array_element_start = NULL;
+	sem.array_element_end = json_stat_array_element_end;
+	sem.scalar = json_stat_scalar;
+
+	/* Run the actual JSON parser. */
+	pg_parse_json(lex, &sem);
+
+	/* Special case if we can't parse this serialized statistic */
+	if (parse.state == JS_INVALID_STAT_VERSION)
+		return false;
+
+	*natts = parse.natts;
+	*relpages = parse.relpages;
+	*reltuples = parse.reltuples;
+	return true;
+}
+
+static bool
+update_foreign_relation_stat(Relation rel)
+{
+	VacAttrStats **vacattrstats = (VacAttrStats **)
+					palloc0(rel->rd_att->natts * sizeof(VacAttrStats *));
+	char *statstr;
+	int natts = 0;
+	FdwRoutine *fdwroutine;
+	BlockNumber relallvisible;
+	BlockNumber numpages;
+	double numtuples;
+
+	fdwroutine = GetFdwRoutineForRelation(rel, false);
+	Assert(fdwroutine != NULL);
+
+	if (fdwroutine->GetForeignRelStat == NULL || strcmp(NameStr(rel->rd_rel->relname), "ftable") != 0)
+		return false;
+
+	statstr = fdwroutine->GetForeignRelStat(rel);
+
+	if (!statstr || !parse_statjson(rel, statstr, strlen(statstr),
+									vacattrstats, &natts, &numpages,
+									&numtuples))
+		/*
+		 * Return false if we can't get statistics for the relation or can't parse
+		 * it by any reason.
+		 */
+		return false;
+
+	update_attstats(RelationGetRelid(rel), false, natts, vacattrstats);
+
+	visibilitymap_count(rel, &relallvisible, NULL);
+	vac_update_relstats(rel, numpages, numtuples, relallvisible, false,
+						InvalidTransactionId, InvalidMultiXactId, false);
+
+	pfree(vacattrstats);
+	return true;
+}
+
+/*
+ * Get stat tuples for the attributes of relation
+ * See row_to_json()
+ */
+Datum
+extract_relation_statistics(PG_FUNCTION_ARGS)
+{
+	const char *relname = text_to_cstring(PG_GETARG_TEXT_PP(0));
+	const char *json_stat;
+
+	json_stat = relstat_to_json(relname);
+
+	PG_RETURN_TEXT_P(cstring_to_text(json_stat));
+}
+
+static const char *
+nsp_name(Oid oid)
+{
+	const char *nspname;
+
+	if (!OidIsValid(oid))
+		return " ";
+
+	if (isTempNamespace(oid))
+		return "pg_temp";
+
+	nspname = get_namespace_name(oid);
+	if (strcmp(nspname, "pg_catalog") == 0)
+		return " ";
+
+	return nspname;
+}
+/*
+ * relstat_to_json
+ *
+ */
+static const char *
+relstat_to_json(const char *relname)
+{
+	RangeVar	*relvar;
+	Relation	rel;
+	Relation	sd;
+	List		*relname_list;
+	int			attno;
+	StringInfo	str = makeStringInfo();
+	int			attnum = 0;
+
+	relname_list = stringToQualifiedNameList(relname);
+	relvar = makeRangeVarFromNameList(relname_list);
+	rel = relation_openrv(relvar, AccessShareLock);
+
+	if (rel->rd_rel->relkind != RELKIND_RELATION)
+	{
+		relation_close(rel, AccessShareLock);
+		elog(ERROR,
+			 "Can be used for ordinary relation only. Reltype: %c.",
+			 rel->rd_rel->relkind);
+	}
+
+	/* JSON header of overall relation statistics. */
+	appendStringInfoString(str, "{");
+	appendStringInfo(str, "\"version\" : %u, ", STATISTIC_VERSION);
+	appendStringInfo(str, "\"namespace\" : \"%s\", \"relname\" : \"%s\", \"sta_num_slots\" : %d, ",
+					 get_namespace_name(rel->rd_rel->relnamespace),
+					 NameStr(rel->rd_rel->relname), STATISTIC_NUM_SLOTS);
+
+	appendStringInfo(str, "\"relpages\" : %u, \"reltuples\" : %f, ",
+					 rel->rd_rel->relpages,
+					 rel->rd_rel->reltuples);
+
+	appendStringInfo(str, "\"attrs\" : [");
+	sd = table_open(StatisticRelationId, RowExclusiveLock);
+
+	for (attno = 0; attno < rel->rd_att->natts; attno++)
+	{
+		HeapTuple	stup;
+		Datum		values[Natts_pg_statistic];
+		bool		nulls[Natts_pg_statistic];
+		int			i, k;
+
+		stup = SearchSysCache3(STATRELATTINH,
+							   ObjectIdGetDatum(RelationGetRelid(rel)),
+							   Int16GetDatum(rel->rd_att->attrs[attno].attnum),
+							   BoolGetDatum(false));
+
+		if (!HeapTupleIsValid(stup))
+			/* Go to the next attribute, if we haven't statistics for. */
+			continue;
+
+		if (attnum++ > 0)
+			appendStringInfoString(str, ", ");
+
+		heap_deform_tuple(stup, RelationGetDescr(sd), values, nulls);
+
+		/* JSON header of attrribute statistics. */
+		appendStringInfo(str, "{\"attname\" : \"%s\", \"inh\" : \"%s\", \"nullfrac\" : %f, \"width\" : %d, \"distinct\" : %f, ",
+			NameStr(*attnumAttName(rel, rel->rd_att->attrs[attno].attnum)),
+			DatumGetBool(values[Anum_pg_statistic_stainherit - 1]) ? "true" : "false",
+			DatumGetFloat4(values[Anum_pg_statistic_stanullfrac - 1]),
+			DatumGetInt32(values[Anum_pg_statistic_stawidth - 1]),
+			DatumGetFloat4(values[Anum_pg_statistic_stadistinct - 1]));
+
+		appendStringInfoString(str, "\"stakind\" : [");
+		i = Anum_pg_statistic_stakind1 - 1;
+		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
+		{
+			appendStringInfo(str, "%d", DatumGetInt16(values[i++]));
+			if (k < STATISTIC_NUM_SLOTS - 1)
+				appendStringInfo(str, ",");
+		}
+		appendStringInfoString(str, "], \"staop\" : [");
+
+		i = Anum_pg_statistic_staop1 - 1;
+		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
+		{
+			Oid opOid = DatumGetObjectId(values[i++]);
+
+			/* Prepare portable representation of the oid */
+			if (OidIsValid(opOid))
+			{
+				Oid lefttype;
+				Oid righttype;
+				const char *ltypname;
+				const char *rtypname;
+
+				op_input_types(opOid, &lefttype, &righttype);
+
+				if (!OidIsValid(lefttype))
+					ltypname = "NDEFINED";
+				else if ((ltypname = get_typ_name(lefttype)) == NULL)
+					elog(ERROR, "Type name search failed for oid %d.", lefttype);
+
+				if (!OidIsValid(righttype))
+					rtypname = "NDEFINED";
+				else if ((rtypname = get_typ_name(righttype)) == NULL)
+					elog(ERROR, "Type name search failed for oid %d.", righttype);
+
+				appendStringInfo(str, "\"%s.%s.%s.%s.%s.%s\"",
+								 nsp_name(get_opnamespace(opOid)),
+								 get_opname(opOid),
+								 nsp_name(get_typ_namespace(lefttype)),
+								 ltypname,
+								 nsp_name(get_typ_namespace(righttype)),
+								 rtypname);
+			}
+			else
+				appendStringInfoString(str, "\" \"");
+
+			if (k < STATISTIC_NUM_SLOTS - 1)
+				appendStringInfoChar(str, ',');
+		}
+		appendStringInfoString(str, "], \"stacoll\" : [");
+
+		i = Anum_pg_statistic_stacoll1 - 1;
+		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
+		{
+			Oid collOid = DatumGetObjectId(values[i++]);
+
+			/* Prepare portable representation of the collation */
+			if (OidIsValid(collOid))
+				appendStringInfo(str, "\"%s.%s.%d\"",
+								 nsp_name(get_opnamespace(collOid)),
+								 get_collation_name(collOid),
+								  get_collation_encoding(collOid));
+			else
+				appendStringInfoString(str, "\" \"");
+
+			if (k < STATISTIC_NUM_SLOTS - 1)
+				appendStringInfoChar(str, ',');
+		}
+
+		appendStringInfoString(str, "], ");
+
+		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
+		{
+			bool		isnull;
+			Datum		val;
+
+			val = SysCacheGetAttr(STATRELATTINH, stup,
+							  Anum_pg_statistic_stavalues1 + k,
+							  &isnull);
+
+			if (isnull)
+				appendStringInfo(str, "\"values%d\" : [ ]", k+1);
+			else
+			{
+				ArrayType *v = DatumGetArrayTypeP(val);
+
+				appendStringInfo(str, "\"nn\" : %d, ",
+								 ArrayGetNItems(ARR_NDIM(v), ARR_DIMS(v)));
+				appendStringInfo(str, "\"values%d\" : ", k+1);
+				arr_to_json(val, str);
+			}
+			appendStringInfoString(str, ", ");
+		}
+
+		/* --- Extract numbers --- */
+		for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
+		{
+			bool		isnull;
+			Datum		val;
+
+			if (k > 0)
+				appendStringInfoString(str, ", ");
+
+			val = SysCacheGetAttr(STATRELATTINH, stup,
+							  Anum_pg_statistic_stanumbers1 + k,
+							  &isnull);
+			if (isnull)
+				appendStringInfo(str, "\"numbers%d\" : [ ]", k+1);
+			else
+			{
+				ArrayType *v = DatumGetArrayTypeP(val);
+
+				appendStringInfo(str, "\"nn\" : %d, ",
+								 ArrayGetNItems(ARR_NDIM(v), ARR_DIMS(v)));
+				appendStringInfo(str, "\"numbers%d\" : ", k+1);
+				arr_to_json(val, str);
+			}
+		}
+
+		appendStringInfoString(str, "}");
+		ReleaseSysCache(stup);
+	}
+
+	if (attnum == 0)
+		appendStringInfoString(str, " ]");
+	else
+		appendStringInfoChar(str, ']');
+	appendStringInfoString(str, "}");
+
+	table_close(sd, RowExclusiveLock);
+	relation_close(rel, AccessShareLock);
+
+	return str->data;
+}
 
 /*
  *	update_attstats() -- update attribute statistics for one relation
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index ddeec870d8..d54a2d51ee 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -882,6 +882,7 @@ get_all_vacuum_rels(int options)
 		 * them.
 		 */
 		if (classForm->relkind != RELKIND_RELATION &&
+			classForm->relkind != RELKIND_FOREIGN_TABLE &&
 			classForm->relkind != RELKIND_MATVIEW &&
 			classForm->relkind != RELKIND_PARTITIONED_TABLE)
 			continue;
@@ -1818,7 +1819,17 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 	/*
 	 * Check that it's of a vacuumable relkind.
 	 */
-	if (onerel->rd_rel->relkind != RELKIND_RELATION &&
+	if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	{
+		relation_close(onerel, lmode);
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+		if (params->options & VACOPT_ANALYZE)
+			return true;
+		else
+			return false;
+	}
+	else if (onerel->rd_rel->relkind != RELKIND_RELATION &&
 		onerel->rd_rel->relkind != RELKIND_MATVIEW &&
 		onerel->rd_rel->relkind != RELKIND_TOASTVALUE &&
 		onerel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index a7a91b72f6..adea67a18d 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -656,6 +656,12 @@ add_json(Datum val, bool is_null, StringInfo result,
 	datum_to_json(val, is_null, result, tcategory, outfuncoid, key_scalar);
 }
 
+void
+arr_to_json(Datum array, StringInfo result)
+{
+	array_to_json_internal(array, result, false);
+}
+
 /*
  * SQL function array_to_json(row)
  */
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index f3bf413829..4b49a0b915 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -1018,6 +1018,31 @@ get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok)
 	return oid;
 }
 
+/*
+ * get_collation_namespace
+ *		Returns the namespace id of a given pg_collation entry.
+ *
+ * Returns an Oid of the collation's namespace.
+ */
+Oid
+get_collation_namespace(Oid colloid)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(colloid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_collation colltup = (Form_pg_collation) GETSTRUCT(tp);
+		Oid 		result;
+
+		result = colltup->collnamespace;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return InvalidOid;
+}
+
 /*				---------- COLLATION CACHE ----------					 */
 
 /*
@@ -1048,6 +1073,47 @@ get_collation_name(Oid colloid)
 		return NULL;
 }
 
+/*
+ * get_collation_encoding
+ *		Returns the encoding of a given pg_collation entry.
+ *
+ * Returns the collation's encoding, or -1 if entry does not exist.
+ */
+int32
+get_collation_encoding(Oid colloid)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(COLLOID, ObjectIdGetDatum(colloid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_collation colltup = (Form_pg_collation) GETSTRUCT(tp);
+		int32 		result;
+
+		result = colltup->collencoding;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return -1;
+}
+
+/*
+ * get_colloid
+ *	  Given a collation name, encoding and namespace OID, look up
+ * the collation OID.
+ *
+ * Returns InvalidOid if there is no such collation
+ */
+Oid
+get_colloid(const char *collname, int32 collencoding, Oid collnsp)
+{
+	return GetSysCacheOid3(COLLNAMEENCNSP, Anum_pg_collation_oid,
+						  CStringGetDatum(collname),
+						  Int32GetDatum(collencoding),
+						  ObjectIdGetDatum(collnsp));
+}
+
 bool
 get_collation_isdeterministic(Oid colloid)
 {
@@ -3393,3 +3459,104 @@ get_index_isclustered(Oid index_oid)
 
 	return isclustered;
 }
+
+/*
+ * get_typ_name
+ *
+ *		Given the type OID, find the type name
+ *		It returns palloc'd copy of the name or NULL if the cache lookup fails.
+ */
+char *
+get_typ_name(Oid typid)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_type typtup = (Form_pg_type) GETSTRUCT(tp);
+		char	   *result;
+
+		result = pstrdup(NameStr(typtup->typname));
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return NULL;
+}
+
+/*
+ * get_typ_namespace
+ *
+ *		Given the type OID, find the namespace
+ *		It returns InvalidOid if the cache lookup fails.
+ */
+Oid
+get_typ_namespace(Oid typid)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_type typtup = (Form_pg_type) GETSTRUCT(tp);
+		Oid			result;
+
+		result = typtup->typnamespace;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return InvalidOid;
+}
+
+/*
+ * get_opnamespace
+ *	  Given an opno, find the namespace
+ *
+ * Returns InvalidOid if there is no such operator
+ */
+Oid
+get_opnamespace(Oid opno)
+{
+	HeapTuple	tp;
+
+	tp = SearchSysCache1(OPEROID, ObjectIdGetDatum(opno));
+	if (HeapTupleIsValid(tp))
+	{
+		Form_pg_operator optup = (Form_pg_operator) GETSTRUCT(tp);
+		Oid				result;
+
+		result = optup->oprnamespace;
+		ReleaseSysCache(tp);
+		return result;
+	}
+	else
+		return InvalidOid;
+}
+
+/*
+ * get_operid
+ *	  Given an operator name, argument types and namespace OID, look up
+ * the operator OID.
+ *
+ * Returns InvalidOid if there is no such operator
+ */
+Oid
+get_operid(const char *oprname, Oid oprleft, Oid oprright, Oid oprnsp)
+{
+	return GetSysCacheOid4(OPERNAMENSP,
+						  Anum_pg_operator_oid,
+						  CStringGetDatum(oprname),
+						  ObjectIdGetDatum(oprleft),
+						  ObjectIdGetDatum(oprright),
+						  ObjectIdGetDatum(oprnsp));
+}
+
+Oid
+get_typname_typid(const char *typname, Oid typnamespace)
+{
+	return GetSysCacheOid2(TYPENAMENSP, Anum_pg_type_oid,
+						  CStringGetDatum(typname),
+						  ObjectIdGetDatum(typnamespace));
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 687509ba92..ce4d94d7d4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8383,6 +8383,9 @@
 { oid => '3156', descr => 'map row to json with optional pretty printing',
   proname => 'row_to_json', provolatile => 's', prorettype => 'json',
   proargtypes => 'record bool', prosrc => 'row_to_json_pretty' },
+{ oid => '10001', descr => 'map relation statistics to json',
+  proname => 'extract_relation_statistics', provolatile => 's', prorettype => 'json',
+  proargtypes => 'text', prosrc => 'extract_relation_statistics' },
 { oid => '3173', descr => 'json aggregate transition function',
   proname => 'json_agg_transfn', proisstrict => 'f', provolatile => 's',
   prorettype => 'internal', proargtypes => 'internal anyelement',
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 40a7260165..6e48a27d6c 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -124,6 +124,7 @@ CATALOG(pg_statistic,2619,StatisticRelationId)
 } FormData_pg_statistic;
 
 #define STATISTIC_NUM_SLOTS  5
+#define STATISTIC_VERSION	 1
 
 
 /* ----------------
diff --git a/src/include/foreign/fdwapi.h b/src/include/foreign/fdwapi.h
index 95556dfb15..51979ab3bf 100644
--- a/src/include/foreign/fdwapi.h
+++ b/src/include/foreign/fdwapi.h
@@ -169,6 +169,7 @@ typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root,
 typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root,
 															List *fdw_private,
 															RelOptInfo *child_rel);
+typedef char *(*GetForeignRelStat_function) (Relation rel);
 
 /*
  * FdwRoutine is the struct returned by a foreign-data wrapper's handler
@@ -246,6 +247,7 @@ typedef struct FdwRoutine
 
 	/* Support functions for path reparameterization. */
 	ReparameterizeForeignPathByChild_function ReparameterizeForeignPathByChild;
+	GetForeignRelStat_function GetForeignRelStat;
 } FdwRoutine;
 
 
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
index 4345fbdc31..a6b51bf191 100644
--- a/src/include/utils/json.h
+++ b/src/include/utils/json.h
@@ -20,5 +20,6 @@
 extern void escape_json(StringInfo buf, const char *str);
 extern char *JsonEncodeDateTime(char *buf, Datum value, Oid typid,
 								const int *tzp);
+extern void arr_to_json(Datum array, StringInfo result);
 
 #endif							/* JSON_H */
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index fecfe1f4f6..b8120a754a 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -93,7 +93,10 @@ extern void get_atttypetypmodcoll(Oid relid, AttrNumber attnum,
 								  Oid *typid, int32 *typmod, Oid *collid);
 extern Datum get_attoptions(Oid relid, int16 attnum);
 extern Oid	get_cast_oid(Oid sourcetypeid, Oid targettypeid, bool missing_ok);
+extern Oid get_collation_namespace(Oid colloid);
 extern char *get_collation_name(Oid colloid);
+extern int32 get_collation_encoding(Oid colloid);
+extern Oid get_colloid(const char *collname, int32 collencoding, Oid collnsp);
 extern bool get_collation_isdeterministic(Oid colloid);
 extern char *get_constraint_name(Oid conoid);
 extern char *get_language_name(Oid langoid, bool missing_ok);
@@ -187,6 +190,11 @@ extern Oid	get_index_column_opclass(Oid index_oid, int attno);
 extern bool get_index_isreplident(Oid index_oid);
 extern bool get_index_isvalid(Oid index_oid);
 extern bool get_index_isclustered(Oid index_oid);
+extern char *get_typ_name(Oid typid);
+extern Oid	get_typ_namespace(Oid typid);
+extern Oid get_opnamespace(Oid opno);
+extern Oid get_operid(const char *oprname, Oid oprleft, Oid oprright, Oid oprnsp);
+extern Oid get_typname_typid(const char *typname, Oid typnamespace);
 
 #define type_is_array(typid)  (get_element_type(typid) != InvalidOid)
 /* type_is_array_domain accepts both plain arrays and domains over arrays */
-- 
2.25.1

#28Ashutosh Bapat
ashutosh.bapat@2ndquadrant.com
In reply to: Andrey V. Lepikhov (#27)
Re: Ideas about a better API for postgres_fdw remote estimates

On Thu, 3 Sep 2020 at 10:44, Andrey V. Lepikhov <a.lepikhov@postgrespro.ru>
wrote:

On 8/31/20 6:19 PM, Ashutosh Bapat wrote:

On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Thanks for this helpful feedback.

I think the patch has some other problems like it works only for
regular tables on foreign server but a foreign table can be pointing
to any relation like a materialized view, partitioned table or a
foreign table on the foreign server all of which have statistics
associated with them. I didn't look closely but it does not consider
that the foreign table may not have all the columns from the relation
on the foreign server or may have different names. But I think those
problems are kind of secondary. We have to agree on the design first.

In accordance with discussion i made some changes in the patch:
1. The extract statistic routine moved into the core.

Bulk of the patch implements the statistics conversion to and fro json
format. I am still not sure whether we need all of that code here. Can we
re-use pg_stats view? That is converting some of the OIDs to names. I agree
with anyarray but if that's a problem here it's also a problem for pg_stats
view, isn't it? If we can reduce the stats handling code to a minimum or
use it for some other purpose as well e.g. pg_stats enhancement, the code
changes required will be far less compared to the value that this patch
provides.

--
Best Wishes,
Ashutosh

#29Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andrey V. Lepikhov (#27)
Re: Ideas about a better API for postgres_fdw remote estimates

On Thu, Sep 03, 2020 at 10:14:41AM +0500, Andrey V. Lepikhov wrote:

On 8/31/20 6:19 PM, Ashutosh Bapat wrote:

On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru> wrote:

Thanks for this helpful feedback.

I think the patch has some other problems like it works only for
regular tables on foreign server but a foreign table can be pointing
to any relation like a materialized view, partitioned table or a
foreign table on the foreign server all of which have statistics
associated with them. I didn't look closely but it does not consider
that the foreign table may not have all the columns from the relation
on the foreign server or may have different names. But I think those
problems are kind of secondary. We have to agree on the design first.

In accordance with discussion i made some changes in the patch:
1. The extract statistic routine moved into the core.
2. Serialized stat contains 'version' field to indicate format of
statistic received.
3. ANALYZE and VACUUM ANALYZE uses this approach only in the case of
implicit analysis of the relation.

I am currently keeping limitation of using the approach for regular
relations only, because i haven't studied the specifics of another
types of relations.
But I don't know any reason to keep this limit in the future.

The patch in attachment is very raw. I publish for further substantive
discussion.

Thanks for working on this. I briefly looked at the patch today, and I
have some comments/feedback:

1) I wonder why deparseGetStatSql looks so different from e.g.
deparseAnalyzeSizeSql - no deparseStringLiteral on relname, no cast to
pg_catalog.regclass, function name not qualified with pg_catalog, ...

2) I'm a bit annoyed by the amount of code added to analyze.c only to
support output/input in JSON format. I'm no expert, but I don't recall
explain needing this much new stuff (OTOH it just produces json, it does
not need to read it). Maybe we also need to process wider range of data
types here. But the code is almost perfectly undocumented :-(

3) Why do we need to change vacuum_rel this way?

4) I wonder if we actually want/need to simply output pg_statistic data
verbatim like this. Is postgres_fdw actually going to benefit from it? I
kinda doubt that, and my assumption was that we'd return only a small
subset of the data, needed by get_remote_estimate.

This has a couple of issues. Firstly, it requires the knowledge of what
the stakind constants in pg_statistic mean and how to interpret it - but
OK, it's true that does not change very often (or at all). Secondly, it
entirely ignores extended statistics - OK, we might extract those too,
but it's going to be much more complex. And finally it entirely ignores
costing on the remote node. Surely we can't just apply local
random_page_cost or whatever, because those may be entirely different.
And we don't know if the remote is going to use index etc.

So is extracting data from pg_statistic the right approach?

5) I doubt it's enough to support relnames - we also need to estimate
joins, so this needs to support plain queries I think. At least that's
what Tom envisioned in his postgres_fdw_support(query text) proposal.

6) I see you've included a version number in the data - why not to just
check

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#30Andrey V. Lepikhov
a.lepikhov@postgrespro.ru
In reply to: Ashutosh Bapat (#28)
Re: Ideas about a better API for postgres_fdw remote estimates

On 9/4/20 6:23 PM, Ashutosh Bapat wrote:

On Thu, 3 Sep 2020 at 10:44, Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:

On 8/31/20 6:19 PM, Ashutosh Bapat wrote:

On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov
<a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:

Thanks for this helpful feedback.

I think the patch has some other problems like it works only for
regular tables on foreign server but a foreign table can be pointing
to any relation like a materialized view, partitioned table or a
foreign table on the foreign server all of which have statistics
associated with them. I didn't look closely but it does not consider
that the foreign table may not have all the columns from the relation
on the foreign server or may have different names. But I think those
problems are kind of secondary. We have to agree on the design first.

In accordance with discussion i made some changes in the patch:
1. The extract statistic routine moved into the core.

Bulk of the patch implements the statistics conversion to and fro json
format. I am still not sure whether we need all of that code here.

Yes, i'm sure we'll replace it with something.

Right now, i want to discuss format of statistics dump. Remind, that a
statistics dump is needed not only for fdw, but it need for the pg_dump.
And in the dump will be placed something like this:
'SELECT store_relation_statistics(rel, serialized_stat)'

my reasons for using JSON:
* it have conversion infrastructure like json_build_object()
* this is flexible readable format, that can be useful in text dumps of
relations.

Can we re-use pg_stats view? That is converting some of the OIDs to names. I
agree with anyarray but if that's a problem here it's also a problem for
pg_stats view, isn't it?

Right now, I don't know if it is possible to unambiguously convert the
pg_stats information to a pg_statistic tuple.

If we can reduce the stats handling code to a
minimum or use it for some other purpose as well e.g. pg_stats
enhancement, the code changes required will be far less compared to the
value that this patch provides.

+1

--
regards,
Andrey Lepikhov
Postgres Professional

#31Ashutosh Bapat
ashutosh.bapat@2ndquadrant.com
In reply to: Tomas Vondra (#29)
Re: Ideas about a better API for postgres_fdw remote estimates

On Fri, 4 Sep 2020 at 20:27, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote

4) I wonder if we actually want/need to simply output pg_statistic data
verbatim like this. Is postgres_fdw actually going to benefit from it? I
kinda doubt that, and my assumption was that we'd return only a small
subset of the data, needed by get_remote_estimate.

This has a couple of issues. Firstly, it requires the knowledge of what
the stakind constants in pg_statistic mean and how to interpret it - but
OK, it's true that does not change very often (or at all). Secondly, it
entirely ignores extended statistics - OK, we might extract those too,
but it's going to be much more complex. And finally it entirely ignores
costing on the remote node. Surely we can't just apply local
random_page_cost or whatever, because those may be entirely different.
And we don't know if the remote is going to use index etc.

So is extracting data from pg_statistic the right approach?

There are two different problems, which ultimately might converge.
1. If use_remote_estimates = false, more generally if querying costs from
foreign server for costing paths is impractical, we want to use local
estimates and try to come up with costs. For that purpose we keep some
statistics locally and user is expected to refresh it periodically by
running ANALYZE on the foreign table. This patch is about a. doing this
efficiently without requiring to fetch every row from the foreign server b.
through autovacuum automatically without user firing ANALYZE. I think this
also answers your question about vacuum_rel() above.

2. How to efficiently extract costs from an EXPLAIN plan when
use_remote_eestimates is true. That's the subject of some nearby thread. I
think you are referring to that problem here. Hence your next point.

Using EXPLAIN to get costs from the foreign server isn't efficient. It
increases planning time a lot; sometimes planning time exceeds execution
time. If usage of foreign tables becomes more and more common, this isn't
ideal. I think we should move towards a model in which the optimizer can
decide whether a subtree involving a foreign server should be evaluated
locally or on the foreign server without the help of foreign server. One
way to do it (I am not saying that this is the only or the best way) is to
estimate the cost of foreign query locally based on the information
available locally about the foreign server and foreign table. This might
mean that we have to get that information from the foreign server and cache
it locally and use it several times, including the indexes on foreign
table, values of various costs etc. Though this approach doesn't solve all
of those problems it's one step forward + it makes the current scenario
also efficient.

I agree that the patch needs some work though, esp the code dealing with
serialization and deserialization of statistics.
--
Best Wishes,
Ashutosh

#32Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Ashutosh Bapat (#31)
Re: Ideas about a better API for postgres_fdw remote estimates

On Tue, Sep 08, 2020 at 05:55:09PM +0530, Ashutosh Bapat wrote:

On Fri, 4 Sep 2020 at 20:27, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote

4) I wonder if we actually want/need to simply output pg_statistic data
verbatim like this. Is postgres_fdw actually going to benefit from it? I
kinda doubt that, and my assumption was that we'd return only a small
subset of the data, needed by get_remote_estimate.

This has a couple of issues. Firstly, it requires the knowledge of what
the stakind constants in pg_statistic mean and how to interpret it - but
OK, it's true that does not change very often (or at all). Secondly, it
entirely ignores extended statistics - OK, we might extract those too,
but it's going to be much more complex. And finally it entirely ignores
costing on the remote node. Surely we can't just apply local
random_page_cost or whatever, because those may be entirely different.
And we don't know if the remote is going to use index etc.

So is extracting data from pg_statistic the right approach?

There are two different problems, which ultimately might converge.
1. If use_remote_estimates = false, more generally if querying costs from
foreign server for costing paths is impractical, we want to use local
estimates and try to come up with costs. For that purpose we keep some
statistics locally and user is expected to refresh it periodically by
running ANALYZE on the foreign table. This patch is about a. doing this
efficiently without requiring to fetch every row from the foreign server b.
through autovacuum automatically without user firing ANALYZE. I think this
also answers your question about vacuum_rel() above.

2. How to efficiently extract costs from an EXPLAIN plan when
use_remote_eestimates is true. That's the subject of some nearby thread. I
think you are referring to that problem here. Hence your next point.

I think that was the topic of *this* thread as started by Tom, but I now
realize Andrey steered it in the direction to allow re-using remote
stats. Which seems useful too, but it confused me a bit.

Using EXPLAIN to get costs from the foreign server isn't efficient. It
increases planning time a lot; sometimes planning time exceeds execution
time. If usage of foreign tables becomes more and more common, this isn't
ideal. I think we should move towards a model in which the optimizer can
decide whether a subtree involving a foreign server should be evaluated
locally or on the foreign server without the help of foreign server. One
way to do it (I am not saying that this is the only or the best way) is to
estimate the cost of foreign query locally based on the information
available locally about the foreign server and foreign table. This might
mean that we have to get that information from the foreign server and cache
it locally and use it several times, including the indexes on foreign
table, values of various costs etc. Though this approach doesn't solve all
of those problems it's one step forward + it makes the current scenario
also efficient.

True, but that ptoject is way more ambitious than providing a simple API
for postgres_fdw to obtain the estimates more efficiently.

I agree that the patch needs some work though, esp the code dealing with
serialization and deserialization of statistics.

I think there's a bunch of open questions, e.g. what to do with extended
statistics - for example what should happen when the extended statistics
object is defined only on local/remote server, or when the definitions
don't match? What should happen when the definitions don't match? This
probably is not an issue for "regular" stats, because that seems pretty
stable, but for extended stats there are differences between versions.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#33Ashutosh Bapat
ashutosh.bapat@2ndquadrant.com
In reply to: Tomas Vondra (#32)
Re: Ideas about a better API for postgres_fdw remote estimates

On Wed, 9 Sep 2020 at 02:35, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote

I think that was the topic of *this* thread as started by Tom, but I now
realize Andrey steered it in the direction to allow re-using remote
stats. Which seems useful too, but it confused me a bit.

I didn't realize that the nearby thread I am mentioning is actually this
thread :). Sorry.

Using EXPLAIN to get costs from the foreign server isn't efficient. It
increases planning time a lot; sometimes planning time exceeds execution
time. If usage of foreign tables becomes more and more common, this isn't
ideal. I think we should move towards a model in which the optimizer can
decide whether a subtree involving a foreign server should be evaluated
locally or on the foreign server without the help of foreign server. One
way to do it (I am not saying that this is the only or the best way) is to
estimate the cost of foreign query locally based on the information
available locally about the foreign server and foreign table. This might
mean that we have to get that information from the foreign server and

cache

it locally and use it several times, including the indexes on foreign
table, values of various costs etc. Though this approach doesn't solve all
of those problems it's one step forward + it makes the current scenario
also efficient.

True, but that ptoject is way more ambitious than providing a simple API
for postgres_fdw to obtain the estimates more efficiently.

Doing all of that is a big project. But what this patch aims at is a small
subset which makes statistics collection efficient and automatic. So, just
for that, we should consider it.

I agree that the patch needs some work though, esp the code dealing with
serialization and deserialization of statistics.

I think there's a bunch of open questions, e.g. what to do with extended
statistics - for example what should happen when the extended statistics
object is defined only on local/remote server, or when the definitions
don't match? What should happen when the definitions don't match? This
probably is not an issue for "regular" stats, because that seems pretty
stable, but for extended stats there are differences between versions.

If it is defined on the foreign server but not the local server, there is
no need to fetch it from the foreign server. The other way round case is
tricky. We could mark the extended statistics object invalid if it's not
defined on the foreign server or the definition is different. We have to
document it that way. I think that should serve most of the cases.

--
Best Wishes,
Ashutosh