The plan for FDW-based sharding

bruce@momjian.us

over 10 years ago

In reply to: David G. Johnston (#2)

Re: The plan for FDW-based sharding

On Tue, Feb 23, 2016 at 09:54:46AM -0700, David G. Johnston wrote:

On Tue, Feb 23, 2016 at 9:43 AM, Bruce Momjian <bruce@momjian.us> wrote:

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

Probably meant "global transaction manager"

Oops, yes, it should be:

4. Cross-node read-write queries:

This will require a global snapshot manager and global transaction
manager.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

simon@2ndQuadrant.com

over 10 years ago

In reply to: Bruce Momjian (#1)

Re: The plan for FDW-based sharding

On 23 February 2016 at 16:43, Bruce Momjian <bruce@momjian.us> wrote:

There was discussion at the FOSDEM/PGDay Developer Meeting
(https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
about sharding so I wanted to outline where I think we are going with
sharding and FDWs.

I think we need to be very careful to understand that "FDWs and Sharding"
is one tentative proposal amongst others, not a statement of direction for
the PostgreSQL project since there is not yet any universal agreement.

We know Postgres XC/XL works, and scales

Agreed.

In contrast, the FDW/sharding approach is as-yet unproven, and
significantly without any detailed technical discussion of the exact
approach and how it would work, even after more than 6 months since we
first heard of it openly. Since we don't know how it will work, we have no
idea how long it will take either, or even if it ever will.

I'd like to see discussion of the details in presentation/wiki form and an
initial prototype, with measurements. Without these things we are still
just at the speculation stage. Some alternate proposals are also at that
stage.

, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions). The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

It's never been our policy to try to include major projects in single code
drops. Any move of XL/XC code into PostgreSQL core would need to be done
piece by piece across many releases. XL is definitely too big for the
elephant to eat in one mouthful.

If we are not willing to add code for the missing Postgres XC/XL
features, Postgres XC/XL will probably remain a separate fork of
Postgres.

And if the FDW approach doesn't work, that won't be part of PostgreSQL core
either...

I don't think anyone knows the answer to this question, and I
don't know how to find the answer except to keep going with our current
FDW sharding approach.

This is exactly the wrong time to discuss this, since we are days away from
the final deadline for PostgreSQL 9.6 and the community should be focusing
on that for next few months, not futures.

What I notice is that when Greenplum announced it would publish as open
source its modified version of Postgres, there was some scary noise made
immediately about that concerning patents etc..

Now, Postgres-XL 9.5 is recently announced and we see another scary
sounding pronouncement about that *maybe* it won't be included in core.
While the comments made are true, they do not solely apply to XC/XL, in
fact the uncertainty applies to all approaches equally since notably we
have approximately five proposals for future designs.

These comments, given their timing and nature could easily cause "Fear,
Uncertainty and Doubt" in people seeing this. FUD is also the name of a
sales technique designed to undermine proposals. I hope and presume it was
not the intention and reason for discussing uncertainty now and earlier.

I'm glad to see that the viability of the XC/XL approach is recognized. The
fact we have a working solution now is important for users, who don't want
to wait the 3-5 years while we work out and implement a longer term
strategy. Future upgrade support is certain, however.

What eventually gets into PostgreSQL core is as yet uncertain, as is the
timescale, but my hope is that we recognize that multiple use cases can be
supported rather than a single fixed architecture. It seems likely to me
that the PostgreSQL project will do what it does best - take multiple
comments and merge those into a combined system that is better than any of
the individual single proposals.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

bruce@momjian.us

over 10 years ago

In reply to: Simon Riggs (#4)

Re: The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 01:08:29AM +0000, Simon Riggs wrote:

On 23 February 2016 at 16:43, Bruce Momjian <bruce@momjian.us> wrote:

There was discussion at the FOSDEM/PGDay Developer Meeting
(https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
about sharding so I wanted to outline where I think we are going with
sharding and FDWs.

I think we need to be very careful to understand that "FDWs and Sharding" is
one tentative proposal amongst others, not a statement of direction for the

--------------

What other directions are proposed to add sharding to the existing
Postgres code? If there are, I have not heard of them. Or are they
only (regularly updated?) forks of Postgres?

PostgreSQL project since there is not yet any universal agreement.

As I stated clearly, we are going in the FDW direction because improving
FDWs have uses beyond sharding, and once it is done we can see how well
it works for sharding.

We know Postgres XC/XL works, and scales

Agreed.ï¿½

In contrast, the FDW/sharding approach is as-yet unproven, and significantly
without any detailed technical discussion of the exact approach and how it
would work, even after more than 6 months since we first heard of it openly.
Since we don't know how it will work, we have no idea how long it will take
either, or even if it ever will.

Yep.

I'd like to see discussion of the details in presentation/wiki form and an
initial prototype, with measurements. Without these things we are still just at
the speculation stage. Some alternate proposals are also at that stage.

Uh, what "alternate proposals"?

My point was that we know XC/XL works, but there is too much code change
for us, so maybe FDWs will make built-in sharding possible/easier.

, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions).ï¿½ The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

It's never been our policy to try to include major projects in single code
drops. Any move of XL/XC code into PostgreSQL core would need to be done piece
by piece across many releases. XL is definitely too big for the elephant to eat
in one mouthful.

Is there any plan to move the XL/XC code into Postgres? If so, I have
not heard of it. I thought everyone agreed it was too much code change,
which is why it is a separate code tree. Is that incorrect?

If we are not willing to add code for the missing Postgres XC/XL
features, Postgres XC/XL will probably remain a separate fork of
Postgres.ï¿½

And if the FDW approach doesn't work, that won't be part of PostgreSQL core
either...

Uh, duh. Yeah, that's what I said. What is your point? I said we
don't know if it will work, as you quoted below:

I don't think anyone knows the answer to this question, and I
don't know how to find the answer except to keep going with our current
FDW sharding approach.

This is exactly the wrong time to discuss this, since we are days away from the
final deadline for PostgreSQL 9.6 and the community should be focusing on that
for next few months, not futures.

I posted this because of the discussion at the FOSDEM meeting, and to
address the questions you asked in that meeting. I even told you last
week on IM that I was going to post this for that stated purpose. I
didn't pick the time at random.

What I notice is that when Greenplum announced it would publish as open source
its modified version of Postgres, there was some scary noise made immediately
about that concerning patents etc..

Now, Postgres-XL 9.5 is recently announced and we see another scary sounding
pronouncement about that *maybe* it won't be included in core. While the
comments made are true, they do not solely apply to XC/XL, in fact the
uncertainty applies to all approaches equally since notably we have
approximately five proposals for future designs.

These comments, given their timing and nature could easily cause "Fear,
Uncertainty and Doubt" in people seeing this. FUD is also the name of a sales
technique designed to undermine proposals. I hope and presume it was not the
intention and reason for discussing uncertainty now and earlier.

Oh, I absolutely did this as a way to undermine what _everyone_ else is
doing? Is there another way to behave?

I find this insulting. Others made the same remarks when I questioned
the patents, and earlier when I questioned if we would integrate the
Greenplum code after their press release. And you know what, we didn't
want the Greenplum code (yet), and I explained how open source code with
patents is riskier than closed-source code with patents, and I think
people finally understood that, including you.

When people don't like what I have to say, they figure their must be
some other motive, because I certainly couldn't think this on my own?
Really? Have I not been around long enough for people to realize that
is not the case!

If you _presume_ did not have some undermining motivation for posting
this, why did you mention it? You obviously _do_ think I have some
external motivation for talking about FDWs now or you wouldn't have
mentioned it. (I can't even think of what the motivation would be.)

Let me come out and say what people might be thinking: I realize it is
unfortunate that _if_ FDWs succeed in sharding, the value of the work
done on Postgres XC/XL will be diminished. I personally think that
Postgres needs a built-in sharding solution, just like I thought we
needed a native Windows port, in-place upgrade, and parallelism. I was
hopeful XC/XL could be integrated into Postgres, but based on
discussions, it seems that is not acceptable, so the FDW/sharding
approach is that only built-in one I can think of. Are there other
possibilities?

I talk about it and try to get people excited about it. I make no
apologies for that. I will talk about this forever, or as long as
people will listen, so you can expect to hear about it. I am sure I
will think of other "crazy" things to talk about too because the other
items I mentioned above were also considered odd/crazy at the time I
proposed them.

I'm glad to see that the viability of the XC/XL approach is recognized. The
fact we have a working solution now is important for users, who don't want to
wait the 3-5 years while we work out and implement a longer term strategy.
Future upgrade support is certain, however.

Yes, no question. The benchmarks of XC/XL looked amazing. Can you
remind me of the URLs for that? Do you have any new ones?

In a way, I don't see any need for an FDW sharding prototype because, as
I said, we already know XC/XL work, so copying what they do doesn't
help. What we need to know is if we can get near the XC/XL benchmarks
with an acceptable addition of code, which is what I thought I already
said. Perhaps this can be done with FDWs, or some other approach I have
not heard of yet.

What eventually gets into PostgreSQL core is as yet uncertain, as is the
timescale, but my hope is that we recognize that multiple use cases can be
supported rather than a single fixed architecture. It seems likely to me that
the PostgreSQL project will do what it does best - take multiple comments and
merge those into a combined system that is better than any of the individual
single proposals.

Agreed.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

aekorotkov@gmail.com

over 10 years ago

In reply to: Bruce Momjian (#1)

Re: The plan for FDW-based sharding

Hi, Bruce!

The important point for me is to distinguish different kind of plans:
implementation plan and research plan.
If we're talking about implementation plan then it should be proven that
proposed approach works in this case. I.e research should be already done.
If we're talking about research plan then we should realize that result is
unpredictable. And we would probably need to dramatically change our way.

This two things would work with FDW:
1) Pull data from data nodes to coordinator.
2) Pushdown computations from coordinator to data nodes: joins, aggregates
etc.
It's proven and clear. This is good.
Another point is that these FDW advances are useful by themselves. This is
good too.

However, the model of FDW assumes that communication happen only between
coordinator and data node. But full-weight distributed optimized can't be
done under this restriction, because it requires every node to communicate
every other node if it makes distributed query faster. And as I get, FDW
approach currently have no research and no particular plan for that.

As I get from Robert Haas's talk (
https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0
)

Before we consider repartitioning joins, we should probably get everything
previously discussed working first.
– Join Pushdown For Parallelism, FDWs
– PartialAggregate/FinalizeAggregate
– Aggregate Pushdown For Parallelism, FDWs
– Declarative Partitioning
– Parallel-Aware Append

So, as I get we didn't ever think about possibility of data redistribution
using FDW. Probably, something changed since that time. But I haven't heard
about it.

On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian <bruce@momjian.us> wrote:

Second, as part of this staged implementation, there are several use
cases that will be shardable at first, and then only later, more complex
ones. For example, here are some use cases and the technology they
require:

1. Cross-node read-only queries on read-only shards using aggregate
queries, e.g. data warehouse:

This is the simplest to implement as it doesn't require a global
transaction manager, global snapshot manager, and the number of rows
returned from the shards is minimal because of the aggregates.

2. Cross-node read-only queries on read-only shards using non-aggregate
queries:

This will stress the coordinator to collect and process many returned
rows, and will show how well the FDW transfer mechanism scales.

FDW would work for queries which fits pull-pushdown model. I see no plan to
make other queries work.

3. Cross-node read-only queries on read/write shards:

This will require a global snapshot manager to make sure the shards
return consistent data.

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

At this point, it unclear why don't you refer work done in the direction of
distributed transaction manager (which is also distributed snapshot manager
in your terminology)
/messages/by-id/56BB7880.4020604@postgrespro.ru

In 9.6, we will have FDW join and sort pushdown
(http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
calability.html
<http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-scalability.html>).
Unfortunately I don't think we will have aggregate
pushdown, so we can't test #1, but we might be able to test #2, even in
9.5. Also, we might have better partitioning syntax in 9.6.

We need things like parallel partition access and replicated lookup
tables for more join pushdown.

In a way, because these enhancements are useful independent of sharding,
we have not tested to see how well an FDW sharding setup will work and
for which workloads.

This is the point I agree. I'm not objecting against any single FDW
advance, because it's useful by itself.

We know Postgres XC/XL works, and scales, but we also know they require

too many code changes to be merged into Postgres (at least based on
previous discussions). The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

This comparison doesn't seems correct to me. Postgres XC/XL supports data
redistribution between nodes. And I haven't heard any single idea of
supporting this in FDW. You are comparing not equal things.

Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

If we are not willing to add code for the missing Postgres XC/XL
features, Postgres XC/XL will probably remain a separate fork of
Postgres. I don't think anyone knows the answer to this question, and I
don't know how to find the answer except to keep going with our current
FDW sharding approach.

I have nothing against particular FDW advances. However, it's unclear for
me that FDW should be the only sharding approach.
It's unproven that FDW can do work that Postgres XC/XL does. With FDW we
can have some low-hanging fruits. That's good.
But it's unclear we can have high-hanging fruits (like data redistribution)
with FDW approach. And if we can it's unclear that it would be easier than
with other approaches.
Just let's don't call this community chosen plan for implementing sharding.
Until we have full picture we can't select one way and reject others.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Bruce Momjian (#1)

Re: The plan for FDW-based sharding

Sorry, but based on this plan it is possible to make a conclusion that
there are only two possible cluster solutions for Postgres:
XC/XL and FDW-based. From my point of view there are much more
possible alternatives.
Our main idea with XTM (eXtensible Transaction Manager API) was to make
it possible to develop cluster solutions for Postgres as extensions
without patching code of Postgres core. And FDW is one of the mechanism
which makes it possible to reach this goal.

IMHO it will be hard to implement efficient execution of complex OLAP
queries (including cross-node joins and aggregation) within FDW
paradigm. It will be necessary to build distributed query execution plan
and coordinate it execution at cluster nodes. And definitely we need
specialized optimizer for distributed queries. Right now solution of the
problem are provided by XL and Greenplum, but both are forks of Posrgres
with a lot of changes in Postgres core. The challenge is to provide the
similar functionality, but at extension level (using custom nodes,
pluggable transaction manager, ...).

But, as you noticed, complex OLAP is just one of the scenarios and this
is not the only possible way of using clusters. In some cases FDW-based
sharding can be quite efficient. Or pg_shard approach which also adds
sharding at extension level and in some aspects is more flexible than
FDW-based solution. Not all scenarios require global transaction
manager. But if one need global consistency, then XTM API allows to
provide ACID for both approaches (and not only for them).

We currently added to commitfest our XTM patch together with
postgres_fdw patch integrating timestamp-based DTM implementation in
postgres_fdw. It illustrates how global consistency canbe reached for
FDW-based sharding.
If this XTM patch will be committed, then in 9.6 we will have wide
flexibility to play with different distributed transaction managers. And
it can be used for many cluster solutions.

IMHO it will be very useful to extend your classification of cluster use
cases, more precisely formulate demands in all cases, investigate how
them can be covered by existed cluster solutions for Postgres and which
niches are still vacant. We are currently continue work on "multimaster"
- some more convenient alternative to hot-standby replication. Looks
like PostgreSQL is missing some product providing functionality similar
to Oracle RAC or MySQL Gallera. It is yet another direction of cluster
development for PostgreSQL. Let's be more open and flexible.

On 23.02.2016 19:43, Bruce Momjian wrote:

There was discussion at the FOSDEM/PGDay Developer Meeting
(https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
about sharding so I wanted to outline where I think we are going with
sharding and FDWs.

First, let me point out that, unlike pg_upgrade and the Windows port,
which either worked or didn't work, sharding is going be implemented and
useful in stages. It will take several years to complete, similar to
parallelism, streaming replication, and logical replication.

Second, as part of this staged implementation, there are several use
cases that will be shardable at first, and then only later, more complex
ones. For example, here are some use cases and the technology they
require:

1. Cross-node read-only queries on read-only shards using aggregate
queries, e.g. data warehouse:

This is the simplest to implement as it doesn't require a global
transaction manager, global snapshot manager, and the number of rows
returned from the shards is minimal because of the aggregates.

2. Cross-node read-only queries on read-only shards using non-aggregate
queries:

This will stress the coordinator to collect and process many returned
rows, and will show how well the FDW transfer mechanism scales.

3. Cross-node read-only queries on read/write shards:

This will require a global snapshot manager to make sure the shards
return consistent data.

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

In 9.6, we will have FDW join and sort pushdown
(http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
calability.html). Unfortunately I don't think we will have aggregate
pushdown, so we can't test #1, but we might be able to test #2, even in
9.5. Also, we might have better partitioning syntax in 9.6.

We need things like parallel partition access and replicated lookup
tables for more join pushdown.

In a way, because these enhancements are useful independent of sharding,
we have not tested to see how well an FDW sharding setup will work and
for which workloads.

We know Postgres XC/XL works, and scales, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions). The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

If we are not willing to add code for the missing Postgres XC/XL
features, Postgres XC/XL will probably remain a separate fork of
Postgres. I don't think anyone knows the answer to this question, and I
don't know how to find the answer except to keep going with our current
FDW sharding approach.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

oleg@sai.msu.su

over 10 years ago

In reply to: Alexander Korotkov (#6)

Re: The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 12:17 PM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:

Hi, Bruce!

The important point for me is to distinguish different kind of plans:
implementation plan and research plan.
If we're talking about implementation plan then it should be proven that
proposed approach works in this case. I.e research should be already done.
If we're talking about research plan then we should realize that result is
unpredictable. And we would probably need to dramatically change our way.

This two things would work with FDW:
1) Pull data from data nodes to coordinator.
2) Pushdown computations from coordinator to data nodes: joins, aggregates
etc.
It's proven and clear. This is good.
Another point is that these FDW advances are useful by themselves. This is
good too.

However, the model of FDW assumes that communication happen only between
coordinator and data node. But full-weight distributed optimized can't be
done under this restriction, because it requires every node to communicate
every other node if it makes distributed query faster. And as I get, FDW
approach currently have no research and no particular plan for that.

As I get from Robert Haas's talk (
https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0
)

Before we consider repartitioning joins, we should probably get
everything previously discussed working first.
– Join Pushdown For Parallelism, FDWs
– PartialAggregate/FinalizeAggregate
– Aggregate Pushdown For Parallelism, FDWs
– Declarative Partitioning
– Parallel-Aware Append

So, as I get we didn't ever think about possibility of data redistribution
using FDW. Probably, something changed since that time. But I haven't heard
about it.

On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian <bruce@momjian.us> wrote:

Second, as part of this staged implementation, there are several use
cases that will be shardable at first, and then only later, more complex
ones. For example, here are some use cases and the technology they
require:

1. Cross-node read-only queries on read-only shards using aggregate
queries, e.g. data warehouse:

This is the simplest to implement as it doesn't require a global
transaction manager, global snapshot manager, and the number of rows
returned from the shards is minimal because of the aggregates.

2. Cross-node read-only queries on read-only shards using non-aggregate
queries:

This will stress the coordinator to collect and process many returned
rows, and will show how well the FDW transfer mechanism scales.

FDW would work for queries which fits pull-pushdown model. I see no plan
to make other queries work.

3. Cross-node read-only queries on read/write shards:

This will require a global snapshot manager to make sure the shards
return consistent data.

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

At this point, it unclear why don't you refer work done in the direction
of distributed transaction manager (which is also distributed snapshot
manager in your terminology)
/messages/by-id/56BB7880.4020604@postgrespro.ru

In 9.6, we will have FDW join and sort pushdown
(http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
calability.html
<http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-scalability.html>).
Unfortunately I don't think we will have aggregate
pushdown, so we can't test #1, but we might be able to test #2, even in
9.5. Also, we might have better partitioning syntax in 9.6.

We need things like parallel partition access and replicated lookup
tables for more join pushdown.

In a way, because these enhancements are useful independent of sharding,
we have not tested to see how well an FDW sharding setup will work and
for which workloads.

This is the point I agree. I'm not objecting against any single FDW
advance, because it's useful by itself.

We know Postgres XC/XL works, and scales, but we also know they require

too many code changes to be merged into Postgres (at least based on
previous discussions). The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

This comparison doesn't seems correct to me. Postgres XC/XL supports data
redistribution between nodes. And I haven't heard any single idea of
supporting this in FDW. You are comparing not equal things.

Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

If we are not willing to add code for the missing Postgres XC/XL
features, Postgres XC/XL will probably remain a separate fork of
Postgres. I don't think anyone knows the answer to this question, and I
don't know how to find the answer except to keep going with our current
FDW sharding approach.

I have nothing against particular FDW advances. However, it's unclear for
me that FDW should be the only sharding approach.
It's unproven that FDW can do work that Postgres XC/XL does. With FDW we
can have some low-hanging fruits. That's good.
But it's unclear we can have high-hanging fruits (like data
redistribution) with FDW approach. And if we can it's unclear that it would
be easier than with other approaches.
Just let's don't call this community chosen plan for implementing sharding.
Until we have full picture we can't select one way and reject others.

I already several times pointed, that we need XTM to be able to continue
development in different directions, since there is no clear winner.
Moreover, I think there is no fits-all solution and while I agree we need
one built-in in the core, other approaches should have ability to exists
without patching.

Show quoted text

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

bruce@momjian.us

over 10 years ago

In reply to: Alexander Korotkov (#6)

Re: The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 12:17:28PM +0300, Alexander Korotkov wrote:

Hi, Bruce!

The important point for me is to distinguish different kind of plans:
implementation plan and research plan.
If we're talking about implementation plan then it should be proven that
proposed approach works in this case. I.e research should be already done.
If we're talking about research plan then we should realize that result is
unpredictable. And we would probably need to dramatically change our way.

Yes, good point. I would say FDW-based sharding is certainly still a
research approach, but an odd one because we are adding code even while
in research mode. I think that is possible because the FDW improvements
have other uses beyond sharding.

I think another aspect is that we already know that modifying the
Postgres source code can produce a useful sharding solution --- XC, XL,
Greenplum, and CitusDB all prove that, and pg_shard does it as a plugin.
So, we know that with unlimited code changes, it is possible. What we
don't know is whether it is possible with acceptable code changes, and
how much of the feature-set can be supported this way.

We had a similar case with the Windows port, where SRA (my employer at
the time) and Nusphere both had native Windows ports of Postgres, and
they supplied source code to help with the port. So, in that case also,
we knew a native Windows port was possible, and we (or at least I) could
see the code that was required to do it. The big question was whether a
native Windows port could be added in a community-acceptable way, and
the community agreed we could try if we didn't make the code messier ---
that was a success.

For pg_upgrade, I had code from EDB (my employer at the time) that kind
of worked, but needed lots of polish, and again, I could do it in
contrib as long as I didn't mess up the backend code --- that worked
well too.

So, I guess I am saying, the FDW/sharding thing is a research project,
but one that is implementing code because of existing proven solutions
and because the improvements are benefiting other use-cases beyond
sharding.

Also, in the big picture, the existence of many Postgres forks, all
doing sharding, indicates that there is demand for this capability, and
if we can get some this capability into Postgres we will increase the
number of people using native Postgres. We might also be able to reduce
the amount of duplicate work being done in all these forks and allow
them to more easily focus on more advanced use-cases.

This two things would work with FDW:
1) Pull data from data nodes to coordinator.
2) Pushdown computations from coordinator to data nodes: joins, aggregates etc.
It's proven and clear. This is good.
Another point is that these FDW advances are useful by themselves. This is good
too.

However, the model of FDW assumes that communication happen only between
coordinator and data node. But full-weight distributed optimized can't be done
under this restriction, because it requires every node to communicate every
other node if it makes distributed query faster. And as I get, FDW approach
currently have no research and no particular plan for that.

This is very true. I imagine cross-node connections will certainly
complicate the implementation and lead to significant code changes,
which might be unacceptable. I think we need to go with a
non-cross-node implementation first, then if that is accepted, we can
start to think what cross-node code changes would look like. It
certainly would require FDW knowledge to exist on every shard. Some
have suggested that FDWs wouldn't work well for cross-node connections
or wouldn't scale and we shouldn't be using them --- I am not sure what
to think of that.

As I get from Robert Haas's talk (https://docs.google.com/viewer?a=v&pid=sites&
srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0)

Before we consider repartitioning joins, we should probably get everything
previously discussed working first.
– Join Pushdown For Parallelism, FDWs
– PartialAggregate/FinalizeAggregate
– Aggregate Pushdown For Parallelism, FDWs
– Declarative Partitioning
– Parallel-Aware Append

So, as I get we didn't ever think about possibility of data redistribution
using FDW. Probably, something changed since that time. But I haven't heard
about it.

No, you didn't miss it. :-( We just haven't gotten to studying that
yet. One possible outcome is that built-in Postgres has non-cross-node
sharding, and forks of Postgres have cross-node sharding, again assuming
cross-node sharding requires an unacceptable amount of code change. I
don't think anyone knows the answer yet.

On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian <bruce@momjian.us> wrote:

Second, as part of this staged implementation, there are several use
cases that will be shardable at first, and then only later, more complex
ones. For example, here are some use cases and the technology they
require:

1. Cross-node read-only queries on read-only shards using aggregate
queries, e.g. data warehouse:

This is the simplest to implement as it doesn't require a global
transaction manager, global snapshot manager, and the number of rows
returned from the shards is minimal because of the aggregates.

2. Cross-node read-only queries on read-only shards using non-aggregate
queries:

This will stress the coordinator to collect and process many returned
rows, and will show how well the FDW transfer mechanism scales.

FDW would work for queries which fits pull-pushdown model. I see no plan to
make other queries work.

Yep, see above.

3. Cross-node read-only queries on read/write shards:

This will require a global snapshot manager to make sure the shards
return consistent data.

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

At this point, it unclear why don't you refer work done in the direction of
distributed transaction manager (which is also distributed snapshot manager in
your terminology)
/messages/by-id/56BB7880.4020604@postgrespro.ru

Yes, there is certainly great work being done on that. I should have
included a URL for that --- glad you did. I wasn't aware it also was a
distributed snapshot manager. :-) And again, as you said earlier, it
is useful for more things that just FDW sharding.

In 9.6, we will have FDW join and sort pushdown
(http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
calability.html). Unfortunately I don't think we will have aggregate
pushdown, so we can't test #1, but we might be able to test #2, even in
9.5. Also, we might have better partitioning syntax in 9.6.

We need things like parallel partition access and replicated lookup
tables for more join pushdown.

In a way, because these enhancements are useful independent of sharding,
we have not tested to see how well an FDW sharding setup will work and
for which workloads.

This is the point I agree. I'm not objecting against any single FDW advance,
because it's useful by itself.

We know Postgres XC/XL works, and scales, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions). The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

This comparison doesn't seems correct to me. Postgres XC/XL supports data
redistribution between nodes. And I haven't heard any single idea of supporting
this in FDW. You are comparing not equal things.

Well, as far as I know XC doesn't support data redistribution between
nodes and I saw good benchmarks of that, as well as XL. We didn't merge
in the XC code, so I assume the XL implementation of non-cross-node
sharding also would be too much code to digest, which is why we are
trying FDW sharding. As I said, we will see how much of the Postgres
XC/XL workload can be accomplished with FDWs.

Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

If we are not willing to add code for the missing Postgres XC/XL
features, Postgres XC/XL will probably remain a separate fork of
Postgres. I don't think anyone knows the answer to this question, and I
don't know how to find the answer except to keep going with our current
FDW sharding approach.

I have nothing against particular FDW advances. However, it's unclear for me
that FDW should be the only sharding approach.
It's unproven that FDW can do work that Postgres XC/XL does. With FDW we can
have some low-hanging fruits. That's good.
But it's unclear we can have high-hanging fruits (like data redistribution)
with FDW approach. And if we can it's unclear that it would be easier than with
other approaches.
Just let's don't call this community chosen plan for implementing sharding.
Until we have full picture we can't select one way and reject others.

I agree. I think the FDW approach is the only existing approach for
built-in sharding though. The forks of Postgres doing sharding are,
just that, forks and just Postgres community ecosystem projects. (Yes,
they are open source.) If the forks were community-chosen plans we
hopefully would not have 5+ of them. If FDW works, it has the potential
to be the community-chosen plan, at least for the workloads it supports,
because it is built into community Postgres in a way the others cannot.

That doesn't mean the forks go away, but rather their value is in doing
things the FDW approach can't, but there are a lot of "if's" in there.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

bruce@momjian.us

over 10 years ago

In reply to: Oleg Bartunov (#8)

Re: The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 12:35:15PM +0300, Oleg Bartunov wrote:

I have nothing against particular FDW advances. However, it's unclear for
me that FDW should be the only sharding approach.
It's unproven that FDW can do work that Postgres XC/XL does. With FDW we
can have someï¿½low-hanging fruits. That's good.
But it's unclear we can have high-hanging fruits (like data redistribution)
with FDW approach. And if we can it's unclear that it would be easier than
with other approaches.
Just let's don't call this community chosen plan for implementing sharding.
Until we have full picture we can't select one way and reject others.

I already several times pointed, that we need XTM to be able to continue
development in different directions, since there is no clear winner.ï¿½ Moreover,
I think there is no fits-allï¿½ solution and while I agree we need one built-in
in the core, other approaches should have ability to exists without patching.

Yep. I think much of what we eventually add to core will be either
copied from an existing soltion, which then doesn't need to be
maintained anymore, or used by existing solutions.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

bruce@momjian.us

over 10 years ago

In reply to: Konstantin Knizhnik (#7)

Re: The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 12:22:20PM +0300, Konstantin Knizhnik wrote:

Sorry, but based on this plan it is possible to make a conclusion
that there are only two possible cluster solutions for Postgres:
XC/XL and FDW-based. From my point of view there are much more
possible alternatives.
Our main idea with XTM (eXtensible Transaction Manager API) was to
make it possible to develop cluster solutions for Postgres as
extensions without patching code of Postgres core. And FDW is one of
the mechanism which makes it possible to reach this goal.

Yes, this is a good example of code reuse.

IMHO it will be hard to implement efficient execution of complex
OLAP queries (including cross-node joins and aggregation) within
FDW paradigm. It will be necessary to build distributed query
execution plan and coordinate it execution at cluster nodes. And
definitely we need specialized optimizer for distributed queries.
Right now solution of the problem are provided by XL and Greenplum,
but both are forks of Posrgres with a lot of changes in Postgres
core. The challenge is to provide the similar functionality, but at
extension level (using custom nodes, pluggable transaction manager,
...).

Agreed.

But, as you noticed, complex OLAP is just one of the scenarios and
this is not the only possible way of using clusters. In some cases
FDW-based sharding can be quite efficient. Or pg_shard approach
which also adds sharding at extension level and in some aspects is
more flexible than FDW-based solution. Not all scenarios require
global transaction manager. But if one need global consistency, then
XTM API allows to provide ACID for both approaches (and not only for
them).

Yep.

We currently added to commitfest our XTM patch together with
postgres_fdw patch integrating timestamp-based DTM implementation in
postgres_fdw. It illustrates how global consistency canbe reached
for FDW-based sharding.
If this XTM patch will be committed, then in 9.6 we will have wide
flexibility to play with different distributed transaction managers.
And it can be used for many cluster solutions.

IMHO it will be very useful to extend your classification of cluster
use cases, more precisely formulate demands in all cases,
investigate how them can be covered by existed cluster solutions
for Postgres and which niches are still vacant. We are currently
continue work on "multimaster" - some more convenient alternative to
hot-standby replication. Looks like PostgreSQL is missing some
product providing functionality similar to Oracle RAC or MySQL
Gallera. It is yet another direction of cluster development for
PostgreSQL. Let's be more open and flexible.

Yes, I listed only the workloads I could think of. It would be helpful
to list more workloads and start to decide what can be accomplished with
each approach. I don't even know all the workloads supported by the
sharding forks of Postgres.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

bruce@momjian.us

over 10 years ago

In reply to: Bruce Momjian (#9)

Re: The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 09:34:37AM -0500, Bruce Momjian wrote:

I have nothing against particular FDW advances. However, it's unclear for me
that FDW should be the only sharding approach.
It's unproven that FDW can do work that Postgres XC/XL does. With FDW we can
have someï¿½low-hanging fruits. That's good.
But it's unclear we can have high-hanging fruits (like data redistribution)
with FDW approach. And if we can it's unclear that it would be easier than with
other approaches.
Just let's don't call this community chosen plan for implementing sharding.
Until we have full picture we can't select one way and reject others.

I agree. I think the FDW approach is the only existing approach for
built-in sharding though. The forks of Postgres doing sharding are,
just that, forks and just Postgres community ecosystem projects. (Yes,
they are open source.) If the forks were community-chosen plans we
hopefully would not have 5+ of them. If FDW works, it has the potential
to be the community-chosen plan, at least for the workloads it supports,
because it is built into community Postgres in a way the others cannot.

That doesn't mean the forks go away, but rather their value is in doing
things the FDW approach can't, but there are a lot of "if's" in there.

Actually, this seems similar to how we handled replication. For years
we had multiple external replication solutions. When we implemented
streaming replication, we knew it would become the default for workloads
it supports. The external solutions didn't go away, but their value was
in handling workloads that streaming replication didn't support.

I think the only difference is that we knew streaming replication would
have this effect before we implemented it, while with FDW-based
sharding, we don't know.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Bruce Momjian (#5)

Re: The plan for FDW-based sharding

Bruce Momjian wrote:

On Wed, Feb 24, 2016 at 01:08:29AM +0000, Simon Riggs wrote:

It's never been our policy to try to include major projects in single code
drops. Any move of XL/XC code into PostgreSQL core would need to be done piece
by piece across many releases. XL is definitely too big for the elephant to eat
in one mouthful.

Is there any plan to move the XL/XC code into Postgres? If so, I have
not heard of it. I thought everyone agreed it was too much code change,
which is why it is a separate code tree. Is that incorrect?

Yes, I think that's incorrect.

What was said, as I understood it, is that Postgres-XL is too big to
merge in a single commit -- just like merging BDR would have been.
Indulge me while I make a parallel with BDR for a bit.
2ndQuadrant never pushed for merging BDR in a single commit; what was
done was to split it, and propose individual pieces for commit. Many of
these pieces are now already committed (event triggers, background
workers, logical decoding, replication slots, and many others). The
"BDR patch" is now much smaller, and it's quite possible that we will
see it merged someday. Will it be different from what it was when the
BDR project started, all those years ago? You bet. Having the
prototype BDR initially was what allowed the whole plan to make sense,
because it showed that the pieces interacted in the right ways to make
it work as a whole.

(I'm not saying 2ndQuadrant is so wise to do things this way. I'm
pretty sure you can see the same thing in parallel query development,
for instance.)

In the same way, Postgres-XL is far too big to merge in a single commit.
But that doesn't mean it will never be merged. What is more likely to
happen instead is that some pieces of it are going to be submitted
separately for consideration. It is a slow process, but progress is
real and tangible. We know this process will yield a useful outcome,
because the architecture has already proven by the existance of
Postgres-XL itself. It's the prototype that proves the overall design,
even if the pieces change shape during the process. (Really, it's way
more than merely a prototype at this point because of how long it has
matured.)

In contrast, we don't have a prototype for FDW-based sharding; as you
admitted, there is no actual plan, other than "let's push FDWs in this
direction and hope that sharding will emerge". We don't really know
what pieces we need or how will they interact with each other; we have a
vague idea of a direction but there's no clear path forward. As the
saying goes, if you don't know where you're going, you will probably end
up somewhere else.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

bruce@momjian.us

over 10 years ago

In reply to: Alvaro Herrera (#13)

Re: The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 01:02:21PM -0300, Alvaro Herrera wrote:

Bruce Momjian wrote:

On Wed, Feb 24, 2016 at 01:08:29AM +0000, Simon Riggs wrote:

It's never been our policy to try to include major projects in single code
drops. Any move of XL/XC code into PostgreSQL core would need to be done piece
by piece across many releases. XL is definitely too big for the elephant to eat
in one mouthful.

Is there any plan to move the XL/XC code into Postgres? If so, I have
not heard of it. I thought everyone agreed it was too much code change,
which is why it is a separate code tree. Is that incorrect?

Yes, I think that's incorrect.

What was said, as I understood it, is that Postgres-XL is too big to
merge in a single commit -- just like merging BDR would have been.
Indulge me while I make a parallel with BDR for a bit.
2ndQuadrant never pushed for merging BDR in a single commit; what was
done was to split it, and propose individual pieces for commit. Many of
these pieces are now already committed (event triggers, background
workers, logical decoding, replication slots, and many others). The
"BDR patch" is now much smaller, and it's quite possible that we will
see it merged someday. Will it be different from what it was when the
BDR project started, all those years ago? You bet. Having the
prototype BDR initially was what allowed the whole plan to make sense,
because it showed that the pieces interacted in the right ways to make
it work as a whole.

Yes, that is my understanding too.

(I'm not saying 2ndQuadrant is so wise to do things this way. I'm
pretty sure you can see the same thing in parallel query development,
for instance.)

In the same way, Postgres-XL is far too big to merge in a single commit.
But that doesn't mean it will never be merged. What is more likely to
happen instead is that some pieces of it are going to be submitted
separately for consideration. It is a slow process, but progress is
real and tangible. We know this process will yield a useful outcome,

I was not aware there was any process to merge XC/XL into Postgres, at
least from the XC/XL side. I know there is desire to take code from
XC/XL on the FDW-sharding side.

I think the most conservative merge approach is to try to enhance
existing Postgres features first (FDWs, partitioning, parallelism),
perhaps features that didn't exist at the time XC/XL were designed. If
they work, keep them and add the XC/XL-specific parts. If the
enhance-features approach doesn't work, we then have to consider how
much additional code will be needed. We have to evaluate this for the
FDW-based approach too, but it is likely to be smaller, which is its
attraction.

because the architecture has already proven by the existence of
Postgres-XL itself. It's the prototype that proves the overall design,
even if the pieces change shape during the process. (Really, it's way
more than merely a prototype at this point because of how long it has
matured.)

True, it is beyond a prototype.

In contrast, we don't have a prototype for FDW-based sharding; as you
admitted, there is no actual plan, other than "let's push FDWs in this
direction and hope that sharding will emerge". We don't really know
what pieces we need or how will they interact with each other; we have a
vague idea of a direction but there's no clear path forward. As the
saying goes, if you don't know where you're going, you will probably end
up somewhere else.

I think I have covered that already.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Michael Paquier

michael@paquier.xyz

over 10 years ago

In reply to: Bruce Momjian (#9)

Re: The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 11:34 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Feb 24, 2016 at 12:17:28PM +0300, Alexander Korotkov wrote:

Hi, Bruce!

The important point for me is to distinguish different kind of plans:
implementation plan and research plan.
If we're talking about implementation plan then it should be proven that
proposed approach works in this case. I.e research should be already done.
If we're talking about research plan then we should realize that result is
unpredictable. And we would probably need to dramatically change our way.

Yes, good point. I would say FDW-based sharding is certainly still a
research approach, but an odd one because we are adding code even while
in research mode. I think that is possible because the FDW improvements
have other uses beyond sharding.

I think another aspect is that we already know that modifying the
Postgres source code can produce a useful sharding solution --- XC, XL,
Greenplum, and CitusDB all prove that, and pg_shard does it as a plugin.
So, we know that with unlimited code changes, it is possible. What we
don't know is whether it is possible with acceptable code changes, and
how much of the feature-set can be supported this way.

We had a similar case with the Windows port, where SRA (my employer at
the time) and Nusphere both had native Windows ports of Postgres, and
they supplied source code to help with the port. So, in that case also,
we knew a native Windows port was possible, and we (or at least I) could
see the code that was required to do it. The big question was whether a
native Windows port could be added in a community-acceptable way, and
the community agreed we could try if we didn't make the code messier ---
that was a success.

For pg_upgrade, I had code from EDB (my employer at the time) that kind
of worked, but needed lots of polish, and again, I could do it in
contrib as long as I didn't mess up the backend code --- that worked
well too.

So, I guess I am saying, the FDW/sharding thing is a research project,
but one that is implementing code because of existing proven solutions
and because the improvements are benefiting other use-cases beyond
sharding.

Also, in the big picture, the existence of many Postgres forks, all
doing sharding, indicates that there is demand for this capability, and
if we can get some this capability into Postgres we will increase the
number of people using native Postgres. We might also be able to reduce
the amount of duplicate work being done in all these forks and allow
them to more easily focus on more advanced use-cases.

This two things would work with FDW:
1) Pull data from data nodes to coordinator.
2) Pushdown computations from coordinator to data nodes: joins, aggregates etc.
It's proven and clear. This is good.
Another point is that these FDW advances are useful by themselves. This is good
too.

However, the model of FDW assumes that communication happen only between
coordinator and data node. But full-weight distributed optimized can't be done
under this restriction, because it requires every node to communicate every
other node if it makes distributed query faster. And as I get, FDW approach
currently have no research and no particular plan for that.

This is very true. I imagine cross-node connections will certainly
complicate the implementation and lead to significant code changes,
which might be unacceptable. I think we need to go with a
non-cross-node implementation first, then if that is accepted, we can
start to think what cross-node code changes would look like. It
certainly would require FDW knowledge to exist on every shard. Some
have suggested that FDWs wouldn't work well for cross-node connections
or wouldn't scale and we shouldn't be using them --- I am not sure what
to think of that.

As I get from Robert Haas's talk (https://docs.google.com/viewer?a=v&pid=sites&
srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0)

Before we consider repartitioning joins, we should probably get everything
previously discussed working first.
– Join Pushdown For Parallelism, FDWs
– PartialAggregate/FinalizeAggregate
– Aggregate Pushdown For Parallelism, FDWs
– Declarative Partitioning
– Parallel-Aware Append

So, as I get we didn't ever think about possibility of data redistribution
using FDW. Probably, something changed since that time. But I haven't heard
about it.

No, you didn't miss it. :-( We just haven't gotten to studying that
yet. One possible outcome is that built-in Postgres has non-cross-node
sharding, and forks of Postgres have cross-node sharding, again assuming
cross-node sharding requires an unacceptable amount of code change. I
don't think anyone knows the answer yet.

On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian <bruce@momjian.us> wrote:

Second, as part of this staged implementation, there are several use
cases that will be shardable at first, and then only later, more complex
ones. For example, here are some use cases and the technology they
require:

1. Cross-node read-only queries on read-only shards using aggregate
queries, e.g. data warehouse:

This is the simplest to implement as it doesn't require a global
transaction manager, global snapshot manager, and the number of rows
returned from the shards is minimal because of the aggregates.

2. Cross-node read-only queries on read-only shards using non-aggregate
queries:

This will stress the coordinator to collect and process many returned
rows, and will show how well the FDW transfer mechanism scales.

FDW would work for queries which fits pull-pushdown model. I see no plan to
make other queries work.

Yep, see above.

3. Cross-node read-only queries on read/write shards:

This will require a global snapshot manager to make sure the shards
return consistent data.

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

At this point, it unclear why don't you refer work done in the direction of
distributed transaction manager (which is also distributed snapshot manager in
your terminology)
/messages/by-id/56BB7880.4020604@postgrespro.ru

Yes, there is certainly great work being done on that. I should have
included a URL for that --- glad you did. I wasn't aware it also was a
distributed snapshot manager. :-) And again, as you said earlier, it
is useful for more things that just FDW sharding.

In 9.6, we will have FDW join and sort pushdown
(http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
calability.html). Unfortunately I don't think we will have aggregate
pushdown, so we can't test #1, but we might be able to test #2, even in
9.5. Also, we might have better partitioning syntax in 9.6.

We need things like parallel partition access and replicated lookup
tables for more join pushdown.

In a way, because these enhancements are useful independent of sharding,
we have not tested to see how well an FDW sharding setup will work and
for which workloads.

This is the point I agree. I'm not objecting against any single FDW advance,
because it's useful by itself.

We know Postgres XC/XL works, and scales, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions). The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

This comparison doesn't seems correct to me. Postgres XC/XL supports data
redistribution between nodes. And I haven't heard any single idea of supporting
this in FDW. You are comparing not equal things.

Well, as far as I know XC doesn't support data redistribution between
nodes and I saw good benchmarks of that, as well as XL.

XC does support that in 1.2 with a very basic approach (coded that
years ago), though it takes an exclusive lock on the table involved.
And actually I think what I did in this case really sucked, the effort
was centralized on the Coordinator to gather and then redistribute the
tuples, at least tuples that do not need to move were not moved at
all.

Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

Postgres-XC used the GPL license first, and has moved to PostgreSQL
license exactly to allow Postgres core to reuse it later on if needed.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

bruce@momjian.us

over 10 years ago

In reply to: Michael Paquier (#15)

Re: The plan for FDW-based sharding

On Thu, Feb 25, 2016 at 01:53:12PM +0900, Michael Paquier wrote:

Well, as far as I know XC doesn't support data redistribution between
nodes and I saw good benchmarks of that, as well as XL.

XC does support that in 1.2 with a very basic approach (coded that
years ago), though it takes an exclusive lock on the table involved.
And actually I think what I did in this case really sucked, the effort
was centralized on the Coordinator to gather and then redistribute the
tuples, at least tuples that do not need to move were not moved at
all.

Yes, there is a lot of complexity involved in sending results between
nodes.

Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

Postgres-XC used the GPL license first, and has moved to PostgreSQL
license exactly to allow Postgres core to reuse it later on if needed.

Ah, yes, I remember that now. Thanks.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

robertmhaas@gmail.com

over 10 years ago

In reply to: Oleg Bartunov (#8)

Re: The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 3:05 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

I already several times pointed, that we need XTM to be able to continue
development in different directions, since there is no clear winner.
Moreover, I think there is no fits-all solution and while I agree we need
one built-in in the core, other approaches should have ability to exists
without patching.

I don't think I necessarily agree with that. Transaction management
is such a fundamental part of the system that I think making it
pluggable is going to be really hard. I understand that you've done
several implementations based on your proposed API, and that's good as
far as it goes, but how do we know that's really going to be general
enough for what other people might need? And what makes us think we
really need multiple transaction managers, anyway? Even writing one
good distributed transaction manager seems like a really hard project
- why would we want to write two or three or five?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

oleg@sai.msu.su

over 10 years ago

In reply to: Robert Haas (#17)

Re: The plan for FDW-based sharding

On Fri, Feb 26, 2016 at 3:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 24, 2016 at 3:05 PM, Oleg Bartunov <obartunov@gmail.com>
wrote:

I already several times pointed, that we need XTM to be able to continue
development in different directions, since there is no clear winner.
Moreover, I think there is no fits-all solution and while I agree we

need

one built-in in the core, other approaches should have ability to exists
without patching.

I don't think I necessarily agree with that. Transaction management
is such a fundamental part of the system that I think making it
pluggable is going to be really hard. I understand that you've done
several implementations based on your proposed API, and that's good as
far as it goes, but how do we know that's really going to be general
enough for what other people might need?

Right now tm is hardcoded and it's doesn't matter "if other people might
need" at all. We at least provide developers ("other people") ability to
work on their implementations and the patch is safe and doesn't sacrifices
anything in core.

And what makes us think we
really need multiple transaction managers, anyway?

If you brave enough to say that one tm-fits-all and you are able to teach
existed tm to play well in various clustering environment during
development period, which is short, than probably we don't need multiple
tms. But It's too perfect to believe and practical solution is to let
multiple groups to work on their solutions.

Even writing one
good distributed transaction manager seems like a really hard project
- why would we want to write two or three or five?

again, right now it's simply impossible to any bright person to work on
dtms. It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future. That's why we propose
this patch, let's play the game !

Show quoted text

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19

robertmhaas@gmail.com

over 10 years ago

In reply to: Oleg Bartunov (#18)

Re: The plan for FDW-based sharding

On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

Right now tm is hardcoded and it's doesn't matter "if other people might
need" at all. We at least provide developers ("other people") ability to
work on their implementations and the patch is safe and doesn't sacrifices
anything in core.

I don't believe that. When we install APIs into core, we're
committing to keep those APIs around. And I think that we're far too
early in the development of transaction managers for PostgreSQL to
think that we know what APIs we want to commit to over the long term.

And what makes us think we
really need multiple transaction managers, anyway?

If you brave enough to say that one tm-fits-all and you are able to teach
existed tm to play well in various clustering environment during
development period, which is short, than probably we don't need multiple
tms. But It's too perfect to believe and practical solution is to let
multiple groups to work on their solutions.

Nobody's preventing multiple groups for working on their solutions.
That's not the question. The question is why we should install hooks
in core at this early stage without waiting to see which
implementations prove to be best and whether those hooks are actually
general enough to cater to everything people want to do. There is
talk of integrating XC/XL work into PostgreSQL; it has a GTM.
Postgres Pro has several GTMs. Maybe there will be others.

Frankly, I'd like to see a GTM in core at some point because I'd like
everybody who uses PostgreSQL to have access to a GTM. What I don't
want is for every PostgreSQL company to develop its own GTM and
distribute it separately from everybody else's. IIUC, MySQL kinda did
that with storage engines and it resulted in the fragmentation of the
community. We've had the same thing happen with replication tools -
every PostgreSQL company develops their own set. It would have been
better to have ONE set that was distributed by the core project so
that we didn't all do the same work over again.

I don't understand the argument that without these hooks in core,
people can't continue to work on this. It isn't hard to work on GTM
without any core changes at all. You just patch your copy of
PostgreSQL. We do this all the time, for every patch. We don't add
hooks for every patch.

dtms. It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future. That's why we propose
this patch, let's play the game !

I don't like to play games with the architecture of PostgreSQL.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Joshua D. Drake

jd@commandprompt.com

over 10 years ago

In reply to: Robert Haas (#19)

Re: The plan for FDW-based sharding

On 02/26/2016 08:06 AM, Robert Haas wrote:

On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

Right now tm is hardcoded and it's doesn't matter "if other people might
need" at all. We at least provide developers ("other people") ability to
work on their implementations and the patch is safe and doesn't sacrifices
anything in core.

I don't believe that. When we install APIs into core, we're
committing to keep those APIs around. And I think that we're far too
early in the development of transaction managers for PostgreSQL to
think that we know what APIs we want to commit to over the long term.

Correct.

[snip]

Frankly, I'd like to see a GTM in core at some point because I'd like
everybody who uses PostgreSQL to have access to a GTM. What I don't
want is for every PostgreSQL company to develop its own GTM and
distribute it separately from everybody else's. IIUC, MySQL kinda did
that with storage engines and it resulted in the fragmentation of the
community.

No it didn't. It allowed MySQL people to use the tool that best fit
their needs.

We've had the same thing happen with replication tools -
every PostgreSQL company develops their own set. It would have been
better to have ONE set that was distributed by the core project so
that we didn't all do the same work over again.

The reason people developed a bunch of external replication tools (and
continue to) is because .Org has shown a unique lack of leadership in
providing solutions for the problem. Historically speaking .Org was anti
replication in core. It wasn't about who was going to be best. It was
who was going to be best for what problem. The inclusion of the
replication tools we have now speaks very loudly to the that lack of
leadership.

The moment .Org showed leadership and developed a reasonable solution to
80% of the problem, a great majority of people moved to hot standby and
streaming replication. It is easy. It does not answer all the questions
but it is default, in core and that gives people piece of mind. This is
also why once PgLogical is up to -core quality and in -core, the great
majority of people will work to dump Slony/Londiste/Insertproghere and
use PgLogical.

If .Org was interested in showing leadership in this area, a few hackers
would get together with a few other hackers from XL and XC (although as
I understand it XL is further along), have a few heart to heart, mind to
mind meetings and determine:

* Are either of these two solutions worth it?
Yes? Then let's start working on an integration plan and get it done.
No? Then let's start working on a .Org plan to solve that problem.

But that likely won't happen because NIH.

I don't understand the argument that without these hooks in core,
people can't continue to work on this. It isn't hard to work on GTM
without any core changes at all. You just patch your copy of
PostgreSQL. We do this all the time, for every patch. We don't add
hooks for every patch.

dtms. It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future. That's why we propose
this patch, let's play the game !

I don't like to play games with the architecture of PostgreSQL.

Robert, this is all a game. It is a game of who wins the intellectual
prize to whatever problem. Who gets the market or mind share and who
gets to pretend they win the Oscar for coolest design.

Sincerely,

--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

robertmhaas@gmail.com

over 10 years ago

In reply to: Joshua D. Drake (#20)

#22

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Robert Haas (#19)

#23

Alvaro Herrera

alvherre@2ndquadrant.com

over 10 years ago

In reply to: Konstantin Knizhnik (#22)

#24

bruce@momjian.us

over 10 years ago

In reply to: Alvaro Herrera (#23)

#25

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Alvaro Herrera (#23)

#26

Kevin.Grittner@wicourts.gov

over 10 years ago

In reply to: Konstantin Knizhnik (#25)

#27

simon@2ndQuadrant.com

over 10 years ago

In reply to: Kevin Grittner (#26)

#28

robertmhaas@gmail.com

over 10 years ago

In reply to: Konstantin Knizhnik (#22)

#29

robertmhaas@gmail.com

over 10 years ago

In reply to: Konstantin Knizhnik (#25)

#30

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Robert Haas (#29)

#31

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Robert Haas (#28)

#32

Álvaro Hernández Tortosa

aht@8kdata.com

over 10 years ago

In reply to: Konstantin Knizhnik (#31)

#33

Kevin.Grittner@wicourts.gov

over 10 years ago

In reply to: Simon Riggs (#27)

#34

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Kevin Grittner (#26)

#35

Kevin.Grittner@wicourts.gov

over 10 years ago

In reply to: Konstantin Knizhnik (#34)

#36

simon@2ndQuadrant.com

over 10 years ago

In reply to: Kevin Grittner (#33)

#37

Kevin.Grittner@wicourts.gov

over 10 years ago

In reply to: Simon Riggs (#36)

#38

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Kevin Grittner (#35)

#39

simon@2ndQuadrant.com

over 10 years ago

In reply to: Kevin Grittner (#37)

#40

robertmhaas@gmail.com

over 10 years ago

In reply to: Konstantin Knizhnik (#30)

#41

bruce@momjian.us

over 10 years ago

In reply to: Robert Haas (#40)

#42

robertmhaas@gmail.com

over 10 years ago

In reply to: Bruce Momjian (#41)

#43

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Robert Haas (#40)

#44

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Robert Haas (#42)

#45

Petr Jelinek

petr@2ndquadrant.com

over 10 years ago

In reply to: Konstantin Knizhnik (#44)

#46

Petr Jelinek

petr@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#28)

#47

bruce@momjian.us

over 10 years ago

In reply to: Petr Jelinek (#46)

#48

bruce@momjian.us

over 10 years ago

In reply to: Bruce Momjian (#47)

#49

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Petr Jelinek (#45)

#50

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 10 years ago

In reply to: Bruce Momjian (#47)

#51

oleg@sai.msu.su

over 10 years ago

In reply to: Robert Haas (#42)

#52

oleg@sai.msu.su

over 10 years ago

In reply to: Tomas Vondra (#50)

#53

aekorotkov@gmail.com

over 10 years ago

In reply to: Robert Haas (#42)

#54

aekorotkov@gmail.com

over 10 years ago

In reply to: Bruce Momjian (#48)

#55

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Bruce Momjian (#47)

#56

Josh Berkus

josh@agliodbs.com

over 10 years ago

In reply to: Bruce Momjian (#1)

#57

aekorotkov@gmail.com

over 10 years ago

In reply to: Josh Berkus (#56)

#58

Michael Paquier

michael@paquier.xyz

over 10 years ago

In reply to: Alexander Korotkov (#54)

#59

Tatsuo Ishii

ishii@postgresql.org

over 10 years ago

In reply to: Michael Paquier (#58)

#60

oleg@sai.msu.su

over 10 years ago

In reply to: Michael Paquier (#58)

#61

robertmhaas@gmail.com

over 10 years ago

In reply to: Josh Berkus (#56)

#62

robertmhaas@gmail.com

over 10 years ago

In reply to: Konstantin Knizhnik (#43)

#63

Joshua D. Drake

jd@commandprompt.com

over 10 years ago

In reply to: Robert Haas (#61)

#64

robertmhaas@gmail.com

over 10 years ago

In reply to: Joshua D. Drake (#63)

#65

craig@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#28)

#66

craig@2ndquadrant.com

over 10 years ago

In reply to: Konstantin Knizhnik (#30)

#67

craig@2ndquadrant.com

over 10 years ago

In reply to: Kevin Grittner (#37)

#68

craig@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#42)

#69

craig@2ndquadrant.com

over 10 years ago

In reply to: Bruce Momjian (#47)

#70

Kevin.Grittner@wicourts.gov

over 10 years ago

In reply to: Craig Ringer (#67)

#71

Peter Geoghegan

pg@bowt.ie

over 10 years ago

In reply to: Robert Haas (#61)

#72

Thom Brown

thombrown@gmail.com

over 10 years ago

In reply to: Peter Geoghegan (#71)

#73

robertmhaas@gmail.com

over 10 years ago

In reply to: Craig Ringer (#65)

#74

robertmhaas@gmail.com

over 10 years ago

In reply to: Craig Ringer (#66)

#75

k.knizhnik@postgrespro.ru

over 10 years ago

In reply to: Robert Haas (#74)

#76

craig@2ndquadrant.com

over 10 years ago

In reply to: Kevin Grittner (#70)

#77

robertmhaas@gmail.com

over 10 years ago

In reply to: Craig Ringer (#69)

#78

Kevin.Grittner@wicourts.gov

over 10 years ago

In reply to: Craig Ringer (#76)

#79

craig@2ndquadrant.com

over 10 years ago

In reply to: Robert Haas (#77)

#80

oleg@sai.msu.su

over 10 years ago

In reply to: Craig Ringer (#79)

#81

bruce@momjian.us

over 10 years ago

In reply to: Josh Berkus (#56)

#82

craig@2ndquadrant.com

over 10 years ago

In reply to: Bruce Momjian (#81)

#83

bruce@momjian.us

over 10 years ago

In reply to: Craig Ringer (#82)

#84

oleg@sai.msu.su

over 10 years ago

In reply to: Bruce Momjian (#81)

#85