Set new system identifier using pg_resetxlog

Started by Petr Jelinekalmost 12 years ago49 messageshackers

petr@2ndquadrant.com

almost 12 years ago

Hello,

attached is a simple patch which makes it possible to change the system
identifier of the cluster in pg_control. This is useful for
individualization of the instance that is started on top of data
directory produced by pg_basebackup - something that's helpful for
logical replication setup where you need to easily identify each node
(it's used by Bidirectional Replication for example).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Petr Jelinek (#1)

Re: Set new system identifier using pg_resetxlog

On Fri, Jun 13, 2014 at 8:31 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

attached is a simple patch which makes it possible to change the system
identifier of the cluster in pg_control. This is useful for
individualization of the instance that is started on top of data directory
produced by pg_basebackup - something that's helpful for logical replication
setup where you need to easily identify each node (it's used by
Bidirectional Replication for example).

I can clearly understand the utility of being able to reset the system
ID to a new, randomly-generated system ID - but giving the user the
ability to set a particular value of their own choosing seems like a
pretty sharp tool. What is the use case for that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr@2ndquadrant.com

almost 12 years ago

In reply to: Robert Haas (#2)

Re: Set new system identifier using pg_resetxlog

On 17/06/14 16:18, Robert Haas wrote:

On Fri, Jun 13, 2014 at 8:31 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:

attached is a simple patch which makes it possible to change the system
identifier of the cluster in pg_control. This is useful for
individualization of the instance that is started on top of data directory
produced by pg_basebackup - something that's helpful for logical replication
setup where you need to easily identify each node (it's used by
Bidirectional Replication for example).

I can clearly understand the utility of being able to reset the system
ID to a new, randomly-generated system ID - but giving the user the
ability to set a particular value of their own choosing seems like a
pretty sharp tool. What is the use case for that?

Let's say you want to initialize new logical replication node via
pg_basebackup and you want your replication slots to be easily
identifiable so you use your local system id as part of the slot name.

In that case you need to know the future system id of the node because
you need to register the slot before consistent point to which you
replay via streaming replication (and you can't replay anymore once you
changed the system id). Which means you need to generate your system id
in advance and be able to change it in pg_control later.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Petr Jelinek (#3)

Re: Set new system identifier using pg_resetxlog

On Tue, Jun 17, 2014 at 10:33 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:

On 17/06/14 16:18, Robert Haas wrote:

On Fri, Jun 13, 2014 at 8:31 PM, Petr Jelinek <petr@2ndquadrant.com>
wrote:

attached is a simple patch which makes it possible to change the system
identifier of the cluster in pg_control. This is useful for
individualization of the instance that is started on top of data
directory
produced by pg_basebackup - something that's helpful for logical
replication
setup where you need to easily identify each node (it's used by
Bidirectional Replication for example).

I can clearly understand the utility of being able to reset the system
ID to a new, randomly-generated system ID - but giving the user the
ability to set a particular value of their own choosing seems like a
pretty sharp tool. What is the use case for that?

Let's say you want to initialize new logical replication node via
pg_basebackup and you want your replication slots to be easily identifiable
so you use your local system id as part of the slot name.

In that case you need to know the future system id of the node because you
need to register the slot before consistent point to which you replay via
streaming replication (and you can't replay anymore once you changed the
system id). Which means you need to generate your system id in advance and
be able to change it in pg_control later.

Hmm. I guess that makes sense.

But it seems to me that we might need to have a process discussion
here, because, while I'm all in favor of incremental feature proposals
that build towards a larger goal, it currently appears that the larger
goal toward which you are building is not something that's been
publicly discussed and debated on this list. And I really think we
need to have that conversation. Obviously, individual patches will
still need to be debated, but I feel like 2ndQuadrant is trying to
construct a castle without showing the community the floor plan. I
believe that there is relatively broad agreement that we would all
like a castle, but different people may have legitimately different
ideas about how it should be constructed. If the work arrives as a
series of disconnected pieces (user-specified system ID, event
triggers for CREATE, etc.), then everyone outside of 2ndQuadrant has
to take it on faith that those pieces are going to eventually fit
together in a way that we'll all be happy with. In some cases, that's
fine, because the feature is useful on its own merits whether it ends
up being part of the castle or not.

But in other cases, like this one, if the premise that the slot name
should match the system identifier isn't something the community wants
to accept, then taking a patch that lets people do that is probably a
bad idea, because at least one person will use it to set the system
identifier of a system to a value that enables physical replication to
take place when that is actually totally unsafe, and we don't want to
enable that for no reason. Maybe the slot name should match the
replication identifier rather than the standby system ID, for example.
There are conflicting proposals for how replication identifiers should
work, but one of those proposals limits it to 16 bits. If we're going
to have multiple identifiers floating around anyway, I'd rather have a
slot called 7 than one called 6024402925054484590. On the other hand,
maybe there's going to be a new proposal to use the database system
identifier as a replication identifier, which might be a fine idea and
which would demolish that argument.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Robert Haas (#4)

Re: Set new system identifier using pg_resetxlog

On 2014-06-17 12:07:04 -0400, Robert Haas wrote:

On Tue, Jun 17, 2014 at 10:33 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:

On 17/06/14 16:18, Robert Haas wrote:

On Fri, Jun 13, 2014 at 8:31 PM, Petr Jelinek <petr@2ndquadrant.com>
wrote:

attached is a simple patch which makes it possible to change the system
identifier of the cluster in pg_control. This is useful for
individualization of the instance that is started on top of data
directory
produced by pg_basebackup - something that's helpful for logical
replication
setup where you need to easily identify each node (it's used by
Bidirectional Replication for example).

I can clearly understand the utility of being able to reset the system
ID to a new, randomly-generated system ID - but giving the user the
ability to set a particular value of their own choosing seems like a
pretty sharp tool. What is the use case for that?

I've previously hacked this up adhoc during data recovery when I needed
to make another cluster similar enough that I could replay WAL.

Another usecase is to mark a database as independent from its
origin. Imagine a database that gets sharded across several
servers. It's not uncommon to do that by initially basebackup'ing the
database to several nodes and then use them separately from
thereon. It's quite useful to actually mark them as being
distinct. Especially as several of them right now would end up with the
same timeline id...

But it seems to me that we might need to have a process discussion
here, because, while I'm all in favor of incremental feature proposals
that build towards a larger goal, it currently appears that the larger
goal toward which you are building is not something that's been
publicly discussed and debated on this list. And I really think we
need to have that conversation. Obviously, individual patches will
still need to be debated, but I feel like 2ndQuadrant is trying to
construct a castle without showing the community the floor plan. I
believe that there is relatively broad agreement that we would all
like a castle, but different people may have legitimately different
ideas about how it should be constructed. If the work arrives as a
series of disconnected pieces (user-specified system ID, event
triggers for CREATE, etc.), then everyone outside of 2ndQuadrant has
to take it on faith that those pieces are going to eventually fit
together in a way that we'll all be happy with. In some cases, that's
fine, because the feature is useful on its own merits whether it ends
up being part of the castle or not.

Uh. Right now this patch has been written because it's needed for a out
of core replication solution. That's what BDR is at this point. The
patch is unobtrusive, has other usecases than just our internal one and
doesn't make pg_resetxlog even more dangerous than it already is. I
don't see much problem with considering it on it's own cost/benefit?

So this seems to be a concern that's relatively independent of this
patch. Am I seing that right?

I think one very important point here is that BDR is *not* the proposed
in core solution. I think a reasonable community perspective - besides
also being useful on it's own - is to view it as a *prototype* for a in
core solution. And e.g. logical decoding would have looked much worse -
and likely not have been integrated - without externally already being
used for BDR.

I'm not sure how we can ease or even resolve your conerns when talking
about pretty independent and general pieces of functionality like the
DDL even trigger stuff. We needed to actually *write* those to see how
BDR will look like. And the communities feedback heavily influenced how
BDR looks like by accepting some pieces, demanding others, and outright
rejecting the remainder.

I think there's some pieces that need to consider them on their own
merit. Logical decoding is useful on it's own. The ability for out of
core systems to do DDL replication is another piece (that you referred
to above).
I think the likelihood of success if we were to try to design a in-core
system from ground up first and then follow through prety exactly along
those lines is minimal.

So, what I think we can do is to continue trying to build independent,
generally useful bits. Which imo all the stuff that's been integrated
is. Then, somewhat soon I think, we'll have to come up with a proposal
how the parts that are *not* necessarily useful outside of in-core
logical rep. might look like. Which will likely trigger some long long
discussions that turn that design around a couple of times. Which is
fine. I *don't* think that's going to be a trimmed down version of
todays BDR.

But in other cases, like this one, if the premise that the slot name
should match the system identifier isn't something the community wants
to accept, then taking a patch that lets people do that is probably a
bad idea, because at least one person will use it to set the system
identifier of a system to a value that enables physical replication to
take place when that is actually totally unsafe, and we don't want to
enable that for no reason.

It also allows many other dangerous things. Many of which are much more
dangerous than changing the system identifier. Resetting an independent
cluster is also not very likely to work - the LSNs would still not
match. But it wouldn't corrupt the copy of the database that's been
changed...

Maybe the slot name should match the
replication identifier rather than the standby system ID, for example.
There are conflicting proposals for how replication identifiers should
work, but one of those proposals limits it to 16 bits.

I actually don't think any of the discussions I was involved in had the
externally visible version of replication identifiers limited to 16bits?
If you are referring to my patch, 16bits was just the width of the
*internal* name that should basically never be looked at. User visible
replication identifiers are always identified by an arbitrary string -
whose format is determined by the user of the replication identifier
facility. *BDR* currently stores the system identifer, the database id
and a name in there - but that's nothing core needs to concern itself
with.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Andres Freund (#5)

Re: Set new system identifier using pg_resetxlog

On Tue, Jun 17, 2014 at 12:50 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I can clearly understand the utility of being able to reset the system
ID to a new, randomly-generated system ID - but giving the user the
ability to set a particular value of their own choosing seems like a
pretty sharp tool. What is the use case for that?

I've previously hacked this up adhoc during data recovery when I needed
to make another cluster similar enough that I could replay WAL.

Another usecase is to mark a database as independent from its
origin. Imagine a database that gets sharded across several
servers. It's not uncommon to do that by initially basebackup'ing the
database to several nodes and then use them separately from
thereon. It's quite useful to actually mark them as being
distinct. Especially as several of them right now would end up with the
same timeline id...

Sure, but that only requires being able to reset the ID randomly, not
to a particular value.

But it seems to me that we might need to have a process discussion
here, because, while I'm all in favor of incremental feature proposals
that build towards a larger goal, it currently appears that the larger
goal toward which you are building is not something that's been
publicly discussed and debated on this list. And I really think we
need to have that conversation. Obviously, individual patches will
still need to be debated, but I feel like 2ndQuadrant is trying to
construct a castle without showing the community the floor plan. I
believe that there is relatively broad agreement that we would all
like a castle, but different people may have legitimately different
ideas about how it should be constructed. If the work arrives as a
series of disconnected pieces (user-specified system ID, event
triggers for CREATE, etc.), then everyone outside of 2ndQuadrant has
to take it on faith that those pieces are going to eventually fit
together in a way that we'll all be happy with. In some cases, that's
fine, because the feature is useful on its own merits whether it ends
up being part of the castle or not.

Uh. Right now this patch has been written because it's needed for a out
of core replication solution. That's what BDR is at this point. The
patch is unobtrusive, has other usecases than just our internal one and
doesn't make pg_resetxlog even more dangerous than it already is. I
don't see much problem with considering it on it's own cost/benefit?

Well, I think it *does* make pg_resetxlog more dangerous; see previous
discussion of pg_computemaxlsn.

So this seems to be a concern that's relatively independent of this
patch. Am I seing that right?

Partially; not completely.

I actually don't think any of the discussions I was involved in had the
externally visible version of replication identifiers limited to 16bits?
If you are referring to my patch, 16bits was just the width of the
*internal* name that should basically never be looked at. User visible
replication identifiers are always identified by an arbitrary string -
whose format is determined by the user of the replication identifier
facility. *BDR* currently stores the system identifer, the database id
and a name in there - but that's nothing core needs to concern itself
with.

I don't think you're going to be able to avoid users needing to know
about those IDs. The configuration table is going to have to be the
same on all nodes, and how are you going to get that set up without
those IDs being user-visible?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Robert Haas (#6)

Re: replication identifier format

On 2014-06-18 12:36:13 -0400, Robert Haas wrote:

I actually don't think any of the discussions I was involved in had the
externally visible version of replication identifiers limited to 16bits?
If you are referring to my patch, 16bits was just the width of the
*internal* name that should basically never be looked at. User visible
replication identifiers are always identified by an arbitrary string -
whose format is determined by the user of the replication identifier
facility. *BDR* currently stores the system identifer, the database id
and a name in there - but that's nothing core needs to concern itself
with.

I don't think you're going to be able to avoid users needing to know
about those IDs. The configuration table is going to have to be the
same on all nodes, and how are you going to get that set up without
those IDs being user-visible?

Why? Users and other systems only ever see the external ID. Everything
leaving the system is converted to the external form. The short id
basically is only used in shared memory and in wal records. For both
using longer strings would be problematic.

In the patch I have the user can actually see them as they're stored in
pg_replication_identifier, but there should never be a need for that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Robert Haas (#6)

Re: Set new system identifier using pg_resetxlog

On 2014-06-18 12:36:13 -0400, Robert Haas wrote:

On Tue, Jun 17, 2014 at 12:50 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I can clearly understand the utility of being able to reset the system
ID to a new, randomly-generated system ID - but giving the user the
ability to set a particular value of their own choosing seems like a
pretty sharp tool. What is the use case for that?

I've previously hacked this up adhoc during data recovery when I needed
to make another cluster similar enough that I could replay WAL.

Another usecase is to mark a database as independent from its
origin. Imagine a database that gets sharded across several
servers. It's not uncommon to do that by initially basebackup'ing the
database to several nodes and then use them separately from
thereon. It's quite useful to actually mark them as being
distinct. Especially as several of them right now would end up with the
same timeline id...

Sure, but that only requires being able to reset the ID randomly, not
to a particular value.

I can definitely see a point in a version of the option that generates
the id randomly. But the use case one up actually does require setting
it to a s specific value...

Uh. Right now this patch has been written because it's needed for a out
of core replication solution. That's what BDR is at this point. The
patch is unobtrusive, has other usecases than just our internal one and
doesn't make pg_resetxlog even more dangerous than it already is. I
don't see much problem with considering it on it's own cost/benefit?

Well, I think it *does* make pg_resetxlog more dangerous; see previous
discussion of pg_computemaxlsn.

Wasn't the thing around pg_computemaxlsn that there's actually no case
where it could be used safely? And that experienced people suggested to
use it an unsafe fashion?
I don't see how the proposed ability makes it more dangerous. It
*already* has the ability to reset the timelineid. That's the case where
users are much more likely to think about resetting it because that's
much more plausible than taking a unrelated cluster and resetting its
sysid, timeline and LSN.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Andres Freund (#8)

Re: Set new system identifier using pg_resetxlog

On 2014-06-18 18:54:05 +0200, Andres Freund wrote:

On 2014-06-18 12:36:13 -0400, Robert Haas wrote:

Sure, but that only requires being able to reset the ID randomly, not
to a particular value.

I can definitely see a point in a version of the option that generates
the id randomly.

That's actually included in the patch btw (thanks Abhijit)...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Andres Freund (#8)

Re: Set new system identifier using pg_resetxlog

On Wed, Jun 18, 2014 at 12:54 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Well, I think it *does* make pg_resetxlog more dangerous; see previous
discussion of pg_computemaxlsn.

Wasn't the thing around pg_computemaxlsn that there's actually no case
where it could be used safely? And that experienced people suggested to
use it an unsafe fashion?

There is a use case - to determine whether WAL has been replayed "from
the future" relative to the WAL stream and control file you have on
disk. But the rest is true enough.

I don't see how the proposed ability makes it more dangerous. It
*already* has the ability to reset the timelineid. That's the case where
users are much more likely to think about resetting it because that's
much more plausible than taking a unrelated cluster and resetting its
sysid, timeline and LSN.

All right, well, I've said my piece. I don't have anything to add to
that that isn't sheer repetition. My vote is to hold off on this
until we've talked about replication identifiers and other related
topics in more depth. But if that position doesn't garner majority
support ... so be it!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Petr Jelinek

petr@2ndquadrant.com

almost 12 years ago

In reply to: Robert Haas (#10)

Re: Set new system identifier using pg_resetxlog

On 18/06/14 19:26, Robert Haas wrote:

On Wed, Jun 18, 2014 at 12:54 PM, Andres Freund <andres@2ndquadrant.com> wrote:

I don't see how the proposed ability makes it more dangerous. It
*already* has the ability to reset the timelineid. That's the case where
users are much more likely to think about resetting it because that's
much more plausible than taking a unrelated cluster and resetting its
sysid, timeline and LSN.

All right, well, I've said my piece. I don't have anything to add to
that that isn't sheer repetition. My vote is to hold off on this
until we've talked about replication identifiers and other related
topics in more depth. But if that position doesn't garner majority
support ... so be it!

I am not sure I get what does this have to do with replication
identifiers. The patch has several use-cases, one of them has to do that
you can know the future system id before you set it, which is useful for
automating some things...

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Robert Haas (#10)

Re: Set new system identifier using pg_resetxlog

On 2014-06-18 13:26:37 -0400, Robert Haas wrote:

My vote is to hold off on this until we've talked about replication
identifiers and other related topics in more depth.

I really don't understand this. We're *NOT* proposing this patch as an
underhanded way of preempting the discussion of whether/how replication
identifiers are going to be used. We're proposing it because we
currently have a need for the facility and this will reduce the number
of patches we have to keep around after 9.5. And more importantly
because there's several other use cases than our internal one for it.

To settle one more point: I am not planning to propose BDR's generation
of replication identifier names for integration. It works well enough
for BDR but I think we can come up with something better for core. If I
had my current knowledge two years back I'd not have chosen the current
scheme.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Josh Berkus

josh@agliodbs.com

almost 12 years ago

In reply to: Petr Jelinek (#1)

Re: Set new system identifier using pg_resetxlog

On 06/13/2014 05:31 PM, Petr Jelinek wrote:

Hello,

attached is a simple patch which makes it possible to change the system
identifier of the cluster in pg_control. This is useful for
individualization of the instance that is started on top of data
directory produced by pg_basebackup - something that's helpful for
logical replication setup where you need to easily identify each node
(it's used by Bidirectional Replication for example).

I'm unclear on why we would overload pg_resetxlog for this. Wouldn't it
be better design to have an independant function,
"pg_set_system_identifier"?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM1b747038b1b89a9a95140c9c68e92a391daa5262e9660946d488c58ba4df0b1ecf4259d997d9ac3a743c5736b519c291@asav-1.01.com

#14

Alvaro Herrera

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Josh Berkus (#13)

Re: Set new system identifier using pg_resetxlog

Josh Berkus wrote:

On 06/13/2014 05:31 PM, Petr Jelinek wrote:

Hello,

attached is a simple patch which makes it possible to change the system
identifier of the cluster in pg_control. This is useful for
individualization of the instance that is started on top of data
directory produced by pg_basebackup - something that's helpful for
logical replication setup where you need to easily identify each node
(it's used by Bidirectional Replication for example).

I'm unclear on why we would overload pg_resetxlog for this. Wouldn't it
be better design to have an independant function,
"pg_set_system_identifier"?

We have overloaded pg_resetxlog for all pg_control-tweaking needs. I
don't see any reason to do differently here.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Josh Berkus (#13)

Re: Set new system identifier using pg_resetxlog

On 2014-06-18 10:44:56 -0700, Josh Berkus wrote:

On 06/13/2014 05:31 PM, Petr Jelinek wrote:

Hello,

attached is a simple patch which makes it possible to change the system
identifier of the cluster in pg_control. This is useful for
individualization of the instance that is started on top of data
directory produced by pg_basebackup - something that's helpful for
logical replication setup where you need to easily identify each node
(it's used by Bidirectional Replication for example).

I'm unclear on why we would overload pg_resetxlog for this. Wouldn't it
be better design to have an independant function,
"pg_set_system_identifier"?

You mean an independent binary? Because it's not possible to change this
at runtime.

If so, it's because pg_resetxlog already has the option to change many
related things (e.g. the timeline id). And it'd require another copy of
several hundred lines of code. It's all stored in the control file/checkpoints.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Abhijit Menon-Sen

ams@2ndQuadrant.com

almost 12 years ago

In reply to: Josh Berkus (#13)

Re: Set new system identifier using pg_resetxlog

At 2014-06-18 10:44:56 -0700, josh@agliodbs.com wrote:

I'm unclear on why we would overload pg_resetxlog for this.

Because pg_resetxlog already does something very similar, so the patch
is small. If it were independent, it would have to copy quite some code
from pg_resetxlog.

Wouldn't it be better design to have an independant function,
"pg_set_system_identifier"?

A *function*? Why?

-- Abhijit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Josh Berkus

josh@agliodbs.com

almost 12 years ago

In reply to: Petr Jelinek (#1)

Re: Set new system identifier using pg_resetxlog

On 06/18/2014 10:48 AM, Abhijit Menon-Sen wrote:

At 2014-06-18 10:44:56 -0700, josh@agliodbs.com wrote:

I'm unclear on why we would overload pg_resetxlog for this.

Because pg_resetxlog already does something very similar, so the patch
is small. If it were independent, it would have to copy quite some code
from pg_resetxlog.

Aha. In that case, it seems like it's time to rename pg_resetxlog, if
it does a bunch of things that aren't resetting the xlog.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMf3eb6580d86f79b3c28b6048297e37273b042f9558559c55e1b66df8eb8b3d965bb0691d9d589a1392ddb4580001d4b4@asav-3.01.com

#18

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Josh Berkus (#17)

Re: Set new system identifier using pg_resetxlog

On 2014-06-18 10:59:59 -0700, Josh Berkus wrote:

On 06/18/2014 10:48 AM, Abhijit Menon-Sen wrote:

At 2014-06-18 10:44:56 -0700, josh@agliodbs.com wrote:

I'm unclear on why we would overload pg_resetxlog for this.

Because pg_resetxlog already does something very similar, so the patch
is small. If it were independent, it would have to copy quite some code
from pg_resetxlog.

Aha. In that case, it seems like it's time to rename pg_resetxlog, if
it does a bunch of things that aren't resetting the xlog.

Well, all those actually do write to the xlog (to write a new
checkpoint, containing the updated control file). Since pg_resetxlog has
done all this pretty much since forever renaming it now seems to be a
big hassle for users for pretty much no benefit? This isn't a tool the
average user should ever touch.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Josh Berkus

josh@agliodbs.com

almost 12 years ago

In reply to: Petr Jelinek (#1)

Re: Set new system identifier using pg_resetxlog

On 06/18/2014 11:03 AM, Andres Freund wrote:

Well, all those actually do write to the xlog (to write a new
checkpoint, containing the updated control file). Since pg_resetxlog has
done all this pretty much since forever renaming it now seems to be a
big hassle for users for pretty much no benefit? This isn't a tool the
average user should ever touch.

If we're using it to create a unique system ID which can be used to
orchestrate replication and clustering systems, a lot more people are
going to be touching it than ever did before -- and not just for BDR.

Or are you saying that we have to destroy the data by resetting the xlog
before we can change the system identifier? If so, this feature is less
than completely useful ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMde44d888b222e4937706133430aead838900bd124b5636616223d4a88008d8ed576c5c8415ced971c76b7019bc569b94@asav-1.01.com

#20

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Andres Freund (#7)

Re: replication identifier format

On Wed, Jun 18, 2014 at 12:46 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-06-18 12:36:13 -0400, Robert Haas wrote:

I actually don't think any of the discussions I was involved in had the
externally visible version of replication identifiers limited to 16bits?
If you are referring to my patch, 16bits was just the width of the
*internal* name that should basically never be looked at. User visible
replication identifiers are always identified by an arbitrary string -
whose format is determined by the user of the replication identifier
facility. *BDR* currently stores the system identifer, the database id
and a name in there - but that's nothing core needs to concern itself
with.

I don't think you're going to be able to avoid users needing to know
about those IDs. The configuration table is going to have to be the
same on all nodes, and how are you going to get that set up without
those IDs being user-visible?

Why? Users and other systems only ever see the external ID. Everything
leaving the system is converted to the external form. The short id
basically is only used in shared memory and in wal records. For both
using longer strings would be problematic.

In the patch I have the user can actually see them as they're stored in
pg_replication_identifier, but there should never be a need for that.

Hmm, so there's no requirement that the short IDs are consistent
across different clusters that are replication to each other? If
that's the case, that might address my concern, but I'd probably want
to go back through the latest patch and think about it a bit more.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Robert Haas (#20)

#22

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Andres Freund (#21)

#23

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Robert Haas (#22)

#24

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Andres Freund (#23)

#25

Masahiko Sawada

sawada.mshk@gmail.com

almost 12 years ago

In reply to: Petr Jelinek (#11)

#26

Michael Paquier

michael@paquier.xyz

almost 12 years ago

In reply to: Masahiko Sawada (#25)

#27

Petr Jelinek

petr@2ndquadrant.com

almost 12 years ago

In reply to: Masahiko Sawada (#25)

#28

Masahiko Sawada

sawada.mshk@gmail.com

almost 12 years ago

In reply to: Petr Jelinek (#27)

#29

Petr Jelinek

petr@2ndquadrant.com

almost 12 years ago

In reply to: Masahiko Sawada (#28)

#30

Abhijit Menon-Sen

ams@2ndQuadrant.com

almost 12 years ago

In reply to: Petr Jelinek (#29)

#31

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Abhijit Menon-Sen (#30)

#32

Fujii Masao

masao.fujii@gmail.com

almost 12 years ago

In reply to: Andres Freund (#31)

#33

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Fujii Masao (#32)

#34

Alvaro Herrera

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Andres Freund (#33)

#35

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Alvaro Herrera (#34)

#36

Alvaro Herrera

alvherre@2ndquadrant.com

almost 12 years ago

In reply to: Robert Haas (#35)

#37

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Robert Haas (#35)

#38

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Andres Freund (#37)

#39

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Petr Jelinek (#29)

#40

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Andres Freund (#39)

#41

Josh Berkus

josh@agliodbs.com

almost 12 years ago

In reply to: Robert Haas (#10)

#42

Petr Jelinek

petr@2ndquadrant.com

almost 12 years ago

In reply to: Andres Freund (#39)

#43

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Petr Jelinek (#42)

#44

Petr Jelinek

petr@2ndquadrant.com

almost 12 years ago

In reply to: Andres Freund (#43)

#45

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 12 years ago

In reply to: Josh Berkus (#19)

#46

Tom Lane

tgl@sss.pgh.pa.us

almost 12 years ago

In reply to: Heikki Linnakangas (#45)

#47

Andres Freund

andres@anarazel.de

almost 12 years ago

In reply to: Tom Lane (#46)

#48

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 12 years ago

In reply to: Tom Lane (#46)

#49

Robert Haas

robertmhaas@gmail.com

almost 12 years ago

In reply to: Heikki Linnakangas (#48)

Set new system identifier using pg_resetxlog

Attachments: