pg_system_identifier()

Started by Vik Fearingalmost 13 years ago43 messageshackers

vik@postgresfriends.org

almost 13 years ago

After someone in IRC asked if there was an equivalent to MySQL's
server_id, it was noted that we do have a system identifier but it's not
very accessible.

The attached patch implements a pg_system_identifier() function that
exposes it.

Shall I add this to the next commitfest?

--
Vik

Fujii Masao

masao.fujii@gmail.com

almost 13 years ago

In reply to: Vik Fearing (#1)

Re: pg_system_identifier()

On Thu, Aug 22, 2013 at 6:43 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:

After someone in IRC asked if there was an equivalent to MySQL's
server_id, it was noted that we do have a system identifier but it's not
very accessible.

The attached patch implements a pg_system_identifier() function that
exposes it.

What's the use case?

IIUC, PostgreSQL's system identifier is not equal to MySQL server-id.
In PostgreSQL, in replication, the master and all the standbys must
have the same system identifier. OTOH, in MySQL, they have the different
server-ids. No?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Vik Fearing

vik@postgresfriends.org

almost 13 years ago

In reply to: Fujii Masao (#2)

Re: pg_system_identifier()

On 08/22/2013 02:51 PM, Fujii Masao wrote:

On Thu, Aug 22, 2013 at 6:43 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:

After someone in IRC asked if there was an equivalent to MySQL's
server_id, it was noted that we do have a system identifier but it's not
very accessible.

The attached patch implements a pg_system_identifier() function that
exposes it.

What's the use case?

It's information about the server that's only accessible through
pg_controldata. I don't know if that's justification enough, which is
why I didn't add it to the commitfest yet.

IIUC, PostgreSQL's system identifier is not equal to MySQL server-id.
In PostgreSQL, in replication, the master and all the standbys must
have the same system identifier. OTOH, in MySQL, they have the different
server-ids. No?

I have zero experience with MySQL.

--
Vik

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fujii Masao

masao.fujii@gmail.com

almost 13 years ago

In reply to: Vik Fearing (#3)

Re: pg_system_identifier()

On Thu, Aug 22, 2013 at 9:53 PM, Vik Fearing <vik.fearing@dalibo.com> wrote:

On 08/22/2013 02:51 PM, Fujii Masao wrote:

On Thu, Aug 22, 2013 at 6:43 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:

After someone in IRC asked if there was an equivalent to MySQL's
server_id, it was noted that we do have a system identifier but it's not
very accessible.

The attached patch implements a pg_system_identifier() function that
exposes it.

What's the use case?

It's information about the server that's only accessible through
pg_controldata.

BTW, you can see the system identifier by executing IDENTIFY_SYSTEM
command in replication connection as follows:

1. Change the server settings so that the server can accept the
replication connection
2. Connect to the server in replication mode
3. Execute IDENTIFY_SYSTEM command in replication connection

$ psql "replication=1"
=# IDENTIFY_SYSTEM;
systemid | timeline | xlogpos
---------------------+----------+-----------
5914930202950905854 | 1 | 0/183F720
(1 row)

This is not good way for a user, though ;P

I don't know if that's justification enough, which is
why I didn't add it to the commitfest yet.

You can add the patch to CF, and then hear the opinions from other people
during CF.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Vik Fearing

vik@postgresfriends.org

almost 13 years ago

In reply to: Fujii Masao (#4)

Re: pg_system_identifier()

On 08/22/2013 03:21 PM, Fujii Masao wrote:

I don't know if that's justification enough, which is
why I didn't add it to the commitfest yet.

You can add the patch to CF, and then hear the opinions from other people
during CF.

Added.

--
Vik

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

almost 13 years ago

In reply to: Fujii Masao (#2)

Re: pg_system_identifier()

On 2013-08-22 21:51:22 +0900, Fujii Masao wrote:

On Thu, Aug 22, 2013 at 6:43 AM, Vik Fearing <vik.fearing@dalibo.com> wrote:

After someone in IRC asked if there was an equivalent to MySQL's
server_id, it was noted that we do have a system identifier but it's not
very accessible.

The attached patch implements a pg_system_identifier() function that
exposes it.

What's the use case?

IIUC, PostgreSQL's system identifier is not equal to MySQL server-id.
In PostgreSQL, in replication, the master and all the standbys must
have the same system identifier. OTOH, in MySQL, they have the different
server-ids. No?

FWIW I've wished for that function repeatedly. Mostly just to make sure
I am actually connected to the same "network" of replicas and not some
other.
It's also useful if you're providing support for a limited number of
machines and you want some form of identifying a node.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

bricklen

bricklen@gmail.com

almost 13 years ago

In reply to: Andres Freund (#6)

Re: pg_system_identifier()

On Thu, Aug 22, 2013 at 6:42 AM, Andres Freund <andres@2ndquadrant.com>wrote:

FWIW I've wished for that function repeatedly. Mostly just to make sure
I am actually connected to the same "network" of replicas and not some
other.
It's also useful if you're providing support for a limited number of
machines and you want some form of identifying a node.

There's a "hostname" function at PGXN which serves some use-cases:
http://pgxn.org/dist/hostname/

Josh Berkus

josh@agliodbs.com

almost 13 years ago

In reply to: Vik Fearing (#1)

Re: pg_system_identifier()

All,

Vik's feature would be useful for detecting an accidental split in a
replication cluster. That is, it would be another tool for detecting if
you've made a mistake and created two masters. So +1 from me.

It will also be useful for me for sharding. Right now, I'm doing a
hackish version of Vik's function, so I'd be glad to have it in core.

However, given that the value is the same for all servers in a
replication set, are we sure we want to call it system_identifier? Is
there a better name?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMf248d514cddac81a49ad91107b7891d203688a9bc9b2b185a57c1442043f35ed99f6fc76707f6f71a7a8c9d2e425e9ba@asav-3.01.com

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: bricklen (#7)

Re: pg_system_identifier()

bricklen <bricklen@gmail.com> writes:

On Thu, Aug 22, 2013 at 6:42 AM, Andres Freund <andres@2ndquadrant.com>wrote:

FWIW I've wished for that function repeatedly. Mostly just to make sure
I am actually connected to the same "network" of replicas and not some
other.
It's also useful if you're providing support for a limited number of
machines and you want some form of identifying a node.

There's a "hostname" function at PGXN which serves some use-cases:
http://pgxn.org/dist/hostname/

I have a very vague recollection that we might've intentionally decided
not to expose the system identifier at the SQL level. This could be all
wet, but it'd be worth trolling the archives to see if there was such a
conversation and if so whether the arguments still have merit.

See also recent discussion about changing how the identifier is computed
--- it'd be a good idea to fix that before we expose the identifier to
users, if we decide to do so.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Josh Berkus (#8)

Re: pg_system_identifier()

Josh Berkus <josh@agliodbs.com> writes:

Vik's feature would be useful for detecting an accidental split in a
replication cluster. That is, it would be another tool for detecting if
you've made a mistake and created two masters. So +1 from me.

We don't change the ID when promoting a slave to master, do we?
So how would this help for that?

However, given that the value is the same for all servers in a
replication set, are we sure we want to call it system_identifier? Is
there a better name?

I think there are definitely use cases for a system identifier of some
kind, I'm just not sure that what we have got right now is especially
useful to users. I'd rather see some thought go into what's needed
before we expose the existing definition (and consume the valuable
namespace of "pg_system_identifier").

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Andres Freund

andres@anarazel.de

almost 13 years ago

In reply to: Josh Berkus (#8)

Re: pg_system_identifier()

On 2013-08-22 08:45:38 -0700, Josh Berkus wrote:

All,

Vik's feature would be useful for detecting an accidental split in a
replication cluster. That is, it would be another tool for detecting if
you've made a mistake and created two masters. So +1 from me.

It will also be useful for me for sharding. Right now, I'm doing a
hackish version of Vik's function, so I'd be glad to have it in core.

However, given that the value is the same for all servers in a
replication set, are we sure we want to call it system_identifier? Is
there a better name?

Given it's been named that and visible via pg_controldata for years I am
against introducing confusion by renaming it.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Andres Freund (#11)

Re: pg_system_identifier()

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-08-22 08:45:38 -0700, Josh Berkus wrote:

However, given that the value is the same for all servers in a
replication set, are we sure we want to call it system_identifier? Is
there a better name?

Given it's been named that and visible via pg_controldata for years I am
against introducing confusion by renaming it.

I agree that if we have a function named pg_system_identifier(), it ought
to return the same value that pg_controldata prints under that name.
But that doesn't really answer any questions about how that value is
obtained. I think the question on the table right now is whether we like
the way that value behaves, in the context of a user-visible system ID.
In particular, do we want to think about changing things so that (1) a
slave has a different ID than the master, and/or (2) a slave's ID changes
on promotion to master. I don't know the answers to these things ---
but once we make it user visible it's going to be too late to change
its behavior, so now's the time to consider.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Andres Freund

andres@anarazel.de

almost 13 years ago

In reply to: Tom Lane (#12)

Re: pg_system_identifier()

On 2013-08-22 12:06:03 -0400, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-08-22 08:45:38 -0700, Josh Berkus wrote:

However, given that the value is the same for all servers in a
replication set, are we sure we want to call it system_identifier? Is
there a better name?

Given it's been named that and visible via pg_controldata for years I am
against introducing confusion by renaming it.

I agree that if we have a function named pg_system_identifier(), it ought
to return the same value that pg_controldata prints under that name.
But that doesn't really answer any questions about how that value is
obtained. I think the question on the table right now is whether we like
the way that value behaves, in the context of a user-visible system ID.
In particular, do we want to think about changing things so tha
(1) a slave has a different ID than the master, and/or

We currently use the system identifier to know that we're replicating
between the same/compatible systems (c.f. libpqwalreceiver.c:
libpqrcv_identify_system()), so I don't think a change of definition
like that is realistic.

(2) a slave's ID changes on promotion to master.

We also cannot change the identifier here, because then other standbys
won't be able to follow the promotion because the identifier doesn't
match anymore

But essentially we already have something like that by the combination
of system identifier and timeline id. Admittedly there's the weakness
that the timelineid can increase the same on several machines in the
cluster but that's a weakness we ought to fix sometime independent of
this.

So maybe the answer is to also expose the current timeline?

An alternative would be to have a pg_controldata_values() SRF...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Bruce Momjian

bruce@momjian.us

almost 13 years ago

In reply to: Andres Freund (#13)

Re: pg_system_identifier()

On Thu, Aug 22, 2013 at 06:18:39PM +0200, Andres Freund wrote:

But essentially we already have something like that by the combination
of system identifier and timeline id. Admittedly there's the weakness
that the timelineid can increase the same on several machines in the
cluster but that's a weakness we ought to fix sometime independent of
this.

So maybe the answer is to also expose the current timeline?

An alternative would be to have a pg_controldata_values() SRF...

It seems the value is more of a _cluster_ identifier than a system
identifier. We don't allow cross-major-version replication, so I am
confused why we can't rename it in 9.4.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Alvaro Herrera

alvherre@2ndquadrant.com

almost 13 years ago

In reply to: Andres Freund (#13)

Re: pg_system_identifier()

Andres Freund wrote:

On 2013-08-22 12:06:03 -0400, Tom Lane wrote:

I agree that if we have a function named pg_system_identifier(), it ought
to return the same value that pg_controldata prints under that name.
But that doesn't really answer any questions about how that value is
obtained. I think the question on the table right now is whether we like
the way that value behaves, in the context of a user-visible system ID.
In particular, do we want to think about changing things so tha
(1) a slave has a different ID than the master, and/or

We currently use the system identifier to know that we're replicating
between the same/compatible systems (c.f. libpqwalreceiver.c:
libpqrcv_identify_system()), so I don't think a change of definition
like that is realistic.

We could split the value; make sure that the first, way, 96 bits are
identical in master and slaves (and change the code to only compare
those bits); but the last 32 bits are system specific and cannot appear
twice in the same replica network. Also, perhaps we should reserve the
last (say) 4 bits, so that 0000 means master and 0001 means standby (it
changes on promotion), and the rest of the values are reserved for
future use.

Not necessarily that exact encoding, but hopefully you get my point.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Andres Freund

andres@anarazel.de

almost 13 years ago

In reply to: Bruce Momjian (#14)

Re: pg_system_identifier()

On 2013-08-22 12:18:41 -0400, Bruce Momjian wrote:

On Thu, Aug 22, 2013 at 06:18:39PM +0200, Andres Freund wrote:

But essentially we already have something like that by the combination
of system identifier and timeline id. Admittedly there's the weakness
that the timelineid can increase the same on several machines in the
cluster but that's a weakness we ought to fix sometime independent of
this.

So maybe the answer is to also expose the current timeline?

An alternative would be to have a pg_controldata_values() SRF...

It seems the value is more of a _cluster_ identifier than a system
identifier. We don't allow cross-major-version replication, so I am
confused why we can't rename it in 9.4.

For one, it would introduce confusion for the not inconsiderable number
of people already knowing the variable. For another, it's exposed via
the replication protocol's IDENTIFY SYSTEM.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Andres Freund

andres@anarazel.de

almost 13 years ago

In reply to: Alvaro Herrera (#15)

Re: pg_system_identifier()

On 2013-08-22 12:20:19 -0400, Alvaro Herrera wrote:

Andres Freund wrote:

On 2013-08-22 12:06:03 -0400, Tom Lane wrote:

I agree that if we have a function named pg_system_identifier(), it ought
to return the same value that pg_controldata prints under that name.
But that doesn't really answer any questions about how that value is
obtained. I think the question on the table right now is whether we like
the way that value behaves, in the context of a user-visible system ID.
In particular, do we want to think about changing things so tha
(1) a slave has a different ID than the master, and/or

We currently use the system identifier to know that we're replicating
between the same/compatible systems (c.f. libpqwalreceiver.c:
libpqrcv_identify_system()), so I don't think a change of definition
like that is realistic.

We could split the value; make sure that the first, way, 96 bits are
identical in master and slaves (and change the code to only compare
those bits); but the last 32 bits are system specific and cannot appear
twice in the same replica network. Also, perhaps we should reserve the
last (say) 4 bits, so that 0000 means master and 0001 means standby (it
changes on promotion), and the rest of the values are reserved for
future use.

Why? This seems to be making a simple thing into something way much more
complex? Imo this proposal is about further exposing an already
existing, already exposed (via pg_controldata, via replication protocol)
variable, not more.

It seems better to make sure the other datapoints are *also* exposed if
they aren't yet.
Some are:
* port (SHOW port;)
* standby/primary (SELECT pg_is_in_recovery();)

Some are not easily:
* system identifier (pg_controldata, replication protocol)
* current timeline identifier (pg_controldata, replication protocol)
* host identifier/hostname (which actually is hard)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Tom Lane

tgl@sss.pgh.pa.us

almost 13 years ago

In reply to: Andres Freund (#17)

Re: pg_system_identifier()

Andres Freund <andres@2ndquadrant.com> writes:

On 2013-08-22 12:20:19 -0400, Alvaro Herrera wrote:

We could split the value; make sure that the first, way, 96 bits are
identical in master and slaves (and change the code to only compare
those bits); but the last 32 bits are system specific and cannot appear
twice in the same replica network. Also, perhaps we should reserve the
last (say) 4 bits, so that 0000 means master and 0001 means standby (it
changes on promotion), and the rest of the values are reserved for
future use.

Why? This seems to be making a simple thing into something way much more
complex? Imo this proposal is about further exposing an already
existing, already exposed (via pg_controldata, via replication protocol)
variable, not more.

It seems better to make sure the other datapoints are *also* exposed if
they aren't yet.

Agreed, this seems like overloading the identifier too much. Currently we
consider it an 8-byte value with unspecified internal structure, and I
think we should probably maintain that approach rather than defining
APIs that assume it can be subdivided. For one thing, reducing the width
of the "unique" part increases our risk of chance collisions.

Do we have a reliable way of generating a unique identifier for each slave
(independently of how that might be exposed)?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Andres Freund

andres@anarazel.de

almost 13 years ago

In reply to: Tom Lane (#18)

Re: pg_system_identifier()

On 2013-08-22 12:37:36 -0400, Tom Lane wrote:

Do we have a reliable way of generating a unique identifier for each slave
(independently of how that might be exposed)?

I'd like one, but it's not easy. The best I can think of is to mash
together:
* system_identifier
* mac address of primary interface we're listening to
* port
* primary/standby
in some pseudo-cryptographic manner.

But that's less than convincing because it can change after simple
configuration or hardware changes or even reboot :(.

Really identifying a particular host seems hard in anything resembling a
portable solution.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Fabien COELHO

coelho@cri.ensmp.fr

almost 13 years ago

In reply to: Vik Fearing (#1)

Re: pg_system_identifier()

After someone in IRC asked if there was an equivalent to MySQL's
server_id, it was noted that we do have a system identifier but it's not
very accessible.

The attached patch implements a pg_system_identifier() function that
exposes it.

Would it make sense for such identifiers be standard UUID
(http://en.wikipedia.org/wiki/UUID)?

Should there be a UUID per cluster? and/or per database, possibly deduce
from the cluster one? Should it be configurable, say from
"postgresql.conf"?

get_pg_uuid()
get_pg_uuid('template0')

Note that there is a set of uuid functions provided as a module that may
help.

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21