collations in shared catalogs?

Started by Andrew Gierthover 11 years ago30 messageshackers

andrew@tao11.riddles.org.uk

over 11 years ago

So while helping someone with an unrelated issue, I did a quick query to
look for collation-dependent indexes, and was rather shocked to find
that not only are there two such in the system catalogs, both set to
"default" collation, but that one of them is in a _shared_ catalog
(pg_shseclabel).

How did that happen? And how could it possibly work?

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Andrew Gierth (#1)

Re: collations in shared catalogs?

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

So while helping someone with an unrelated issue, I did a quick query to
look for collation-dependent indexes, and was rather shocked to find
that not only are there two such in the system catalogs, both set to
"default" collation, but that one of them is in a _shared_ catalog
(pg_shseclabel).

How did that happen? And how could it possibly work?

It probably doesn't, and the reason nobody has noticed is that the
security label stuff has fewer users than I have fingers (and those
people aren't using provider names that would cause anything interesting
to happen).

The most obvious fix is to change "provider" to a NAME column.

What was the other case? We might want to add a regression test to
check for collation-dependent system indexes ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andrew@tao11.riddles.org.uk

over 11 years ago

In reply to: Tom Lane (#2)

Re: collations in shared catalogs?

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

So while helping someone with an unrelated issue, I did a quick
query to look for collation-dependent indexes, and was rather
shocked to find that not only are there two such in the system
catalogs, both set to "default" collation, but that one of them is
in a _shared_ catalog (pg_shseclabel).

How did that happen? And how could it possibly work?

Tom> It probably doesn't, and the reason nobody has noticed is that the
Tom> security label stuff has fewer users than I have fingers (and
Tom> those people aren't using provider names that would cause anything
Tom> interesting to happen).

Or possibly not mixing locales between databases.

Tom> The most obvious fix is to change "provider" to a NAME column.

Tom> What was the other case? We might want to add a regression test
Tom> to check for collation-dependent system indexes ...

pg_seclabel (also "provider").

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Tom Lane (#2)

Re: collations in shared catalogs?

Tom Lane wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

So while helping someone with an unrelated issue, I did a quick query to
look for collation-dependent indexes, and was rather shocked to find
that not only are there two such in the system catalogs, both set to
"default" collation, but that one of them is in a _shared_ catalog
(pg_shseclabel).

How did that happen? And how could it possibly work?

It probably doesn't, and the reason nobody has noticed is that the
security label stuff has fewer users than I have fingers (and those
people aren't using provider names that would cause anything interesting
to happen).

The BDR code has recently started using security labels as a place to
store table-specific data. That widens its use a fair bit ... and most
likely, other extensions will also start using them as soon as they
realize that it can be used for stuff other than actual security labels.

(FWIW we shouldn't have called these "security labels" but just
generically "labels" or something like that.)

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Alvaro Herrera (#4)

Re: collations in shared catalogs?

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

How did that happen? And how could it possibly work?

It probably doesn't, and the reason nobody has noticed is that the
security label stuff has fewer users than I have fingers (and those
people aren't using provider names that would cause anything interesting
to happen).

The BDR code has recently started using security labels as a place to
store table-specific data. That widens its use a fair bit ... and most
likely, other extensions will also start using them as soon as they
realize that it can be used for stuff other than actual security labels.

Yeah? Would they be OK with redefining the provider field as "name",
or would the length limit be an issue?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Tom Lane (#5)

Re: collations in shared catalogs?

Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

How did that happen? And how could it possibly work?

It probably doesn't, and the reason nobody has noticed is that the
security label stuff has fewer users than I have fingers (and those
people aren't using provider names that would cause anything interesting
to happen).

The BDR code has recently started using security labels as a place to
store table-specific data. That widens its use a fair bit ... and most
likely, other extensions will also start using them as soon as they
realize that it can be used for stuff other than actual security labels.

Yeah? Would they be OK with redefining the provider field as "name",
or would the length limit be an issue?

Nah, it's fine. The provider name used there is "bdr".

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

over 11 years ago

In reply to: Tom Lane (#2)

Re: collations in shared catalogs?

On 2015-02-25 12:08:32 -0500, Tom Lane wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

So while helping someone with an unrelated issue, I did a quick query to
look for collation-dependent indexes, and was rather shocked to find
that not only are there two such in the system catalogs, both set to
"default" collation, but that one of them is in a _shared_ catalog
(pg_shseclabel).

How did that happen? And how could it possibly work?

It probably doesn't, and the reason nobody has noticed is that the
security label stuff has fewer users than I have fingers (and those
people aren't using provider names that would cause anything interesting
to happen).

The most obvious fix is to change "provider" to a NAME column.

Yea. I'm not sure why that wasn't done initially. I can't really see the
length be an issue. How about we add an error check enforcing ascii,
that'll work in the back branches?

Generally it's not the greatest idea to have non-ascii stuff in shared
catalogs...

What was the other case? We might want to add a regression test to
check for collation-dependent system indexes ...

+1

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Andres Freund (#7)

Re: collations in shared catalogs?

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-02-25 12:08:32 -0500, Tom Lane wrote:

The most obvious fix is to change "provider" to a NAME column.

Yea. I'm not sure why that wasn't done initially. I can't really see the
length be an issue. How about we add an error check enforcing ascii,
that'll work in the back branches?

Nope, that won't help much at all. C vs en_US for instance is different
sort orders even with all-ASCII data.

Basically you're screwed if you've got different collations in different
databases and you put anything into pg_shseclabel ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

over 11 years ago

In reply to: Tom Lane (#8)

Re: collations in shared catalogs?

On 2015-02-25 15:59:55 -0500, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-02-25 12:08:32 -0500, Tom Lane wrote:

The most obvious fix is to change "provider" to a NAME column.

Yea. I'm not sure why that wasn't done initially. I can't really see the
length be an issue. How about we add an error check enforcing ascii,
that'll work in the back branches?

Nope, that won't help much at all. C vs en_US for instance is different
sort orders even with all-ASCII data.

Ick, yes. The restriction to a charset that's encodable in all server
encodings should be there additionally, but it's not sufficient :(

Basically you're screwed if you've got different collations in different
databases and you put anything into pg_shseclabel ...

Hrmpf.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

sfrost@snowman.net

over 11 years ago

In reply to: Alvaro Herrera (#6)

Re: collations in shared catalogs?

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

How did that happen? And how could it possibly work?

It probably doesn't, and the reason nobody has noticed is that the
security label stuff has fewer users than I have fingers (and those
people aren't using provider names that would cause anything interesting
to happen).

The BDR code has recently started using security labels as a place to
store table-specific data. That widens its use a fair bit ... and most
likely, other extensions will also start using them as soon as they
realize that it can be used for stuff other than actual security labels.

Yeah? Would they be OK with redefining the provider field as "name",
or would the length limit be an issue?

Nah, it's fine. The provider name used there is "bdr".

Agreed, the provider field should be fine as a name field. Not that I
expect it to be an issue, but I'd definitely like to keep the label
field as text as those can definitely be longer (the very simply example
included in the security label docs is over half the length of a name
field already..). Now if we increased name to 128 characters...

/me runs and hides.

Thanks!

Stephen

david@pgmasters.net

over 11 years ago

In reply to: Stephen Frost (#10)

Re: collations in shared catalogs?

On 2/25/15 5:47 PM, Stephen Frost wrote:

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:

Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

How did that happen? And how could it possibly work?

It probably doesn't, and the reason nobody has noticed is that the
security label stuff has fewer users than I have fingers (and those
people aren't using provider names that would cause anything interesting
to happen).

The BDR code has recently started using security labels as a place to
store table-specific data. That widens its use a fair bit ... and most
likely, other extensions will also start using them as soon as they
realize that it can be used for stuff other than actual security labels.

Yeah? Would they be OK with redefining the provider field as "name",
or would the length limit be an issue?

Nah, it's fine. The provider name used there is "bdr".

Agreed, the provider field should be fine as a name field. Not that I
expect it to be an issue, but I'd definitely like to keep the label
field as text as those can definitely be longer (the very simply example
included in the security label docs is over half the length of a name
field already..). Now if we increased name to 128 characters...

+1 on 128/256 character names.

/me runs and hides.

/stands brazenly in the open and volunteers to try it if I don't get
clobbered within seconds.

--
- David Steele
david@pgmasters.net

robertmhaas@gmail.com

about 11 years ago

In reply to: David Steele (#11)

Re: collations in shared catalogs?

On Wed, Feb 25, 2015 at 7:54 PM, David Steele <david@pgmasters.net> wrote:

+1 on 128/256 character names.

/me runs and hides.

/stands brazenly in the open and volunteers to try it if I don't get
clobbered within seconds.

I think the question is whether making lots of rows in system catalogs
better is going to have undesirable effects on (a) the size of our
initial on-disk format (i.e. how big an empty database is), (b) the
amount of memory consumed by the syscache and relcaches on workloads
that touch lots of tables/functions/whatever, or (c) CPU consumption
mostly as a result of more cache line accesses for the same operation.
If you can prove those effects are minimal, that'd be a good place to
start.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

david@pgmasters.net

about 11 years ago

In reply to: Robert Haas (#12)

Re: collations in shared catalogs?

Hi Robert,

On 3/4/15 10:14 AM, Robert Haas wrote:

On Wed, Feb 25, 2015 at 7:54 PM, David Steele <david@pgmasters.net> wrote:

+1 on 128/256 character names.

/me runs and hides.

/stands brazenly in the open and volunteers to try it if I don't get
clobbered within seconds.

I think the question is whether making lots of rows in system catalogs
better is going to have undesirable effects on (a) the size of our
initial on-disk format (i.e. how big an empty database is), (b) the
amount of memory consumed by the syscache and relcaches on workloads
that touch lots of tables/functions/whatever, or (c) CPU consumption
mostly as a result of more cache line accesses for the same operation.
If you can prove those effects are minimal, that'd be a good place to
start.

Thanks, that's encouraging. I've already compiled with NAMEDATALEN=256
and verified that the only failing tests are the ones making sure that
identifier lengths are truncated or fail appropriately when they are >
63. I'm sure lots of people have done that before and gotten the same
result.

I'm currently investigating the issues that you've identified above
since they constitute the real problem with increasing NAMEDATALEN.
Once I have some answers I'll send a proposal to hackers.

--
- David Steele
david@pgmasters.net

bruce@momjian.us

about 11 years ago

In reply to: Andres Freund (#9)

Re: collations in shared catalogs?

On Wed, Feb 25, 2015 at 10:19:45PM +0100, Andres Freund wrote:

On 2015-02-25 15:59:55 -0500, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-02-25 12:08:32 -0500, Tom Lane wrote:

The most obvious fix is to change "provider" to a NAME column.

Yea. I'm not sure why that wasn't done initially. I can't really see the
length be an issue. How about we add an error check enforcing ascii,
that'll work in the back branches?

Nope, that won't help much at all. C vs en_US for instance is different
sort orders even with all-ASCII data.

Ick, yes. The restriction to a charset that's encodable in all server
encodings should be there additionally, but it's not sufficient :(

Basically you're screwed if you've got different collations in different
databases and you put anything into pg_shseclabel ...

Hrmpf.

Where are we on this?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Bruce Momjian (#14)

Re: collations in shared catalogs?

Bruce Momjian <bruce@momjian.us> writes:

On 2015-02-25 12:08:32 -0500, Tom Lane wrote:

The most obvious fix is to change "provider" to a NAME column.

Where are we on this?

Not done yet, but we should make a point of making that fix before 9.5.
Please add it to the open items page for 9.5.

I am not sure there's anything useful to be done about this in the back
branches.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

bruce@momjian.us

about 11 years ago

In reply to: Tom Lane (#15)

Re: collations in shared catalogs?

On Thu, Apr 30, 2015 at 08:16:09AM -0700, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On 2015-02-25 12:08:32 -0500, Tom Lane wrote:

The most obvious fix is to change "provider" to a NAME column.

Where are we on this?

Not done yet, but we should make a point of making that fix before 9.5.
Please add it to the open items page for 9.5.

I am not sure there's anything useful to be done about this in the back
branches.

Done.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Andres Freund (#7)

Re: collations in shared catalogs?

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-02-25 12:08:32 -0500, Tom Lane wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

So while helping someone with an unrelated issue, I did a quick query to
look for collation-dependent indexes, and was rather shocked to find
that not only are there two such in the system catalogs, both set to
"default" collation, but that one of them is in a _shared_ catalog
(pg_shseclabel).
How did that happen? And how could it possibly work?

It probably doesn't, and the reason nobody has noticed is that the
security label stuff has fewer users than I have fingers (and those
people aren't using provider names that would cause anything interesting
to happen).

The most obvious fix is to change "provider" to a NAME column.

Yea. I'm not sure why that wasn't done initially.

OK, now I'm on the warpath, because I went to fix this and discovered
that since that discussion, somebody named Freund committed yet another
shared catalog with a collation-dependent index. This time, at least,
we can fix it *before* it gets into the wild.

Is it okay to change pg_replication_origin.roname to type "name",
and if not what do you want to do instead?

While I'm looking at it, why in the world have roident and not just a
standard system OID column? This catalog seems willfully ignorant of
Postgres conventions.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

about 11 years ago

In reply to: Tom Lane (#17)

Re: collations in shared catalogs?

On 2015-05-18 19:23:59 -0400, Tom Lane wrote:

OK, now I'm on the warpath, because I went to fix this and discovered
that since that discussion, somebody named Freund committed yet another
shared catalog with a collation-dependent index. This time, at least,
we can fix it *before* it gets into the wild.

Hrmpf, good point.

Is it okay to change pg_replication_origin.roname to type "name",
and if not what do you want to do instead?

It was turned into text after it initially was name, because of length
concerns.

Hm, just forcing a collation and restricting the input to ascii should
work, right? What I'm wondering is how we easily can do the collation
forcing part. The best seems to be to force the collation on the column
itself. We could add BKI_COLLATION(). Or we could invent a alias
'systext' or something that's intended to be used in catalogs?

While I'm looking at it, why in the world have roident and not just a
standard system OID column? This catalog seems willfully ignorant of
Postgres conventions.

There's a comment:
* Needs to fit into an uint16, so we don't waste too much space in WAL
* records. For this reason we don't use a normal Oid column here, since
* we need to handle allocation of new values manually.

I mean it could use the standard oid, but given it's allocated
differently...

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Andres Freund (#18)

Re: collations in shared catalogs?

Andres Freund <andres@anarazel.de> writes:

On 2015-05-18 19:23:59 -0400, Tom Lane wrote:

Is it okay to change pg_replication_origin.roname to type "name",
and if not what do you want to do instead?

It was turned into text after it initially was name, because of length
concerns.

Hm, just forcing a collation and restricting the input to ascii should
work, right?

I think that's fragile as can be. Is there a *really really* good
argument why these things shouldn't be subject to identifier length
restrictions?

While I'm looking at it, why in the world have roident and not just a
standard system OID column? This catalog seems willfully ignorant of
Postgres conventions.

There's a comment:
* Needs to fit into an uint16, so we don't waste too much space in WAL
* records. For this reason we don't use a normal Oid column here, since
* we need to handle allocation of new values manually.

If it needs to fit into uint16, why not make it smallint? The declaration
seems 100% misleading if it's not an OID. Moreover, the catalog
infrastructure is failing to help you make sure the values are unique.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

andres@anarazel.de

about 11 years ago

In reply to: Tom Lane (#19)

Re: collations in shared catalogs?

On 2015-05-18 19:59:29 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2015-05-18 19:23:59 -0400, Tom Lane wrote:
Hm, just forcing a collation and restricting the input to ascii should
work, right?

I think that's fragile as can be. Is there a *really really* good
argument why these things shouldn't be subject to identifier length
restrictions?

It's maybe not absolutely strictly necessary. In fact in earlier
versions of the patch it was name. But replication solutions like bdr,
slony, whatever will have to store a bunch of values identifying a node
in there. And that's much easier if you're not constrained by 63 chars.

While I'm looking at it, why in the world have roident and not just a
standard system OID column? This catalog seems willfully ignorant of
Postgres conventions.

There's a comment:
* Needs to fit into an uint16, so we don't waste too much space in WAL
* records. For this reason we don't use a normal Oid column here, since
* we need to handle allocation of new values manually.

If it needs to fit into uint16, why not make it smallint? The declaration
seems 100% misleading if it's not an OID.

smallint has a smaller range. I mean, we could use a smallint and just
store unsigned values nonetheless. But that'd be somewhat ugly, although
not without precedent (pg_class.relpages). There'll only ever be very
few rows in pg_replication_origin, so the wideness itself doesn't
matter.

One reason for leaving it a oid instead of a more fitting type is that
it'll make it much smother to increase the limit to full 32bit -
there'll be no user level change.

Moreover, the catalog infrastructure is failing to help you make sure
the values are unique.

Not sure what you mean? There's both a unique key and locking in place
to make sure that's not violated.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Andres Freund (#20)

andres@anarazel.de

about 11 years ago

In reply to: Tom Lane (#21)

andres@anarazel.de

about 11 years ago

In reply to: Tom Lane (#19)

sfrost@snowman.net

about 11 years ago

In reply to: Andres Freund (#23)

andres@anarazel.de

about 11 years ago

In reply to: Stephen Frost (#24)

sfrost@snowman.net

about 11 years ago

In reply to: Andres Freund (#25)

Greg Sabino Mullane

greg@turnstep.com

about 11 years ago

In reply to: Andres Freund (#20)

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Andres Freund (#23)

andres@anarazel.de

about 11 years ago

In reply to: Tom Lane (#28)

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Andres Freund (#29)