Replacing pg_depend PIN entries with a fixed range check
In [1]/messages/by-id/947172.1617684433@sss.pgh.pa.us Andres and I speculated about whether we really need all
those PIN entries in pg_depend. Here is a draft patch that gets
rid of them.
It turns out to be no big problem to replace the PIN entries
with an OID range check, because there's a well-defined point
in initdb where it wants to pin (almost) all existing objects,
and then no objects created after that are pinned. In the earlier
thread I'd imagined having initdb record the OID counter at that
point in pg_control, and then we could look at the recorded counter
value to make is-it-pinned decisions. However, that idea has a
fatal problem: what shall pg_resetwal fill into that field when
it has to gin up a pg_control file from scratch? There's no
good way to reconstruct the value.
Hence, what this patch does is to establish a manually-managed cutoff
point akin to FirstBootstrapObjectId, and make initdb push the OID
counter up to that once it's made the small number of pinned objects
it's responsible for. With the value I used here, a couple hundred
OIDs are wasted, but there seems to be a reasonable amount of headroom
still beyond that. On my build, the OID counter at the end of initdb
is 15485 (with a reasonable number of glibc and ICU locales loaded).
So we still have about 900 free OIDs there; and there are 500 or so
free just below FirstBootstrapObjectId, too. So this approach does
hasten the day when we're going to run out of free OIDs below 16384,
but not by all that much.
There are a couple of objects, namely template1 and the public
schema, that are in the catalog .dat files but are not supposed
to be pinned. The existing code accomplishes that by excluding them
(in two different ways :-() while filling pg_depend. This patch
just hard-wires exceptions for them in IsPinnedObject(), which seems
to me not much uglier than what we had before. The existing code
also handles pinning of the standard tablespaces in an idiosyncratic
way; I just dropped that and made them be treated as pinned.
One interesting point about doing things this way is that
IsPinnedObject() will give correct answers throughout initdb, whereas
before the backend couldn't tell what was supposed to be pinned until
after initdb loaded pg_depend. This means we don't need the hacky
truncation of pg_depend and pg_shdepend that initdb used to do,
because now the backend will correctly not make entries relating to
objects it now knows are pinned. Aside from saving a few cycles,
this is more correct. For example, if some object that initdb made
after bootstrap but before truncating pg_depend had a dependency on
the public schema, the existing coding would lose track of that fact.
(There's no live issue of that sort, I hasten to say, and really it
would be a bug to set things up that way because then you couldn't
drop the public schema. But the existing coding would make things
worse by not detecting the mistake.)
Anyway, as to concrete results:
* pg_depend's total relation size, in a freshly made database,
drops from 1269760 bytes to 368640 bytes.
* There seems to be a small but noticeable reduction in the time
to run check-world. I compared runtimes on a not-particularly-modern
machine with spinning-rust storage, using -j4 parallelism:
HEAD
real 5m4.248s
user 2m59.390s
sys 1m21.473s
+ patch
real 5m2.924s
user 2m36.196s
sys 1m19.724s
These top-line numbers don't look too impressive, but the CPU-time
reduction seems quite significant. Probably on a different hardware
platform that would translate more directly to runtime savings.
I didn't try to reproduce the original performance bottleneck
that was complained of in [1]/messages/by-id/947172.1617684433@sss.pgh.pa.us, but that might be fun to check.
Anyway, I'll stick this in the next CF so we don't lose track
of it.
regards, tom lane
Attachments:
remove-pg_depend-PIN-entries-1.patchtext/x-diff; charset=us-ascii; name=remove-pg_depend-PIN-entries-1.patchDownload+229-363
[ connecting up two threads here ]
I wrote:
Hence, what this patch does is to establish a manually-managed cutoff
point akin to FirstBootstrapObjectId, and make initdb push the OID
counter up to that once it's made the small number of pinned objects
it's responsible for. With the value I used here, a couple hundred
OIDs are wasted, but there seems to be a reasonable amount of headroom
still beyond that. On my build, the OID counter at the end of initdb
is 15485 (with a reasonable number of glibc and ICU locales loaded).
So we still have about 900 free OIDs there; and there are 500 or so
free just below FirstBootstrapObjectId, too. So this approach does
hasten the day when we're going to run out of free OIDs below 16384,
but not by all that much.
In view of the discussion at [1]/messages/by-id/CAGPqQf3JYTrTB1E1fu_zOGj+rG_kwTfa3UcUYPfNZL9o1bcYNw@mail.gmail.com, there's more pressure on the OID supply
above 10K than I'd realized. While I don't have any good ideas about
eliminating the problem altogether, I did have a thought that would remove
the extra buffer zone created by my first-draft patch in this thread.
Namely, let's have genbki.pl write out its final OID assignment counter
value in a command in the postgres.bki file, say "set_next_oid 12036".
This would cause the bootstrap backend to set the server's OID counter to
that value. Then the initial part of initdb's post-bootstrap processing
could assign pinned OIDs working forward from there, with no gap. We'd
still need a gap before FirstBootstrapObjectId (which we might as well
rename to FirstUnpinnedObjectId), but we don't need two gaps, and so this
patch wouldn't make things any worse than they are today.
I'm not planning to put more effort into this patch right now, but
I'll revise it along these lines once v15 development opens.
regards, tom lane
[1]: /messages/by-id/CAGPqQf3JYTrTB1E1fu_zOGj+rG_kwTfa3UcUYPfNZL9o1bcYNw@mail.gmail.com
Hi,
On 2021-04-14 21:43:28 -0400, Tom Lane wrote:
In [1] Andres and I speculated about whether we really need all
those PIN entries in pg_depend. Here is a draft patch that gets
rid of them.
Yay.
There are a couple of objects, namely template1 and the public
schema, that are in the catalog .dat files but are not supposed
to be pinned. The existing code accomplishes that by excluding them
(in two different ways :-() while filling pg_depend. This patch
just hard-wires exceptions for them in IsPinnedObject(), which seems
to me not much uglier than what we had before. The existing code
also handles pinning of the standard tablespaces in an idiosyncratic
way; I just dropped that and made them be treated as pinned.
Hm, maybe we ought to swap template0 and template1 instead? I.e. have
template0 be in pg_database.dat and thus get a pinned oid, and then
create template1, postgres etc from that?
I guess we could also just create public in initdb.
Not that it matters much, having those exceptions doesn't seem too bad.
Anyway, as to concrete results:
* pg_depend's total relation size, in a freshly made database,
drops from 1269760 bytes to 368640 bytes.
Nice!
I didn't try to reproduce the original performance bottleneck
that was complained of in [1], but that might be fun to check.
I hope it's not reproducible as is, because I hopefully did fix the bug
leading to it ;)
+bool +IsPinnedObject(Oid classId, Oid objectId) +{ + /* + * Objects with OIDs above FirstUnpinnedObjectId are never pinned. Since + * the OID generator skips this range when wrapping around, this check + * guarantees that user-defined objects are never considered pinned. + */ + if (objectId >= FirstUnpinnedObjectId) + return false; + + /* + * Large objects are never pinned. We need this special case because + * their OIDs can be user-assigned. + */ + if (classId == LargeObjectRelationId) + return false; +
Huh, shouldn't we reject that when creating them? IIRC we already use
oid range checks in a bunch of places? I guess you didn't because of
dump/restore concerns?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
Hm, maybe we ought to swap template0 and template1 instead? I.e. have
template0 be in pg_database.dat and thus get a pinned oid, and then
create template1, postgres etc from that?
No, *neither* of them are pinned, and we don't want them to be.
It's something of a historical artifact that template1 has a low OID.
+ /* + * Large objects are never pinned. We need this special case because + * their OIDs can be user-assigned. + */ + if (classId == LargeObjectRelationId) + return false;
Huh, shouldn't we reject that when creating them?
We've got regression tests that create blobs with small OIDs :-(.
We could change those tests of course, but they're pretty ancient
and I'm hesitant to move those goal posts.
I guess you didn't because of dump/restore concerns?
That too.
In short, I'm really skeptical of changing any of these pin-or-not
decisions to save one or two comparisons in IsPinnedObject. That
function is already orders of magnitude faster than what it replaces;
we don't need to sweat over making it faster yet.
regards, tom lane
Hi,
On 2021-04-15 19:59:24 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
Hm, maybe we ought to swap template0 and template1 instead? I.e. have
template0 be in pg_database.dat and thus get a pinned oid, and then
create template1, postgres etc from that?No, *neither* of them are pinned, and we don't want them to be.
It's something of a historical artifact that template1 has a low OID.
Hm, it makes sense for template1 not to be pinned, but it doesn't seem
as obvious why that should be the case for template0.
In short, I'm really skeptical of changing any of these pin-or-not
decisions to save one or two comparisons in IsPinnedObject. That
function is already orders of magnitude faster than what it replaces;
we don't need to sweat over making it faster yet.
I'm not at all concerned about the speed after the change - it just
seems cleaner and easier to understand not to have exceptions.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-04-15 19:59:24 -0400, Tom Lane wrote:
No, *neither* of them are pinned, and we don't want them to be.
It's something of a historical artifact that template1 has a low OID.
Hm, it makes sense for template1 not to be pinned, but it doesn't seem
as obvious why that should be the case for template0.
IIRC, the docs suggest that in an emergency you could recreate either
of them from the other. Admittedly, if you've put stuff in template1
then this might cause problems later, but I think relatively few
people do that.
I'm not at all concerned about the speed after the change - it just
seems cleaner and easier to understand not to have exceptions.
We had these exceptions already, they were just implemented in initdb
rather than the backend.
regards, tom lane
I wrote:
In view of the discussion at [1], there's more pressure on the OID supply
above 10K than I'd realized. While I don't have any good ideas about
eliminating the problem altogether, I did have a thought that would remove
the extra buffer zone created by my first-draft patch in this thread.
Namely, let's have genbki.pl write out its final OID assignment counter
value in a command in the postgres.bki file, say "set_next_oid 12036".
This would cause the bootstrap backend to set the server's OID counter to
that value. Then the initial part of initdb's post-bootstrap processing
could assign pinned OIDs working forward from there, with no gap. We'd
still need a gap before FirstBootstrapObjectId (which we might as well
rename to FirstUnpinnedObjectId), but we don't need two gaps, and so this
patch wouldn't make things any worse than they are today.
Here's a v2 that does things that way (and is rebased up to HEAD).
I did some more documentation cleanup, too.
regards, tom lane
Attachments:
remove-pg_depend-PIN-entries-2.patchtext/x-diff; charset=us-ascii; name=remove-pg_depend-PIN-entries-2.patchDownload+331-392
On Wed, May 12, 2021 at 6:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Here's a v2 that does things that way (and is rebased up to HEAD).
I did some more documentation cleanup, too.
The first hunk of the patch seems to back away from the idea that the
cutoff is 13000, but the second half of the patch says 13000 still
matters. Not sure I understand what's going on there exactly.
I suggest deleting the words "An additional thing that is useful to
know is that" because the rest of the sentence is fine without it.
I'm sort of wondering what we think the long term plan ought to be.
Are there some categories of things we should be looking to move out
of the reserved OID space to keep it from filling up? Can we
realistically think of moving the 16384 boundary?
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
The first hunk of the patch seems to back away from the idea that the
cutoff is 13000, but the second half of the patch says 13000 still
matters. Not sure I understand what's going on there exactly.
Not sure exactly what you're looking at, but IIRC there is a place
where the patch is cleaning up after ab596105b's failure to adjust
bki.sgml to match its change of FirstBootstrapObjectId from 12000
to 13000. I hadn't bothered to fix that separately, but I guess
we should do so, else v14 is going to ship with incorrect docs.
I'm sort of wondering what we think the long term plan ought to be.
Are there some categories of things we should be looking to move out
of the reserved OID space to keep it from filling up? Can we
realistically think of moving the 16384 boundary?
I haven't got any wonderful ideas there. I do not see how we can
move the 16384 boundary without breaking pg_upgrade'ing, because
pg_upgrade relies on preserving user object OIDs that are likely
to be not much above that value. Probably, upping
FirstNormalObjectId ought to be high on our list of things to do
if we ever do force an on-disk compatibility break. In the
meantime, we could decrease the 10000 boundary if things get
tight above that, but I fear that would annoy some extension
maintainers.
Another idea is to give up the principle that initdb-time OIDs
need to be globally unique. They only really need to be
unique within their own catalogs, so we could buy a lot of space
by exploiting that. The original reason for that policy was to
reduce the risk of mistakes in handwritten OID references in
the initial catalog data --- but now that numeric references
there are Not Done, it seems like we don't really need that.
An intermediate step, perhaps, could be to give up that
uniqueness only for OIDs assigned by genbki.pl itself, while
keeping it for OIDs below 10000. This'd be appealing if we
find that we're getting tight between 10K and 13K.
In any case it doesn't seem like the issue is entirely pressing
yet. Although ... maybe we should do that last bit now, so
that we can revert FirstBootstrapObjectId to 12K before v14
ships? I've felt a little bit of worry that that change might
cause problems on machines with a boatload of locales.
regards, tom lane
On Wed, May 26, 2021 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
In any case it doesn't seem like the issue is entirely pressing
yet. Although ... maybe we should do that last bit now, so
that we can revert FirstBootstrapObjectId to 12K before v14
ships? I've felt a little bit of worry that that change might
cause problems on machines with a boatload of locales.
I think that particular case is definitely worth worrying about. Most
of what we put into the system catalogs is our own hand-crafted
entries, but that's coming from the operating system and we have no
control over it whatever. It wouldn't be very nice to have to suggest
to users who get can't initdb that perhaps they should delete some
locales...
Honestly, it seems odd to me that these entries use reserved OIDs
rather than regular ones at all. Why does the first run of
pg_import_system_collations use special magic OIDs, and later runs use
regular OIDs? pg_type OIDs need to remain stable from release to
release since it's part of the on disk format for arrays, and pg_proc
OIDs have to be the same at compile time and initdb time because of
the fmgr hash table, and any other thing that has a constant that
might be used in the source code also has that issue. But none of this
applies to collations: they can't expected to have the same OID from
release to release, or even from one installation to another; the
source code can't be relying on the specific values; and we have no
idea how many there might be.
So I think your proposal of allowing genbki-assigned OIDs to be reused
in different catalogs is probably a pretty good one, but I wonder if
we could just rejigger things so that collations just get normal OIDs
16384.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
So I think your proposal of allowing genbki-assigned OIDs to be reused
in different catalogs is probably a pretty good one, but I wonder if
we could just rejigger things so that collations just get normal OIDs16384.
Hm. I can't readily think of a non-hack way of making that happen.
It's also unclear to me how it'd interact with assignment of OIDs
to regular user objects. Maybe we'll have to go there eventually,
but I'm not in a hurry to.
Meanwhile, I'll draft a patch for the other thing.
regards, tom lane
I wrote:
Robert Haas <robertmhaas@gmail.com> writes:
The first hunk of the patch seems to back away from the idea that the
cutoff is 13000, but the second half of the patch says 13000 still
matters. Not sure I understand what's going on there exactly.
Not sure exactly what you're looking at, but IIRC there is a place
where the patch is cleaning up after ab596105b's failure to adjust
bki.sgml to match its change of FirstBootstrapObjectId from 12000
to 13000. I hadn't bothered to fix that separately, but I guess
we should do so, else v14 is going to ship with incorrect docs.
I take that back: I had committed that doc fix, in 1f9b0e693, so
I'm still unsure what was confusing you. (But a4390abec just
reverted it, anyway.)
Attached is a rebase over a4390abec. The decision in that commit
to not expect global uniqueness of OIDs above 10K frees us to use
a much simpler solution than before: we can just go ahead and start
the backend's OID counter at 10000, and not worry about conflicts,
because the OID generation logic can deal with any conflicts just
fine as long as you're okay with only having per-catalog uniqueness.
So this gets rid of the set_next_oid mechanism that I'd invented in
v2, and yet there's still no notable risk of running out of OIDs in
the 10K-12K range.
While testing this, I discovered something that I either never knew
or had forgotten: the bootstrap backend is itself assigning some
OIDs, specifically OIDs for the composite types associated with most
of the system catalogs (plus their array types). I find this scary,
because it is happening before we've built the catalog indexes, so
it's impossible to ensure uniqueness. (Of course, when we do build
the indexes, we'd notice any conflicts; but that's not a solution.)
I think it accidentally works because we don't ask genbki.pl to
assign any pg_type OIDs, but that seems fragile. Seems like maybe
we should fix genbki.pl to assign those OIDs, and then change
GetNewOidWithIndex to error out in bootstrap mode. However that's a
pre-existing issue, so I don't feel that this patch needs to be
the one to fix it.
regards, tom lane
Attachments:
remove-pg_depend-PIN-entries-3.patchtext/x-diff; charset=us-ascii; name=remove-pg_depend-PIN-entries-3.patchDownload+288-390
On Thu, May 27, 2021 at 6:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Attached is a rebase over a4390abec.
Looks good to me overall, I just had a couple questions/comments:
isObjectPinned and isSharedObjectPinned are now thin wrappers around
IsPinnedObject. Is keeping those functions a matter of future-proofing in
case something needs to be handled differently someday, or reducing
unnecessary code churn?
setup_depend now doesn't really need to execute any SQL (unless third-party
forks have extra steps here?), and could be replaced with a direct call
to StopGeneratingPinnedObjectIds. That's a bit more self-documenting, and
that would allow shortening this comment:
/*
* Note that no objects created after setup_depend() will be "pinned".
* They are all droppable at the whim of the DBA.
*/
--
John Naylor
EDB: http://www.enterprisedb.com
John Naylor <john.naylor@enterprisedb.com> writes:
On Thu, May 27, 2021 at 6:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Attached is a rebase over a4390abec.
Looks good to me overall, I just had a couple questions/comments:
Thanks for looking!
isObjectPinned and isSharedObjectPinned are now thin wrappers around
IsPinnedObject. Is keeping those functions a matter of future-proofing in
case something needs to be handled differently someday, or reducing
unnecessary code churn?
Yeah, it was mostly a matter of reducing code churn. We could probably
drop isSharedObjectPinned altogether, but isObjectPinned seems to have
some notational value in providing an API that takes an ObjectAddress.
setup_depend now doesn't really need to execute any SQL (unless third-party
forks have extra steps here?), and could be replaced with a direct call
to StopGeneratingPinnedObjectIds. That's a bit more self-documenting, and
that would allow shortening this comment:
Hm, I'm not following? setup_depend runs in initdb, that is on the
client side. It can't invoke backend-internal functions any other
way than via SQL, AFAICS.
regards, tom lane
On Wed, Jul 14, 2021 at 3:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm, I'm not following? setup_depend runs in initdb, that is on the
client side. It can't invoke backend-internal functions any other
way than via SQL, AFAICS.
Ah, brainfade on my part.
I was also curious about the test case where Andres fixed a regression in
the parent thread [1]/messages/by-id/20210406043521.lopeo7bbigad3n6t@alap3.anarazel.de, and there is a noticeable improvement (lowest of 10
measurements):
HEAD: 623ms
patch: 567ms
If no one else has anything, I think this is ready for commit.
[1]: /messages/by-id/20210406043521.lopeo7bbigad3n6t@alap3.anarazel.de
/messages/by-id/20210406043521.lopeo7bbigad3n6t@alap3.anarazel.de
--
John Naylor
EDB: http://www.enterprisedb.com