post-freeze damage control
On Mon, Apr 8, 2024 at 10:42 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Can you elaborate, which patches you think were not ready? Let's make
sure to capture any concrete concerns in the Open Items list.
Hi,
I'm moving this topic to a new thread for better visibility and less
admixture of concerns. I'd like to invite everyone here present to
opine on which patches we ought to be worried about. Here are a few
picks from me to start things off. My intention here is to convey "I
find these scary" rather than "these commits were irresponsible," so I
respectfully ask that you don't take the choice to list your patch
here as an attack, or the choice not to list your patch here as an
endorsement. I'm very happy if I'm wrong and these patches are not
actually scary. And likewise let's view any allegations of scariness
by others as a genuine attempt to figure out what might be broken,
rather than as an attempt to get anyone into trouble or whatever.
- As I wrote about separately, I'm concerned that the failover slots
stuff may not be in as good a shape as it needs to be for people to
get good use out of it.
/messages/by-id/CA+TgmoaA4oufUBR5B-4o83rnwGZ3zAA5UvwxDX=NjCm1TVgRsQ@mail.gmail.com
- I also wrote separately about the flurry of recent table AM changes.
/messages/by-id/CA+TgmoZpWB50GnJZYF1x8PenTqXDTFBH_Euu6ybGfzEy34o+5Q@mail.gmail.com
- The streaming read API stuff was all committed very last minute. I
think this should have been committed much sooner. It's probably not
going to break the world; it's more likely to have performance
consequences. But if it had gone in sooner, we'd have had more time to
figure that out.
- The heap pruning refactoring work, on the other hand, seems to be at
very significant risk of breaking the world. I don't doubt the
authors, but this is some very critical stuff and a lot of it happened
very quickly right at the end.
- Incremental backup could have all sorts of terrible data-corrupting
bugs. 55a5ee30cd65886ff0a2e7ffef4ec2816fbec273 was a full brown-paper
bag level fix, so the chances of there being further problems are
probably high.
- I'm slightly worried about the TID store work (ee1b30f12, 30e144287,
667e65aac35), perhaps for no reason. Actually, the results seem really
impressive, and a very quick look at some of the commits seemed kind
of reassuring. But, like the changes to pruning and freezing, this is
making some really fundamental changes to critical code. In this case,
it's code that has evolved very little over the years and seems to
have now evolved quite a lot.
- I couldn't understand why the "Operate
XLogCtl->log{Write,Flush}Result with atomics" code was correct when I
read it. That's not to say I know it to be incorrect. But, a spinlock
protecting two variables together guarantees more than atomic access
to each of those variables separately.
There's probably a lot more to worry about; there have been so
incredibly many changes recently. But this is as far as my speculation
extends, as of now. Comments welcome. Additional concerns also
welcome, as noted above. And again, no offense is intended to anyone.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Apr 9, 2024 at 7:47 AM Robert Haas <robertmhaas@gmail.com> wrote:
- The streaming read API stuff was all committed very last minute. I
think this should have been committed much sooner. It's probably not
going to break the world; it's more likely to have performance
consequences. But if it had gone in sooner, we'd have had more time to
figure that out.
OK, let me give an update on this work stream (pun intended).
One reason for the delay in committing was precisely that we were
fretting about regressions risks. We tried pretty hard to identify
and grind down every regression we could find, and cases with
outstanding not-fully-understood or examined problems in that area
have been booted into the next cycle for more work: streaming bitmap
heapscan, several streaming vacuum patches, and more, basically things
that seem to have more complex interactions with other machinery. The
only three places using streaming I/O that went in were:
041b9680: Use streaming I/O in ANALYZE.
b7b0f3f2: Use streaming I/O in sequential scans.
3a352df0: Use streaming I/O in pg_prewarm.
The first is a good first exercise in streaming random blocks;
hopefully no one would be too upset about an unexpected small
regression in ANALYZE, but as it happens it goes faster hot and cold
according to all reports. The second is a good first exercise in
streaming sequential blocks, and it ranges from faster to no
regression, according to testing and reports. The third is less
important, but it also goes faster.
Of those, streaming seq scan is clearly the most relevant to real
workloads that someone might be upset about, and I made a couple of
choices that you might say had damage control in mind:
* A conservative choice not to get into the business of the issuing
new hints to the kernel for random jumps in cold scans, even though we
think we probably should for better performance: more research needed
precisely to avoid unexpected interactions (cf the booted bitmap
heapscan where that sort of thing seems to be afoot).
* A GUC to turn off I/O combining if it somehow upsets your storage in
ways we didn't foresee (io_combine_limit=1).
For fully cached hot scans, it does seem to be quite sensitive to tiny
changes in a hot code path that I and others spent a lot of time
optimising and testing during the CF. Perhaps it is possible that
someone else's microarchitecture or compiler could show a regression
that I don't see, and I will certainly look into it with vim and
vigour if so. In that case we could consider a tiny
micro-optimisation that I've shared already (it seemed a little novel
so I'd rather propose it in the new cycle if I can), or, if it comes
to it based on evidence and inability to address a problem quickly,
reverting just b7b0f3f2 which itself is a very small patch.
An aspect you didn't mention is correctness. I don't actually know
how to prove that buffer manager protocols are correct beyond thinking
and torture testing, ie what kind of new test harness machinery could
be used to cross-check more things about buffer pool state explicitly,
and that is a weakness I'm planning to look into.
I realise that "these are the good ones, you should see all the stuff
we decided not to commit!" is not an argument, I'm just laying out how
I see the patches that went in and why I thought they were good. It's
almost an architectural change, but done in tiny footsteps. I
appreciate that people would have liked to see those particular tiny
footsteps in some of the other fine months available for patching the
tree, and some of the earlier underpinning patches that were part of
the same patch series did go in around New Year, but clearly my
"commit spreading" didn't go as well as planned after that (not helped
by Jan/Feb summer vacation season down here).
Mr Paquier this year announced his personal code freeze a few weeks
back on social media, which seemed like an interesting idea I might
adopt. Perhaps that is what some other people are doing without
saying so, and perhaps the time they are using for that is the end of
the calendar year. I might still be naturally inclined to crunch-like
behaviour, but it wouldn't be at the same time as everyone else,
except all the people who follow the same advice.
On Tue, Apr 09, 2024 at 09:35:00AM +1200, Thomas Munro wrote:
Mr Paquier this year announced his personal code freeze a few weeks
back on social media, which seemed like an interesting idea I might
adopt. Perhaps that is what some other people are doing without
saying so, and perhaps the time they are using for that is the end of
the calendar year. I might still be naturally inclined to crunch-like
behaviour, but it wouldn't be at the same time as everyone else,
except all the people who follow the same advice.
That's more linked to the fact that I was going silent without a
laptop for a few weeks before the end of the release cycle, and a way
to say to not count on me, while I was trying to keep my room clean to
avoid noise for others who would rush patches. It is a vacation
period for schools in Japan as the fiscal year finishes at the end of
March, while the rest of the world still studies/works, so that makes
trips much easier with areas being less busy when going abroad. If
you want to limit commit activity during this period, the answer is
simple then: require that all the committers live in Japan.
Jokes apart, I really try to split commit effort across the year and
not rush things at the last minute. If something's not agreed upon
and commit-ready by the 15th of March, the chances that I would apply
it within the release cycle are really slim. That's a kind of
personal policy I have in place for a few years now.
--
Michael
On 4/8/24 21:47, Robert Haas wrote:
On Mon, Apr 8, 2024 at 10:42 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Can you elaborate, which patches you think were not ready? Let's make
sure to capture any concrete concerns in the Open Items list....
- Incremental backup could have all sorts of terrible data-corrupting
bugs. 55a5ee30cd65886ff0a2e7ffef4ec2816fbec273 was a full brown-paper
bag level fix, so the chances of there being further problems are
probably high.
I don't feel too particularly worried about this. Yes, backups are super
important because it's often the only thing you have left when things go
wrong, and the incremental aspect is all new. The code I've seen while
doing the CoW-related patches seemed very precise and careful, and the
one bug we found & fixed does not make it bad.
Sure, I can't rule there being more bugs, but I've been doing some
pretty extensive stress testing of this (doing incremental backups +
combinebackup, and comparing the results against the source, and that
sort of stuff). And so far only that single bug this way. I'm still
doing this randomized stress testing, with more and more complex
workloads etc. and I'll let keep doing that for a while.
Maybe I'm a bit too happy-go-lucky, but IMO the risk here is limited.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Apr 09, 2024 at 01:16:02AM +0200, Tomas Vondra wrote:
I don't feel too particularly worried about this. Yes, backups are super
important because it's often the only thing you have left when things go
wrong, and the incremental aspect is all new. The code I've seen while
doing the CoW-related patches seemed very precise and careful, and the
one bug we found & fixed does not make it bad.Sure, I can't rule there being more bugs, but I've been doing some
pretty extensive stress testing of this (doing incremental backups +
combinebackup, and comparing the results against the source, and that
sort of stuff). And so far only that single bug this way. I'm still
doing this randomized stress testing, with more and more complex
workloads etc. and I'll let keep doing that for a while.Maybe I'm a bit too happy-go-lucky, but IMO the risk here is limited.
Even if there's a critical bug, there are still other ways to take
backups, so there is an exit route even if a problem is found and even
if this problem requires a complex solution to be able to work
correctly.
This worries me less than other patches like the ones around heap
changes or the radix tree stuff for TID collection plugged into
vacuum, which don't have explicit on/off switches AFAIK.
--
Michael
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Apr 8, 2024 at 10:42 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Can you elaborate, which patches you think were not ready? Let's make
sure to capture any concrete concerns in the Open Items list.
Hi,
I'm moving this topic to a new thread for better visibility and less
admixture of concerns. I'd like to invite everyone here present to
opine on which patches we ought to be worried about. Here are a few
picks from me to start things off. My intention here is to convey "I
find these scary" rather than "these commits were irresponsible," so I
respectfully ask that you don't take the choice to list your patch
here as an attack, or the choice not to list your patch here as an
endorsement.
I have another one that I'm not terribly happy about:
Author: Alexander Korotkov <akorotkov@postgresql.org>
Branch: master [72bd38cc9] 2024-04-08 01:27:52 +0300
Transform OR clauses to ANY expression
I don't know that I'd call it scary exactly, but I do think it
was premature. A week ago there was no consensus that it was
ready to commit, but Alexander pushed it (or half of it, anyway)
despite that. A few concrete concerns:
* Yet another planner GUC. Do we really need or want that?
* What the medical community would call off-label usage of
query jumbling. I'm not sure this is even correct as-used,
and for sure it's using that code for something never intended.
Nor is the added code adequately (as in, at all) documented.
* Patch refuses to group anything but Consts into the SAOP
transformation. I realize that if you want to produce an
array Const you need Const inputs, but I wonder why it
wasn't considered to produce an ARRAY[] construct if there
are available clauses with pseudo-constant (eg Param)
comparison values.
* I really, really dislike jamming this logic into prepqual.c,
where it has no business being. I note that it was shoved
into process_duplicate_ors without even the courtesy of
expanding the header comment:
* process_duplicate_ors
* Given a list of exprs which are ORed together, try to apply
* the inverse OR distributive law.
Another reason to think this wasn't a very well chosen place is
that the file's list of #include's went from 4 entries to 11.
Somebody should have twigged to the idea that this was off-topic
for prepqual.c.
* OrClauseGroupKey is not a Node type, so why does it have
a NodeTag? I wonder what value will appear in that field,
and what will happen if the struct is passed to any code
that expects real Nodes.
I could probably find some other nits if I spent more time
on it, but I think these are sufficient to show that this
was not commit-ready.
regards, tom lane
On 9/4/2024 09:12, Tom Lane wrote:
I have another one that I'm not terribly happy about:
Author: Alexander Korotkov <akorotkov@postgresql.org>
Branch: master [72bd38cc9] 2024-04-08 01:27:52 +0300Transform OR clauses to ANY expression
Because I'm primary author of the idea, let me answer.
I don't know that I'd call it scary exactly, but I do think it
was premature. A week ago there was no consensus that it was
ready to commit, but Alexander pushed it (or half of it, anyway)
despite that. A few concrete concerns:* Yet another planner GUC. Do we really need or want that?
It is the most interesting question here. Looking around planner
features designed but not applied for the same reason because they can
produce suboptimal plans in corner cases, I think about inventing
flag-type parameters and hiding some features that work better for
different load types under such flagged parameters.
* What the medical community would call off-label usage of
query jumbling. I'm not sure this is even correct as-used,
and for sure it's using that code for something never intended.
Nor is the added code adequately (as in, at all) documented.
I agree with documentation and disagree with critics on the expression
jumbling. It was introduced in the core. Why don't we allow it to be
used to speed up machinery with some hashing?
* Patch refuses to group anything but Consts into the SAOP
transformation. I realize that if you want to produce an
array Const you need Const inputs, but I wonder why it
wasn't considered to produce an ARRAY[] construct if there
are available clauses with pseudo-constant (eg Param)
comparison values.
Good point. I think, we can consider that in the future.
* I really, really dislike jamming this logic into prepqual.c,
where it has no business being. I note that it was shoved
into process_duplicate_ors without even the courtesy of
expanding the header comment:
Yeah, I preferred to do it in parse_expr.c with the assumption of some
'minimal' or 'canonical' tree form. You can see this code in the
previous version. I think we don't have any bugs here, but we have
different opinions on how it should work.
* process_duplicate_ors
* Given a list of exprs which are ORed together, try to apply
* the inverse OR distributive law.Another reason to think this wasn't a very well chosen place is
that the file's list of #include's went from 4 entries to 11.
Somebody should have twigged to the idea that this was off-topic
for prepqual.c.* OrClauseGroupKey is not a Node type, so why does it have
a NodeTag? I wonder what value will appear in that field,
and what will happen if the struct is passed to any code
that expects real Nodes.
It's a hack authored by Alexander. I guess He can provide additional
reasons in support of that.
I could probably find some other nits if I spent more time
on it, but I think these are sufficient to show that this
was not commit-ready.
It's up to you. On the one hand, I don't see any bugs or strong
performance issues, and all the issues can be resolved further; on the
other hand, I've got your good review and some ideas to work out.
--
regards,
Andrei Lepikhov
Postgres Professional
Andrei Lepikhov <a.lepikhov@postgrespro.ru> writes:
On 9/4/2024 09:12, Tom Lane wrote:
I have another one that I'm not terribly happy about:
Author: Alexander Korotkov <akorotkov@postgresql.org>
Branch: master [72bd38cc9] 2024-04-08 01:27:52 +0300
Transform OR clauses to ANY expression
* What the medical community would call off-label usage of
query jumbling. I'm not sure this is even correct as-used,
and for sure it's using that code for something never intended.
Nor is the added code adequately (as in, at all) documented.
I agree with documentation and disagree with critics on the expression
jumbling. It was introduced in the core. Why don't we allow it to be
used to speed up machinery with some hashing?
I would back up from that a good deal: why do we need to hash here in
the first place? There's no evidence I'm aware of that it's needful
from a performance standpoint.
* I really, really dislike jamming this logic into prepqual.c,
where it has no business being. I note that it was shoved
into process_duplicate_ors without even the courtesy of
expanding the header comment:
Yeah, I preferred to do it in parse_expr.c with the assumption of some
'minimal' or 'canonical' tree form.
That seems quite the wrong direction to me. AFAICS, the argument
for making this transformation depends on being able to convert
to an indexscan condition, so I would try to apply it even later,
when we have a set of restriction conditions to apply to a particular
baserel. (This would weaken the argument that we need hashing
rather than naive equal() tests even further, I think.) Applying
the transform to join quals seems unlikely to be a win.
regards, tom lane
On 09.04.24 00:58, Michael Paquier wrote:
That's more linked to the fact that I was going silent without a
laptop for a few weeks before the end of the release cycle, and a way
to say to not count on me, while I was trying to keep my room clean to
avoid noise for others who would rush patches. It is a vacation
period for schools in Japan as the fiscal year finishes at the end of
March, while the rest of the world still studies/works, so that makes
trips much easier with areas being less busy when going abroad. If
you want to limit commit activity during this period, the answer is
simple then: require that all the committers live in Japan.
Well, due to the Easter holiday being earlier this year, I adopted a
similar approach: Go on vacation the last week of March and watch the
rest from the pool. :) So far I feel this was better for my well-being.
On 4/9/24 01:33, Michael Paquier wrote:
On Tue, Apr 09, 2024 at 01:16:02AM +0200, Tomas Vondra wrote:
I don't feel too particularly worried about this. Yes, backups are super
important because it's often the only thing you have left when things go
wrong, and the incremental aspect is all new. The code I've seen while
doing the CoW-related patches seemed very precise and careful, and the
one bug we found & fixed does not make it bad.Sure, I can't rule there being more bugs, but I've been doing some
pretty extensive stress testing of this (doing incremental backups +
combinebackup, and comparing the results against the source, and that
sort of stuff). And so far only that single bug this way. I'm still
doing this randomized stress testing, with more and more complex
workloads etc. and I'll let keep doing that for a while.Maybe I'm a bit too happy-go-lucky, but IMO the risk here is limited.
Even if there's a critical bug, there are still other ways to take
backups, so there is an exit route even if a problem is found and even
if this problem requires a complex solution to be able to work
correctly.
I think it's a bit more nuanced, because it's about backups/restore. The
bug might be subtle, and you won't learn about it until the moment when
you need to restore (or perhaps even long after that). At which point
"You might have taken the backup in some other way." is not really a
viable exit route.
Anyway, I'm still not worried about this particular feature, and I'll
keep doing the stress testing.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Apr 9, 2024 at 7:24 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
I think it's a bit more nuanced, because it's about backups/restore. The
bug might be subtle, and you won't learn about it until the moment when
you need to restore (or perhaps even long after that). At which point
"You might have taken the backup in some other way." is not really a
viable exit route.Anyway, I'm still not worried about this particular feature, and I'll
keep doing the stress testing.
In all sincerity, I appreciate the endorsement. Basically what's been
scaring me about this feature is the possibility that there's some
incurable design flaw that I've managed to completely miss. If it has
some more garden-variety bugs, that's still pretty bad: people will
potentially lose data and be unable to get it back. But, as long as
we're able to find the bugs and fix them, the situation should improve
over time until, hopefully, everybody trusts it roughly as much as we
trust, say, crash recovery. Perhaps even a bit more: I think this code
is much better-written than our crash recovery code, which has grown
into a giant snarl that nobody seems able to untangle, despite
multiple refactoring attempts. However, if there's some reason why the
approach is fundamentally unsound which I and others have failed to
detect, then we're at risk of shipping a feature that is irretrievably
broken. That would really suck.
I'm fairly hopeful that there is no such design defect: I certainly
can't think of one. But, it's much easier to imagine an incurable
problem here than with, say, the recent pruning+freezing changes.
Those changes might have bugs, and those bugs might be hard to find,
but if they do exist and are found, they can be fixed. Here, it's a
little less obvious that that's true. We're relying on our ability, at
incremental backup time, to sort out from the manifest and the WAL
summaries, what needs to be included in the backup in order for a
subsequent pg_combinebackup operation to produce correct results. The
basic idea is simple enough, but the details are complicated, and it
feels like a subtle defect in the algorithm could potentially scuttle
the whole thing. I'd certainly appreciate having more smart people try
to think of things that I might have overlooked.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 9/4/2024 12:55, Tom Lane wrote:
Andrei Lepikhov <a.lepikhov@postgrespro.ru> writes:
* I really, really dislike jamming this logic into prepqual.c,
where it has no business being. I note that it was shoved
into process_duplicate_ors without even the courtesy of
expanding the header comment:Yeah, I preferred to do it in parse_expr.c with the assumption of some
'minimal' or 'canonical' tree form.That seems quite the wrong direction to me. AFAICS, the argument
for making this transformation depends on being able to convert
to an indexscan condition, so I would try to apply it even later,
when we have a set of restriction conditions to apply to a particular
baserel. (This would weaken the argument that we need hashing
rather than naive equal() tests even further, I think.) Applying
the transform to join quals seems unlikely to be a win.
Our first prototype did this job right at the stage of index path
creation. Unfortunately, this approach was too narrow and expensive.
The most problematic cases we encountered were from BitmapOr paths: if
an incoming query has a significant number of OR clauses, the optimizer
spends a lot of time generating these, in most cases, senseless paths
(remember also memory allocated for that purpose). Imagine how much
worse the situation becomes when we scale it with partitions.
Another issue we resolved with this transformation: shorter list of
clauses speeds up planning and, sometimes, makes cardinality estimation
more accurate.
Moreover, it helps even SeqScan: attempting to find a value in the
hashed array is much faster than cycling a long-expression on each
incoming tuple.
One more idea that I have set aside here is that the planner can utilize
quick clause hashing:
From time to time, in the mailing list, I see disputes on different
approaches to expression transformation/simplification/grouping, and
most of the time, it ends up with the problem of search complexity.
Clause hash can be a way to solve this, can't it?
--
regards,
Andrei Lepikhov
Postgres Professional
Hi,
On Tuesday, April 9th, 2024 at 2:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:
In all sincerity, I appreciate the endorsement. Basically what's been
scaring me about this feature is the possibility that there's some
incurable design flaw that I've managed to completely miss. If it has
some more garden-variety bugs, that's still pretty bad: people will
potentially lose data and be unable to get it back. But, as long as
we're able to find the bugs and fix them, the situation should improve
over time until, hopefully, everybody trusts it roughly as much as we
trust, say, crash recovery. Perhaps even a bit more: I think this code
is much better-written than our crash recovery code, which has grown
into a giant snarl that nobody seems able to untangle, despite
multiple refactoring attempts. However, if there's some reason why the
approach is fundamentally unsound which I and others have failed to
detect, then we're at risk of shipping a feature that is irretrievably
broken. That would really suck.
IMHO it totally worth shipping such long-waited feature sooner than later.
Yes, it is a complex one, but you started advertising it since last January already, so people should already be able to play with it in Beta.
And as you mentioned in your blog about the evergreen backup:
But if you're anything like me, you'll already see that this arrangement
has two serious weaknesses. First, if there are any data-corrupting bugs
in pg_combinebackup or any of the server-side code that supports
incremental backup, this approach could get you into big trouble.
At some point, the only way to really validate a backup is to actually try to restore it.
And if people get encouraged to do that faster thanks to incremental backups, they could detect potential issues sooner.
Ultimately, users will still need their full backups and WAL archives.
If pg_combinebackup fails for any reason, the fix will be to perform the recovery from the full backup directly.
They still should be able to recover, just slower.
--
Stefan FERCOT
Data Egret (https://dataegret.com)
On 2024-Apr-09, Stefan Fercot wrote:
At some point, the only way to really validate a backup is to actually try to restore it.
And if people get encouraged to do that faster thanks to incremental backups, they could detect potential issues sooner.
Ultimately, users will still need their full backups and WAL archives.
If pg_combinebackup fails for any reason, the fix will be to perform the recovery from the full backup directly.
They still should be able to recover, just slower.
I completely agree that people should be testing the feature so that we
can fix any bugs as soon as possible. However, if my understanding is
correct, restoring a full backup plus an incremental no longer needs the
intervening WAL up to the incremental. Users wishing to save some disk
space might be tempted to delete that WAL. If they do, and later it
turns out that the full+incremental cannot be restored for whatever
reason, they are in danger.
But you're right that if they don't delete that WAL, then the full is
restorable on its own.
Maybe we should explicitly advise users to not delete that WAL from
their archives, until pg_combinebackup is hammered a bit more.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"I must say, I am absolutely impressed with what pgsql's implementation of
VALUES allows me to do. It's kind of ridiculous how much "work" goes away in
my code. Too bad I can't do this at work (Oracle 8/9)." (Tom Allison)
http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php
On 9 Apr 2024, at 18:45, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Maybe we should explicitly advise users to not delete that WAL from
their archives, until pg_combinebackup is hammered a bit more.
As a backup tool maintainer, I always reference to out-of-the box Postgres tools as some bulletproof alternative.
I really would like to stick to this reputation and not discredit these tools.
Best regards, Andrey Borodin.
On Mon, Apr 8, 2024 at 10:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't know that I'd call it scary exactly, but I do think it
was premature. A week ago there was no consensus that it was
ready to commit, but Alexander pushed it (or half of it, anyway)
despite that.
Some of the most compelling cases for the transformation will involve
path keys. If the transformation enables the optimizer to build a
plain index scan (or index-only scan) with useful path keys, then that
might well lead to a far superior plan compared to what's possible
with BitmapOrs.
I understand that it'll still be possible to use OR expression
evaluation in such cases, without applying the transformation (via
filter quals), so in principle you don't need the transformation to
get an index scan that can (say) terminate the scan early due to the
presence of an "ORDER BY ... LIMIT n". But I suspect that that won't
work out much of the time, because the planner will believe (rightly
or wrongly) that the filter quals will incur too many heap page
accesses.
Another problem (at least as I see it) with the new
or_to_any_transform_limit GUC is that its design seems to have nothing
to say about the importance of these sorts of cases. Most of these
cases will only have 2 or 3 constants, just because that's what's most
common in general.
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
On Mon, Apr 8, 2024 at 10:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't know that I'd call it scary exactly, but I do think it
was premature. A week ago there was no consensus that it was
ready to commit, but Alexander pushed it (or half of it, anyway)
despite that.
Some of the most compelling cases for the transformation will involve
path keys. If the transformation enables the optimizer to build a
plain index scan (or index-only scan) with useful path keys, then that
might well lead to a far superior plan compared to what's possible
with BitmapOrs.
I did not say it isn't a useful thing to have. I said the patch
did not appear ready to go in.
I understand that it'll still be possible to use OR expression
evaluation in such cases, without applying the transformation (via
filter quals), so in principle you don't need the transformation to
get an index scan that can (say) terminate the scan early due to the
presence of an "ORDER BY ... LIMIT n". But I suspect that that won't
work out much of the time, because the planner will believe (rightly
or wrongly) that the filter quals will incur too many heap page
accesses.
That's probably related to the fact that we don't have a mechanism
for evaluating non-indexed quals against columns that are retrievable
from the index. We really oughta work on getting that done. But
I've been keeping a side eye on the work to unify plain and index-only
scans, because that's going to touch a lot of the same code so it
doesn't seem profitable to pursue those goals in parallel.
Another problem (at least as I see it) with the new
or_to_any_transform_limit GUC is that its design seems to have nothing
to say about the importance of these sorts of cases. Most of these
cases will only have 2 or 3 constants, just because that's what's most
common in general.
Yeah, that's one of the reasons I'm dubious that the committed
patch was ready.
regards, tom lane
On Tue, Apr 9, 2024 at 12:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Some of the most compelling cases for the transformation will involve
path keys. If the transformation enables the optimizer to build a
plain index scan (or index-only scan) with useful path keys, then that
might well lead to a far superior plan compared to what's possible
with BitmapOrs.I did not say it isn't a useful thing to have. I said the patch
did not appear ready to go in.
Didn't mean to suggest otherwise. I just wanted to hear your thoughts
on this aspect of these sorts of transformations.
I understand that it'll still be possible to use OR expression
evaluation in such cases, without applying the transformation (via
filter quals), so in principle you don't need the transformation to
get an index scan that can (say) terminate the scan early due to the
presence of an "ORDER BY ... LIMIT n". But I suspect that that won't
work out much of the time, because the planner will believe (rightly
or wrongly) that the filter quals will incur too many heap page
accesses.That's probably related to the fact that we don't have a mechanism
for evaluating non-indexed quals against columns that are retrievable
from the index. We really oughta work on getting that done.
I agree that that is very important work, but I'm not sure that it
makes all that much difference here. Even if we had that improved
mechanism already, today, using index quals would still be strictly
better than expression evaluation. Index quals can allow nbtree to
skip over irrelevant parts of the index as the need arises, which is a
significant advantage in its own right.
ISTM that the planner should always prefer index quals over expression
evaluation, on general principle, even when there's no reason to think
it'll work out. At worst the executor has essentially the same
physical access patterns as the expression evaluation case. On the
other hand, providing nbtree with that context might end up being a
great deal faster.
--
Peter Geoghegan
On Tue, Apr 9, 2024 at 8:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrei Lepikhov <a.lepikhov@postgrespro.ru> writes:
On 9/4/2024 09:12, Tom Lane wrote:
I have another one that I'm not terribly happy about:
Author: Alexander Korotkov <akorotkov@postgresql.org>
Branch: master [72bd38cc9] 2024-04-08 01:27:52 +0300
Transform OR clauses to ANY expression* What the medical community would call off-label usage of
query jumbling. I'm not sure this is even correct as-used,
and for sure it's using that code for something never intended.
Nor is the added code adequately (as in, at all) documented.I agree with documentation and disagree with critics on the expression
jumbling. It was introduced in the core. Why don't we allow it to be
used to speed up machinery with some hashing?I would back up from that a good deal: why do we need to hash here in
the first place? There's no evidence I'm aware of that it's needful
from a performance standpoint.
I think the feature is aimed to deal with large OR lists. I've seen a
significant degradation on 10000 or-clause-groups. That might seem
like awfully a lot, but actually it's not unachievable in generated
queries.
Links.
1. /messages/by-id/CAPpHfduJtO0s9E=SHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g@mail.gmail.com
------
Regards,
Alexander Korotkov
On Tue, Apr 9, 2024 at 5:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
* OrClauseGroupKey is not a Node type, so why does it have
a NodeTag? I wonder what value will appear in that field,
and what will happen if the struct is passed to any code
that expects real Nodes.
I used that to put both not-subject-of-transform nodes together with
hash entries into the same list. This is used to save the order of
clauses. I think this is an important property, and I have already
expressed it in [1]. That could be achieved without adding NodeTag to
hash entries, but that would require a level of indirection. It's not
passed to code that expects real Nodes, it doesn't go to anything
except lists.
Links.
1. /messages/by-id/CAPpHfdutHt31sdt2rfU=4fsDMWxf6tvtnHARgCzLY2Tf21+fgw@mail.gmail.com
------
Regards,
Alexander Korotkov