why there is not VACUUM FULL CONCURRENTLY?
Hi
I have one question, what is a block of implementation of some variant of
VACUUM FULL like REINDEX CONCURRENTLY? Why similar mechanism of REINDEX
CONCURRENTLY cannot be used for VACUUM FULL?
Regards
Pavel
On Tue, Jan 30, 2024 at 09:01:57AM +0100, Pavel Stehule wrote:
I have one question, what is a block of implementation of some variant of
VACUUM FULL like REINDEX CONCURRENTLY? Why similar mechanism of REINDEX
CONCURRENTLY cannot be used for VACUUM FULL?
You may be interested in these threads:
/messages/by-id/CAB7nPqTGmNUFi+W6F1iwmf7J-o6sY+xxo6Yb=mkUVYT-CG-B5A@mail.gmail.com
/messages/by-id/CAB7nPqTys6JUQDxUczbJb0BNW0kPrW8WdZuk11KaxQq6o98PJg@mail.gmail.com
VACUUM FULL is CLUSTER under the hoods. One may question whether it
is still a relevant discussion these days if we assume that autovacuum
is able to keep up, because it always keeps up with the house cleanup,
right? ;)
More seriously, we have a lot more options these days with VACUUM like
PARALLEL, so CONCURRENTLY may still have some uses, but the new toys
available may have changed things. So, would it be worth the
complexities around heap manipulations that lower locks would require?
--
Michael
út 30. 1. 2024 v 9:14 odesílatel Michael Paquier <michael@paquier.xyz>
napsal:
On Tue, Jan 30, 2024 at 09:01:57AM +0100, Pavel Stehule wrote:
I have one question, what is a block of implementation of some variant of
VACUUM FULL like REINDEX CONCURRENTLY? Why similar mechanism of REINDEX
CONCURRENTLY cannot be used for VACUUM FULL?You may be interested in these threads:
/messages/by-id/CAB7nPqTGmNUFi+W6F1iwmf7J-o6sY+xxo6Yb=mkUVYT-CG-B5A@mail.gmail.com
/messages/by-id/CAB7nPqTys6JUQDxUczbJb0BNW0kPrW8WdZuk11KaxQq6o98PJg@mail.gmail.com
VACUUM FULL is CLUSTER under the hoods. One may question whether it
is still a relevant discussion these days if we assume that autovacuum
is able to keep up, because it always keeps up with the house cleanup,
right? ;)More seriously, we have a lot more options these days with VACUUM like
PARALLEL, so CONCURRENTLY may still have some uses, but the new toys
available may have changed things. So, would it be worth the
complexities around heap manipulations that lower locks would require?
One of my customer today is reducing one table from 140GB to 20GB. Now he
is able to run archiving. He should play with pg_repack, and it is working
well today, but I ask myself, what pg_repack does not be hard to do
internally because it should be done for REINDEX CONCURRENTLY. This is not
a common task, and not will be, but on the other hand, it can be nice to
have feature, and maybe not too hard to implement today. But I didn't try it
I'll read the threads
Pavel
--
Show quoted text
Michael
On 2024-Jan-30, Pavel Stehule wrote:
One of my customer today is reducing one table from 140GB to 20GB. Now he
is able to run archiving. He should play with pg_repack, and it is working
well today, but I ask myself, what pg_repack does not be hard to do
internally because it should be done for REINDEX CONCURRENTLY. This is not
a common task, and not will be, but on the other hand, it can be nice to
have feature, and maybe not too hard to implement today. But I didn't try it
FWIW a newer, more modern and more trustworthy alternative to pg_repack
is pg_squeeze, which I discovered almost by random chance, and soon
discovered I liked it much more.
So thinking about your question, I think it might be possible to
integrate a tool that works like pg_squeeze, such that it runs when
VACUUM is invoked -- either under some new option, or just replace the
code under FULL, not sure. If the Cybertec people allows it, we could
just grab the pg_squeeze code and add it to the things that VACUUM can
run.
Now, pg_squeeze has some additional features, such as periodic
"squeezing" of tables. In a first attempt, for simplicity, I would
leave that stuff out and just allow it to run from the user invoking it,
and then have the command to do a single run. (The scheduling features
could be added later, or somehow integrated into autovacuum, or maybe
something else.)
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"We're here to devour each other alive" (Hobbes)
út 30. 1. 2024 v 11:31 odesílatel Alvaro Herrera <alvherre@alvh.no-ip.org>
napsal:
On 2024-Jan-30, Pavel Stehule wrote:
One of my customer today is reducing one table from 140GB to 20GB. Now
he
is able to run archiving. He should play with pg_repack, and it is
working
well today, but I ask myself, what pg_repack does not be hard to do
internally because it should be done for REINDEX CONCURRENTLY. This isnot
a common task, and not will be, but on the other hand, it can be nice to
have feature, and maybe not too hard to implement today. But I didn'ttry it
FWIW a newer, more modern and more trustworthy alternative to pg_repack
is pg_squeeze, which I discovered almost by random chance, and soon
discovered I liked it much more.So thinking about your question, I think it might be possible to
integrate a tool that works like pg_squeeze, such that it runs when
VACUUM is invoked -- either under some new option, or just replace the
code under FULL, not sure. If the Cybertec people allows it, we could
just grab the pg_squeeze code and add it to the things that VACUUM can
run.Now, pg_squeeze has some additional features, such as periodic
"squeezing" of tables. In a first attempt, for simplicity, I would
leave that stuff out and just allow it to run from the user invoking it,
and then have the command to do a single run. (The scheduling features
could be added later, or somehow integrated into autovacuum, or maybe
something else.)
some basic variant (without autovacuum support) can be good enough. We have
no autovacuum support for REINDEX CONCURRENTLY and I don't see a necessity
for it (sure, it can be limited by my perspective) . The necessity of
reducing table size is not too common (a lot of use cases are better
covered by using partitioning), but sometimes it is, and then buildin
simple available solution can be helpful.
Show quoted text
--
Álvaro Herrera PostgreSQL Developer —
https://www.EnterpriseDB.com/
"We're here to devour each other alive" (Hobbes)
On 2024-Jan-30, Pavel Stehule wrote:
some basic variant (without autovacuum support) can be good enough. We have
no autovacuum support for REINDEX CONCURRENTLY and I don't see a necessity
for it (sure, it can be limited by my perspective) . The necessity of
reducing table size is not too common (a lot of use cases are better
covered by using partitioning), but sometimes it is, and then buildin
simple available solution can be helpful.
That's my thinking as well.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Tue, Jan 30, 2024 at 12:37:12PM +0100, Alvaro Herrera wrote:
On 2024-Jan-30, Pavel Stehule wrote:
some basic variant (without autovacuum support) can be good enough. We have
no autovacuum support for REINDEX CONCURRENTLY and I don't see a necessity
for it (sure, it can be limited by my perspective) . The necessity of
reducing table size is not too common (a lot of use cases are better
covered by using partitioning), but sometimes it is, and then buildin
simple available solution can be helpful.That's my thinking as well.
Or, yes, I'd agree about that. This can make for a much better user
experience. I'm just not sure how that stuff would be shaped and how
much ground it would need to cover.
--
Michael
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2024-Jan-30, Pavel Stehule wrote:
One of my customer today is reducing one table from 140GB to 20GB. Now he
is able to run archiving. He should play with pg_repack, and it is working
well today, but I ask myself, what pg_repack does not be hard to do
internally because it should be done for REINDEX CONCURRENTLY. This is not
a common task, and not will be, but on the other hand, it can be nice to
have feature, and maybe not too hard to implement today. But I didn't try itFWIW a newer, more modern and more trustworthy alternative to pg_repack
is pg_squeeze, which I discovered almost by random chance, and soon
discovered I liked it much more.So thinking about your question, I think it might be possible to
integrate a tool that works like pg_squeeze, such that it runs when
VACUUM is invoked -- either under some new option, or just replace the
code under FULL, not sure. If the Cybertec people allows it, we could
just grab the pg_squeeze code and add it to the things that VACUUM can
run.
There are no objections from Cybertec. Nevertheless, I don't expect much code
to be just copy & pasted. If I started to implement the extension today, I'd
do some things in a different way. (Some things might actually be simpler in
the core, i.e. a few small changes in PG core are easier than the related
workarounds in the extension.)
The core idea is that: 1) a "historic snapshot" is used to get the current
contents of the table, 2) logical decoding is used to capture the changes done
while the data is being copied to new storage, 3) the exclusive lock on the
table is only taken for very short time, to swap the storage (relfilenode) of
the table.
I think it should be coded in a way that allows use by VACUUM FULL, CLUSTER,
and possibly some subcommands of ALTER TABLE. For example, some users of
pg_squeeze requested an enhancement that allows the user to change column data
type w/o service disruption (typically when it appears that integer type is
going to overflow and change bigint is needed).
Online (re)partitioning could be another use case, although I admit that
commands that change the system catalog are a bit harder to implement than
VACUUM FULL / CLUSTER.
One thing that pg_squeeze does not handle is visibility: it uses heap_insert()
to insert the tuples into the new storage, so the problems described in [1]https://www.postgresql.org/docs/current/mvcc-caveats.html
can appear. The in-core implementation should rather do something like tuple
rewriting (rewriteheap.c).
Is your plan to work on it soon or should I try to write a draft patch? (I
assume this is for PG >= 18.)
[1]: https://www.postgresql.org/docs/current/mvcc-caveats.html
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
This is great to hear.
On 2024-Jan-31, Antonin Houska wrote:
Is your plan to work on it soon or should I try to write a draft patch? (I
assume this is for PG >= 18.)
I don't have plans for it, so if you have resources, please go for it.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
This is great to hear.
On 2024-Jan-31, Antonin Houska wrote:
Is your plan to work on it soon or should I try to write a draft patch? (I
assume this is for PG >= 18.)I don't have plans for it, so if you have resources, please go for it.
ok, I'm thinking how can the feature be integrated into the core.
BTW, I'm failing to understand why cluster_rel() has no argument of the
BufferAccessStrategy type. According to buffer/README, the criterion for using
specific strategy is that page "is unlikely to be needed again
soon". Specifically for cluster_rel(), the page will *definitely* not be used
again (unless the VACCUM FULL/CLUSTER command fails): BufferTag contains the
relatin file number and the old relation file is eventually dropped.
Am I missing anything?
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
On 2024-Feb-16, Antonin Houska wrote:
BTW, I'm failing to understand why cluster_rel() has no argument of the
BufferAccessStrategy type. According to buffer/README, the criterion for using
specific strategy is that page "is unlikely to be needed again
soon". Specifically for cluster_rel(), the page will *definitely* not be used
again (unless the VACCUM FULL/CLUSTER command fails): BufferTag contains the
relatin file number and the old relation file is eventually dropped.Am I missing anything?
No, that's just an oversight. Access strategies are newer than that
cluster code.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Most hackers will be perfectly comfortable conceptualizing users as entropy
sources, so let's move on." (Nathaniel Smith)
https://mail.gnu.org/archive/html/monotone-devel/2007-01/msg00080.html
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Is your plan to work on it soon or should I try to write a draft patch? (I
assume this is for PG >= 18.)I don't have plans for it, so if you have resources, please go for it.
The first version is attached. The actual feature is in 0003. 0004 is probably
not necessary now, but I haven't realized until I coded it.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Attachments:
v01-0001-Adjust-signature-of-cluster_rel-and-its-subroutines.patchtext/x-diffDownload+99-69
v01-0002-Move-progress-related-fields-from-PgBackendStatus-to.patchtext/x-diffDownload+30-27
v01-0003-Add-CONCURRENTLY-option-to-both-VACUUM-FULL-and-CLUS.patchtext/plainDownload+3876-260
v01-0004-Call-logical_rewrite_heap_tuple-when-applying-concur.patchtext/x-diffDownload+187-60
Antonin Houska <ah@cybertec.at> wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Is your plan to work on it soon or should I try to write a draft patch? (I
assume this is for PG >= 18.)I don't have plans for it, so if you have resources, please go for it.
The first version is attached. The actual feature is in 0003. 0004 is probably
not necessary now, but I haven't realized until I coded it.
The mailing list archive indicates something is wrong with the 0003
attachment. Sending it all again, as *.tar.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Attachments:
vacuum_full_concurrently_v01.tarapplication/x-tarDownload+4192-416
On 2024-Jul-09, Antonin Houska wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Is your plan to work on it soon or should I try to write a draft patch? (I
assume this is for PG >= 18.)I don't have plans for it, so if you have resources, please go for it.
The first version is attached. The actual feature is in 0003. 0004 is probably
not necessary now, but I haven't realized until I coded it.
Thank you, this is great. I'll be studying this during the next
commitfest.
BTW I can apply 0003 from this email perfectly fine, but you're right
that the archives don't show the file name. I suspect the
"Content-Disposition: inline" PLUS the Content-Type text/plain are what
cause the problem -- for instance, [1]/messages/by-id/32781.1714378236@antos doesn't have a problem and they
do have inline content disposition, but the content-type is not
text/plain. In any case, I encourage you not to send patches as
tarballs :-)
[1]: /messages/by-id/32781.1714378236@antos
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2024-Jul-09, Antonin Houska wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Is your plan to work on it soon or should I try to write a draft patch? (I
assume this is for PG >= 18.)I don't have plans for it, so if you have resources, please go for it.
The first version is attached. The actual feature is in 0003. 0004 is probably
not necessary now, but I haven't realized until I coded it.Thank you, this is great. I'll be studying this during the next
commitfest.
Thanks. I'll register it in the CF application.
BTW I can apply 0003 from this email perfectly fine, but you're right
that the archives don't show the file name. I suspect the
"Content-Disposition: inline" PLUS the Content-Type text/plain are what
cause the problem -- for instance, [1] doesn't have a problem and they
do have inline content disposition, but the content-type is not
text/plain. In any case, I encourage you not to send patches as
tarballs :-)
You're right, "Content-Disposition" is the problem. I forgot that "attachment"
is better for patches and my email client (emacs+nmh) defaults to
"inline". I'll pay attention next time.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Hi!
I'm interested in the vacuum concurrently feature being inside the
core, so will try to review patch set and give valuable feedback. For
now, just a few little thoughts..
The first version is attached. The actual feature is in 0003. 0004 is probably
not necessary now, but I haven't realized until I coded it.
The logical replication vacuum approach is a really smart idea, I like
it. As far as I understand, pg_squeeze works well in real production
databases, which
gives us hope that the vacuum concurrent feature in core will be good
too... What is the size of the biggest relation successfully vacuumed
via pg_squeeze?
Looks like in case of big relartion or high insertion load,
replication may lag and never catch up...
However, in general, the 3rd patch is really big, very hard to
comprehend. Please consider splitting this into smaller (and
reviewable) pieces.
Also, we obviously need more tests on this. Both tap-test and
regression tests I suppose.
One more thing is about pg_squeeze background workers. They act in an
autovacuum-like fashion, aren't they? Maybe we can support this kind
of relation processing in core too?
Hi
ne 21. 7. 2024 v 17:13 odesílatel Kirill Reshke <reshkekirill@gmail.com>
napsal:
Hi!
I'm interested in the vacuum concurrently feature being inside the
core, so will try to review patch set and give valuable feedback. For
now, just a few little thoughts..One more thing is about pg_squeeze background workers. They act in an
autovacuum-like fashion, aren't they? Maybe we can support this kind
of relation processing in core too?
I don't think it is necessary when this feature will be an internal
feature.
I agree so this feature is very important, I proposed it (and I very happy
so Tonda implemented it), but I am not sure, if usage of this should be
automatized, and if it should be, then
a) probably autovacuum should do,
b) we can move a discussion after vacuum full concurrently will be merged
to upstream, please. Isn't very practical to have too many open targets.
Regards
Pavel
Also, we obviously need more tests on this. Both tap-test and
regression tests I suppose.
The one simple test to this patch can be done this way:
1) create test relation (call it vac_conc_r1 for example) and fill it
with dead tuples (insert + update or insert + delete)
2) create injection point preventing concurrent vacuum from compiling.
3) run concurrent vacuum (VACUUM FULL CONCURRENTLY) in separate thread
or in some other async way.
4) Insert new data in relation to vac_conc_r1.
5) Release injection point, assert that vacuum completed successfully.
6) check that all data is present in vac_conc_r1 (data from step 1 and
from step 4).
This way we can catch some basic buggs, if some paths of VACUUM
CONCURRENTLY will be touched in the future.
The problem with this test is: i don't know how to do anything async
in current TAP tests (needed in step 3). Also, maybe test with async
interaction
may be too flappy (producing false negative flaps) to support.
Sequential test for this feature would be much better, but I can't
think of one.
Also, should we create a cf entry for this thread already?
On Mon, Jul 22, 2024 at 01:23:03PM +0500, Kirill Reshke wrote:
Also, should we create a cf entry for this thread already?
I was wondering about this as well, but there is one for the upcoming
commitfest already:
https://commitfest.postgresql.org/49/5117/
Michael
Hi!
On Tue, 30 Jan 2024 at 15:31, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
FWIW a newer, more modern and more trustworthy alternative to pg_repack
is pg_squeeze, which I discovered almost by random chance, and soon
discovered I liked it much more.
Can you please clarify this a bit more? What is the exact reason for
pg_squeeze being more trustworthy than pg_repack?
Is there something about the logical replication approach that makes
it more bulletproof than the trigger-based repack approach?
Also, I was thinking about pg_repack vs pg_squeeze being used for the
VACUUM FULL CONCURRENTLY feature, and I'm a bit suspicious about the
latter.
If I understand correctly, we essentially parse the whole WAL to
obtain info about one particular relation changes. That may be a big
overhead, whereas the trigger approach does
not suffer from this. So, there is the chance that VACUUM FULL
CONCURRENTLY will never keep up with vacuumed relation changes. Am I
right?