BRIN autosummarization lacking a snapshot

Started by Alvaro Herrera8 months ago8 messageshackers

alvherre@2ndquadrant.com

8 months ago

If you have a BRIN index with autosummarize=on, and it's based on a
function that requires a snapshot, autovacuum will simply error out when
trying to summarize a range. Here's a reproducer:

create table journal (d timestamp);
create function packdate(d timestamp) returns text as $$
begin return to_char(d, 'yyyymm'); end; $$
language plpgsql returns null on null input immutable;
create index on journal using brin (packdate(d))
with (autosummarize = on, pages_per_range = 1);

Now insert some data,
insert into journal select generate_series(timestamp '2025-01-01', '2025-12-31', '1 day');

and wait for autovacuum to fire. You'll get an error like

2025-11-03 12:17:42.263 CET [536755] ERROR: cannot execute SQL without an outer snapshot or portal

This patch fixes it. I haven't given much thought to adding a good way
to test this yet ...

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Someone said that it is at least an order of magnitude more work to do
production software than a prototype. I think he is wrong by at least
an order of magnitude." (Brian Kernighan)

Michael Paquier

michael@paquier.xyz

8 months ago

In reply to: Alvaro Herrera (#1)

Re: BRIN autosummarization lacking a snapshot

On Mon, Nov 03, 2025 at 12:21:50PM +0100, Alvaro Herrera wrote:

This patch fixes it. I haven't given much thought to adding a good way
to test this yet ...

Spawning an autovacuum worker can feel artistic as we try to make the
tests run fast, but it's not that bad. The trick is to use an
"autovacuum_naptime = 1". Then you could either scan the server logs
for some 'autovacuum: processing database "blah"', or just a polling
query based on pg_stat_all_tables.autovacuum_count. See for example
006_signal_autovacuum.pl.
--
Michael

Alvaro Herrera

alvherre@2ndquadrant.com

8 months ago

In reply to: Michael Paquier (#2)

Re: BRIN autosummarization lacking a snapshot

On 2025-Nov-04, Michael Paquier wrote:

Spawning an autovacuum worker can feel artistic as we try to make the
tests run fast, but it's not that bad. The trick is to use an
"autovacuum_naptime = 1". Then you could either scan the server logs
for some 'autovacuum: processing database "blah"', or just a polling
query based on pg_stat_all_tables.autovacuum_count. See for example
006_signal_autovacuum.pl.

Ah yes ... and, actually, we already have a file doing a closely related
thing, so I added to it. Here's the patch for master. Backbranches are
essentially identical, modulo these changes for 13 and 14:

-use Test::More tests => 2;
+use Test::More tests => 4;

I'm glad we got rid of that :-)

With my initial try of this test, just counting the number of BRIN
tuples, I was _really_ surprised that the index did indeed contain the
expected number of tuples, even when the error was being thrown. This
turned out to be expected, because the way BRIN summarization works is
that we insert a placeholder tuple first, then update it to the correct
value, and the error only aborts the second part. That's why I needed
to add a WHERE clause to only count non-placeholder tuples.

I also added a 'sleep(1)', to avoid looping on the query when we know
autovacuum can't possibly have had a chance to run yet.

I unleashed CI on branches 15 and master, and will push soon if they
both turn green.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La virtud es el justo medio entre dos defectos" (Aristóteles)

Alvaro Herrera

alvherre@2ndquadrant.com

8 months ago

In reply to: Alvaro Herrera (#3)

Re: BRIN autosummarization lacking a snapshot

With my initial try of this test, just counting the number of BRIN
tuples, I was _really_ surprised that the index did indeed contain the
expected number of tuples, even when the error was being thrown. This
turned out to be expected, because the way BRIN summarization works is
that we insert a placeholder tuple first, then update it to the correct
value, and the error only aborts the second part.

One thing that's not fully clear to me, but will test later, is that if
this has happened to you, then the placeholder tuple remains in place
and doesn't ever become non-placeholder. If vacuum (incl. autovacuum)
sees such a tuple, it will gladly ignore the page range, as if it were
already summarized. This makes your index scans potentially
inefficient, because for placeholder tuples bringetbitmap will always
include the affected page range.

I think we should make vacuum clean it up somehow, but I'm not yet sure
how safe it is.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte"
(Ijon Tichy en Viajes, Stanislaw Lem)

Arseniy Mukhin

arseniy.mukhin.dev@gmail.com

8 months ago

In reply to: Alvaro Herrera (#4)

Re: BRIN autosummarization lacking a snapshot

Hi,

On Tue, Nov 4, 2025 at 5:22 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

With my initial try of this test, just counting the number of BRIN
tuples, I was _really_ surprised that the index did indeed contain the
expected number of tuples, even when the error was being thrown. This
turned out to be expected, because the way BRIN summarization works is
that we insert a placeholder tuple first, then update it to the correct
value, and the error only aborts the second part.

One thing that's not fully clear to me, but will test later, is that if
this has happened to you, then the placeholder tuple remains in place
and doesn't ever become non-placeholder. If vacuum (incl. autovacuum)
sees such a tuple, it will gladly ignore the page range, as if it were
already summarized. This makes your index scans potentially
inefficient, because for placeholder tuples bringetbitmap will always
include the affected page range.

I think we should make vacuum clean it up somehow, but I'm not yet sure
how safe it is.

+1. I noticed the same behavior while working on amcheck support for BRIN.

I think it should be safe because we hold ShareUpdateExclusiveLock
during any summarization, desummarization, or vacuum. So it means we
can't have concurrent summarization during the vacuum. So if we
encounter a placeholder during vacuum, it is always the result of
"aborted" summarization. At least brinRevmapDesummarizeRange always
treats placeholders this way. Maybe we can use
brinRevmapDesummarizeRange for this? FWIW there is a deadlock issue
with brinRevmapDesummarizeRange [0]/messages/by-id/261e68bc-f5f5-5234-fb2c-af4f583513c0@enterprisedb.com, maybe it is worth addressing if
we want to start using it during vacuum.

[0]: /messages/by-id/261e68bc-f5f5-5234-fb2c-af4f583513c0@enterprisedb.com

Best regards,
Arseniy Mukhin

Alvaro Herrera

alvherre@2ndquadrant.com

8 months ago

In reply to: Alvaro Herrera (#3)

Re: BRIN autosummarization lacking a snapshot

On 2025-Nov-04, Álvaro Herrera wrote:

With my initial try of this test, just counting the number of BRIN
tuples, I was _really_ surprised that the index did indeed contain the
expected number of tuples, even when the error was being thrown. This
turned out to be expected, because the way BRIN summarization works is
that we insert a placeholder tuple first, then update it to the correct
value, and the error only aborts the second part. That's why I needed
to add a WHERE clause to only count non-placeholder tuples.

I see that skink (buildfarm animal that runs under valgrind) has failed.
Gotta vacate the premises, will study later.

2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] LOG: process 2118443 still waiting for ShareUpdateExclusiveLock on relation 16436 of database 5 after 1016.528 ms
2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] DETAIL: Process holding the lock: 2118078. Wait queue: 2118443.
2025-11-04 16:44:46.271 CET [2118443][autovacuum worker][108/6:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:46.298 CET [2118078][autovacuum worker][103/9:766] ERROR: canceling autovacuum task
2025-11-04 16:44:46.298 CET [2118078][autovacuum worker][103/9:766] CONTEXT: automatic analyze of table "postgres.public.journal"
2025-11-04 16:44:46.382 CET [2118443][autovacuum worker][108/6:0] LOG: process 2118443 acquired ShareUpdateExclusiveLock on relation 16436 of database 5 after 1188.860 ms
2025-11-04 16:44:46.382 CET [2118443][autovacuum worker][108/6:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:46.975 CET [2118946][autovacuum worker][110/7:0] LOG: skipping analyze of "journal" --- lock not available
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] LOG: process 2118078 still waiting for ShareUpdateExclusiveLock on relation 16436 of database 5 after 1017.402 ms
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] DETAIL: Process holding the lock: 2118443. Wait queue: 2118078, 2118946.
2025-11-04 16:44:47.490 CET [2118078][autovacuum worker][103/10:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] ERROR: canceling autovacuum task
2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] CONTEXT: processing work entry for relation "postgres.public.brin_packdate_idx"
2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] LOG: process 2118078 acquired ShareUpdateExclusiveLock on relation 16436 of database 5 after 1414.103 ms
2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
==2118443== VALGRINDERROR-BEGIN
==2118443== Invalid read of size 8
==2118443== at 0x4634F39: PopActiveSnapshot (snapmgr.c:777)
==2118443== by 0x43F693F: do_autovacuum (autovacuum.c:2561)
==2118443== by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604)
==2118443== by 0x43FA9C9: postmaster_child_launch (launch_backend.c:268)
==2118443== by 0x43FDD9E: StartChildProcess (postmaster.c:3991)
==2118443== by 0x43FE008: StartAutovacuumWorker (postmaster.c:4055)
==2118443== by 0x43FF078: process_pm_pmsignal (postmaster.c:3812)
==2118443== by 0x43FF93C: ServerLoop (postmaster.c:1706)
==2118443== by 0x4401080: PostmasterMain (postmaster.c:1403)
==2118443== by 0x432A55F: main (main.c:231)
==2118443== Address 0x10 is not stack'd, malloc'd or (recently) free'd
==2118443==
==2118443== VALGRINDERROR-END
{
<insert_a_suppression_name_here>
Memcheck:Addr8
fun:PopActiveSnapshot
fun:do_autovacuum
fun:AutoVacWorkerMain
fun:postmaster_child_launch
fun:StartChildProcess
fun:StartAutovacuumWorker
fun:process_pm_pmsignal
fun:ServerLoop
fun:PostmasterMain
fun:main
}
==2118443==
==2118443== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==2118443== Access not within mapped region at address 0x10
==2118443== at 0x4634F39: PopActiveSnapshot (snapmgr.c:777)
==2118443== by 0x43F693F: do_autovacuum (autovacuum.c:2561)
==2118443== by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604)
==2118443== by 0x43FA9C9: postmaster_child_launch (launch_backend.c:268)
==2118443== by 0x43FDD9E: StartChildProcess (postmaster.c:3991)
==2118443== by 0x43FE008: StartAutovacuumWorker (postmaster.c:4055)
==2118443== by 0x43FF078: process_pm_pmsignal (postmaster.c:3812)
==2118443== by 0x43FF93C: ServerLoop (postmaster.c:1706)
==2118443== by 0x4401080: PostmasterMain (postmaster.c:1403)
==2118443== by 0x432A55F: main (main.c:231)
==2118443== If you believe this happened as a result of a stack
==2118443== overflow in your program's main thread (unlikely but
==2118443== possible), you can try to increase the size of the
==2118443== main thread stack using the --main-stacksize= flag.
==2118443== The main thread stack size used in this run was 8388608.
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] LOG: process 2118946 still waiting for ShareUpdateExclusiveLock on relation 16436 of database 5 after 1013.910 ms
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] DETAIL: Process holding the lock: 2118078. Wait queue: 2118946.
2025-11-04 16:44:48.010 CET [2118946][autovacuum worker][110/8:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:48.098 CET [2118946][autovacuum worker][110/8:0] LOG: process 2118946 acquired ShareUpdateExclusiveLock on relation 16436 of database 5 after 1111.825 ms
2025-11-04 16:44:48.098 CET [2118946][autovacuum worker][110/8:0] CONTEXT: waiting for ShareUpdateExclusiveLock on relation 16436 of database 5
2025-11-04 16:44:48.482 CET [2119181][client backend][8/2:0] LOG: statement: select count(*) from brin_page_items(get_raw_page('brin_wi_idx', 2), 'brin_wi_idx'::regclass)
where not placeholder;
2025-11-04 16:44:48.799 CET [2109989][postmaster][:0] LOG: autovacuum worker (PID 2118443) was terminated by signal 11: Segmentation fault
2025-11-04 16:44:48.799 CET [2109989][postmaster][:0] DETAIL: Failed process was running: autovacuum: BRIN summarize public.brin_packdate_idx 1

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

Alvaro Herrera

alvherre@2ndquadrant.com

8 months ago

In reply to: Alvaro Herrera (#6)

Re: BRIN autosummarization lacking a snapshot

On 2025-Nov-04, Álvaro Herrera wrote:

2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] ERROR: canceling autovacuum task
2025-11-04 16:44:47.792 CET [2118443][autovacuum worker][108/6:0] CONTEXT: processing work entry for relation "postgres.public.brin_packdate_idx"
2025-11-04 16:44:47.810 CET [2118078][autovacuum worker][103/10:0] LOG: process 2118078 acquired ShareUpdateExclusiveLock on relation 16436 of database 5 after 1414.103 ms

==2118443== VALGRINDERROR-BEGIN
==2118443== Invalid read of size 8
==2118443== at 0x4634F39: PopActiveSnapshot (snapmgr.c:777)
==2118443== by 0x43F693F: do_autovacuum (autovacuum.c:2561)
==2118443== by 0x43F6E2B: AutoVacWorkerMain (autovacuum.c:1604)

Ah, the problem is that perform_work_item() pushes an active snapshot
before calling the BRIN function, but because of the error, we
terminate the transaction before returning, so when we try to Pop that
snapshot, it doesn't exist anymore. Other ways for this to happen would
be an autovacuum worker trying to run an item for a dropped table (the
"goto deleted2" case). Hmm ...

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

Alvaro Herrera

alvherre@2ndquadrant.com

8 months ago

In reply to: Alvaro Herrera (#7)

Re: BRIN autosummarization lacking a snapshot

On 2025-Nov-04, Álvaro Herrera wrote:

Ah, the problem is that perform_work_item() pushes an active snapshot
before calling the BRIN function, but because of the error, we
terminate the transaction before returning, so when we try to Pop that
snapshot, it doesn't exist anymore. Other ways for this to happen would
be an autovacuum worker trying to run an item for a dropped table (the
"goto deleted2" case).

Of course, the answer is just to inquire ActiveSnapshotSet() beforehand.
I tested this by adding an elog(ERROR) in perform_work_item() -- without
this fix, I see a segfault, which disappears with it.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Debido a que la velocidad de la luz es mucho mayor que la del sonido,
algunas personas nos parecen brillantes un minuto antes
de escuchar las pelotudeces que dicen." (Roberto Fontanarrosa)

BRIN autosummarization lacking a snapshot

Attachments:

Attachments:

Attachments: