Understanding WAL - large amount of activity from removing data

Started by Isaac Morlandabout 3 years ago3 messages
#1Isaac Morland
isaac.morland@gmail.com

I'm encountering some surprising (to me) behaviour related to WAL, and I'm
wondering if anybody can point me at an article that might help me
understand what is happening, or give a brief explanation.

I'm trying to make a slimmed down version of my database for testing
purposes. As part of this, I'm running a query something like this:

UPDATE table1
SET pdfcolumn = 'redacted'
WHERE pdfcolumn IS NOT NULL;

(literally 'redacted', not redacted here for your benefit)

The idea is to replace the actual contents of the column, which are PDF
documents totalling 70GB, with just a short placeholder value, without
affecting the other columns, which are a more ordinary collection - a few
integers and short strings.

The end result will be a database which is way easier to copy around but
which still has all the records of the original; the only change is that an
attempt to access one of the PDFs will not return the actual PDF but rather
a garbage value. For most testing this will make little to no difference.

What I'm finding is that the UPDATE is taking over an hour for 5000
records, and tons of WAL is being generated, several files per minute.
Selecting the non-PDF columns from the entire table takes a few
milliseconds, and the only thing I'm doing with the records is updating
them to much smaller values. Why so much activity just to remove data? The
new rows are tiny.

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Isaac Morland (#1)
Re: Understanding WAL - large amount of activity from removing data

On Sun, Nov 20, 2022 at 6:24 PM Isaac Morland <isaac.morland@gmail.com>
wrote:

What I'm finding is that the UPDATE is taking over an hour for 5000
records, and tons of WAL is being generated, several files per minute.
Selecting the non-PDF columns from the entire table takes a few
milliseconds, and the only thing I'm doing with the records is updating
them to much smaller values. Why so much activity just to remove data? The
new rows are tiny.

Simplistic answer (partly because the second part of this isn't spelled out
explicitly in the docs that I could find) when you UPDATE two things
happen, the old record is modified to indicate it has been deleted and a
new record is inserted. Both of these are written to the WAL, and a record
is always written to the WAL as a self-contained unit, so the old record is
full sized in the newly written WAL. TOAST apparently has an optimization
if you don't change the TOASTed value, but here you are so that
optimization doesn't apply.

David J.

#3Andres Freund
andres@anarazel.de
In reply to: David G. Johnston (#2)
Re: Understanding WAL - large amount of activity from removing data

Hi,

On 2022-11-20 19:02:12 -0700, David G. Johnston wrote:

Both of these are written to the WAL, and a record is always written
to the WAL as a self-contained unit, so the old record is full sized
in the newly written WAL.

That's not really true. Normally the update record just logs the xmax,
offset, infomask for the old tuple. However, full_page_writes can lead
to the old tuple's whole page to be logged.

We do log the old tuple contents if the replica identity of the table is
set to 'FULL' - if you're using that, we'll indeed log the whole old
version of the tuple to the WAL.

I think the more likely explanation in this case is that deleting the
toast values with the PDF - which is what you're doing by updating the
value to = 'redacted' - will have to actually mark all those toast
tuples as deleted. Which then likely is causing a lot of full page
writes.

In a case like this you might have better luck forcing the table to be
rewritten with something like

ALTER TABLE tbl ALTER COLUMN data TYPE text USING ('redacted');

which should just drop the old toast table, without going through it
one-by-one.

Greetings,

Andres Freund