Loading the latest N rows into the cache seems way too fast.

Started by Ronabout 1 year ago7 messagesgeneral
Jump to latest
#1Ron
ronljohnsonjr@gmail.com

PG 9.6.24 and PG 14.15, if it matters.
(Yes, 9.6 is really EOL. I don't control that.)

(I could use pg_prewarm, but the table is much bigger than RAM, and
last_block value only has the newest record if data has never been
deleted. The oldest records regularly get deleted, and then the table is
vacuumed; thus, new records can be anywhere in the table.)

Thus, roll my own cache-loading statement.

The bigint "id" column in "mytbl" is populated from a sequence, and so is
monotonically increasing: the newest records will have the biggest id
values.
The table also has a bytea column that averages about 100KB.

Loading 200K rows is more than 200MB. I expected this "prewarm" statement
to take much longer than 1/2 second. Am I still in the dark ages of
computer speed, or is this statement not doing what I hope it's doing?

$ time psql -h foo bar -Xc "DO \$\$ BEGIN PERFORM * FROM mytbl ORDER BY id
DESC LIMIT 200000 ; END \$\$;"
DO

real 0m0.457s
user 0m0.005s
sys 0m0.004s

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#2Christoph Moench-Tegeder
cmt@burggraben.net
In reply to: Ron (#1)
Re: Loading the latest N rows into the cache seems way too fast.

## Ron Johnson (ronljohnsonjr@gmail.com):

Loading 200K rows is more than 200MB. I expected this "prewarm" statement
to take much longer than 1/2 second. Am I still in the dark ages of
computer speed, or is this statement not doing what I hope it's doing?

$ time psql -h foo bar -Xc "DO \$\$ BEGIN PERFORM * FROM mytbl ORDER BY id
DESC LIMIT 200000 ; END \$\$;"

You can check what that statement does - e.g. in pg_stat_statements,
or (on an idle database, so the effects aren't lost in the noise) in
pg_stat_database or pg_statio_user_tables.
Between what the storage components of the last decade (e.g. those
SATA SSDs which are already being replaced in the market by NVME)
can deliver (>400MB/s, often marketed as ">500 MB/s" but on SATA that's
optimistic) and the fact that there are most likely some blocks
in the database' buffer and/or the OS buffer, the observed throughput
is not neccessarily unrealistic. With modern "server" hardware, getting
throughput in the "gigabytes per second" range is considered normal and
expected.

Regards,
Christoph

--
Spare Space

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ron (#1)
Re: Loading the latest N rows into the cache seems way too fast.

Ron Johnson <ronljohnsonjr@gmail.com> writes:

The bigint "id" column in "mytbl" is populated from a sequence, and so is
monotonically increasing: the newest records will have the biggest id
values.
The table also has a bytea column that averages about 100KB.

Loading 200K rows is more than 200MB. I expected this "prewarm" statement
to take much longer than 1/2 second. Am I still in the dark ages of
computer speed, or is this statement not doing what I hope it's doing?

It's not pulling in the TOAST storage where the bytea column lives.
(pg_prewarm wouldn't have either, without special pushups.)

regards, tom lane

#4Ron
ronljohnsonjr@gmail.com
In reply to: Tom Lane (#3)
Re: Loading the latest N rows into the cache seems way too fast.

On Mon, Feb 17, 2025 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ron Johnson <ronljohnsonjr@gmail.com> writes:

The bigint "id" column in "mytbl" is populated from a sequence, and so is
monotonically increasing: the newest records will have the biggest id
values.
The table also has a bytea column that averages about 100KB.

Loading 200K rows is more than 200MB. I expected this "prewarm"

statement

to take much longer than 1/2 second. Am I still in the dark ages of
computer speed, or is this statement not doing what I hope it's doing?

It's not pulling in the TOAST storage where the bytea column lives.
(pg_prewarm wouldn't have either, without special pushups.)

Puzzling, since I ran "PERFORM *". What if I explicitly mentioned the
bytea column's name?

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ron (#4)
Re: Loading the latest N rows into the cache seems way too fast.

Ron Johnson <ronljohnsonjr@gmail.com> writes:

On Mon, Feb 17, 2025 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's not pulling in the TOAST storage where the bytea column lives.
(pg_prewarm wouldn't have either, without special pushups.)

Puzzling, since I ran "PERFORM *". What if I explicitly mentioned the
bytea column's name?

You'd have to do something that actually used the column's value,
perhaps "md5(byteacol)" or such. (The obvious candidate would be
length(), but I think that is optimized to not fetch or decompress
the whole value.)

regards, tom lane

#6David G. Johnston
david.g.johnston@gmail.com
In reply to: Ron (#4)
Re: Loading the latest N rows into the cache seems way too fast.

On Mon, Feb 17, 2025 at 2:41 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

On Mon, Feb 17, 2025 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ron Johnson <ronljohnsonjr@gmail.com> writes:

The bigint "id" column in "mytbl" is populated from a sequence, and so

is

monotonically increasing: the newest records will have the biggest id
values.
The table also has a bytea column that averages about 100KB.

Loading 200K rows is more than 200MB. I expected this "prewarm"

statement

to take much longer than 1/2 second. Am I still in the dark ages of
computer speed, or is this statement not doing what I hope it's doing?

It's not pulling in the TOAST storage where the bytea column lives.
(pg_prewarm wouldn't have either, without special pushups.)

Puzzling, since I ran "PERFORM *". What if I explicitly mentioned the
bytea column's name?

It's more about the system optimizing away data retrieval because you've
indicated you don't care about the contents due to using PERFORM. All it
needs is a pointer to represent the future data, not the data itself. And
PERFORM will never resolve that pointer by itself - so as Tom said your
query would need to force pointer resolution by computing on the data.

David J.

#7Ron
ronljohnsonjr@gmail.com
In reply to: Tom Lane (#5)
Re: Loading the latest N rows into the cache seems way too fast.

On Mon, Feb 17, 2025 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ron Johnson <ronljohnsonjr@gmail.com> writes:

On Mon, Feb 17, 2025 at 4:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's not pulling in the TOAST storage where the bytea column lives.
(pg_prewarm wouldn't have either, without special pushups.)

Puzzling, since I ran "PERFORM *". What if I explicitly mentioned the
bytea column's name?

You'd have to do something that actually used the column's value,
perhaps "md5(byteacol)" or such. (The obvious candidate would be
length(), but I think that is optimized to not fetch or decompress
the whole value.)

That's definitely taking a LOT longer...

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!