failed NUMA pages inquiry status: Operation not permitted

Started by Christoph Berg6 months ago29 messageshackers
Jump to latest
#1Christoph Berg
myon@debian.org

src/test/regress/expected/numa.out | 13 +++
src/test/regress/expected/numa_1.out | 5 +

numa_1.out is catching this error:

ERROR: libnuma initialization failed or NUMA is not supported on this platform

This is what I'm getting when running PG18 in docker on Debian trixie
(libnuma 2.0.19).

However, on older distributions, the error is different:

postgres =# select * from pg_shmem_allocations_numa;
ERROR: XX000: failed NUMA pages inquiry status: Operation not permitted
LOCATION: pg_get_shmem_allocations_numa, shmem.c:691

This makes the numa regression tests fail in Docker on Debian bookworm
(libnuma 2.0.16) and older and all of the Ubuntu LTS releases.

The attached patch makes it accept these errors, but perhaps it would
be better to detect it in pg_numa_available().

Christoph

Attachments:

0001-numa-Catch-Operation-not-permitted-error.patchtext/x-diff; charset=us-asciiDownload+30-1
#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Christoph Berg (#1)
Re: failed NUMA pages inquiry status: Operation not permitted

On 10/16/25 13:38, Christoph Berg wrote:

src/test/regress/expected/numa.out | 13 +++
src/test/regress/expected/numa_1.out | 5 +

numa_1.out is catching this error:

ERROR: libnuma initialization failed or NUMA is not supported on this platform

This is what I'm getting when running PG18 in docker on Debian trixie
(libnuma 2.0.19).

However, on older distributions, the error is different:

postgres =# select * from pg_shmem_allocations_numa;
ERROR: XX000: failed NUMA pages inquiry status: Operation not permitted
LOCATION: pg_get_shmem_allocations_numa, shmem.c:691

This makes the numa regression tests fail in Docker on Debian bookworm
(libnuma 2.0.16) and older and all of the Ubuntu LTS releases.

It's probably more about the kernel version. What kernels are used by
these systems?

The attached patch makes it accept these errors, but perhaps it would
be better to detect it in pg_numa_available().

Not sure how would that work. It seems this is some sort of permission
check in numa_move_pages, that's not what pg_numa_available does. Also,
it may depending on the page queried (e.g. whether it's exclusive or
shared by multiple processes).

thanks

--
Tomas Vondra

#3Christoph Berg
myon@debian.org
In reply to: Tomas Vondra (#2)
Re: failed NUMA pages inquiry status: Operation not permitted

Re: Tomas Vondra

It's probably more about the kernel version. What kernels are used by
these systems?

It's the very same kernel, just different docker containers on the
same system. I did not investigate yet where the problem is coming
from, different libnuma versions seemed like the best bet.

Same (differing) results on both these systems:
Linux turing 6.16.7+deb14-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.16.7-1 (2025-09-11) x86_64 GNU/Linux
Linux jenkins 6.1.0-39-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.148-1 (2025-08-26) x86_64 GNU/Linux

Not sure how would that work. It seems this is some sort of permission
check in numa_move_pages, that's not what pg_numa_available does. Also,
it may depending on the page queried (e.g. whether it's exclusive or
shared by multiple processes).

It's probably the lack of some process capability in that environment.
Maybe there is a way to query that, but I don't know much about that
yet.

Christoph

#4Christoph Berg
myon@debian.org
In reply to: Christoph Berg (#3)
Re: failed NUMA pages inquiry status: Operation not permitted

Re: To Tomas Vondra

It's the very same kernel, just different docker containers on the
same system. I did not investigate yet where the problem is coming
from, different libnuma versions seemed like the best bet.

numactl shows the problem already:

Host system:

$ numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
cpubind: 0
nodebind: 0
membind: 0
preferred:

debian:trixie-slim container:

$ numactl --show
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
No NUMA support available on this system.

debian:bookworm-slim container:

$ numactl --show
get_mempolicy: Operation not permitted
get_mempolicy: Operation not permitted
get_mempolicy: Operation not permitted
get_mempolicy: Operation not permitted
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
cpubind: 0
nodebind: 0
membind: 0
preferred:

Running with sudo does not change the result.

So maybe all that's needed is a get_mempolicy() call in
pg_numa_available() ?

Christoph

#5Christoph Berg
myon@debian.org
In reply to: Christoph Berg (#4)
Re: failed NUMA pages inquiry status: Operation not permitted

Re: To Tomas Vondra

So maybe all that's needed is a get_mempolicy() call in
pg_numa_available() ?

Or perhaps give up on pg_numa_available, and just have two _1.out and
_2.out that just contain the two different error messages, without
trying to catch the problem.

Christoph

#6Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Christoph Berg (#3)
Re: failed NUMA pages inquiry status: Operation not permitted

On 10/16/25 16:54, Christoph Berg wrote:

Re: Tomas Vondra

It's probably more about the kernel version. What kernels are used by
these systems?

It's the very same kernel, just different docker containers on the
same system. I did not investigate yet where the problem is coming
from, different libnuma versions seemed like the best bet.

Same (differing) results on both these systems:
Linux turing 6.16.7+deb14-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.16.7-1 (2025-09-11) x86_64 GNU/Linux
Linux jenkins 6.1.0-39-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.148-1 (2025-08-26) x86_64 GNU/Linux

Hmmm. Those seem like relatively recent kernels.

Not sure how would that work. It seems this is some sort of permission
check in numa_move_pages, that's not what pg_numa_available does. Also,
it may depending on the page queried (e.g. whether it's exclusive or
shared by multiple processes).

It's probably the lack of some process capability in that environment.
Maybe there is a way to query that, but I don't know much about that
yet.

move_page() manpage mentions PTRACE_MODE_READ_REALCREDS (man ptrace) so
maybe that's it.

--
Tomas Vondra

#7Christoph Berg
myon@debian.org
In reply to: Christoph Berg (#4)
Re: failed NUMA pages inquiry status: Operation not permitted

So maybe all that's needed is a get_mempolicy() call in
pg_numa_available() ?

numactl 2.0.19 --show does this:

if (numa_available() < 0) {
show_physcpubind();
printf("No NUMA support available on this system.\n");
exit(1);
}

int numa_available(void)
{
if (get_mempolicy(NULL, NULL, 0, 0, 0) < 0 && (errno == ENOSYS || errno == EPERM))
return -1;
return 0;
}

pg_numa_available is already calling numa_available.

But numactl 2.0.16 has this:

int numa_available(void)
{
if (get_mempolicy(NULL, NULL, 0, 0, 0) < 0 && errno == ENOSYS)
return -1;
return 0;
}

... which is not catching the "permission denied" error I am seeing.

So maybe PG should implement numa_available itself like that. (Or
accept the output difference so the regression tests are passing.)

Christoph

#8Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Christoph Berg (#7)
Re: failed NUMA pages inquiry status: Operation not permitted

On 10/16/25 17:19, Christoph Berg wrote:

So maybe all that's needed is a get_mempolicy() call in
pg_numa_available() ?

...

So maybe PG should implement numa_available itself like that. (Or
accept the output difference so the regression tests are passing.)

I'm not sure which of those options is better. I'm a bit worried just
accepting the alternative output would hide some failures in the future
(although it's a low risk).

So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
attached patch. It still calls numa_available(), so that we don't
silently miss future libnuma changes.

Can you check this makes it work inside the docker container?

regards

--
Tomas Vondra

Attachments:

0001-Handle-EPERM-in-pg_numa_init.patchtext/x-patch; charset=UTF-8; name=0001-Handle-EPERM-in-pg_numa_init.patchDownload+11-2
#9Christoph Berg
myon@debian.org
In reply to: Christoph Berg (#7)
Re: failed NUMA pages inquiry status: Operation not permitted

Re: To Tomas Vondra

So maybe PG should implement numa_available itself like that.

Following our discussion at pgconf.eu last week, I just implemented
that. The numa and pg_buffercache tests pass in Docker on Debian
bookworm now.

Christoph

Attachments:

v2-0001-Make-pg_numa_init-cope-with-Docker.patchtext/x-diff; charset=us-asciiDownload+11-5
#10Christoph Berg
myon@debian.org
In reply to: Tomas Vondra (#8)
Re: failed NUMA pages inquiry status: Operation not permitted

Re: Tomas Vondra

So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
attached patch. It still calls numa_available(), so that we don't
silently miss future libnuma changes.

Can you check this makes it work inside the docker container?

Yes your patch works. (Sorry I meant to test earlier, but RL...)

Christoph

#11Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Christoph Berg (#10)
Re: failed NUMA pages inquiry status: Operation not permitted

On 11/14/25 13:52, Christoph Berg wrote:

Re: Tomas Vondra

So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
attached patch. It still calls numa_available(), so that we don't
silently miss future libnuma changes.

Can you check this makes it work inside the docker container?

Yes your patch works. (Sorry I meant to test earlier, but RL...)

Thanks. I've pushed the fix (and backpatched to 18).

regards

--
Tomas Vondra

#12Christoph Berg
myon@debian.org
In reply to: Tomas Vondra (#11)
Re: failed NUMA pages inquiry status: Operation not permitted

Re: Tomas Vondra

So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
attached patch. It still calls numa_available(), so that we don't
silently miss future libnuma changes.

Can you check this makes it work inside the docker container?

Yes your patch works. (Sorry I meant to test earlier, but RL...)

Thanks. I've pushed the fix (and backpatched to 18).

It looks like we are not done here yet :(

postgresql-18 is failing here intermittently with this diff:

12:20:24 --- /build/reproducible-path/postgresql-18-18.1/src/test/regress/expected/numa.out 2025-11-10 21:52:06.000000000 +0000
12:20:24 +++ /build/reproducible-path/postgresql-18-18.1/build/src/test/regress/results/numa.out 2025-12-11 11:20:22.618989603 +0000
12:20:24 @@ -6,8 +6,4 @@
12:20:24 -- switch to superuser
12:20:24 \c -
12:20:24 SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
12:20:24 - ok
12:20:24 -----
12:20:24 - t
12:20:24 -(1 row)
12:20:24 -
12:20:24 +ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2

That's REL_18_STABLE @ 580b5c, with the Debian packaging on top.

I've seen it on unstable/amd64, unstable/arm64, and Ubuntu
questing/amd64, where libnuma should take care of this itself, without
the extra patch in PG. There was another case on bullseye/amd64 which
has the old libnuma.

It's been frequent enough so it killed 4 out of the 10 builds
currently visible on
https://jengus.postgresql.org/job/postgresql-18-binaries-snapshot/.
(Though to be fair, only one distribution/arch combination was failing
for each of them.)

There is also one instance of it in
https://jengus.postgresql.org/job/postgresql-19-binaries-snapshot/

I currently have no idea what's happening.

Christoph

#13Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Christoph Berg (#12)
Re: failed NUMA pages inquiry status: Operation not permitted

On 12/11/25 13:29, Christoph Berg wrote:

Re: Tomas Vondra

So I'm leaning to adjust pg_numa_init() to also check EPERM, per the
attached patch. It still calls numa_available(), so that we don't
silently miss future libnuma changes.

Can you check this makes it work inside the docker container?

Yes your patch works. (Sorry I meant to test earlier, but RL...)

Thanks. I've pushed the fix (and backpatched to 18).

It looks like we are not done here yet :(

postgresql-18 is failing here intermittently with this diff:

12:20:24 --- /build/reproducible-path/postgresql-18-18.1/src/test/regress/expected/numa.out 2025-11-10 21:52:06.000000000 +0000
12:20:24 +++ /build/reproducible-path/postgresql-18-18.1/build/src/test/regress/results/numa.out 2025-12-11 11:20:22.618989603 +0000
12:20:24 @@ -6,8 +6,4 @@
12:20:24 -- switch to superuser
12:20:24 \c -
12:20:24 SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa;
12:20:24 - ok
12:20:24 -----
12:20:24 - t
12:20:24 -(1 row)
12:20:24 -
12:20:24 +ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2

That's REL_18_STABLE @ 580b5c, with the Debian packaging on top.

I've seen it on unstable/amd64, unstable/arm64, and Ubuntu
questing/amd64, where libnuma should take care of this itself, without
the extra patch in PG. There was another case on bullseye/amd64 which
has the old libnuma.

It's been frequent enough so it killed 4 out of the 10 builds
currently visible on
https://jengus.postgresql.org/job/postgresql-18-binaries-snapshot/.
(Though to be fair, only one distribution/arch combination was failing
for each of them.)

There is also one instance of it in
https://jengus.postgresql.org/job/postgresql-19-binaries-snapshot/

I currently have no idea what's happening.

Hmmm, strange. -2 is ENOENT, which should mean this:

-ENOENT
The page is not present.

But what does "not present" mean in this context? And why would that be
only intermittent? Presumably this is still running in Docker, so maybe
it's another weird consequence of that?

regards

--
Tomas Vondra

#14Christoph Berg
myon@debian.org
In reply to: Tomas Vondra (#13)
Re: failed NUMA pages inquiry status: Operation not permitted

Re: Tomas Vondra

Hmmm, strange. -2 is ENOENT, which should mean this:

-ENOENT
The page is not present.

But what does "not present" mean in this context? And why would that be
only intermittent? Presumably this is still running in Docker, so maybe
it's another weird consequence of that?

Sorry I forgot to mention that this is now in the normal apt.pg.o
build environment (chroots without any funky permission restrictions).
I have not tried Docker yet.

I think it was not happening before the backport of the Docker fix.
But I have no idea why this should have broken anything, and why it
would only happen like 3% of the time.

Christoph

#15Christoph Berg
myon@debian.org
In reply to: Tomas Vondra (#13)
Re: failed NUMA pages inquiry status: Operation not permitted

Re: Tomas Vondra

Hmmm, strange. -2 is ENOENT, which should mean this:

-ENOENT
The page is not present.

But what does "not present" mean in this context? And why would that be
only intermittent? Presumably this is still running in Docker, so maybe
it's another weird consequence of that?

I've managed to reproduce it once, running this loop on
18-as-of-today. It errored out after a few 100 iterations:

while psql -c 'SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa'; do :; done

2025-12-16 11:49:35.982 UTC [621807] myon@postgres ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
2025-12-16 11:49:35.982 UTC [621807] myon@postgres STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa

That was on the apt.pg.o amd64 build machine while a few things were
just building. Maybe ENOENT "The page is not present" means something
was just swapped out because the machine was under heavy load.

I tried reading the kernel source and it sounds related:

* If the source virtual memory range has any unmapped holes, or if
* the destination virtual memory range is not a whole unmapped hole,
* move_pages() will fail respectively with -ENOENT or -EEXIST. This
* provides a very strict behavior to avoid any chance of memory
* corruption going unnoticed if there are userland race conditions.
* Only one thread should resolve the userland page fault at any given
* time for any given faulting address. This means that if two threads
* try to both call move_pages() on the same destination address at the
* same time, the second thread will get an explicit error from this
* command.
...
* The UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES flag can be specified to
* prevent -ENOENT errors to materialize if there are holes in the
* source virtual range that is being remapped. The holes will be
* accounted as successfully remapped in the retval of the
* command. This is mostly useful to remap hugepage naturally aligned
* virtual regions without knowing if there are transparent hugepage
* in the regions or not, but preventing the risk of having to split
* the hugepmd during the remap.
...
ssize_t move_pages(struct userfaultfd_ctx *ctx, unsigned long dst_start,
unsigned long src_start, unsigned long len, __u64 mode)
...
if (!(mode & UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES)) {
err = -ENOENT;
break;

What I don't understand yet is why this move_pages() signature does
not match the one from libnuma and move_pages(2) (note "mode" vs "flags"):

int numa_move_pages(int pid, unsigned long count,
void **pages, const int *nodes, int *status, int flags)
{
return move_pages(pid, count, pages, nodes, status, flags);
}

I guess the answer is somewhere in that gap.

ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2

Maybe instead of putting sanity checks on what the kernel is
returning, we should just pass that through to the user? (Or perhaps
transform negative numbers to NULL?)

Christoph

#16Christoph Berg
myon@debian.org
In reply to: Christoph Berg (#15)
Re: failed NUMA pages inquiry status: Operation not permitted

Re: To Tomas Vondra

I've managed to reproduce it once, running this loop on
18-as-of-today. It errored out after a few 100 iterations:

while psql -c 'SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa'; do :; done

2025-12-16 11:49:35.982 UTC [621807] myon@postgres ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
2025-12-16 11:49:35.982 UTC [621807] myon@postgres STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa

That was on the apt.pg.o amd64 build machine while a few things were
just building. Maybe ENOENT "The page is not present" means something
was just swapped out because the machine was under heavy load.

I played a bit more with it.

* It seems to trigger only once for a running cluster. The next one
needs a restart
* If it doesn't trigger within the first 30s, it probably never will
* It seems easier to trigger on a system that is under load (I started
a few pgmodeler compile runs in parallel (C++))

But none of that answers the "why".

Christoph

#17Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Christoph Berg (#16)
Re: failed NUMA pages inquiry status: Operation not permitted

On 12/16/25 15:48, Christoph Berg wrote:

Re: To Tomas Vondra

I've managed to reproduce it once, running this loop on
18-as-of-today. It errored out after a few 100 iterations:

while psql -c 'SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa'; do :; done

2025-12-16 11:49:35.982 UTC [621807] myon@postgres ERROR: invalid NUMA node id outside of allowed range [0, 0]: -2
2025-12-16 11:49:35.982 UTC [621807] myon@postgres STATEMENT: SELECT COUNT(*) >= 0 AS ok FROM pg_shmem_allocations_numa

That was on the apt.pg.o amd64 build machine while a few things were
just building. Maybe ENOENT "The page is not present" means something
was just swapped out because the machine was under heavy load.

I played a bit more with it.

* It seems to trigger only once for a running cluster. The next one
needs a restart
* If it doesn't trigger within the first 30s, it probably never will
* It seems easier to trigger on a system that is under load (I started
a few pgmodeler compile runs in parallel (C++))

But none of that answers the "why".

Hmmm, so this is interesting. I tried this on my workstation (with a
single NUMA node), and I see this:

1) right after opening a connection, I get this

test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
numa_node | count
-----------+-------
0 | 290
-2 | 32478
(2 rows)

2) but a select from pg_shmem_allocations_numa works fine

test=# select numa_node, count(*) from pg_shmem_allocations_numa group by 1;
numa_node | count
-----------+-------
0 | 72
(1 row)

3) and if I repeat the pg_buffercache_numa query, it now works

test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
numa_node | count
-----------+-------
0 | 32768
(1 row)

That's a bit strange. I have no idea why is this happening. If I
reconnect, I start getting the failures again.

regards

--
Tomas Vondra

#18Christoph Berg
myon@debian.org
In reply to: Tomas Vondra (#17)
Re: failed NUMA pages inquiry status: Operation not permitted

Re: Tomas Vondra

1) right after opening a connection, I get this

test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
numa_node | count
-----------+-------
0 | 290
-2 | 32478

Does that mean that the "touch all pages" logic is missing in some
code paths?

But even with that, it seems to be able to degenerate again and
accepting -2 in the regression tests would be required to make it
stable.

Christoph

#19Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Christoph Berg (#18)
Re: failed NUMA pages inquiry status: Operation not permitted

On 12/16/25 18:54, Christoph Berg wrote:

Re: Tomas Vondra

1) right after opening a connection, I get this

test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
numa_node | count
-----------+-------
0 | 290
-2 | 32478

Does that mean that the "touch all pages" logic is missing in some
code paths?

I did check and AFAICS we are touching the pages in pg_buffercache_numa.

To make it even more confusing, I can no longer reproduce the behavior I
reported yesterday. It just consistently reports "0" and I have no idea
why it changed :-( I did restart since yesterday, so maybe that changed
something.

But even with that, it seems to be able to degenerate again and
accepting -2 in the regression tests would be required to make it
stable.

No opinion yet. Either the -2 can happen occasionally, and then we'd
need to adjust the regression tests. Or maybe it's some thinko, and then
it'd be good to figure out why it's happening.

I find it interesting it does not seem to fail on the buildfarm. Or at
least I'm not aware of such failures. Even a rare failure should show
itself on the buildfarm a couple times, so how come it didn't?

regards

--
Tomas Vondra

#20Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#19)
Re: failed NUMA pages inquiry status: Operation not permitted

On 12/17/25 12:07, Tomas Vondra wrote:

On 12/16/25 18:54, Christoph Berg wrote:

Re: Tomas Vondra

1) right after opening a connection, I get this

test=# select numa_node, count(*) from pg_buffercache_numa group by 1;
numa_node | count
-----------+-------
0 | 290
-2 | 32478

Does that mean that the "touch all pages" logic is missing in some
code paths?

I did check and AFAICS we are touching the pages in pg_buffercache_numa.

To make it even more confusing, I can no longer reproduce the behavior I
reported yesterday. It just consistently reports "0" and I have no idea
why it changed :-( I did restart since yesterday, so maybe that changed
something.

I kept poking at this, and I managed to reproduce it again. The key
seems to be that the system needs to be under pressure, and then it's
reliably reproducible (at least for me).

What I did is I created two instances - one to keep the system busy, one
for experimentation. The "busy" one is set to use shared_buffers=16GB,
and then running read-only pgbench.

pgbench -i -s 4500 test
pgbench -S -j 16 -c 64 -T 600 -P 1 test

The system has 64GB of RAM and 12 cores, so this is a lot of load.

Then, the other instance is set to use shared_buffers=4GB, is started
and immediately queried for NUMA info for buffers (in a loop):

pg_ctl -D data -l pg.log start;

for r in $(seq 1 10); do
psql -p 5001 test -c 'select numa_node, count(*) from
pg_buffercache_numa group by 1';
done;

pg_ctl -D data -l pg.log stop;

And this often fails like this:

----------------------------------------------------------------------

waiting for server to start.... done
server started
numa_node | count
-----------+---------
0 | 1045302
-2 | 3274
(2 rows)

numa_node | count
-----------+---------
0 | 1048576
(1 row)

numa_node | count
-----------+---------
0 | 1048576
(1 row)

numa_node | count
-----------+---------
0 | 1048576
(1 row)

numa_node | count
-----------+---------
0 | 1048576
(1 row)

numa_node | count
-----------+---------
0 | 1048576
(1 row)

numa_node | count
-----------+---------
0 | 1025321
-2 | 23255
(2 rows)

numa_node | count
-----------+---------
0 | 1038596
-2 | 9980
(2 rows)

numa_node | count
-----------+---------
0 | 1048518
-2 | 58
(2 rows)

numa_node | count
-----------+---------
0 | 1048525
-2 | 51
(2 rows)

waiting for server to shut down.... done
server stopped

----------------------------------------------------------------------

So, it clearly fails quite often. And it can fail even later, after a
run that returned no "-2" buffers.

Clearly, something behaves differently than we thought. I've only seen
this happen on a system with swap - once I removed it, this behavior
disappeared too. So it seems a page can be moved to swap, in which case
we get -2 for a status.

In hindsight, that's not all that surprising. It's interesting it can
happen even with the "touching", but I guess there's a race condition
and the memory can get paged out before we inspect the status. We're
querying batches of pages, which probably makes the window larger.

FWIW I now realized I don't even need two instances. If I try this on
the "busy" instance, I get the -2 values too. Which I find a bit weird.
Because why should those be paged out?

The question is what to do about this. I don't think we can prevent the
-2 values, and error-ing out does not seem great either (most systems
have swap, so -2 may not be all that rare).

In fact, pg_shmem_allocations_numa probably should not error-out either,
because it's now reliably failing (on the busy instance).

I guess the only solution is to accept -2 as a possible value (unknown
node). But that makes regression testing harder, because it means the
output could change a lot ...

regards

--
Tomas Vondra

#21Christoph Berg
myon@debian.org
In reply to: Tomas Vondra (#20)
#22Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Christoph Berg (#21)
#23Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Jakub Wartak (#22)
#24Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Tomas Vondra (#23)
#25Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Jakub Wartak (#24)
#26Christoph Berg
myon@debian.org
In reply to: Tomas Vondra (#25)
#27Jakub Wartak
jakub.wartak@enterprisedb.com
In reply to: Tomas Vondra (#25)
#28Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#25)
#29Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Tomas Vondra (#28)