Add pg_stat_vfdcache view for VFD cache statistics

Started by Ayoub Kazar4 months ago14 messageshackers

kazarayoub2004@gmail.com

4 months ago

Hello hackers,

This comes from Tomas's patch idea from his website[1]https://vondra.me/posts/patch-idea-statistics-for-file-descriptor-cache, i thought this
patch makes sense to have.

PostgreSQL's virtual file descriptor (VFD) maintains a
per-backend cache of open file descriptors, bounded by
max_files_per_process (default 1000). When the cache is full, the
least-recently-used entry is evicted so its OS fd is closed, so a new
file can be opened. On the next access to that file, open() must be
called again, incurring a syscall that a larger cache would have
avoided.

A trivial example is with partitioned tables: a table with 1500
partitions requires even more than 1500 file descriptors per full scan (main
fork, vm ...), which is more than the default limit, causing potential
evictions and reopens.

The problem is well-understood and the fix is straightforward: raise
max_files_per_process. Tomas showed a 4-5x throughput
improvement in [1]https://vondra.me/posts/patch-idea-statistics-for-file-descriptor-cache sometimes, on my end i see something less than that,
depending on the query itself, but we get the idea.

AFAIK there is currently no way from inside PostgreSQL to know whether fd
cache pressure is occurring.

Implementation is trivial, because the VFD cache is strictly per-backend,
the counters are also
per-backend and require no shared memory or locking. Three macros
(pgstat_count_vfd_hit/miss/eviction) update fields in PendingVfdCacheStats
directly from fd.c.

I find this a bit useful, I would love to hear about anyone's thoughts
whether this is useful or not.

[1]: https://vondra.me/posts/patch-idea-statistics-for-file-descriptor-cache

Regards,
Ayoub

Jakub Wartak

jakub.wartak@enterprisedb.com

4 months ago

In reply to: Ayoub Kazar (#1)

Re: Add pg_stat_vfdcache view for VFD cache statistics

On Sat, Mar 21, 2026 at 5:59 PM KAZAR Ayoub <ma_kazar@esi.dz> wrote:

Hello hackers,

This comes from Tomas's patch idea from his website[1], i thought this patch makes sense to have.

PostgreSQL's virtual file descriptor (VFD) maintains a
per-backend cache of open file descriptors, bounded by
max_files_per_process (default 1000). When the cache is full, the
least-recently-used entry is evicted so its OS fd is closed, so a new
file can be opened. On the next access to that file, open() must be
called again, incurring a syscall that a larger cache would have
avoided.

A trivial example is with partitioned tables: a table with 1500
partitions requires even more than 1500 file descriptors per full scan (main
fork, vm ...), which is more than the default limit, causing potential evictions and reopens.

The problem is well-understood and the fix is straightforward: raise
max_files_per_process. Tomas showed a 4-5x throughput
improvement in [1] sometimes, on my end i see something less than that, depending on the query itself, but we get the idea.

AFAIK there is currently no way from inside PostgreSQL to know whether fd cache pressure is occurring.

Implementation is trivial, because the VFD cache is strictly per-backend, the counters are also
per-backend and require no shared memory or locking. Three macros (pgstat_count_vfd_hit/miss/eviction) update fields in PendingVfdCacheStats directly from fd.c.

I find this a bit useful, I would love to hear about anyone's thoughts whether this is useful or not.

Hi,

My $0.02, for that for that to being useful it would need to allow viewing
global vfd cache picture (across all backends), not just from *current* backend.
Applicaiton wouldn't call this function anyway, because they would have to be
modified.

In order to get that you technically should collect the hits/misses in local
pending pgstat io area (see e.g. pgstat_io or simpler pgstat_bgwriter/
checkpointer) like you do already with PendingVfdCacheStats, but then copy them
to shared memory pgstat area (with some LWLock* protection) that would be
queryable.

-J.

David Geier

geidav.pg@gmail.com

4 months ago

In reply to: Jakub Wartak (#2)

Re: Add pg_stat_vfdcache view for VFD cache statistics

Hi!

On 23.03.2026 12:22, Jakub Wartak wrote:

On Sat, Mar 21, 2026 at 5:59 PM KAZAR Ayoub <ma_kazar@esi.dz> wrote:

Hello hackers,

This comes from Tomas's patch idea from his website[1], i thought this patch makes sense to have.

PostgreSQL's virtual file descriptor (VFD) maintains a
per-backend cache of open file descriptors, bounded by
max_files_per_process (default 1000). When the cache is full, the
least-recently-used entry is evicted so its OS fd is closed, so a new
file can be opened. On the next access to that file, open() must be
called again, incurring a syscall that a larger cache would have
avoided.

That's one use-case. The other one that I've recently come across is
just knowing how many VFD cache entries there are in the first place.

While the number of open files is bounded by max_files_per_process, the
number of cache entries is unbounded. Large database can easily have
hundreds of thousands of files due to our segmentation scheme.

Workloads that access a big portion of these files can end up spending
very considerable amounts of memory on the VFD cache. For example, with
100,000 VFD entries per backend * 80 bytes per VFD = ~7.6 MiB. With 1000
backends that almost 10 GiB just for VFD entries; assuming that each
backend over time accumulates that many files.

A production database I looked recently had ~300,000 files and many
thousand backends. It spent close to 30 GiBs on VFD cache.

I've looked at struct vfd and some simple changes to the struct would
already cut memory consumption in half. I can look into that.

Thoughts?

A trivial example is with partitioned tables: a table with 1500
partitions requires even more than 1500 file descriptors per full scan (main
fork, vm ...), which is more than the default limit, causing potential evictions and reopens.

The problem is well-understood and the fix is straightforward: raise
max_files_per_process. Tomas showed a 4-5x throughput
improvement in [1] sometimes, on my end i see something less than that, depending on the query itself, but we get the idea.

The question is what the kernel makes out of that, especially in
aforementioned case where the number of total files and backends is large.

In the Linux kernel each process that open some file gets its own struct
file. sizeof(struct file) is ~200 bytes. Hence, increasing
max_files_per_process can measurably impact memory consumption if
changed lightheartedly. We should document that.

But I guess in most cases it's rather about changing it from 1k to 2k,
rather than changing it from 1k to 100k.

AFAIK there is currently no way from inside PostgreSQL to know whether fd cache pressure is occurring.

Implementation is trivial, because the VFD cache is strictly per-backend, the counters are also
per-backend and require no shared memory or locking. Three macros (pgstat_count_vfd_hit/miss/eviction) update fields in PendingVfdCacheStats directly from fd.c.

I find this a bit useful, I would love to hear about anyone's thoughts whether this is useful or not.

Hi,

My $0.02, for that for that to being useful it would need to allow viewing
global vfd cache picture (across all backends), not just from *current* backend.
Applicaiton wouldn't call this function anyway, because they would have to be
modified.

In order to get that you technically should collect the hits/misses in local
pending pgstat io area (see e.g. pgstat_io or simpler pgstat_bgwriter/
checkpointer) like you do already with PendingVfdCacheStats, but then copy them
to shared memory pgstat area (with some LWLock* protection) that would be
queryable.

I would include here the sum of VFD cache entries across all backend and
the total VFD cache size.

--
David Geier

Ayoub Kazar

kazarayoub2004@gmail.com

4 months ago

In reply to: David Geier (#3)

Re: Add pg_stat_vfdcache view for VFD cache statistics

Hello David and Jakub,
On Mon, Mar 23, 2026 at 1:35 PM David Geier <geidav.pg@gmail.com> wrote:

Hi!

On 23.03.2026 12:22, Jakub Wartak wrote:

On Sat, Mar 21, 2026 at 5:59 PM KAZAR Ayoub <ma_kazar@esi.dz> wrote:

Hello hackers,

This comes from Tomas's patch idea from his website[1], i thought this

patch makes sense to have.

PostgreSQL's virtual file descriptor (VFD) maintains a
per-backend cache of open file descriptors, bounded by
max_files_per_process (default 1000). When the cache is full, the
least-recently-used entry is evicted so its OS fd is closed, so a new
file can be opened. On the next access to that file, open() must be
called again, incurring a syscall that a larger cache would have
avoided.

That's one use-case. The other one that I've recently come across is
just knowing how many VFD cache entries there are in the first place.

While the number of open files is bounded by max_files_per_process, the
number of cache entries is unbounded. Large database can easily have
hundreds of thousands of files due to our segmentation scheme.

Workloads that access a big portion of these files can end up spending
very considerable amounts of memory on the VFD cache. For example, with
100,000 VFD entries per backend * 80 bytes per VFD = ~7.6 MiB. With 1000
backends that almost 10 GiB just for VFD entries; assuming that each
backend over time accumulates that many files.

A production database I looked recently had ~300,000 files and many
thousand backends. It spent close to 30 GiBs on VFD cache.

I've looked at struct vfd and some simple changes to the struct would
already cut memory consumption in half. I can look into that.

Thoughts?

Looking forward to this.

What also bothers me in that space is if a backend allocates 100K entries
in VFD cache, that cache is never shrank ever again,
the cache only grows (if it needs more than its lifetime maximum) until the
backend dies, although this is useful as entries are reused if free instead
of
allocating entries, whether a spike in files openings effects a long living
backend to keep holding a useless amount of
cache size it will need in the future, i don't imagine this to be common
though, what do you think about this issue from your experience ?

In the Linux kernel each process that open some file gets its own struct
file. sizeof(struct file) is ~200 bytes. Hence, increasing
max_files_per_process can measurably impact memory consumption if
changed lightheartedly. We should document that.

But I guess in most cases it's rather about changing it from 1k to 2k,
rather than changing it from 1k to 100k.

Indeed, I agree with this.

AFAIK there is currently no way from inside PostgreSQL to know whether

fd cache pressure is occurring.

Implementation is trivial, because the VFD cache is strictly

per-backend, the counters are also

per-backend and require no shared memory or locking. Three macros

(pgstat_count_vfd_hit/miss/eviction) update fields in PendingVfdCacheStats
directly from fd.c.

I find this a bit useful, I would love to hear about anyone's thoughts

whether this is useful or not.

Hi,

My $0.02, for that for that to being useful it would need to allow

viewing

global vfd cache picture (across all backends), not just from *current*

backend.

Applicaiton wouldn't call this function anyway, because they would have

to be

modified.

+1

Would it be reasonable to have both ? I changed the way i was thinking
about it,
Its clear one would want global vfd cache behavior monitoring as its
configuration is global but its effect is mostly backend specific,
That leads me to think it could also be useful to ALSO maintain per-backend
metrics, to help identify which backends are going crazy with the VFD cache.

In order to get that you technically should collect the hits/misses in

local

pending pgstat io area (see e.g. pgstat_io or simpler pgstat_bgwriter/
checkpointer) like you do already with PendingVfdCacheStats, but then

copy them

to shared memory pgstat area (with some LWLock* protection) that would be
queryable.

I would include here the sum of VFD cache entries across all backend and
the total VFD cache size.

I'll be doing this soon.

--
David Geier

Regards,
Ayoub Kazar

David Geier

geidav.pg@gmail.com

4 months ago

In reply to: Ayoub Kazar (#4)

Re: Add pg_stat_vfdcache view for VFD cache statistics

On 24.03.2026 00:36, KAZAR Ayoub wrote:

I've looked at struct vfd and some simple changes to the struct would
already cut memory consumption in half. I can look into that.

Thoughts?

Looking forward to this.

I try to come up with something the next days.

What also bothers me in that space is if a backend allocates 100K entries
in VFD cache, that cache is never shrank ever again,
the cache only grows (if it needs more than its lifetime maximum) until the
backend dies, although this is useful as entries are reused if free instead
of
allocating entries, whether a spike in files openings effects a long living
backend to keep holding a useless amount of
cache size it will need in the future, i don't imagine this to be common
though, what do you think about this issue from your experience ?

Currently the cache is directly mapped by the VFD index. That means we
could only resize down to the maximum used VFD index.

Being able to resize independently of the maximum VFD index would
require changing to a hash map like simplehash.h. I can take a look how
invasive such a change would be.

--
David Geier

Ayoub Kazar

kazarayoub2004@gmail.com

4 months ago

In reply to: David Geier (#5)

Re: Add pg_stat_vfdcache view for VFD cache statistics

Hello,
On Tue, Mar 24, 2026 at 3:09 PM David Geier <geidav.pg@gmail.com> wrote:

On 24.03.2026 00:36, KAZAR Ayoub wrote:

I've looked at struct vfd and some simple changes to the struct would
already cut memory consumption in half. I can look into that.

Thoughts?

Looking forward to this.

I try to come up with something the next days.

What also bothers me in that space is if a backend allocates 100K entries
in VFD cache, that cache is never shrank ever again,
the cache only grows (if it needs more than its lifetime maximum) until

the

backend dies, although this is useful as entries are reused if free

instead

of
allocating entries, whether a spike in files openings effects a long

living

backend to keep holding a useless amount of
cache size it will need in the future, i don't imagine this to be common
though, what do you think about this issue from your experience ?

Currently the cache is directly mapped by the VFD index. That means we
could only resize down to the maximum used VFD index.

Being able to resize independently of the maximum VFD index would
require changing to a hash map like simplehash.h. I can take a look how
invasive such a change would be.

--
David Geier

I've implemented the recommended global stats view on vfd cache, the
implementation should be also straightforward as it follows the same
cumulative shared statistics infrastructure like pgstat_bgwriter and others
do.

Attached is v2 patch also contains what David suggested for global cache
size and entries in the view.

Kind regards,
Ayoub

Ayoub Kazar

kazarayoub2004@gmail.com

4 months ago

In reply to: Ayoub Kazar (#6)

Re: Add pg_stat_vfdcache view for VFD cache statistics

Rebased v2.

On Sun, Mar 29, 2026 at 8:23 PM KAZAR Ayoub <ma_kazar@esi.dz> wrote:

Show quoted text

Hello,
On Tue, Mar 24, 2026 at 3:09 PM David Geier <geidav.pg@gmail.com> wrote:

On 24.03.2026 00:36, KAZAR Ayoub wrote:

I've looked at struct vfd and some simple changes to the struct would
already cut memory consumption in half. I can look into that.

Thoughts?

Looking forward to this.

I try to come up with something the next days.

What also bothers me in that space is if a backend allocates 100K

entries

in VFD cache, that cache is never shrank ever again,
the cache only grows (if it needs more than its lifetime maximum) until

the

backend dies, although this is useful as entries are reused if free

instead

of
allocating entries, whether a spike in files openings effects a long

living

backend to keep holding a useless amount of
cache size it will need in the future, i don't imagine this to be common
though, what do you think about this issue from your experience ?

Currently the cache is directly mapped by the VFD index. That means we
could only resize down to the maximum used VFD index.

Being able to resize independently of the maximum VFD index would
require changing to a hash map like simplehash.h. I can take a look how
invasive such a change would be.

--
David Geier

I've implemented the recommended global stats view on vfd cache, the
implementation should be also straightforward as it follows the same
cumulative shared statistics infrastructure like pgstat_bgwriter and others
do.

Attached is v2 patch also contains what David suggested for global cache
size and entries in the view.

Kind regards,
Ayoub

David Geier

geidav.pg@gmail.com

4 months ago

In reply to: Ayoub Kazar (#7)

Re: Add pg_stat_vfdcache view for VFD cache statistics

Hi!

On 29.03.2026 21:46, KAZAR Ayoub wrote:

I've looked at struct vfd and some simple changes to the struct would
already cut memory consumption in half. I can look into that.

Thoughts?

Looking forward to this.

I try to come up with something the next days.

What also bothers me in that space is if a backend allocates 100K

entries

in VFD cache, that cache is never shrank ever again,
the cache only grows (if it needs more than its lifetime maximum) until

the

backend dies, although this is useful as entries are reused if free

instead

of
allocating entries, whether a spike in files openings effects a long

living

backend to keep holding a useless amount of
cache size it will need in the future, i don't imagine this to be common
though, what do you think about this issue from your experience ?

Currently the cache is directly mapped by the VFD index. That means we
could only resize down to the maximum used VFD index.

Being able to resize independently of the maximum VFD index would
require changing to a hash map like simplehash.h. I can take a look how
invasive such a change would be.

That would actually be doable without too much code churn. It would,
however, add some more overhead to each cache entry: 1 byte for the
simplehash.h status and a 4 byte for the hash, if we want to avoid
rehashing on each access. Probably we can get away without storing the
hash. We would then have to monitor the cache size and recreate the hash
table when the size has shrunk by enough.

Another alternative is pallocing vfd entries so that we can freely move
them around in the vfd cache array. That would mean an ABI change (File
would be an 8-byte pointer instead of a 4-byte integer) but give us much
more flexibility for possible improvements.

For example, apart from compacting without hash map, this would allow us
to allocate variable amounts of memory per entry to, e.g.

- store the file name inline of the struct as variable length array
(instead of a pstrdup() pointer) and
- depending on type allocate more or less memory. This is because
non-temporary files don't use the ResOwner and the file size.

I've implemented the recommended global stats view on vfd cache, the
implementation should be also straightforward as it follows the same
cumulative shared statistics infrastructure like pgstat_bgwriter and others
do.

Attached is v2 patch also contains what David suggested for global cache
size and entries in the view.

I'll review the patch the next days. On quick inspection I saw that you
compute the size of a struct vfd by using sizeof(). That works, except
for the filename which is stored as a pointer to a pstrdup() piece of
memory. I guess you can just say:

sizeof(VfdCache[i]) + GetMemoryChunkSpace(VfdCache[i].fileName)

--
David Geier

Tomas Vondra

tomas.vondra@2ndquadrant.com

4 months ago

In reply to: Ayoub Kazar (#7)

Re: Add pg_stat_vfdcache view for VFD cache statistics

Hi,

Thanks for working on this, I think having some stats about the vfd
cache would be quite helpful. I took a quick look at the patch, and in
general it goes in the right direction.

Here's a couple comments / suggestions:

1) sgml docs

- Formatting seems a bit wrong, both for the entry in the first table,
and then for the view description later. Clearly different from the
nearby tables after the documentation is built. Did you generate the
tables somehow?

- The ordering seems a bit random, but I'd argue the pg_stat_vfdcache
view should go before pg_stat_wal (at least in the first table).

2) system_views.sql

- missing the REVOKE command
- We don't align the AS clauses with spaces (I like to align my queries,
but here it's a question of consistency with the other commands, and the
alignment makes future diffs larger)

3) fd.c

- Wouldn't it be better to have pgstat_count_vfd_access(hit bool)? That
way you would't even need the new else, it'd ne enough to do
pgstat_count_vfd_access(!FileIsNotOpen(file)) at the beginning.

- GetVfdCacheOccupancy does not seem to be called from anywhere

4) pgstat_vfdcache.c

- I'm not sure usagecount makes sense for these stats, because that's
for cases with a single writer. These stats are written by backends, so
it probably needs a lwlock.

- I'm not sure updating PgStatShared_Backend from pgstat_vfdcache.c is a
good idea, when it's already synced from pgstat_backend. We don't do
that for WAL either, but there's pgstat_flush_backend_entry_wal in
pgstat_backend.c. I suppose vfdcache should do it the same way.

5) pg_proc.dat

- formatting seems a bit inconsistent

6) pgstat.h

- Aren't evictions mostly the same as misses, at least after a while?

- I think it would be useful to report how many file descriptors we
are allowed to open (it's less than max_files_per_process, depending
on the ulimits etc.)

- I know io_uring can consume quite a few descriptors, and it can cause
issues, I wonder if this would make it easier to observe

I also suggest to split the patch into smaller patches, to make it
easier to review and evaluate. Not because of size - the patch is fairly
small. But it's better to not mix multiple features with different
cost/benefit trade offs, because then it's possible to evaluate them
separately. Maybe even commit the first part and continue discussion
about the following one(s).

This patch seems to mix two different types of stats - global stats of
the vfd cache, and then also per-backend stats. Those seems like very
different things, both in terms of overhead and benefits.

The global cache stats is going to be virtually free (at least the
hits/misses, I'm not sure about the number of entries and bytes), and
it's obviously useful for tuning the max_files_per_process GUC. I'd even
contemplate getting this into PG19, maybe.

The per-backend stats seem like a much harder sell to me, but I can be
convinced. Maybe it's not an issue in terms of overhead, maybe the stats
we get from that are worth it. Not sure. But I'd keep it in a separate
0002 patch, on top of 0001 with just the "global" stats.

regards

--
Tomas Vondra

#10

Ayoub Kazar

kazarayoub2004@gmail.com

3 months ago

In reply to: Tomas Vondra (#9)

Re: Add pg_stat_vfdcache view for VFD cache statistics

Hello,
Thanks for the review!
On Tue, Mar 31, 2026 at 8:27 PM Tomas Vondra <tomas@vondra.me> wrote:

Hi,

Thanks for working on this, I think having some stats about the vfd
cache would be quite helpful. I took a quick look at the patch, and in
general it goes in the right direction.

Here's a couple comments / suggestions:

6) pgstat.h

- Aren't evictions mostly the same as misses, at least after a while?

Correct, the time where they are not the same is pretty much meaningless
info, i removed it.

- I think it would be useful to report how many file descriptors we
are allowed to open (it's less than max_files_per_process, depending
on the ulimits etc.)

Agree, This should be max_safe_fds calculated by postmaster, I added this
but let me know if its acceptable to export max_safe_fds in the way I did.

- I know io_uring can consume quite a few descriptors, and it can cause
issues, I wonder if this would make it easier to observe

I also suggest to split the patch into smaller patches, to make it
easier to review and evaluate. Not because of size - the patch is fairly
small. But it's better to not mix multiple features with different
cost/benefit trade offs, because then it's possible to evaluate them
separately. Maybe even commit the first part and continue discussion
about the following one(s).

This patch seems to mix two different types of stats - global stats of
the vfd cache, and then also per-backend stats. Those seems like very
different things, both in terms of overhead and benefits.

The global cache stats is going to be virtually free (at least the
hits/misses, I'm not sure about the number of entries and bytes), and
it's obviously useful for tuning the max_files_per_process GUC. I'd even
contemplate getting this into PG19, maybe.

The per-backend stats seem like a much harder sell to me, but I can be
convinced. Maybe it's not an issue in terms of overhead, maybe the stats
we get from that are worth it. Not sure. But I'd keep it in a separate
0002 patch, on top of 0001 with just the "global" stats.

regards

--
Tomas Vondra

I fixed style related issues and followed your suggestions on splitting

the patch to do global stats first then the per-backend stats of cache size
and entries count reporting, attached is v3-0001 that does just the global
stats counting.
When we make sure this is correct i'll proceed with the per-backend stats
patch.

Regards,
Ayoub

#11

David Geier

geidav.pg@gmail.com

3 months ago

In reply to: Ayoub Kazar (#10)

Re: Add pg_stat_vfdcache view for VFD cache statistics

Hi!

I finally got around taking a look at this patch.

On 03.04.2026 15:53, KAZAR Ayoub wrote:

- I think it would be useful to report how many file descriptors we
are allowed to open (it's less than max_files_per_process, depending
on the ulimits etc.)

Agree, This should be max_safe_fds calculated by postmaster, I added this
but let me know if its acceptable to export max_safe_fds in the way I did.

An alternative to including it in the view would be using a GUC of type
PGC_INTERNAL. That seems more inline with how we expose other PostgreSQL
internal read-only variables that don't change.

Or is there an advantage to including max_safe_fds in the view?

The global cache stats is going to be virtually free (at least the
hits/misses, I'm not sure about the number of entries and bytes), and
it's obviously useful for tuning the max_files_per_process GUC. I'd even
contemplate getting this into PG19, maybe.

The number of used entries already exists, see nfile in fd.c.

Including the total cache size would also be virtually free if we don't
iterate over all VFDs each time, but update the size as we go. That
would have to happen when resizing the cache and when populating /
freeing a cache entry because extra memory is allocated / freed for
Vfd::fileName.

I'm happy to code this up if there's agreement that it's sensible to
include it, in the current version of the patch or a follow-up patch.

Beyond that:

While looking through the code I saw a mistake (repetition of "that") in
a comment in existing code. Maybe you want to fix that as well right away?

/*
* For variable-numbered stats: flush pending stats. Required if pending
* data is used. See flush_static_cb when dealing with stats data that
* that cannot use PgStat_EntryRef->pending.
*/
bool (*flush_pending_cb) (PgStat_EntryRef *sr, bool nowait);

The indentation of the type at the end of the following two structs is
inconsistent with the rest of the files.

typedef struct PgStatShared_VfdCache
{
/* lock protects ->stats */
LWLock lock;
PgStat_VfdCacheStats stats;
} PgStatShared_VfdCache;

typedef struct PgStat_VfdCacheStats
{
PgStat_Counter vfd_hits; /* fd was open, no open() was needed */
PgStat_Counter vfd_misses; /* fd was VFD_CLOSED, open() was required */
TimestampTz stat_reset_timestamp;
} PgStat_VfdCacheStats;

Apart from these nit comments the patch looks good to me.

--
David Geier

#12

Ayoub Kazar

kazarayoub2004@gmail.com

3 months ago

In reply to: David Geier (#11)

Re: Add pg_stat_vfdcache view for VFD cache statistics

Hello,
On Thu, Apr 23, 2026, 9:34 AM David Geier <geidav.pg@gmail.com> wrote:

Hi!

I finally got around taking a look at this patch.

On 03.04.2026 15:53, KAZAR Ayoub wrote:

- I think it would be useful to report how many file descriptors we
are allowed to open (it's less than max_files_per_process, depending
on the ulimits etc.)

Agree, This should be max_safe_fds calculated by postmaster, I added this
but let me know if its acceptable to export max_safe_fds in the way I

did.

An alternative to including it in the view would be using a GUC of type
PGC_INTERNAL. That seems more inline with how we expose other PostgreSQL
internal read-only variables that don't change.

Or is there an advantage to including max_safe_fds in the view?

There's no significant advantage other than seeing all info related to
vfdcache together in one view
Also in fd.h i just remembered:
/*
* This is private to fd.c, but exported for
save/restore_backend_variables()
*/
extern PGDLLIMPORT int max_safe_fds;

I don't think it's intended to be exported for reads like i'm doing (which
can be fine?), or maybe update this comment about its export.

The global cache stats is going to be virtually free (at least the
hits/misses, I'm not sure about the number of entries and bytes), and
it's obviously useful for tuning the max_files_per_process GUC. I'd even
contemplate getting this into PG19, maybe.

The number of used entries already exists, see nfile in fd.c.

Would one want the number of all entries (i.e SizeVfdCache see fd.c) or the
number of used entries (i.e entries with fds in use, which is nfile) ? I
thought of the first, that's what 0002 patch contains for the moment.

Including the total cache size would also be virtually free if we don't
iterate over all VFDs each time, but update the size as we go. That
would have to happen when resizing the cache and when populating /
freeing a cache entry because extra memory is allocated / freed for
Vfd::fileName.

Is it a big deal if we miss some bytes of filename globally ?

I'm happy to code this up if there's agreement that it's sensible to
include it, in the current version of the patch or a follow-up patch.

Beyond that:

While looking through the code I saw a mistake (repetition of "that") in
a comment in existing code. Maybe you want to fix that as well right away?

Noted.

/*
* For variable-numbered stats: flush pending stats. Required if
pending
* data is used. See flush_static_cb when dealing with stats data
that
* that cannot use PgStat_EntryRef->pending.
*/
bool (*flush_pending_cb) (PgStat_EntryRef *sr, bool
nowait);

The indentation of the type at the end of the following two structs is
inconsistent with the rest of the files.

Fixed.
That's pg_indent doing me dirty, although i know it's wrong i didn't
understand why it kept indenting like this, only in those two structs.

typedef struct PgStatShared_VfdCache
{
/* lock protects ->stats */
LWLock lock;
PgStat_VfdCacheStats stats;
} PgStatShared_VfdCache;

typedef struct PgStat_VfdCacheStats
{
PgStat_Counter vfd_hits; /* fd was open, no open() was
needed */
PgStat_Counter vfd_misses; /* fd was VFD_CLOSED, open() was
required */
TimestampTz stat_reset_timestamp;
} PgStat_VfdCacheStats;

Apart from these nit comments the patch looks good to me.

Thanks for the review!
Other than the above small changes, i'll be moving forward with 0002 which
is also ready.

Regards,
Ayoub

#13

David Geier

geidav.pg@gmail.com

3 months ago

In reply to: Ayoub Kazar (#12)

Re: Add pg_stat_vfdcache view for VFD cache statistics

On 29.04.2026 15:45, KAZAR Ayoub wrote:

The global cache stats is going to be virtually free (at least the
hits/misses, I'm not sure about the number of entries and bytes), and
it's obviously useful for tuning the max_files_per_process GUC. I'd even
contemplate getting this into PG19, maybe.

The number of used entries already exists, see nfile in fd.c.

Would one want the number of all entries (i.e SizeVfdCache see fd.c) or the
number of used entries (i.e entries with fds in use, which is nfile) ? I
thought of the first, that's what 0002 patch contains for the moment.

I thought we would expose both. That way we can assess in the field if
being able to shrink the cache would be useful.

Including the total cache size would also be virtually free if we don't
iterate over all VFDs each time, but update the size as we go. That
would have to happen when resizing the cache and when populating /
freeing a cache entry because extra memory is allocated / freed for
Vfd::fileName.

Is it a big deal if we miss some bytes of filename globally ?

It's not just some bytes. sizeof(struct vfd) is 56 bytes and fileName
looks typically something like:

- base/5/1249
- pg_wal/000000010000000000000001
- pg_wal/archive_status/000000010000000000000001.ready
- pg_xact/0000
- pg_multixact/offsets/0000

File names can vary in length between 10 - 55 bytes, give or take. Most
files will be table and index segments and WAL files. We could maybe add
a fixed constant as "assumed average file name length" but I'm worried
that might end up being quite wrong.

--
David Geier

#14

Ayoub Kazar

kazarayoub2004@gmail.com

4 days ago

In reply to: David Geier (#13)

Re: Add pg_stat_vfdcache view for VFD cache statistics

Hello everyone,

This is Ayoub again, I've just changed my email address.

On Sun, Jul 12, 2026 at 11:57 PM David Geier <geidav.pg@gmail.com> wrote:

On 29.04.2026 15:45, KAZAR Ayoub wrote:

The global cache stats is going to be virtually free (at least the
hits/misses, I'm not sure about the number of entries and bytes), and
it's obviously useful for tuning the max_files_per_process GUC. I'd

even

contemplate getting this into PG19, maybe.

The number of used entries already exists, see nfile in fd.c.

Would one want the number of all entries (i.e SizeVfdCache see fd.c) or

the

number of used entries (i.e entries with fds in use, which is nfile) ? I
thought of the first, that's what 0002 patch contains for the moment.

I thought we would expose both. That way we can assess in the field if
being able to shrink the cache would be useful.

Done.

Including the total cache size would also be virtually free if we don't
iterate over all VFDs each time, but update the size as we go. That
would have to happen when resizing the cache and when populating /
freeing a cache entry because extra memory is allocated / freed for
Vfd::fileName.

Is it a big deal if we miss some bytes of filename globally ?

It's not just some bytes. sizeof(struct vfd) is 56 bytes and fileName
looks typically something like:

- base/5/1249
- pg_wal/000000010000000000000001
- pg_wal/archive_status/000000010000000000000001.ready
- pg_xact/0000
- pg_multixact/offsets/0000

File names can vary in length between 10 - 55 bytes, give or take. Most
files will be table and index segments and WAL files. We could maybe add
a fixed constant as "assumed average file name length" but I'm worried
that might end up being quite wrong.

I chose to keep an accurate running count of the memory footprint per
backend by tracking both the sizeof(Vfd) and the exact filename string
lengths.

We add the string length to the total footprint when a file is opened, and
subtract it when the VFD is freed (i suppose this is not "Too much" of a
work done, although it's done in a bit hot place); v5 of the patch is
attached.

--
David Geier

Regards,

Ayoub KAZAR

Add pg_stat_vfdcache view for VFD cache statistics

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: