Can't we give better table bloat stats easily?

Started by Bruce Momjianalmost 7 years ago4 messageshackers

bruce@momjian.us

almost 7 years ago

Everywhere I've worked I've seen people struggle with table bloat. It's
hard to even measure how much of it you have or where, let alone actually
fix it.

If you search online you'll find dozens of different queries estimating how
much empty space are in your tables and indexes based on pg_stats
statistics and suppositions about header lengths and padding and plugging
them into formulas of varying credibility.

But isn't this all just silliiness these days? We could actually sum up the
space recorded in the fsm and get a much more trustworthy number in
milliseconds.

I rigged up a quick proof of concept and the code seems super simple and
quick. There's one or two tables where the number is a bit suspect and
there's no fsm if vacuum hasn't run but that seems pretty small potatoes
for such a huge help in reducing user pain.

andres@anarazel.de

almost 7 years ago

In reply to: Bruce Momjian (#1)

Re: Can't we give better table bloat stats easily?

Hi,

On 2019-08-16 20:39:21 -0400, Greg Stark wrote:

But isn't this all just silliiness these days? We could actually sum up the
space recorded in the fsm and get a much more trustworthy number in
milliseconds.

You mean like pgstattuple_approx()?

https://www.postgresql.org/docs/current/pgstattuple.html

Or something different?

I rigged up a quick proof of concept and the code seems super simple and
quick. There's one or two tables where the number is a bit suspect and
there's no fsm if vacuum hasn't run but that seems pretty small potatoes
for such a huge help in reducing user pain.

Hard to comment on what you propose, without more details. But note that
you can't just look at the FSM, because in a lot of workloads it is
often hugely out of date. And fairly obviously it doesn't provide you
with information about how much space is currently occupied by dead
tuples. What pgstattuple_approx does is to use the FSM for blocks that
are all-visible, and look at the page otherwise.

Greetings,

Andres Freund

Masahiko Sawada

sawada.mshk@gmail.com

almost 7 years ago

In reply to: Andres Freund (#2)

Re: Can't we give better table bloat stats easily?

On Sat, Aug 17, 2019 at 9:59 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-08-16 20:39:21 -0400, Greg Stark wrote:

But isn't this all just silliiness these days? We could actually sum up the
space recorded in the fsm and get a much more trustworthy number in
milliseconds.

You mean like pgstattuple_approx()?

https://www.postgresql.org/docs/current/pgstattuple.html

Or something different?

I rigged up a quick proof of concept and the code seems super simple and
quick. There's one or two tables where the number is a bit suspect and
there's no fsm if vacuum hasn't run but that seems pretty small potatoes
for such a huge help in reducing user pain.

Hard to comment on what you propose, without more details. But note that
you can't just look at the FSM, because in a lot of workloads it is
often hugely out of date. And fairly obviously it doesn't provide you
with information about how much space is currently occupied by dead
tuples. What pgstattuple_approx does is to use the FSM for blocks that
are all-visible, and look at the page otherwise.

It's just an idea but we could have pgstattuple_approx use sample scan
to estimate the table bloat more faster.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

jeff.janes@gmail.com

almost 7 years ago

In reply to: Bruce Momjian (#1)

Re: Can't we give better table bloat stats easily?

On Fri, Aug 16, 2019 at 8:39 PM Greg Stark <stark@mit.edu> wrote:

Everywhere I've worked I've seen people struggle with table bloat. It's
hard to even measure how much of it you have or where, let alone actually
fix it.

If you search online you'll find dozens of different queries estimating
how much empty space are in your tables and indexes based on pg_stats
statistics and suppositions about header lengths and padding and plugging
them into formulas of varying credibility.

There is not much we can do to suppress bad advice that people post on
their own blogs. If wiki.postgresql.org is hosting bad advice, by all
means we should fix that.

But isn't this all just silliiness these days? We could actually sum up
the space recorded in the fsm and get a much more trustworthy number in
milliseconds.

If you have bloat problems, then you probably have vacuuming problems. If
you have vacuuming problems, how much can you trust fsm anyway?

Cheers,

Jeff