Question about behavior of conditional indexes
Greetings all,
Working on postgres 11.
I'm researching an index growing in size and never shrinking, and not being
used anymore after a while.
The index looks like this:
"index002" btree (action_performed, should_still_perform_action,
action_performed_at DESC) WHERE should_still_perform_action = false
AND action_performed = true
So, there are 2 boolean fields, and a conditional clause for both. The
table has about 50M rows, the index barely ever goes over 100K matched rows.
The idea is to search for rows matching these conditions quickly, and then
update them. This happens daily.
This means the condition no longer match the index. At this point, does the
row get removed from the index? Or does it stay in there as a dead row?
I'm noticing index bloat on this index and am wondering if all these
updated rows just stay in the index?
The index currently sits at 330MB. If I drop and re-create it, it's 1.5MB.
A cronjob runs a vacuum once per day, I can see the amount of dead rows
dropping in monitoring software.
But should this also take care of indexes? In postgres 11, you can't
reindex concurrently, so I was wondering if indexes are skipped by vacuum?
Or only in case of conditional indexes?
So I'm wondering if the behavior is as I described.
Regards,
Koen De Groote
On 21/09/21 22:28, Koen De Groote wrote:
Greetings all,
Working on postgres 11.
I'm researching an index growing in size and never shrinking, and not
being used anymore after a while.The index looks like this:
|"index002" btree (action_performed, should_still_perform_action,
action_performed_at DESC) WHERE should_still_perform_action = false
AND action_performed = true |
So, there are 2 boolean fields, and a conditional clause for both. The
table has about 50M rows, the index barely ever goes over 100K matched
rows.The idea is to search for rows matching these conditions quickly, and
then update them. This happens daily.This means the condition no longer match the index. At this point,
does the row get removed from the index? Or does it stay in there as a
dead row?I'm noticing index bloat on this index and am wondering if all these
updated rows just stay in the index?The index currently sits at 330MB. If I drop and re-create it, it's 1.5MB.
A cronjob runs a vacuum once per day, I can see the amount of dead
rows dropping in monitoring software.But should this also take care of indexes? In postgres 11, you can't
reindex concurrently, so I was wondering if indexes are skipped by
vacuum? Or only in case of conditional indexes?So I'm wondering if the behavior is as I described.
Regards,
Koen De Groote
Can you upgrade to pg13? A lot of work was done on indexes in pg12 &
13. So possibly your problem may have been resolved, at least in part.
Note that pg 14 is due out this month.
Cheers,
Gavin
On Tue, 21 Sept 2021 at 15:59, Koen De Groote <kdg.dev@gmail.com> wrote:
Greetings all,
Working on postgres 11.
I'm researching an index growing in size and never shrinking, and not
being used anymore after a while.The index looks like this:
"index002" btree (action_performed, should_still_perform_action, action_performed_at DESC) WHERE should_still_perform_action = false AND action_performed = true
So, there are 2 boolean fields, and a conditional clause for both. The
table has about 50M rows, the index barely ever goes over 100K matched rows.The idea is to search for rows matching these conditions quickly, and then
update them. This happens daily.This means the condition no longer match the index. At this point, does
the row get removed from the index? Or does it stay in there as a dead row?I'm noticing index bloat on this index and am wondering if all these
updated rows just stay in the index?The index currently sits at 330MB. If I drop and re-create it, it's 1.5MB.
A cronjob runs a vacuum once per day, I can see the amount of dead rows
dropping in monitoring software.
- This doesn't reclaim the space. VACUUM operation cleans up space above
the upper edge(High-water mark). Interleaved fragmentation will be marked
for reuse.
But should this also take care of indexes? In postgres 11, you can't
reindex concurrently, so I was wondering if indexes are skipped by vacuum?
Or only in case of conditional indexes?
- They aren't left untouched by VACUUM, but as I mentioned reclaiming
space is not a job of VACUUM operation. You must execute VACUUM FULL.
AFAIK, reindexing the 1.5MB index may not need a lot of time(regardless of
concurrent reindexing feature).
Show quoted text
So I'm wondering if the behavior is as I described.
Regards,
Koen De Groote
You likely need to tune your autovacuum settings such that the index bloat
reaches a steady-state and does not continue to increase. When vacuum runs,
it will remove dead (and no longer visible to any running transactions)
tuples aka row versions from each page (8KB block by default) in the file
for that table's data. It will also update the index, except in newer
versions of Postgres where that behavior becomes optional (see manual for
which version and whether it still runs by default). If you are I/O bound
and cannot tune the system defaults to autovacuum more proactively (when a
smaller percent of rows are dead), then perhaps just change the settings
for that table as it seems to be functioning a bit like a queue. Or you
might consider setting up a daily job to vacuum analyze on all tables, if
there is a period of low system activity. If you didn't have the index on
the columns you are updating, then reducing fillfactor would be an option
to increase HOT updates and thereby prevent the index bloat. Alas, that's
not an option with the index needing to reference those values that are
being changed.
"index002" btree (action_performed, should_still_perform_action,
action_performed_at DESC) WHERE should_still_perform_action = false
AND action_performed = true
That index seems odd to me. Why index a single value for the first two
columns? I would recreate it with those skipped. Also, based on the
names, I'd expect the opposite for true and false. That is, the
"interesting" rows are where the action has NOT yet been performed yet
and it is needed. I'd expect the index as defined to cover most of the
table rather than a small fraction. Perhaps just a typo from manual
obfuscation.
For what it's worth, you can create new concurrently, drop old
concurrently, then rename new to old. That would be the same result as a
reindex concurrently.
Show quoted text
Thanks for the replies, everyone.
Gavin - I can't upgrade to a more recent version, at least not for the
foreseeable future. From what I'm reading, it's the best path forward, but
there's considerations to be made that I can't overrule.
Ninad - As I suspected about VACUUM and VACUUM FULL. Thanks for confirming
Michael - Yes, my mistakes in manually obfuscating.
Going over the documentation, is seems after pg 11, several additions are
made, like "vacuum_index_cleanup" and "VACUUM INDEX_CLEANUP".
From what I'm reading, these did not exist in pg11.Googling a bit shows me
this:
https://www.depesz.com/2019/05/01/waiting-for-postgresql-12-allow-vacuum-to-be-run-with-index-cleanup-disabled/
From which I assume that index cleaning is the default in pg11 and cannot
be controlled. Though it appears to not be "enough" for my use case.
There is already a cronjob that does "VACUUM ANALYZE" during the night.
Though it seems the index keeps growing, regardless of cleanup... I'm not
sure what to think of that, or what to look for, apart from more agressive
autovacuum.
I'll look into dropping and recreating this index without those fields, and
the fillfactor you mention.
On Tue, Sep 21, 2021 at 6:15 PM Michael Lewis <mlewis@entrata.com> wrote:
Show quoted text
You likely need to tune your autovacuum settings such that the index bloat
reaches a steady-state and does not continue to increase. When vacuum runs,
it will remove dead (and no longer visible to any running transactions)
tuples aka row versions from each page (8KB block by default) in the file
for that table's data. It will also update the index, except in newer
versions of Postgres where that behavior becomes optional (see manual for
which version and whether it still runs by default). If you are I/O bound
and cannot tune the system defaults to autovacuum more proactively (when a
smaller percent of rows are dead), then perhaps just change the settings
for that table as it seems to be functioning a bit like a queue. Or you
might consider setting up a daily job to vacuum analyze on all tables, if
there is a period of low system activity. If you didn't have the index on
the columns you are updating, then reducing fillfactor would be an option
to increase HOT updates and thereby prevent the index bloat. Alas, that's
not an option with the index needing to reference those values that are
being changed."index002" btree (action_performed, should_still_perform_action, action_performed_at DESC) WHERE should_still_perform_action = false AND action_performed = true
That index seems odd to me. Why index a single value for the first two columns? I would recreate it with those skipped. Also, based on the names, I'd expect the opposite for true and false. That is, the "interesting" rows are where the action has NOT yet been performed yet and it is needed. I'd expect the index as defined to cover most of the table rather than a small fraction. Perhaps just a typo from manual obfuscation.
For what it's worth, you can create new concurrently, drop old
concurrently, then rename new to old. That would be the same result as a
reindex concurrently.
Just to clarify, I suggested fillfactor likely could *not* help because you
are indexing a column that has the value change. The idea with reducing
fillfactor is that the rows can be updated with the new versions remaining
in the same 8KB block in the file for table/row storage. If the new version
of the row remains in the same page, then the index pointer doesn't have to
be updated until that old version of the row gets vacuumed away. But alas,
when the value in the index changes, then all bets are off. Although, I
suppose in your workflow you might update these rows frequently and NOT
change the status column, then I would certainly consider reducing the
fillfactor, but it will mean perpetual "reserved space" (normally called
bloat though that has a different implication) in the blocks/pages that
only hold old records that won't be getting updates anymore.
If you check pg_stat_user_tables, then you will see autovauum count and can
check it periodically to see how often it is doing work. I'd lower
autovacuum_vacuum_scale_factor and perhaps autovacuum_vacuum_cost_delay
(default was 20ms and lowered to 2ms in PG12) to ensure work is done more
often, and more work is done in each cycle before it pauses to avoid
overloading the I/O system.