Table not cleaning up drom dead tuples
Good day. It seems that we have some strange case of VACUUM malfunction and
table bloating.
PostgreSQL 9.5.3
OS #uname -a
FreeBSD db-host 10.2-RELEASE-p18 FreeBSD 10.2-RELEASE-p18 #0: Sat May 28
08:53:43 UTC 2016
root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
amd64
General info about our database:
https://gist.github.com/aCLr/dec78ab031749e517550ac11f8233f70
Information about problematic tables:
https://gist.github.com/tarabanton/edf7f8dc26eb7ec37a9cfa3424493871
As you can see, there 2 to 10 times dead tuples compared to actual row
count.
We've tried VACUUM FULL and CLUSTER without any success.
There is no long term locks, idle in transaction requests or prepared
transactions.
We are temporarily fixing this like that:
BEGIN;
CREATE TABLE _orders_temp (LIKE orders_y2017_m3 INCLUDING DEFAULTS
INCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING COMMENTS);
INSERT INTO _orders_temp select * from orders_y2017_m3;
ALTER TABLE orders_y2017_m3 RENAME TO __orders_y2017_m3_to_drop;
ALTER TABLE __orders_y2017_m3_to_drop NO INHERIT orders;
ALTER TABLE _orders_temp RENAME TO orders_y2017_m3;
ALTER TABLE orders_y2017_m3 INHERIT orders;
COMMIT;
but bloat returns again and again
--
___________________________
С наилучшими пожеланиями,
Антон Тарабрин
With best regards,
Anton Tарабрин
From: Антон Тарабрин <tarabanton@gmail.com>
To: pgsql-general@postgresql.org
Sent: Tuesday, 14 March 2017, 12:09
Subject: [GENERAL] Table not cleaning up drom dead tuples
General info about our database:
https://gist.github.com/aCLr/dec78ab031749e517550ac11f8233f70
Information about problematic tables:
https://gist.github.com/tarabanton/edf7f8dc26eb7ec37a9cfa3424493871As you can see, there 2 to 10 times dead tuples compared to actual row count.
We've tried VACUUM FULL and CLUSTER without any success.
There is no long term locks, idle in transaction requests or prepared transactions.We are temporarily fixing this like that:
BEGIN;
CREATE TABLE _orders_temp (LIKE orders_y2017_m3 INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING COMMENTS);
INSERT INTO _orders_temp select * from orders_y2017_m3;
ALTER TABLE orders_y2017_m3 RENAME TO __orders_y2017_m3_to_drop;
ALTER TABLE __orders_y2017_m3_to_drop NO INHERIT orders;
ALTER TABLE _orders_temp RENAME TO orders_y2017_m3;
ALTER TABLE orders_y2017_m3 INHERIT orders;
COMMIT;but bloat returns again and again
Some bloat is to be expected unless you've totally static data due to the postgres MVCC model.
Are you saying vacuum full and cluster aren't removing the bloat? Sounds unlikely to me.
Issues only arise when you can't manage it; from what you've posted we can see autovacuum is running, but perhaps it's not keeping up with your workload, or your update patterns make it difficult to keep bloat down; we can see some rollbacks which I'm sure are part of your problem.
You could try updating to the latest minor version of postgres as there are a few fixes to autovacuum in versions after 9.5.3, but I think analyzing your update patterns and/or tuning autovacuum will be your starting point.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Yep. VACUUM FULL not helping us on OLD table, that are not getting updated
and not used in any requests. Bloat is still there
This is production system, so now we are investigating why it's happening.
Information about problematic tables:
https://gist.github.com/tarabanton/edf7f8dc26eb7ec37a9cfa3424493871
At the link above is some results of VACUUM (ANALYZE, VERBOSE) from source
(OLD) table.
вт, 14 мар. 2017 г. в 16:53, Glyn Astill <glynastill@yahoo.co.uk>:
We are temporarily fixing this like that:
BEGIN;
CREATE TABLE _orders_temp (LIKE orders_y2017_m3 INCLUDING DEFAULTSINCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING COMMENTS);
INSERT INTO _orders_temp select * from orders_y2017_m3;
ALTER TABLE orders_y2017_m3 RENAME TO __orders_y2017_m3_to_drop;
ALTER TABLE __orders_y2017_m3_to_drop NO INHERIT orders;
ALTER TABLE _orders_temp RENAME TO orders_y2017_m3;
ALTER TABLE orders_y2017_m3 INHERIT orders;
COMMIT;but bloat returns again and again
Some bloat is to be expected unless you've totally static data due to the
postgres MVCC model.Are you saying vacuum full and cluster aren't removing the bloat? Sounds
unlikely to me.Issues only arise when you can't manage it; from what you've posted we can
see autovacuum is running, but perhaps it's not keeping up with your
workload, or your update patterns make it difficult to keep bloat down; we
can see some rollbacks which I'm sure are part of your problem.You could try updating to the latest minor version of postgres as there
are a few fixes to autovacuum in versions after 9.5.3, but I think
analyzing your update patterns and/or tuning autovacuum will be your
starting point.
--
___________________________
С наилучшими пожеланиями,
Антон Тарабрин
With best regards,
Anton Tarabrin
________________________________
From: Антон Тарабрин <tarabanton@gmail.com>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Tuesday, 14 March 2017, 14:05
Subject: Re: [GENERAL] Table not cleaning up drom dead tuplesYep. VACUUM FULL not helping us on OLD table, that are not getting updated and not used in any requests. Bloat is still there
This is production system, so now we are investigating why it's happening.Information about problematic tables:
https://gist.github.com/tarabanton/edf7f8dc26eb7ec37a9cfa3424493871At the link above is some results of VACUUM (ANALYZE, VERBOSE) from source (OLD) table.
So what's the output of vacuum full? Or are you saying you can't sustain the exclusive lock vacuum full would require?
Plain vacuum can only reclaim free space at the end of the table, fragmented dead rows can only be marked available for reuse.
Perhaps give us some idea of activity on your database/tables:
select * from pg_stat_user_tables where relname in ('__orders_y2017_m3_to_drop', '__orders_y2017_m2_to_drop');
select * from pg_stat_database;
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
This tables is original ones, it doesn't have any activity now. We copied
data to NEW tables and trying to solve root of the problem
<THIS_DB> - target database where broken tables are located
----- VACUUM FULL VERBOSE
<THIS_DB>=# VACUUM (FULL, VERBOSE) __orders_y2017_m2_to_drop;
INFO: vacuuming "public.__orders_y2017_m2_to_drop"
INFO: "__orders_y2017_m2_to_drop": found 0 removable, 3179076 nonremovable
row versions in 551423 pages
DETAIL: 1778770 dead row versions cannot be removed yet.
CPU 30.92s/102.66u sec elapsed 184.69 sec.
<THIS_DB>=# VACUUM (FULL, VERBOSE) __orders_y2017_m3_to_drop;
INFO: vacuuming "public.__orders_y2017_m3_to_drop"
INFO: "__orders_y2017_m3_to_drop": found 0 removable, 9103104 nonremovable
row versions in 1520371 pages
DETAIL: 8396820 dead row versions cannot be removed yet.
CPU 65.00s/284.03u sec elapsed 399.66 sec.
----- DB INFO
<THIS_DB>=# select * from pg_stat_user_tables where relname in
('__orders_y2017_m3_to_drop', '__orders_y2017_m2_to_drop');
relid | schemaname | relname | seq_scan |
seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd |
n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |
last_vacuum | last_autovacuum |
last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-----------+------------+---------------------------+----------+--------------+-----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+--------------+------------------+---------------+-------------------
179718008 | public | __orders_y2017_m2_to_drop | 5615 |
7934041177 | 328044580 | 7979850698 | 0 | 3065776 | 0
| 25685 | 3082885 | 1759481 | 0 |
2017-03-14 11:57:40.388527+00 | 2017-03-14 07:37:50.907757+00 | 2017-03-14
11:57:42.656628+00 | 2017-03-13 16:15:55.60846+00 | 5 |
96 | 4 | 15
207347508 | public | __orders_y2017_m3_to_drop | 1128 |
794959804 | 129799001 | 1292952066 | 706089 | 8377499 | 0
| 118035 | 8937540 | 8406385 | 0 |
2017-03-14 11:57:58.026816+00 | 2017-03-14 10:09:08.597031+00 | 2017-03-14
11:57:59.117331+00 | 2017-03-14 04:11:11.370923+00 | 4 |
11 | 4 | 7
(2 rows)
<THIS_DB>=# select * from pg_stat_database;
datid | datname | numbackends | xact_commit | xact_rollback
| blks_read | blks_hit | tup_returned | tup_fetched | tup_inserted |
tup_updated | tup_deleted | conflicts | temp_files | temp_bytes | deadlocks
| blk_read_time | blk_write_time | stats_reset
-----------+--------------------+-------------+-------------+---------------+-----------+--------------+---------------+--------------+--------------+-------------+-------------+-----------+------------+------------+-----------+---------------+----------------+-------------------------------
4906146 | <THIS_DB> | 62 | 24781721 |
5888121 | 492125811 | 348274702788 | 1127846911908 | 250049066062 |
413981238 | 188610068 | 397036 | 0 | 53 |
7507001344 | 1 | 0 | 0 | 2017-03-06
02:33:26.466458+00
113713583 | sentry | 0 | 350030 | 342
| 11574 | 33444698 | 22519113 | 10577975 | 2438 |
27672 | 2 | 0 | 0 | 0 | 0
| 0 | 0 | 2017-03-06 02:33:24.156858+00
148539615 | test | 0 | 0 | 0
| 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0
| 0 | 0 |
161510793 | ... | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 |
0 | 0 |
(8 rows)
вт, 14 мар. 2017 г. в 17:37, Glyn Astill <glynastill@yahoo.co.uk>:
So what's the output of vacuum full? Or are you saying you can't sustain
the exclusive lock vacuum full would require?Plain vacuum can only reclaim free space at the end of the table,
fragmented dead rows can only be marked available for reuse.Perhaps give us some idea of activity on your database/tables:
select * from pg_stat_user_tables where relname in
('__orders_y2017_m3_to_drop', '__orders_y2017_m2_to_drop');
select * from pg_stat_database;
--
___________________________
С наилучшими пожеланиями,
Антон Тарабрин
With best regards,
Anton Tarabrin
This tables is original ones, it doesn't have any activity now. We copied data to NEW tables and trying to solve root of the problem
<THIS_DB> - target database where broken tables are located
----- VACUUM FULL VERBOSE
<THIS_DB>=# VACUUM (FULL, VERBOSE) __orders_y2017_m2_to_drop;
INFO: vacuuming "public.__orders_y2017_m2_to_drop"
INFO: "__orders_y2017_m2_to_drop": found 0 removable, 3179076 nonremovable row versions in 551423 pages
DETAIL: 1778770 dead row versions cannot be removed yet.
CPU 30.92s/102.66u sec elapsed 184.69 sec.<THIS_DB>=# VACUUM (FULL, VERBOSE) __orders_y2017_m3_to_drop;
INFO: vacuuming "public.__orders_y2017_m3_to_drop"
INFO: "__orders_y2017_m3_to_drop": found 0 removable, 9103104 nonremovable row versions in 1520371 pagesDETAIL: 8396820 dead row versions cannot be removed yet.
CPU 65.00s/284.03u sec elapsed 399.66 sec.----- DB INFO
<THIS_DB>=# select * from pg_stat_user_tables where relname in ('__orders_y2017_m3_to_drop', '__orders_y2017_m2_to_drop');
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
-----------+------------+---------------------------+----------+--------------+-----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+--------------+------------------+---------------+-------------------
179718008 | public | __orders_y2017_m2_to_drop | 5615 | 7934041177 | 328044580 | 7979850698 | 0 | 3065776 | 0 | 25685 | 3082885 | 1759481 | 0 | 2017-03-14 11:57:40.388527+00 | 2017-03-14 07:37:50.907757+00 | 2017-03-14 11:57:42.656628+00 | 2017-03-13 16:15:55.60846+00 | 5 | 96 | 4 | 15
207347508 | public | __orders_y2017_m3_to_drop | 1128 | 794959804 | 129799001 | 1292952066 | 706089 | 8377499 | 0 | 118035 | 8937540 | 8406385 | 0 | 2017-03-14 11:57:58.026816+00 | 2017-03-14 10:09:08.597031+00 | 2017-03-14 11:57:59.117331+00 | 2017-03-14 04:11:11.370923+00 | 4 | 11 | 4 | 7
(2 rows)<THIS_DB>=# select * from pg_stat_database;
datid | datname | numbackends | xact_commit | xact_rollback | blks_read | blks_hit | tup_returned | tup_fetched | tup_inserted | tup_updated | tup_deleted | conflicts | temp_files | temp_bytes | deadlocks | blk_read_time | blk_write_time | stats_reset
-----------+--------------------+-------------+-------------+---------------+-----------+--------------+---------------+--------------+--------------+-------------+-------------+-----------+------------+------------+-----------+---------------+----------------+-------------------------------
4906146 | <THIS_DB> | 62 | 24781721 | 5888121 | 492125811 | 348274702788 | 1127846911908 | 250049066062 | 413981238 | 188610068 | 397036 | 0 | 53 | 7507001344 | 1 | 0 | 0 | 2017-03-06 02:33:26.466458+00113713583 | sentry | 0 | 350030 | 342 | 11574 | 33444698 | 22519113 | 10577975 | 2438 | 27672 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 2017-03-06 02:33:24.156858+00
148539615 | test | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
161510793 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
(8 rows)
Quite a large quantity of rollbacks there. In your initial email the longest running transaction was an autovacuum task wasn't it? Are you sure there are no other long running transactions?
Whats the output of:
select min(xact_start) from pg_stat_activity where state<>'idle';
select * from pg_stat_activity where state<>'idle' order by xact_start limit 1;
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
We're, in general, pretty carefull with our DB, as it contains important
data.
Most rollback is issued by application (which processes all data inside
transactions).
p.s. Time is in UTC (GMT+0)
<THIS_DB>=# select min(xact_start) from pg_stat_activity where
state<>'idle';
min
-------------------------------
2017-03-14 15:36:05.432139+00
(1 row)
<THIS_DB>=# select * from pg_stat_activity where state<>'idle' order by
xact_start limit 1;
datid | datname | pid | usesysid | usename | application_name |
client_addr | client_hostname | client_port | backend_start
| xact_start | query_start |
state_change | waiting | state | backend_xid | backend_xmin |
query
---------+---------+-------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+---------+--------+-------------+--------------+--------------------------------------------------
4906146 | <THIS_DB> | 37235 | 10 | pgsql | |
| | | 2017-03-14 05:55:43.287128+00 |
2017-03-14 15:36:05.432139+00 | 2017-03-14 15:36:05.432139+00 | 2017-03-14
15:36:05.432141+00 | f | active | | 1621959045 |
autovacuum: VACUUM public.stats_y2017_m3_d13_hk2
(1 row)
вт, 14 мар. 2017 г. в 18:15, Glyn Astill <glynastill@yahoo.co.uk>:
Quite a large quantity of rollbacks there. In your initial email the
longest running transaction was an autovacuum task wasn't it? Are you sure
there are no other long running transactions?Whats the output of:
select min(xact_start) from pg_stat_activity where state<>'idle';
select * from pg_stat_activity where state<>'idle' order by xact_start
limit 1;
--
___________________________
С наилучшими пожеланиями,
Антон Тарабрин
With best regards,
Anton Tarabrin
We're, in general, pretty carefull with our DB, as it contains important data.
Most rollback is issued by application (which processes all data inside transactions).p.s. Time is in UTC (GMT+0)
<THIS_DB>=# select min(xact_start) from pg_stat_activity where state<>'idle';
min
-------------------------------
2017-03-14 15:36:05.432139+00
(1 row)<THIS_DB>=# select * from pg_stat_activity where state<>'idle' order by xact_start limit 1;
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | waiting | state | backend_xid | backend_xmin | query
---------+---------+-------+----------+---------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+---------+--------+-------------+--------------+--------------------------------------------------
4906146 | <THIS_DB> | 37235 | 10 | pgsql | | | | | 2017-03-14 05:55:43.287128+00 | 2017-03-14 15:36:05.432139+00 | 2017-03-14 15:36:05.432139+00 | 2017-03-14 15:36:05.432141+00 | f | active | | 1621959045 | autovacuum: VACUUM public.stats_y2017_m3_d13_hk2
(1 row)
And no prepared transactions you say?
select * from pg_prepared_xacts;
Perhaps someone else will chime in ...
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Nope, we've checked everything before deciding to write in mail list.
We have no idea why it's happening.
вт, 14 мар. 2017 г. в 19:10, Glyn Astill <glynastill@yahoo.co.uk>:
And no prepared transactions you say?
select * from pg_prepared_xacts;
Perhaps someone else will chime in ...
--
___________________________
С наилучшими пожеланиями,
Антон Тарабрин
With best regards,
Anton Tarabrin
On Tue, Mar 14, 2017 at 5:09 AM, Антон Тарабрин <tarabanton@gmail.com>
wrote:
Good day. It seems that we have some strange case of VACUUM malfunction
and table bloating.PostgreSQL 9.5.3
Are you using replication slots?
See this, fixed in 9.5.5:
commit de396a1cb34626619ddc6fb9dec6d12abee8b589
Author: Andres Freund <andres@anarazel.de>
Date: Wed Aug 17 13:15:03 2016 -0700
Properly re-initialize replication slot shared memory upon creation.
Slot creation did not clear all fields upon creation. After start the
memory is zeroed, but when a physical replication slot was created in
the shared memory of a previously existing logical slot, catalog_xmin
would not be cleared. That in turn would prevent vacuum from doing its
duties.
Cheers,
Jeff
That was exactly that one bug. Thanks a lot
вт, 14 марта 2017 г., 23:26 Jeff Janes <jeff.janes@gmail.com>:
On Tue, Mar 14, 2017 at 5:09 AM, Антон Тарабрин <tarabanton@gmail.com>
wrote:Good day. It seems that we have some strange case of VACUUM malfunction
and table bloating.PostgreSQL 9.5.3
Are you using replication slots?
See this, fixed in 9.5.5:
commit de396a1cb34626619ddc6fb9dec6d12abee8b589
Author: Andres Freund <andres@anarazel.de>
Date: Wed Aug 17 13:15:03 2016 -0700Properly re-initialize replication slot shared memory upon creation.
Slot creation did not clear all fields upon creation. After start the
memory is zeroed, but when a physical replication slot was created in
the shared memory of a previously existing logical slot, catalog_xmin
would not be cleared. That in turn would prevent vacuum from doing its
duties.Cheers,
Jeff
--
___________________________
С наилучшими пожеланиями,
Антон Тарабрин
With best regards,
Anton Tarabrin