Matview size - space increased on concurrently refresh

Started by Nicola Contuover 6 years ago6 messagesgeneral
Jump to latest
#1Nicola Contu
nicola.contu@gmail.com

Hello,
we noticed with a simple matview we have that refreshing it using the
concurrently item the space always increases of about 120MB .
This only happens if I am reading from that matview and at the same time I
am am refreshing it.

cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)

cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)

cmdv3=# \! date
Fri Jul 12 13:52:51 GMT 2019

cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)

Let's try concurrently.....

cmdv3=# refresh materialized view CONCURRENTLY matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
261 MB
(1 row)

So the matview is not really used and it does not have anything strange but
that matview growth to 12GB as we refresh it once an hour.
It had the free percent at 97%.
I understand with concurrenlty it needs to take copy of the data while
reading, but this seems to be too much on the space side.

Is this a bug? Or is there anyone can help us understanding this?

Thanks a lot,
Nicola

#2Nicola Contu
nicola.contu@gmail.com
In reply to: Nicola Contu (#1)
Re: Matview size - space increased on concurrently refresh

P.S.: I am on postgres 11.3

Il giorno ven 12 lug 2019 alle ore 16:32 Nicola Contu <
nicola.contu@gmail.com> ha scritto:

Show quoted text

Hello,
we noticed with a simple matview we have that refreshing it using the
concurrently item the space always increases of about 120MB .
This only happens if I am reading from that matview and at the same time I
am am refreshing it.

cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)

cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)

cmdv3=# \! date
Fri Jul 12 13:52:51 GMT 2019

cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)

Let's try concurrently.....

cmdv3=# refresh materialized view CONCURRENTLY matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
261 MB
(1 row)

So the matview is not really used and it does not have anything strange
but that matview growth to 12GB as we refresh it once an hour.
It had the free percent at 97%.
I understand with concurrenlty it needs to take copy of the data while
reading, but this seems to be too much on the space side.

Is this a bug? Or is there anyone can help us understanding this?

Thanks a lot,
Nicola

#3Kaixi Luo
kaixiluo@gmail.com
In reply to: Nicola Contu (#2)
Re: Matview size - space increased on concurrently refresh

On Fri, Jul 12, 2019 at 4:34 PM Nicola Contu <nicola.contu@gmail.com> wrote:

P.S.: I am on postgres 11.3

Il giorno ven 12 lug 2019 alle ore 16:32 Nicola Contu <
nicola.contu@gmail.com> ha scritto:

Hello,
we noticed with a simple matview we have that refreshing it using the
concurrently item the space always increases of about 120MB .
This only happens if I am reading from that matview and at the same time
I am am refreshing it.

cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)

cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)

cmdv3=# \! date
Fri Jul 12 13:52:51 GMT 2019

cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)

Let's try concurrently.....

cmdv3=# refresh materialized view CONCURRENTLY matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
261 MB
(1 row)

So the matview is not really used and it does not have anything strange
but that matview growth to 12GB as we refresh it once an hour.
It had the free percent at 97%.
I understand with concurrenlty it needs to take copy of the data while
reading, but this seems to be too much on the space side.

Is this a bug? Or is there anyone can help us understanding this?

Thanks a lot,
Nicola

This is normal and something to be expected. When refreshing the
materialized view, the new data is written to a disk and then the two
tables are diffed. After the refresh finishes, your view size should go
back to normal.

#4Nicola Contu
nicola.contu@gmail.com
In reply to: Kaixi Luo (#3)
Re: Matview size - space increased on concurrently refresh

It does not. That's the issue.
It always increases of 120mb and it reached 12gb instead of just 180mb.

Il dom 14 lug 2019, 21:34 Kaixi Luo <kaixiluo@gmail.com> ha scritto:

Show quoted text

On Fri, Jul 12, 2019 at 4:34 PM Nicola Contu <nicola.contu@gmail.com>
wrote:

P.S.: I am on postgres 11.3

Il giorno ven 12 lug 2019 alle ore 16:32 Nicola Contu <
nicola.contu@gmail.com> ha scritto:

Hello,
we noticed with a simple matview we have that refreshing it using the
concurrently item the space always increases of about 120MB .
This only happens if I am reading from that matview and at the same time
I am am refreshing it.

cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)

cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)

cmdv3=# \! date
Fri Jul 12 13:52:51 GMT 2019

cmdv3=# refresh materialized view matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
133 MB
(1 row)

Let's try concurrently.....

cmdv3=# refresh materialized view CONCURRENTLY matview_nm_connections;
REFRESH MATERIALIZED VIEW
cmdv3=# SELECT
pg_size_pretty(pg_relation_size('public.matview_nm_connections'::regclass));
pg_size_pretty
----------------
261 MB
(1 row)

So the matview is not really used and it does not have anything strange
but that matview growth to 12GB as we refresh it once an hour.
It had the free percent at 97%.
I understand with concurrenlty it needs to take copy of the data while
reading, but this seems to be too much on the space side.

Is this a bug? Or is there anyone can help us understanding this?

Thanks a lot,
Nicola

This is normal and something to be expected. When refreshing the
materialized view, the new data is written to a disk and then the two
tables are diffed. After the refresh finishes, your view size should go
back to normal.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nicola Contu (#4)
Re: Matview size - space increased on concurrently refresh

[ please do not top-post in your replies, it makes the conversation hard
to follow ]

Nicola Contu <nicola.contu@gmail.com> writes:

Il dom 14 lug 2019, 21:34 Kaixi Luo <kaixiluo@gmail.com> ha scritto:

This is normal and something to be expected. When refreshing the
materialized view, the new data is written to a disk and then the two
tables are diffed. After the refresh finishes, your view size should go
back to normal.

It does not. That's the issue.
It always increases of 120mb and it reached 12gb instead of just 180mb.

A concurrent matview refresh will necessarily leave behind two copies
of any rows it changes, just like any other row-update operation in
Postgres. Once there are no concurrent transactions that can "see"
the old row copies, they should be reclaimable by vacuum.

Since you're not seeing autovacuum reclaim the space automatically,
I hypothesize that you've got autovacuum turned off or dialed down
to unrealistically non-aggressive settings. Or possibly you have
old open transactions that are preventing reclaiming dead rows
(because they can still possibly "see" those rows). Either of those
explanations should imply that you're getting similar bloat in every
other table and matview, though.

You might want to look into pg_stat_all_tables to see what it says
about the last_autovacuum time etc. for that matview. Another source
of insight is to do a manual "vacuum verbose" on the matview and see
what that says about removable and nonremovable rows.

regards, tom lane

#6Nicola Contu
nicola.contu@gmail.com
In reply to: Tom Lane (#5)
Re: Matview size - space increased on concurrently refresh

Il giorno dom 14 lug 2019 alle ore 22:23 Tom Lane <tgl@sss.pgh.pa.us> ha
scritto:

[ please do not top-post in your replies, it makes the conversation hard
to follow ]

Nicola Contu <nicola.contu@gmail.com> writes:

Il dom 14 lug 2019, 21:34 Kaixi Luo <kaixiluo@gmail.com> ha scritto:

This is normal and something to be expected. When refreshing the
materialized view, the new data is written to a disk and then the two
tables are diffed. After the refresh finishes, your view size should go
back to normal.

It does not. That's the issue.
It always increases of 120mb and it reached 12gb instead of just 180mb.

A concurrent matview refresh will necessarily leave behind two copies
of any rows it changes, just like any other row-update operation in
Postgres. Once there are no concurrent transactions that can "see"
the old row copies, they should be reclaimable by vacuum.

Since you're not seeing autovacuum reclaim the space automatically,
I hypothesize that you've got autovacuum turned off or dialed down
to unrealistically non-aggressive settings. Or possibly you have
old open transactions that are preventing reclaiming dead rows
(because they can still possibly "see" those rows). Either of those
explanations should imply that you're getting similar bloat in every
other table and matview, though.

You might want to look into pg_stat_all_tables to see what it says
about the last_autovacuum time etc. for that matview. Another source
of insight is to do a manual "vacuum verbose" on the matview and see
what that says about removable and nonremovable rows.

regards, tom lane

This matview has nothing strange and nothign custom.
We can replicate the matview that is not used by anyone.

cmdv3=# vacuum (full,analyze,verbose) public.matview_nm_connections;
INFO: vacuuming "public.matview_nm_connections"
INFO: "matview_nm_connections": found 0 removable, 295877 nonremovable row
versions in 33654 pages
DETAIL: 0 dead row versions cannot be removed yet.
CPU: user: 0.45 s, system: 0.43 s, elapsed: 1.24 s.
INFO: analyzing "public.matview_nm_connections"
INFO: "matview_nm_connections": scanned 16986 of 16986 pages, containing
295877 live rows and 0 dead rows; 30000 rows in sample, 295877 estimated
total rows
VACUUM

This is an example of full and verbose vacuum. Everytime I refresh it I get
the size increased.

See stats from the pg_stat_all_tables :

cmdv3=# select * from pg_stat_all_tables where relname =
'matview_nm_connections';
relid | schemaname | relname | seq_scan | seq_tup_read
| idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del |
n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |
last_vacuum | last_autovacuum |
last_analyze | last_autoanalyze
| vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
------------+------------+------------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------
------+--------------+------------------+---------------+-------------------
3466831733 | public | matview_nm_connections | 3725 | 540992219
| 33235 | 255113 | 96874161 | 0 | 95692276 |
0 | 295877 | 0 | 0 | 2019-07-12
11:58:39.198049+00 | 2019-07-16 11:07:02.765612+00 | 2019-07-17
10:28:08.819679+00 | 2019-07-16 11:03:32.4895
73+00 | 5 | 29 | 11 | 17
(1 row)