Freeze the inserted tuples during CTAS?
Here is the simple patch,
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..0391699423 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -552,7 +552,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
myState->rel = intoRelationDesc;
myState->reladdr = intoRelationAddr;
myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
MatView code already does this and COPY does this if specified. I’m not sure how
does the community think about this. Actually personally I expect more about the
all-visible setting due to TABLE_INSERT_FROZEN since I could easier use index only scan
if we create an index and table use CTAS, else people have to use index only scan
after vacuum. If people do not expect freeze could we at least introduce a option to
specify the visibility during inserting?
Regards,
Paul
Hi,
I confirm that my analytic workflows often do the CTAS and VACUUM of the
relation right after, before the index creation, to mark stuff as
all-visible for IOS to work. Freezing and marking as visible will help.
On Wed, Jan 27, 2021 at 12:29 PM Paul Guo <guopa@vmware.com> wrote:
Here is the simple patch,
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c index dce882012e..0391699423 100644 --- a/src/backend/commands/createas.c +++ b/src/backend/commands/createas.c @@ -552,7 +552,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo) myState->rel = intoRelationDesc; myState->reladdr = intoRelationAddr; myState->output_cid = GetCurrentCommandId(true); - myState->ti_options = TABLE_INSERT_SKIP_FSM; + myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;MatView code already does this and COPY does this if specified. I’m not
sure how
does the community think about this. Actually personally I expect more
about the
all-visible setting due to TABLE_INSERT_FROZEN since I could easier use
index only scan
if we create an index and table use CTAS, else people have to use index
only scan
after vacuum. If people do not expect freeze could we at least introduce a
option to
specify the visibility during inserting?Regards,
Paul
--
Darafei "Komяpa" Praliaskouski
OSM BY Team - http://openstreetmap.by/
On Jan 27, 2021, at 5:33 PM, Darafei Komяpa Praliaskouski <me@komzpa.net<mailto:me@komzpa.net>> wrote:
Hi,
I confirm that my analytic workflows often do the CTAS and VACUUM of the relation right after, before the index creation, to mark stuff as all-visible for IOS to work. Freezing and marking as visible will help.
Thanks for letting me know there is such a real case in production environment.
I attached the short patch. If no more other concerns, I will log the patch on commitfest.
-Paul
On Wed, Jan 27, 2021 at 12:29 PM Paul Guo <guopa@vmware.com<mailto:guopa@vmware.com>> wrote:
Here is the simple patch,
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..0391699423 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -552,7 +552,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
myState->rel = intoRelationDesc;
myState->reladdr = intoRelationAddr;
myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
MatView code already does this and COPY does this if specified. I’m not sure how
does the community think about this. Actually personally I expect more about the
all-visible setting due to TABLE_INSERT_FROZEN since I could easier use index only scan
if we create an index and table use CTAS, else people have to use index only scan
after vacuum. If people do not expect freeze could we at least introduce a option to
specify the visibility during inserting?
Regards,
Paul
Attachments:
v1-0001-Freeze-the-tuples-during-CTAS.patchapplication/octet-stream; name=v1-0001-Freeze-the-tuples-during-CTAS.patchDownload
From a94f1503abd720ab25c7f0e536f8158ffed3085c Mon Sep 17 00:00:00 2001
From: Paul Guo <paulguo@gmail.com>
Date: Wed, 27 Jan 2021 13:06:16 +0800
Subject: [PATCH v1] Freeze the tuples during CTAS.
Actually this is the behavior in materialized view code also. With this
unncessary vacuum could be avoided.
---
src/backend/commands/createas.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..0391699423 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -552,7 +552,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
myState->rel = intoRelationDesc;
myState->reladdr = intoRelationAddr;
myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
/*
* If WITH NO DATA is specified, there is no need to set up the state for
--
2.14.3
Attached is the v2 version that fixes a test failure due to plan change (bitmap index scan -> index only scan).
Attachments:
v2-0001-Freeze-the-tuples-during-CTAS.patchapplication/octet-stream; name=v2-0001-Freeze-the-tuples-during-CTAS.patchDownload
From 26213367c7208c3cefe9467c6bc0b1399632a8fc Mon Sep 17 00:00:00 2001
From: Paul Guo <paulguo@gmail.com>
Date: Wed, 27 Jan 2021 13:06:16 +0800
Subject: [PATCH v2] Freeze the tuples during CTAS.
Actually this is the behavior in materialized view code also. With this
unncessary vacuum could be avoided.
---
src/backend/commands/createas.c | 2 +-
src/test/regress/expected/create_index.out | 12 +++++-------
2 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..0391699423 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -552,7 +552,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
myState->rel = intoRelationDesc;
myState->reladdr = intoRelationAddr;
myState->output_cid = GetCurrentCommandId(true);
- myState->ti_options = TABLE_INSERT_SKIP_FSM;
+ myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
/*
* If WITH NO DATA is specified, there is no need to set up the state for
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index fc6afab58a..63ac9f1425 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1857,14 +1857,12 @@ ANALYZE dupindexcols;
EXPLAIN (COSTS OFF)
SELECT count(*) FROM dupindexcols
WHERE f1 BETWEEN 'WA' AND 'ZZZ' and id < 1000 and f1 ~<~ 'YX';
- QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
Aggregate
- -> Bitmap Heap Scan on dupindexcols
- Recheck Cond: ((f1 >= 'WA'::text) AND (f1 <= 'ZZZ'::text) AND (id < 1000) AND (f1 ~<~ 'YX'::text))
- -> Bitmap Index Scan on dupindexcols_i
- Index Cond: ((f1 >= 'WA'::text) AND (f1 <= 'ZZZ'::text) AND (id < 1000) AND (f1 ~<~ 'YX'::text))
-(5 rows)
+ -> Index Only Scan using dupindexcols_i on dupindexcols
+ Index Cond: ((f1 >= 'WA'::text) AND (f1 <= 'ZZZ'::text) AND (id < 1000) AND (f1 ~<~ 'YX'::text))
+(3 rows)
SELECT count(*) FROM dupindexcols
WHERE f1 BETWEEN 'WA' AND 'ZZZ' and id < 1000 and f1 ~<~ 'YX';
--
2.14.3
On Sun, Feb 21, 2021 at 4:46 PM Paul Guo <guopa@vmware.com> wrote:
Attached is the v2 version that fixes a test failure due to plan change (bitmap index scan -> index only scan).
I think this is a good idea.
BTW, how much does this patch affect the CTAS performance? I expect
it's negligible but If there is much performance degradation due to
populating visibility map, it might be better to provide a way to
disable it.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
On Mar 3, 2021, at 1:35 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sun, Feb 21, 2021 at 4:46 PM Paul Guo <guopa@vmware.com> wrote:
Attached is the v2 version that fixes a test failure due to plan change (bitmap index scan -> index only scan).
I think this is a good idea.
BTW, how much does this patch affect the CTAS performance? I expect
it's negligible but If there is much performance degradation due to
populating visibility map, it might be better to provide a way to
disable it.
Yes, this is a good suggestion. I did a quick test yesterday.
Configuration: shared_buffers = 1280M and the test system memory is 7G.
Test queries:
checkpoint;
\timing
create table t1 (a, b, c, d) as select i,i,i,i from generate_series(1,20000000) i;
\timing
select pg_size_pretty(pg_relation_size('t1'));
Here are the running time:
HEAD : Time: 10299.268 ms (00:10.299) + 1537.876 ms (00:01.538)
Patch : Time: 12257.044 ms (00:12.257) + 14.247 ms
The table size is 800+MB so the table should be all in the buffer. I was surprised
to see the patch increases the CTAS time by 19.x%, and also it is not better than
"CTAS+VACUUM" on HEAD version. In theory the visibility map buffer change should
not affect that much. I looked at related code again (heap_insert()). I believe
the overhead could decrease along with some discussed CTAS optimization
solutions (multi-insert, or raw-insert, etc).
I tested 'copy' also. The COPY FREEZE does not involve much overhead than COPY
according to the experiement results as below. COPY uses multi-insert. Seems there is
no other difference than CTAS when writing a new table.
COPY TO + VACUUM
Time: 8826.995 ms (00:08.827) + 1599.260 ms (00:01.599)
COPY TO FREEZE + VACUUM
Time: 8836.107 ms (00:08.836) + 13.581 ms
So maybe think about doing freeze in CTAS after optimizing the CTAS performance
later?
By the way, ‘REFRESH MatView’ does freeze by default. Matview is quite similar to CTAS.
I did test it also and the conclusion is similar to that of CTAS. Not sure why FREEZE was
enabled though, maybe I missed something?
On Wed, Mar 10, 2021 at 3:57 PM Paul Guo <guopa@vmware.com> wrote:
On Mar 3, 2021, at 1:35 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sun, Feb 21, 2021 at 4:46 PM Paul Guo <guopa@vmware.com> wrote:
Attached is the v2 version that fixes a test failure due to plan change (bitmap index scan -> index only scan).I think this is a good idea.
BTW, how much does this patch affect the CTAS performance? I expect
it's negligible but If there is much performance degradation due to
populating visibility map, it might be better to provide a way to
disable it.Yes, this is a good suggestion. I did a quick test yesterday.
Configuration: shared_buffers = 1280M and the test system memory is 7G.
Test queries:
checkpoint;
\timing
create table t1 (a, b, c, d) as select i,i,i,i from generate_series(1,20000000) i;
\timing
select pg_size_pretty(pg_relation_size('t1'));Here are the running time:
HEAD : Time: 10299.268 ms (00:10.299) + 1537.876 ms (00:01.538)
Patch : Time: 12257.044 ms (00:12.257) + 14.247 msThe table size is 800+MB so the table should be all in the buffer. I was surprised
to see the patch increases the CTAS time by 19.x%, and also it is not better than
"CTAS+VACUUM" on HEAD version. In theory the visibility map buffer change should
not affect that much. I looked at related code again (heap_insert()). I believe
the overhead could decrease along with some discussed CTAS optimization
solutions (multi-insert, or raw-insert, etc).I tested 'copy' also. The COPY FREEZE does not involve much overhead than COPY
according to the experiement results as below. COPY uses multi-insert. Seems there is
no other difference than CTAS when writing a new table.COPY TO + VACUUM
Time: 8826.995 ms (00:08.827) + 1599.260 ms (00:01.599)
COPY TO FREEZE + VACUUM
Time: 8836.107 ms (00:08.836) + 13.581 msSo maybe think about doing freeze in CTAS after optimizing the CTAS performance
later?
Thank you for testing. That's interesting.
I've also done some benchmarks for CTAS (2GB table creation) and got
similar results:
Patched : 44 sec
HEAD : 34 sec
Since CREATE MATERIALIZED VIEW is also internally treated as CTAS, I
got similar results even for CREATE MATVIEW.
After investigation, it seems to me that the cause of performance
degradation is that heap_insert() set PD_ALL_VISIBLE when inserting a
tuple for the first time on the page (L2133 in heapam.c). This
requires every subsequent heap_insert() to pin a visibility map buffer
(see RelationGetBufferForTuple()). This problem doesn't exist in
heap_multi_insert() since it reads vm buffer once to fill the heap
page with tuples.
Given such relatively big performance degradation, it seems to me that
we should do some optimization for heap_insert() first. Otherwise, we
will end up imposing those costs on all users.
By the way, ‘REFRESH MatView’ does freeze by default. Matview is quite similar to CTAS.
I did test it also and the conclusion is similar to that of CTAS. Not sure why FREEZE was
enabled though, maybe I missed something?
I’m not sure the reason why setting visibility map and PD_ALL_VISIBLE
during REFRESH MATVIEW is enabled by default. By commit 7db0cd2145 and
39b66a91b, heap_insert and heap_multi_insert() sets visibility map
bits if HEAP_INSERT_FROZEN is specified. Looking at the commit
messages, those changes seem to be intended to COPY FREEZE but they
indeed affect REFRESH MATVIEW as well. But I could not find discussion
and mention about REFRESH MATVIEW both in those threads and commit
messages.
One reason that might justify such behavior would be materialized
views are read-only. Since visibility map bits never be cleared, even
if it costs here is some cost to set visibility map during the refresh
setting VM bits and PD_ALL_VISIBLE at creation might win. On the other
hand, a table created by CTAS is read-write. The user might not want
to pay a cost during creating a table if the table is updated
frequently after creation. Not sure. Being said that, I think this
performance degradation of REFRESH MATVIEW could be a problem. There
is no way to avoid the degradation and we also can rely on autovacuum
to set visibility map bits on materialized views. I'll start a new
thread to discuss that.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/