COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

pavan.deolasee@gmail.com

almost 7 years ago

In reply to: Jeff Janes (#5)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Wed, Feb 27, 2019 at 7:05 AM Jeff Janes <jeff.janes@gmail.com> wrote:

After doing a truncation and '\copy ... with (freeze)' of a table with
long data, I find that the associated toast table has a handful of unfrozen
blocks. I don't know if that is an actual problem, but it does seem a bit
odd, and thus suspicious.

Hi Jeff, thanks for looking at it and the test. I can reproduce the problem
and quite curiously block number 1 and then every 32672th block is getting
skipped.

postgres=# select * from pg_visibility('pg_toast.pg_toast_16384') where
all_visible = 'f';
blkno | all_visible | all_frozen | pd_all_visible
--------+-------------+------------+----------------
1 | f | f | f
32673 | f | f | f
65345 | f | f | f
98017 | f | f | f
130689 | f | f | f
163361 | f | f | f
<snip>

Having investigated this a bit, I see that a relcache invalidation arrives
after 1st and then after every 32672th block is filled. That clears the
rel->rd_smgr field and we lose the information about the saved target
block. The code then moves to extend the relation again and thus skips the
previously less-than-half filled block, losing the free space in that block.

postgres=# SELECT * FROM
page_header(get_raw_page('pg_toast.pg_toast_16384', 0));
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
1/15B37748 | 0 | 4 | 40 | 64 | 8192 | 8192 |
4 | 0
(1 row)

postgres=# SELECT * FROM
page_header(get_raw_page('pg_toast.pg_toast_16384', 1));
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
1/15B39A28 | 0 | 4 | 28 | 7640 | 8192 | 8192 |
4 | 0
(1 row)

postgres=# SELECT * FROM
page_header(get_raw_page('pg_toast.pg_toast_16384', 2));
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
1/15B3BE08 | 0 | 4 | 40 | 64 | 8192 | 8192 |
4 | 0
(1 row)

So the block 1 has a large amount of free space (upper - lower), which
never gets filled.

I am not yet sure what causes the relcache invalidation at regular
intervals. But if I have to guess, it could be because of a new VM (or
FSM?) page getting allocated. I am bit puzzled because this issue seems to
only occur with toast tables since I tested the patch while writing it on a
regular table and did not see any block remaining unfrozen. I tested only
upto 450 blocks, but that shouldn't matter because with your test, we see
the problem with block 1 as well. So something to look into in more detail.

While we could potentially fix this by what you'd done in the original
patch and what Kuntal also suggested, i.e. by setting the PD_ALL_VISIBLE
bit during page initialisation itself, I am a bit circumspect about that
approach for two reasons:

1. It requires us to then add extra logic to avoid clearing the bit during
insertions
2. It requires us to also update the VM bit during page init or risk having
divergent views on the page-level bit and the VM bit.

And even if we do that, this newly discovered problem of less-than-half
filled intermediate blocks remain. I wonder if we should instead track the
last used block in BulkInsertState and if the relcache invalidation flushes
smgr, start inserting again from the last saved block. In fact, we already
track the last used buffer in BulkInsertState and that's enough to know the
last used block.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

sawada.mshk@gmail.com

almost 7 years ago

In reply to: Pavan Deolasee (#1)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Thu, Feb 21, 2019 at 3:05 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

Hi,

Jeff Janes raised an issue [1] about PD_ALL_VISIBLE not being set correctly while loading data via COPY FREEZE and had also posted a draft patch.

I now have what I think is a more complete patch. I took a slightly different approach and instead of setting PD_ALL_VISIBLE bit initially and then not clearing it during insertion, we now recheck the page for all-frozen, all-visible tuples just before switching to a new page. This allows us to then also mark set the visibility map bit, like we do in vacuumlazy.c

I might be missing something but why do we need to recheck whether
each pages is all-frozen after insertion? I wonder if we can set
all-frozen without checking all tuples again in this case.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

pavan.deolasee@gmail.com

almost 7 years ago

In reply to: Masahiko Sawada (#7)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Mon, Mar 11, 2019 at 1:37 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

I might be missing something but why do we need to recheck whether
each pages is all-frozen after insertion? I wonder if we can set
all-frozen without checking all tuples again in this case.

It's possible that the user may have inserted unfrozen rows (via regular
INSERT or COPY without FREEZE option) before inserting frozen rows. So we
can't set the all-visible/all-frozen flag unconditionally. I also find it
safer to do an explicit check to ensure we never accidentally mark a page
as all-frozen. Since the check is performed immediately after the page
becomes full and only once per page, there shouldn't be any additional IO
cost and the check should be quite fast.

Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

sawada.mshk@gmail.com

almost 7 years ago

In reply to: Pavan Deolasee (#8)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Tue, Mar 12, 2019 at 4:54 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

On Mon, Mar 11, 2019 at 1:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I might be missing something but why do we need to recheck whether
each pages is all-frozen after insertion? I wonder if we can set
all-frozen without checking all tuples again in this case.

It's possible that the user may have inserted unfrozen rows (via regular INSERT or COPY without FREEZE option) before inserting frozen rows.

I think that since COPY FREEZE can be executed only when the table is
created or truncated within the transaction other users cannot insert
any rows during COPY FREEZE.

So we can't set the all-visible/all-frozen flag unconditionally. I also find it safer to do an explicit check to ensure we never accidentally mark a page as all-frozen. Since the check is performed immediately after the page becomes full and only once per page, there shouldn't be any additional IO cost and the check should be quite fast.

I'd suggest to measure performance overhead. I can imagine one use
case of COPY FREEZE is the loading a very large table. Since in
addition to set visibility map bits this patch could scan a very large
table I'm concerned that how much performance is degraded by this
patch.
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#10

pavan.deolasee@gmail.com

almost 7 years ago

In reply to: Masahiko Sawada (#9)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Wed, Mar 13, 2019 at 11:37 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

I think that since COPY FREEZE can be executed only when the table is
created or truncated within the transaction other users cannot insert
any rows during COPY FREEZE.

Right. But the truncating transaction can insert unfrozen rows into the
table before inserting more rows via COPY FREEZE.

postgres=# CREATE EXTENSION pageinspect ;
CREATE EXTENSION
postgres=# BEGIN;
BEGIN
postgres=# TRUNCATE testtab ;
TRUNCATE TABLE
postgres=# INSERT INTO testtab VALUES (100, 200);
INSERT 0 1
postgres=# COPY testtab FROM STDIN WITH (FREEZE);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

1 2
2 3
\.

COPY 2
postgres=# COMMIT;

postgres=# SELECT lp, to_hex(t_infomask) FROM
heap_page_items(get_raw_page('testtab', 0));
lp | to_hex
----+--------
1 | 800
2 | b00
3 | b00
(3 rows)

The first row in inserted by regular insert and it's not frozen. The next 2
are frozen. We can't mark such as page all-visible, all-frozen.

I'd suggest to measure performance overhead. I can imagine one use
case of COPY FREEZE is the loading a very large table. Since in
addition to set visibility map bits this patch could scan a very large
table I'm concerned that how much performance is degraded by this
patch.

Ok. I will run some tests. But please note that this patch is a bug fix to
address the performance issue that is caused by having to rewrite the
entire table when all-visible bit is set on the page during first vacuum.
So while we may do some more work during COPY FREEZE, we're saving a lot of
page writes during next vacuum. Also, since the scan that we are doing in
this patch is done on a page that should be in the buffer cache, we will
pay a bit in terms of CPU cost, but not anything in terms of IO cost.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#11

sawada.mshk@gmail.com

almost 7 years ago

In reply to: Pavan Deolasee (#10)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Thu, Mar 14, 2019 at 5:17 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

On Wed, Mar 13, 2019 at 11:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I think that since COPY FREEZE can be executed only when the table is
created or truncated within the transaction other users cannot insert
any rows during COPY FREEZE.

Right. But the truncating transaction can insert unfrozen rows into the table before inserting more rows via COPY FREEZE.

postgres=# CREATE EXTENSION pageinspect ;
CREATE EXTENSION
postgres=# BEGIN;
BEGIN
postgres=# TRUNCATE testtab ;
TRUNCATE TABLE
postgres=# INSERT INTO testtab VALUES (100, 200);
INSERT 0 1
postgres=# COPY testtab FROM STDIN WITH (FREEZE);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.

1 2
2 3
\.

COPY 2
postgres=# COMMIT;

postgres=# SELECT lp, to_hex(t_infomask) FROM heap_page_items(get_raw_page('testtab', 0));
lp | to_hex
----+--------
1 | 800
2 | b00
3 | b00
(3 rows)

The first row in inserted by regular insert and it's not frozen. The next 2 are frozen. We can't mark such as page all-visible, all-frozen.

Understood. Thank you for explanation!

I'd suggest to measure performance overhead. I can imagine one use
case of COPY FREEZE is the loading a very large table. Since in
addition to set visibility map bits this patch could scan a very large
table I'm concerned that how much performance is degraded by this
patch.

Ok. I will run some tests. But please note that this patch is a bug fix to address the performance issue that is caused by having to rewrite the entire table when all-visible bit is set on the page during first vacuum. So while we may do some more work during COPY FREEZE, we're saving a lot of page writes during next vacuum. Also, since the scan that we are doing in this patch is done on a page that should be in the buffer cache, we will pay a bit in terms of CPU cost, but not anything in terms of IO cost.

Agreed. I had been misunderstanding this patch. The page scan during
COPY FREEZE is necessary and it's very cheaper than doing in the first
vacuum.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#12

David Steele

david@pgmasters.net

almost 7 years ago

In reply to: Masahiko Sawada (#11)

Re: Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi Pavan,

On 3/14/19 2:20 PM, Masahiko Sawada wrote:

On Thu, Mar 14, 2019 at 5:17 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

Ok. I will run some tests. But please note that this patch is a bug fix to address the performance issue that is caused by having to rewrite the entire table when all-visible bit is set on the page during first vacuum. So while we may do some more work during COPY FREEZE, we're saving a lot of page writes during next vacuum. Also, since the scan that we are doing in this patch is done on a page that should be in the buffer cache, we will pay a bit in terms of CPU cost, but not anything in terms of IO cost.

Agreed. I had been misunderstanding this patch. The page scan during
COPY FREEZE is necessary and it's very cheaper than doing in the first
vacuum.

I have removed Ibrar as a reviewer since there has been no review from
them in three weeks, and too encourage others to have a look.

Regards,
--
-David
david@pgmasters.net

#13

pavan.deolasee@gmail.com

almost 7 years ago

In reply to: Masahiko Sawada (#11)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Thu, Mar 14, 2019 at 3:54 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

Ok. I will run some tests. But please note that this patch is a bug fix

to address the performance issue that is caused by having to rewrite the
entire table when all-visible bit is set on the page during first vacuum.
So while we may do some more work during COPY FREEZE, we're saving a lot of
page writes during next vacuum. Also, since the scan that we are doing in
this patch is done on a page that should be in the buffer cache, we will
pay a bit in terms of CPU cost, but not anything in terms of IO cost.

Agreed. I had been misunderstanding this patch. The page scan during
COPY FREEZE is necessary and it's very cheaper than doing in the first
vacuum.

Thanks for agreeing to the need of this bug fix. I ran some simple tests
anyways and here are the results.

The test consists of a simple table with three columns, two integers and
one char(100). I then ran COPY (FREEZE), loading 7M rows, followed by a
VACUUM. The total size of the raw data is about 800MB and the table size in
Postgres is just under 1GB. The results for 3 runs in milliseconds are:

Master:
COPY FREEZE: 40243.725 40309.675 40783.836
VACUUM: 2685.871 2517.445 2508.452

Patched:
COPY FREEZE: 40942.410 40495.303 40638.075
VACUUM: 25.067 35.793 25.390

So there is a slight increase in the time to run COPY FREEZE, but a
significant reduction in time to VACUUM the table. The benefits will only
go up if the table is vacuumed much later when most of the pages are
already written to the disk and removed from shared buffers and/or kernel
cache.

I hope this satisfies your doubts regarding performance implications of the
patch.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#14

Darafei Praliaskouski

me@komzpa.net

almost 7 years ago

In reply to: Pavan Deolasee (#13)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

This patch is particularly helpful in processing OpenStreetMap Data in PostGIS.
OpenStreetMap is imported as a stream of 300-900 (depending on settings) gigabytes, that are needing a VACUUM after a COPY FREEZE.
With this patch, the first and usually the last transforming query is performed much faster after initial load.

I have read this patch and have no outstanding comments on it.
Pavan Deolasee demonstrates the expected speed improvement.

The new status of this patch is: Ready for Committer

#15

sawada.mshk@gmail.com

almost 7 years ago

In reply to: Pavan Deolasee (#13)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Thu, Mar 21, 2019 at 11:27 PM Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

On Thu, Mar 14, 2019 at 3:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Ok. I will run some tests. But please note that this patch is a bug fix to address the performance issue that is caused by having to rewrite the entire table when all-visible bit is set on the page during first vacuum. So while we may do some more work during COPY FREEZE, we're saving a lot of page writes during next vacuum. Also, since the scan that we are doing in this patch is done on a page that should be in the buffer cache, we will pay a bit in terms of CPU cost, but not anything in terms of IO cost.

Agreed. I had been misunderstanding this patch. The page scan during
COPY FREEZE is necessary and it's very cheaper than doing in the first
vacuum.

Thanks for agreeing to the need of this bug fix. I ran some simple tests anyways and here are the results.

The test consists of a simple table with three columns, two integers and one char(100). I then ran COPY (FREEZE), loading 7M rows, followed by a VACUUM. The total size of the raw data is about 800MB and the table size in Postgres is just under 1GB. The results for 3 runs in milliseconds are:

Master:
COPY FREEZE: 40243.725 40309.675 40783.836
VACUUM: 2685.871 2517.445 2508.452

Patched:
COPY FREEZE: 40942.410 40495.303 40638.075
VACUUM: 25.067 35.793 25.390

So there is a slight increase in the time to run COPY FREEZE, but a significant reduction in time to VACUUM the table. The benefits will only go up if the table is vacuumed much later when most of the pages are already written to the disk and removed from shared buffers and/or kernel cache.

I hope this satisfies your doubts regarding performance implications of the patch.

Thank you for the performance testing, that's a great improvement!

I've looked at the patch and have comments and questions.

+    /*
+     * While we are holding the lock on the page, check if all tuples
+     * in the page are marked frozen at insertion. We can safely mark
+     * such page all-visible and set visibility map bits too.
+     */
+    if (CheckPageIsAllFrozen(relation, buffer))
+        PageSetAllVisible(page);
+
+    MarkBufferDirty(buffer);

Maybe we don't need to mark the buffer dirty if the page is not set all-visible.

-----
+    if (PageIsAllVisible(page))
+    {
+        uint8       vm_status = visibilitymap_get_status(relation,
+                targetBlock, &myvmbuffer);
+        uint8       flags = 0;
+
+        /* Set the VM all-frozen bit to flag, if needed */
+        if ((vm_status & VISIBILITYMAP_ALL_VISIBLE) == 0)
+            flags |= VISIBILITYMAP_ALL_VISIBLE;
+        if ((vm_status & VISIBILITYMAP_ALL_FROZEN) == 0)
+            flags |= VISIBILITYMAP_ALL_FROZEN;
+
+        Assert(BufferIsValid(myvmbuffer));
+        if (flags != 0)
+            visibilitymap_set(relation, targetBlock, buffer, InvalidXLogRecPtr,
+                    myvmbuffer, InvalidTransactionId, flags);

Since CheckPageIsAllFrozen() is used only when inserting frozen tuples
CheckAndSetPageAllVisible() seems to be implemented for the same
situation. If we have CheckAndSetPageAllVisible() for only this
situation we would rather need to check that the VM status of the page
should be 0 and then set two flags to the page? The 'flags' will
always be (VISIBILITYMAP_ALL_FROZEN | VISIBILITYMAP_ALL_VISIBLE) in
copy freeze case. I'm confused that this function has both code that
assumes some special situations and code that can be used in generic
situations.

-----
Perhaps we can add some tests for this feature to pg_visibility module.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#16

pavan.deolasee@gmail.com

almost 7 years ago

In reply to: Masahiko Sawada (#15)

1 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Fri, Mar 22, 2019 at 12:19 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

I've looked at the patch and have comments and questions.

+    /*
+     * While we are holding the lock on the page, check if all tuples
+     * in the page are marked frozen at insertion. We can safely mark
+     * such page all-visible and set visibility map bits too.
+     */
+    if (CheckPageIsAllFrozen(relation, buffer))
+        PageSetAllVisible(page);
+
+    MarkBufferDirty(buffer);

Maybe we don't need to mark the buffer dirty if the page is not set
all-visible.

Yeah, makes sense. Fixed.

If we have CheckAndSetPageAllVisible() for only this
situation we would rather need to check that the VM status of the page
should be 0 and then set two flags to the page? The 'flags' will
always be (VISIBILITYMAP_ALL_FROZEN | VISIBILITYMAP_ALL_VISIBLE) in
copy freeze case. I'm confused that this function has both code that
assumes some special situations and code that can be used in generic
situations.

If a second COPY FREEZE is run within the same transaction and if starts
inserting into the page used by the previous COPY FREEZE, then the page
will already be marked all-visible/all-frozen. So we can skip repeating the
operation again. While it's quite unlikely that someone will do that and I
can't think of a situation where only one of those flags will be set, I
don't see a harm in keeping the code as is. This code is borrowed from
vacuumlazy.c and at some point we can even move it to some common location.

Perhaps we can add some tests for this feature to pg_visibility module.

That's a good idea. Please see if the tests included in the attached patch
are enough.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

copy_freeze_v4.patchapplication/octet-stream; name=copy_freeze_v4.patchDownload

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index f0dcb897c4..6ac3e525eb 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -131,6 +131,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -140,3 +203,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index c2a7f1d9e4..01a65fdab4 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -72,6 +72,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -81,3 +157,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 137cc9257d..2ba9778995 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1834,6 +1834,16 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
 	bistate->current_buf = InvalidBuffer;
 }
 
+/*
+ * CheckAndSetAllVisibleBulkInsertState - check if the buffer currently held in
+ * bistate is all-visible, all-frozen and take necessary action.
+ */
+void
+CheckAndSetAllVisibleBulkInsertState(Relation relation, BulkInsertState bistate)
+{
+	if (bistate->current_buf != InvalidBuffer)
+		CheckAndSetPageAllVisible(relation, bistate->current_buf, NULL);
+}
 
 /*
  *	heap_insert		- insert tuple into a heap
@@ -8763,3 +8773,159 @@ heap_mask(char *pagedata, BlockNumber blkno)
 		}
 	}
 }
+
+/*
+ * Check if all tuples in this page are frozen and visible, without doing any
+ * extensive checks.
+ *
+ * The only use of this function is when we are doing heap insertions in
+ * HEAP_INSERT_FROZEN mode (e.g. COPY FREEZE). Since the relation is not
+ * visible to any other transactions when running in this mode, we usually
+ * expect the pages to contain only tuples with frozen-xmin and invalid-xmax.
+ * But in order to guard against the case when our own transaction may has
+ * inserted regular tuples, which are not marked frozen, in the table and for
+ * abundance of caution, we still do this check.
+ *
+ * While we could do more elaborate tests like heap_page_is_all_visible does,
+ * we deliberately try to keep this simple.
+ */
+static bool
+CheckPageIsAllFrozen(Relation relation, Buffer buf)
+{
+	Page		page = BufferGetPage(buf);
+	BlockNumber blockno = BufferGetBlockNumber(buf);
+	OffsetNumber offnum,
+				maxoff;
+
+	/*
+	 * This is a stripped down version of the line pointer scan in
+	 * lazy_scan_heap(). So if you change anything here, also check that code.
+	 */
+	maxoff = PageGetMaxOffsetNumber(page);
+	for (offnum = FirstOffsetNumber;
+		 offnum <= maxoff;
+		 offnum = OffsetNumberNext(offnum))
+	{
+		ItemId		itemid;
+		HeapTupleData tuple;
+
+		itemid = PageGetItemId(page, offnum);
+
+		/* Unused or redirect line pointers are of no interest */
+		if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid))
+			continue;
+
+		ItemPointerSet(&(tuple.t_self), blockno, offnum);
+
+		/*
+		 * Dead line pointers can have index pointers pointing to them. So
+		 * they can't be treated as visible
+		 */
+		if (ItemIdIsDead(itemid))
+			return false;
+
+		Assert(ItemIdIsNormal(itemid));
+
+		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+		tuple.t_len = ItemIdGetLength(itemid);
+		tuple.t_tableOid = RelationGetRelid(relation);
+
+		/*
+		 * If xmin is not frozen, then something else other than the one
+		 * inserting tuples in HEAP_INSERT_FROZEN mode, have inserted tuples in
+		 * this page. Don't mark such a page all-visible and frozen.
+		 */
+		if (!HeapTupleHeaderXminFrozen(tuple.t_data))
+			return false;
+
+		/*
+		 * Similarly, if xmax is set, be paranoid and don't mark the page as
+		 * all-visible and frozen.
+		 */
+		if (HeapTupleHeaderGetRawXmax(tuple.t_data) != InvalidTransactionId)
+			return false;
+	}							/* scan along page */
+
+	return true;
+}
+
+/*
+ * If we are inserting frozen tuples, then check if the given page
+ * has all frozen tuples and mark the page as all-visible and frozen.
+ *
+ * Caller may pass a valid vmbuffer, in which case a valid vmbuffer will be
+ * returned. If valid vmbuffer is not passed, then we don't return one either.
+ */
+void
+CheckAndSetPageAllVisible(Relation relation, Buffer buffer, Buffer *vmbuffer)
+{
+	BlockNumber targetBlock;
+	Page		page;
+	Buffer		myvmbuffer = InvalidBuffer;
+
+	/* Nothing to do if we're passed an Invalid buffer */
+	if (!BufferIsValid(buffer))
+		return;
+
+	targetBlock = BufferGetBlockNumber(buffer);
+	page = BufferGetPage(buffer);
+
+	/*
+	 * Use the passed-in vmbuffer, if available. Otherwise obtain pin the
+	 * required visibility map page before locking the heap page.
+	 */
+	if (vmbuffer && BufferIsValid(*vmbuffer))
+		myvmbuffer = *vmbuffer;
+	else
+		visibilitymap_pin(relation, targetBlock, &myvmbuffer);
+
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*
+	 * While we are holding the lock on the page, check if all tuples
+	 * in the page are marked frozen at insertion. We can safely mark
+	 * such page all-visible and set visibility map bits too.
+	 */
+	if (CheckPageIsAllFrozen(relation, buffer))
+	{
+		PageSetAllVisible(page);
+		MarkBufferDirty(buffer);
+	}
+
+	/*
+	 * All the changes to the heap page have been done. If the all-visible
+	 * flag is now set, also set the VM all-visible bit (and, if possible, the
+	 * all-frozen bit) unless this has already been done previously.
+	 *
+	 * Note: This portion of the code resembles to what we do in
+	 * vacuumlazy.c
+	 */
+	if (PageIsAllVisible(page))
+	{
+		uint8		vm_status = visibilitymap_get_status(relation,
+				targetBlock, &myvmbuffer);
+		uint8		flags = 0;
+
+		/* Set the VM all-frozen bit to flag, if needed */
+		if ((vm_status & VISIBILITYMAP_ALL_VISIBLE) == 0)
+			flags |= VISIBILITYMAP_ALL_VISIBLE;
+		if ((vm_status & VISIBILITYMAP_ALL_FROZEN) == 0)
+			flags |= VISIBILITYMAP_ALL_FROZEN;
+
+		Assert(BufferIsValid(myvmbuffer));
+		if (flags != 0)
+			visibilitymap_set(relation, targetBlock, buffer, InvalidXLogRecPtr,
+					myvmbuffer, InvalidTransactionId, flags);
+	}
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * If the caller passed a valid vmbuffer, return (potentially different)
+	 * vmbuffer. Otherwise just release the vmbuffer that we pinned.
+	 */
+	if (vmbuffer && BufferIsValid(*vmbuffer))
+		*vmbuffer = myvmbuffer;
+	else if (BufferIsValid(myvmbuffer))
+		ReleaseBuffer(myvmbuffer);
+}
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 69a7a23874..28619b7616 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -415,7 +415,15 @@ loop:
 		{
 			/* easy case */
 			buffer = ReadBufferBI(relation, targetBlock, RBM_NORMAL, bistate);
-			if (PageIsAllVisible(BufferGetPage(buffer)))
+
+			/*
+			 * Obtain the pin on visibility map buffer if the page is known to
+			 * be all-visible or we are running in HEAP_INSERT_FROZEN mode. In
+			 * latter case, we may decide to mark the page all-visible before
+			 * switching to a new page and hence obtain the pin in advance.
+			 */
+			if (PageIsAllVisible(BufferGetPage(buffer)) ||
+				options & HEAP_INSERT_FROZEN)
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
@@ -516,13 +524,18 @@ loop:
 		 * code above.
 		 */
 		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-		if (otherBuffer == InvalidBuffer)
-			ReleaseBuffer(buffer);
-		else if (otherBlock != targetBlock)
-		{
+		if (otherBuffer != InvalidBuffer && otherBlock != targetBlock)
 			LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
+
+		/*
+		 * If we're inserting frozen tuples, then check if the current page is
+		 * completely frozen and set the visibility bit.
+		 */
+		if (options & HEAP_INSERT_FROZEN)
+			CheckAndSetPageAllVisible(relation, buffer, vmbuffer);
+
+		if ((otherBuffer == InvalidBuffer) || (otherBlock != targetBlock))
 			ReleaseBuffer(buffer);
-		}
 
 		/* Without FSM, always fall out of the loop and extend */
 		if (!use_fsm)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 705df8900b..6dfbf993d5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2832,6 +2832,15 @@ CopyFrom(CopyState cstate)
 					!has_instead_insert_row_trig &&
 					resultRelInfo->ri_FdwRoutine == NULL;
 
+				/*
+				 * Note: As of PG12, COPY FREEZE is not supported on
+				 * partitioned table. Nevertheless have this check in place so
+				 * that we do the right thing if it ever gets supported.
+				 */
+				if (hi_options & HEAP_INSERT_FROZEN)
+					CheckAndSetAllVisibleBulkInsertState(resultRelInfo->ri_RelationDesc,
+							bistate);
+
 				/*
 				 * We'd better make the bulk insert mechanism gets a new
 				 * buffer when the partition being inserted into changes.
@@ -3046,6 +3055,15 @@ CopyFrom(CopyState cstate)
 								firstBufferedLineNo);
 	}
 
+	/*
+	 * If we are inserting frozen tuples, check if the last page used can also
+	 * be marked as all-visible and all-frozen. This ensures that a table can
+	 * be fully frozen when the data is loaded.
+	 */
+	if (hi_options & HEAP_INSERT_FROZEN)
+		CheckAndSetAllVisibleBulkInsertState(resultRelInfo->ri_RelationDesc,
+				bistate);
+
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3773a4df85..bf32ce295e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -140,6 +140,10 @@ extern void setLastTid(const ItemPointer tid);
 extern BulkInsertState GetBulkInsertState(void);
 extern void FreeBulkInsertState(BulkInsertState);
 extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
+extern void CheckAndSetAllVisibleBulkInsertState(Relation relation,
+					BulkInsertState bistate);
+extern void CheckAndSetPageAllVisible(Relation relation,
+					Buffer buffer, Buffer *vmbuffer);
 
 extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 			int options, BulkInsertState bistate);

#17

sawada.mshk@gmail.com

almost 7 years ago

In reply to: Pavan Deolasee (#16)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Thank you for sharing the updated patch!

On Tue, Mar 26, 2019 at 6:26 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

On Fri, Mar 22, 2019 at 12:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've looked at the patch and have comments and questions.
+    /*
+     * While we are holding the lock on the page, check if all tuples
+     * in the page are marked frozen at insertion. We can safely mark
+     * such page all-visible and set visibility map bits too.
+     */
+    if (CheckPageIsAllFrozen(relation, buffer))
+        PageSetAllVisible(page);
+
+    MarkBufferDirty(buffer);
Maybe we don't need to mark the buffer dirty if the page is not set all-visible.
Yeah, makes sense. Fixed.

If we have CheckAndSetPageAllVisible() for only this
situation we would rather need to check that the VM status of the page
should be 0 and then set two flags to the page? The 'flags' will
always be (VISIBILITYMAP_ALL_FROZEN | VISIBILITYMAP_ALL_VISIBLE) in
copy freeze case. I'm confused that this function has both code that
assumes some special situations and code that can be used in generic
situations.

If a second COPY FREEZE is run within the same transaction and if starts inserting into the page used by the previous COPY FREEZE, then the page will already be marked all-visible/all-frozen. So we can skip repeating the operation again. While it's quite unlikely that someone will do that and I can't think of a situation where only one of those flags will be set, I don't see a harm in keeping the code as is. This code is borrowed from vacuumlazy.c and at some point we can even move it to some common location.

Thank you for explanation, agreed.

Perhaps we can add some tests for this feature to pg_visibility module.

That's a good idea. Please see if the tests included in the attached patch are enough.

The patch looks good to me. There is no comment from me.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#18

pavan.deolasee@gmail.com

almost 7 years ago

In reply to: Masahiko Sawada (#17)

1 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Wed, Mar 27, 2019 at 9:47 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

The patch looks good to me. There is no comment from me.

Thanks for your review! Updated patch attached since patch failed to apply
after recent changes in the master.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

copy_freeze_v5.patchapplication/octet-stream; name=copy_freeze_v5.patchDownload

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index f0dcb897c4..6ac3e525eb 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -131,6 +131,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -140,3 +203,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index c2a7f1d9e4..01a65fdab4 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -72,6 +72,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -81,3 +157,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 05ceb6550d..b9ec083fce 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1849,6 +1849,16 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
 	bistate->current_buf = InvalidBuffer;
 }
 
+/*
+ * CheckAndSetAllVisibleBulkInsertState - check if the buffer currently held in
+ * bistate is all-visible, all-frozen and take necessary action.
+ */
+void
+CheckAndSetAllVisibleBulkInsertState(Relation relation, BulkInsertState bistate)
+{
+	if (bistate->current_buf != InvalidBuffer)
+		CheckAndSetPageAllVisible(relation, bistate->current_buf, NULL);
+}
 
 /*
  *	heap_insert		- insert tuple into a heap
@@ -8992,3 +9002,159 @@ heap_mask(char *pagedata, BlockNumber blkno)
 		}
 	}
 }
+
+/*
+ * Check if all tuples in this page are frozen and visible, without doing any
+ * extensive checks.
+ *
+ * The only use of this function is when we are doing heap insertions in
+ * HEAP_INSERT_FROZEN mode (e.g. COPY FREEZE). Since the relation is not
+ * visible to any other transactions when running in this mode, we usually
+ * expect the pages to contain only tuples with frozen-xmin and invalid-xmax.
+ * But in order to guard against the case when our own transaction may has
+ * inserted regular tuples, which are not marked frozen, in the table and for
+ * abundance of caution, we still do this check.
+ *
+ * While we could do more elaborate tests like heap_page_is_all_visible does,
+ * we deliberately try to keep this simple.
+ */
+static bool
+CheckPageIsAllFrozen(Relation relation, Buffer buf)
+{
+	Page		page = BufferGetPage(buf);
+	BlockNumber blockno = BufferGetBlockNumber(buf);
+	OffsetNumber offnum,
+				maxoff;
+
+	/*
+	 * This is a stripped down version of the line pointer scan in
+	 * lazy_scan_heap(). So if you change anything here, also check that code.
+	 */
+	maxoff = PageGetMaxOffsetNumber(page);
+	for (offnum = FirstOffsetNumber;
+		 offnum <= maxoff;
+		 offnum = OffsetNumberNext(offnum))
+	{
+		ItemId		itemid;
+		HeapTupleData tuple;
+
+		itemid = PageGetItemId(page, offnum);
+
+		/* Unused or redirect line pointers are of no interest */
+		if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid))
+			continue;
+
+		ItemPointerSet(&(tuple.t_self), blockno, offnum);
+
+		/*
+		 * Dead line pointers can have index pointers pointing to them. So
+		 * they can't be treated as visible
+		 */
+		if (ItemIdIsDead(itemid))
+			return false;
+
+		Assert(ItemIdIsNormal(itemid));
+
+		tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
+		tuple.t_len = ItemIdGetLength(itemid);
+		tuple.t_tableOid = RelationGetRelid(relation);
+
+		/*
+		 * If xmin is not frozen, then something else other than the one
+		 * inserting tuples in HEAP_INSERT_FROZEN mode, have inserted tuples in
+		 * this page. Don't mark such a page all-visible and frozen.
+		 */
+		if (!HeapTupleHeaderXminFrozen(tuple.t_data))
+			return false;
+
+		/*
+		 * Similarly, if xmax is set, be paranoid and don't mark the page as
+		 * all-visible and frozen.
+		 */
+		if (HeapTupleHeaderGetRawXmax(tuple.t_data) != InvalidTransactionId)
+			return false;
+	}							/* scan along page */
+
+	return true;
+}
+
+/*
+ * If we are inserting frozen tuples, then check if the given page
+ * has all frozen tuples and mark the page as all-visible and frozen.
+ *
+ * Caller may pass a valid vmbuffer, in which case a valid vmbuffer will be
+ * returned. If valid vmbuffer is not passed, then we don't return one either.
+ */
+void
+CheckAndSetPageAllVisible(Relation relation, Buffer buffer, Buffer *vmbuffer)
+{
+	BlockNumber targetBlock;
+	Page		page;
+	Buffer		myvmbuffer = InvalidBuffer;
+
+	/* Nothing to do if we're passed an Invalid buffer */
+	if (!BufferIsValid(buffer))
+		return;
+
+	targetBlock = BufferGetBlockNumber(buffer);
+	page = BufferGetPage(buffer);
+
+	/*
+	 * Use the passed-in vmbuffer, if available. Otherwise obtain pin the
+	 * required visibility map page before locking the heap page.
+	 */
+	if (vmbuffer && BufferIsValid(*vmbuffer))
+		myvmbuffer = *vmbuffer;
+	else
+		visibilitymap_pin(relation, targetBlock, &myvmbuffer);
+
+	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+	/*
+	 * While we are holding the lock on the page, check if all tuples
+	 * in the page are marked frozen at insertion. We can safely mark
+	 * such page all-visible and set visibility map bits too.
+	 */
+	if (CheckPageIsAllFrozen(relation, buffer))
+	{
+		PageSetAllVisible(page);
+		MarkBufferDirty(buffer);
+	}
+
+	/*
+	 * All the changes to the heap page have been done. If the all-visible
+	 * flag is now set, also set the VM all-visible bit (and, if possible, the
+	 * all-frozen bit) unless this has already been done previously.
+	 *
+	 * Note: This portion of the code resembles to what we do in
+	 * vacuumlazy.c
+	 */
+	if (PageIsAllVisible(page))
+	{
+		uint8		vm_status = visibilitymap_get_status(relation,
+				targetBlock, &myvmbuffer);
+		uint8		flags = 0;
+
+		/* Set the VM all-frozen bit to flag, if needed */
+		if ((vm_status & VISIBILITYMAP_ALL_VISIBLE) == 0)
+			flags |= VISIBILITYMAP_ALL_VISIBLE;
+		if ((vm_status & VISIBILITYMAP_ALL_FROZEN) == 0)
+			flags |= VISIBILITYMAP_ALL_FROZEN;
+
+		Assert(BufferIsValid(myvmbuffer));
+		if (flags != 0)
+			visibilitymap_set(relation, targetBlock, buffer, InvalidXLogRecPtr,
+					myvmbuffer, InvalidTransactionId, flags);
+	}
+
+	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
+
+	/*
+	 * If the caller passed a valid vmbuffer, return (potentially different)
+	 * vmbuffer. Otherwise just release the vmbuffer that we pinned.
+	 */
+	if (vmbuffer && BufferIsValid(*vmbuffer))
+		*vmbuffer = myvmbuffer;
+	else if (BufferIsValid(myvmbuffer))
+		ReleaseBuffer(myvmbuffer);
+}
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 69a7a23874..28619b7616 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -415,7 +415,15 @@ loop:
 		{
 			/* easy case */
 			buffer = ReadBufferBI(relation, targetBlock, RBM_NORMAL, bistate);
-			if (PageIsAllVisible(BufferGetPage(buffer)))
+
+			/*
+			 * Obtain the pin on visibility map buffer if the page is known to
+			 * be all-visible or we are running in HEAP_INSERT_FROZEN mode. In
+			 * latter case, we may decide to mark the page all-visible before
+			 * switching to a new page and hence obtain the pin in advance.
+			 */
+			if (PageIsAllVisible(BufferGetPage(buffer)) ||
+				options & HEAP_INSERT_FROZEN)
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
@@ -516,13 +524,18 @@ loop:
 		 * code above.
 		 */
 		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-		if (otherBuffer == InvalidBuffer)
-			ReleaseBuffer(buffer);
-		else if (otherBlock != targetBlock)
-		{
+		if (otherBuffer != InvalidBuffer && otherBlock != targetBlock)
 			LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
+
+		/*
+		 * If we're inserting frozen tuples, then check if the current page is
+		 * completely frozen and set the visibility bit.
+		 */
+		if (options & HEAP_INSERT_FROZEN)
+			CheckAndSetPageAllVisible(relation, buffer, vmbuffer);
+
+		if ((otherBuffer == InvalidBuffer) || (otherBlock != targetBlock))
 			ReleaseBuffer(buffer);
-		}
 
 		/* Without FSM, always fall out of the loop and extend */
 		if (!use_fsm)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c1fd7b78ce..09df70a3ac 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2833,6 +2833,15 @@ CopyFrom(CopyState cstate)
 					!has_instead_insert_row_trig &&
 					resultRelInfo->ri_FdwRoutine == NULL;
 
+				/*
+				 * Note: As of PG12, COPY FREEZE is not supported on
+				 * partitioned table. Nevertheless have this check in place so
+				 * that we do the right thing if it ever gets supported.
+				 */
+				if (ti_options & TABLE_INSERT_FROZEN)
+					CheckAndSetAllVisibleBulkInsertState(resultRelInfo->ri_RelationDesc,
+							bistate);
+
 				/*
 				 * We'd better make the bulk insert mechanism gets a new
 				 * buffer when the partition being inserted into changes.
@@ -3062,6 +3071,15 @@ CopyFrom(CopyState cstate)
 								firstBufferedLineNo);
 	}
 
+	/*
+	 * If we are inserting frozen tuples, check if the last page used can also
+	 * be marked as all-visible and all-frozen. This ensures that a table can
+	 * be fully frozen when the data is loaded.
+	 */
+	if (ti_options & TABLE_INSERT_FROZEN)
+		CheckAndSetAllVisibleBulkInsertState(resultRelInfo->ri_RelationDesc,
+				bistate);
+
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4c077755d5..37480b1f22 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -140,6 +140,10 @@ extern void setLastTid(const ItemPointer tid);
 extern BulkInsertState GetBulkInsertState(void);
 extern void FreeBulkInsertState(BulkInsertState);
 extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
+extern void CheckAndSetAllVisibleBulkInsertState(Relation relation,
+					BulkInsertState bistate);
+extern void CheckAndSetPageAllVisible(Relation relation,
+					Buffer buffer, Buffer *vmbuffer);
 
 extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 			int options, BulkInsertState bistate);
@@ -178,7 +182,6 @@ extern TransactionId heap_compute_xid_horizon_for_tuples(Relation rel,
 									ItemPointerData *items,
 									int nitems);
 
-/* in heap/pruneheap.c */
 extern void heap_page_prune_opt(Relation relation, Buffer buffer);
 extern int heap_page_prune(Relation relation, Buffer buffer,
 				TransactionId OldestXmin,

#19

tomas.vondra@2ndquadrant.com

almost 7 years ago

In reply to: Pavan Deolasee (#18)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

I've been looking at this patch for a while, and it seems pretty much RFC,
so barring objections I'll take care of that once I do a bit more testing
and review. Unless someone else wants to take care of that.

FWIW I wonder if we should add the code for partitioned tables to
CopyFrom, considering that's unsupported and so can't be tested etc. It's
not a huge amount of code, of course.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20

andres@anarazel.de

almost 7 years ago

In reply to: Pavan Deolasee (#18)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

On 2019-04-03 10:19:17 +0530, Pavan Deolasee wrote:

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c1fd7b78ce..09df70a3ac 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2833,6 +2833,15 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;

+				/*
+				 * Note: As of PG12, COPY FREEZE is not supported on
+				 * partitioned table. Nevertheless have this check in place so
+				 * that we do the right thing if it ever gets supported.
+				 */
+				if (ti_options & TABLE_INSERT_FROZEN)
+					CheckAndSetAllVisibleBulkInsertState(resultRelInfo->ri_RelationDesc,
+							bistate);
+
/*
* We'd better make the bulk insert mechanism gets a new
* buffer when the partition being inserted into changes.
@@ -3062,6 +3071,15 @@ CopyFrom(CopyState cstate)
firstBufferedLineNo);
}

+	/*
+	 * If we are inserting frozen tuples, check if the last page used can also
+	 * be marked as all-visible and all-frozen. This ensures that a table can
+	 * be fully frozen when the data is loaded.
+	 */
+	if (ti_options & TABLE_INSERT_FROZEN)
+		CheckAndSetAllVisibleBulkInsertState(resultRelInfo->ri_RelationDesc,
+				bistate);
+
/* Done, clean up */
error_context_stack = errcallback.previous;

I'm totally not OK with this from a layering
POV. CheckAndSetAllVisibleBulkInsertState is entirely heap specific
(without being named such), whereas all the heap specific bits are
getting moved below tableam.

Greetings,

Andres Freund

#21

Alvaro Herrera

alvherre@2ndquadrant.com

almost 7 years ago

In reply to: Andres Freund (#20)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 2019-Apr-04, Andres Freund wrote:

I'm totally not OK with this from a layering
POV. CheckAndSetAllVisibleBulkInsertState is entirely heap specific
(without being named such), whereas all the heap specific bits are
getting moved below tableam.

This is a fair complaint, but on the other hand the COPY changes for
table AM are still being developed, so there's no ground on which to
rebase this patch. Do you have a timeline on getting the COPY one
committed?

I think it's fair to ask the RMT for an exceptional extension of a
couple of working days for this patch.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#22

andres@anarazel.de

almost 7 years ago

In reply to: Alvaro Herrera (#21)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

On 2019-04-04 16:15:54 -0300, Alvaro Herrera wrote:

On 2019-Apr-04, Andres Freund wrote:

I'm totally not OK with this from a layering
POV. CheckAndSetAllVisibleBulkInsertState is entirely heap specific
(without being named such), whereas all the heap specific bits are
getting moved below tableam.

This is a fair complaint, but on the other hand the COPY changes for
table AM are still being developed, so there's no ground on which to
rebase this patch. Do you have a timeline on getting the COPY one
committed?

~2h. Just pondering the naming of some functions etc. Don't think
there's a large interdependency though.

But even if tableam weren't committed, I'd still argue that it's
structurally done wrong in the patch right now. FWIW, I actually think
this whole approach isn't quite right - this shouldn't be done as a
secondary action after we'd already inserted, with a separate
lock-unlock cycle etc.

Also, how is this code even close to correct?
CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and
there's no WAL logging? Without even a comment arguing why that's OK (I
don't think it is)?

Greetings,

Andres Freund

#23

andres@anarazel.de

almost 7 years ago

In reply to: Andres Freund (#22)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

On 2019-04-04 12:23:08 -0700, Andres Freund wrote:

Also, how is this code even close to correct?
CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and
there's no WAL logging? Without even a comment arguing why that's OK (I
don't think it is)?

Peter Geoghegan just reminded me over IM that there's actually logging
inside log_heap_visible(), called from visibilitymap_set(). Still lacks
a critical section though.

I still think this is the wrong architecture.

Greetings,

Andres Freund

PS: We're going to have to revamp visibilitymap_set() soon-ish - the
fact that it directly calls heap routines inside is bad, means that
additional AMs e.g. zheap has to reimplement that routine.

#24

Alvaro Herrera

alvherre@2ndquadrant.com

almost 7 years ago

In reply to: Andres Freund (#23)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 2019-Apr-04, Andres Freund wrote:

On 2019-04-04 12:23:08 -0700, Andres Freund wrote:

Also, how is this code even close to correct?
CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and
there's no WAL logging? Without even a comment arguing why that's OK (I
don't think it is)?

Peter Geoghegan just reminded me over IM that there's actually logging
inside log_heap_visible(), called from visibilitymap_set(). Still lacks
a critical section though.

Hmm, isn't there already a critical section in visibilitymap_set itself?

I still think this is the wrong architecture.

Hmm.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#25

andres@anarazel.de

almost 7 years ago

In reply to: Alvaro Herrera (#24)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

On 2019-04-05 00:06:04 -0300, Alvaro Herrera wrote:

On 2019-Apr-04, Andres Freund wrote:

On 2019-04-04 12:23:08 -0700, Andres Freund wrote:

Also, how is this code even close to correct?
CheckAndSetPageAllVisible() modifies the buffer in a crucial way, and
there's no WAL logging? Without even a comment arguing why that's OK (I
don't think it is)?

Peter Geoghegan just reminded me over IM that there's actually logging
inside log_heap_visible(), called from visibilitymap_set(). Still lacks
a critical section though.

Hmm, isn't there already a critical section in visibilitymap_set itself?

There is, but the proposed code sets all visible on the page, and marks
the buffer dirty, before calling visibilitymap_set.

Greetings,

Andres Freund

#26

andres@anarazel.de

almost 7 years ago

In reply to: Alvaro Herrera (#24)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

On 2019-04-05 00:06:04 -0300, Alvaro Herrera wrote:

On 2019-Apr-04, Andres Freund wrote:

I still think this is the wrong architecture.

Hmm.

I think the right approach would be to do all of this in heap_insert and
heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
is specified, remember whether it is either currently empty, or is
already marked as all-visible. If previously empty, mark it as all
visible at the end. If already all visible, there's no need to change
that. If not yet all-visible, no need to do anything, since it can't
have been inserted with COPY FREEZE. Do you see any problem doing it
that way?

Greetings,

Andres Freund

#27

pavan.deolasee@gmail.com

almost 7 years ago

In reply to: Andres Freund (#26)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

On Fri, Apr 5, 2019 at 9:05 AM Andres Freund <andres@anarazel.de> wrote:

I think the right approach would be to do all of this in heap_insert and
heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
is specified, remember whether it is either currently empty, or is
already marked as all-visible. If previously empty, mark it as all
visible at the end. If already all visible, there's no need to change
that. If not yet all-visible, no need to do anything, since it can't
have been inserted with COPY FREEZE.

We're doing roughly the same. If we are running INSERT_FROZEN, whenever
we're about to switch to a new page, we check if the previous page should
be marked all-frozen and do it that way. The special code in copy.c is
necessary to take care of the last page which we don't get to handle in the
regular code path.

Or are you suggesting that we don't even rescan the page for all-frozen
tuples at the end and just simply mark it all-frozen at the start, when the
first tuple is inserted and then don't touch the PD_ALL_VISIBLE/visibility
map bit as we go on inserting more tuples in the page?

Anyways, if major architectural changes are required then it's probably too
late to consider this for PG12, even though it's more of a bug fix and a
candidate for back-patching too.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#28

pavan.deolasee@gmail.com

almost 7 years ago

In reply to: Andres Freund (#25)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Fri, Apr 5, 2019 at 8:37 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-04-05 00:06:04 -0300, Alvaro Herrera wrote:

Hmm, isn't there already a critical section in visibilitymap_set itself?

There is, but the proposed code sets all visible on the page, and marks
the buffer dirty, before calling visibilitymap_set.

How's it any different than what we're doing at vacuumlazy.c:1322? We set
the page-level bit, mark the buffer dirty and then call
visibilitymap_set(), all outside a critical section.

1300 /* mark page all-visible, if appropriate */
1301 if (all_visible && !all_visible_according_to_vm)
1302 {
1303 uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
1304
1305 if (all_frozen)
1306 flags |= VISIBILITYMAP_ALL_FROZEN;
1307
1308 /*
1309 * It should never be the case that the visibility map
page is set
1310 * while the page-level bit is clear, but the reverse is
allowed
1311 * (if checksums are not enabled). Regardless, set the
both bits
1312 * so that we get back in sync.
1313 *
1314 * NB: If the heap page is all-visible but the VM bit is
not set,
1315 * we don't need to dirty the heap page. However, if
checksums
1316 * are enabled, we do need to make sure that the heap page
is
1317 * dirtied before passing it to visibilitymap_set(),
because it
1318 * may be logged. Given that this situation should only
happen in
1319 * rare cases after a crash, it is not worth optimizing.
1320 */
1321 PageSetAllVisible(page);
1322 MarkBufferDirty(buf);
1323 visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
1324 vmbuffer, visibility_cutoff_xid, flags);
1325 }

As the first para in that comment says, I thought it's ok for page-level
bit to be set but the visibility bit to be clear, but not the vice versa.
The proposed code does not introduce any new behaviour AFAICS. But I might
be missing something.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#29

Tom Lane

tgl@sss.pgh.pa.us

almost 7 years ago

In reply to: Andres Freund (#26)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Andres Freund <andres@anarazel.de> writes:

I think the right approach would be to do all of this in heap_insert and
heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
is specified, remember whether it is either currently empty, or is
already marked as all-visible. If previously empty, mark it as all
visible at the end. If already all visible, there's no need to change
that. If not yet all-visible, no need to do anything, since it can't
have been inserted with COPY FREEZE. Do you see any problem doing it
that way?

Do we want to add overhead to these hot-spot routines for this purpose?

regards, tom lane

#30

andres@anarazel.de

almost 7 years ago

In reply to: Pavan Deolasee (#27)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

On 2019-04-05 09:20:36 +0530, Pavan Deolasee wrote:

On Fri, Apr 5, 2019 at 9:05 AM Andres Freund <andres@anarazel.de> wrote:

I think the right approach would be to do all of this in heap_insert and
heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
is specified, remember whether it is either currently empty, or is
already marked as all-visible. If previously empty, mark it as all
visible at the end. If already all visible, there's no need to change
that. If not yet all-visible, no need to do anything, since it can't
have been inserted with COPY FREEZE.

We're doing roughly the same. If we are running INSERT_FROZEN, whenever
we're about to switch to a new page, we check if the previous page should
be marked all-frozen and do it that way. The special code in copy.c is
necessary to take care of the last page which we don't get to handle in the
regular code path.

Well, it's not the same, because you need extra code from copy.c, extra
lock cycles, and extra WAL logging.

Or are you suggesting that we don't even rescan the page for all-frozen
tuples at the end and just simply mark it all-frozen at the start, when the
first tuple is inserted and then don't touch the PD_ALL_VISIBLE/visibility
map bit as we go on inserting more tuples in the page?

Correct. If done right that should be cheaper (no extra scans, less WAL
logging), without requiring some new dispatch logic from copy.c.

Anyways, if major architectural changes are required then it's probably too
late to consider this for PG12, even though it's more of a bug fix and a
candidate for back-patching too.

Let's just see how bad it looks? I don't feel like we'd need to be super
strict about it. If it looks simple enough I'd e.g. be ok to merge this
soon after freeze, and backpatch around maybe 12.1 or such.

Greetings,

Andres Freund

#31

Darafei "Komяpa" Praliaskouski

andres@anarazel.de

almost 7 years ago

In reply to: Tom Lane (#29)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

On 2019-04-04 23:57:58 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

I think the right approach would be to do all of this in heap_insert and
heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
is specified, remember whether it is either currently empty, or is
already marked as all-visible. If previously empty, mark it as all
visible at the end. If already all visible, there's no need to change
that. If not yet all-visible, no need to do anything, since it can't
have been inserted with COPY FREEZE. Do you see any problem doing it
that way?

Do we want to add overhead to these hot-spot routines for this purpose?

For heap_multi_insert I can't see it being a problem - it's only used
from copy.c, and the cost should be "smeared" over many tuples. I'd
assume that compared with locking a page, WAL logging, etc, it'd not
even meaningfully show up for heap_insert. Especially because we already
have codepaths for options & HEAP_INSERT_FROZEN in
heap_prepare_insert(), and I'd assume those could be combined.

I think we should measure it, but I don't think that one or two
additional, very well predictd, branches are going to be measurable in
in those routines.

The patch, as implemented, has modifications in
RelationGetBufferForTuple(), that seems like they'd be more expensive.

Greetings,

Andres Freund

#32

me@komzpa.net

almost 7 years ago

In reply to: Tom Lane (#29)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Fri, Apr 5, 2019 at 6:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

I think the right approach would be to do all of this in heap_insert and
heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
is specified, remember whether it is either currently empty, or is
already marked as all-visible. If previously empty, mark it as all
visible at the end. If already all visible, there's no need to change
that. If not yet all-visible, no need to do anything, since it can't
have been inserted with COPY FREEZE. Do you see any problem doing it
that way?

Do we want to add overhead to these hot-spot routines for this purpose?

Sizing the overhead: workflows right now don't end with COPY FREEZE - you
need another VACUUM to set maps.
Anything that lets you skip that VACUUM (and faster than that VACUUM
itself) is helping. You specifically asked for it to be skippable with
FREEZE anyway.

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

#33

andres@anarazel.de

almost 7 years ago

In reply to: Darafei "Komяpa" Praliaskouski (#32)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

On 2019-04-05 08:38:34 +0300, Darafei "Komяpa" Praliaskouski wrote:

On Fri, Apr 5, 2019 at 6:58 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

I think the right approach would be to do all of this in heap_insert and
heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
is specified, remember whether it is either currently empty, or is
already marked as all-visible. If previously empty, mark it as all
visible at the end. If already all visible, there's no need to change
that. If not yet all-visible, no need to do anything, since it can't
have been inserted with COPY FREEZE. Do you see any problem doing it
that way?

Do we want to add overhead to these hot-spot routines for this purpose?

Sizing the overhead: workflows right now don't end with COPY FREEZE - you
need another VACUUM to set maps.
Anything that lets you skip that VACUUM (and faster than that VACUUM
itself) is helping. You specifically asked for it to be skippable with
FREEZE anyway.

Tom's point was that the routines I was suggesting to adapt above aren't
just used for COPY FREEZE.

Greetings,

Andres Freund

#34

andres@anarazel.de

almost 7 years ago

In reply to: Andres Freund (#31)

1 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

On 2019-04-04 21:04:49 -0700, Andres Freund wrote:

On 2019-04-04 23:57:58 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

I think the right approach would be to do all of this in heap_insert and
heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
is specified, remember whether it is either currently empty, or is
already marked as all-visible. If previously empty, mark it as all
visible at the end. If already all visible, there's no need to change
that. If not yet all-visible, no need to do anything, since it can't
have been inserted with COPY FREEZE. Do you see any problem doing it
that way?

Do we want to add overhead to these hot-spot routines for this purpose?

For heap_multi_insert I can't see it being a problem - it's only used
from copy.c, and the cost should be "smeared" over many tuples. I'd
assume that compared with locking a page, WAL logging, etc, it'd not
even meaningfully show up for heap_insert. Especially because we already
have codepaths for options & HEAP_INSERT_FROZEN in
heap_prepare_insert(), and I'd assume those could be combined.

I think we should measure it, but I don't think that one or two
additional, very well predictd, branches are going to be measurable in
in those routines.

The patch, as implemented, has modifications in
RelationGetBufferForTuple(), that seems like they'd be more expensive.

Here's a *prototype* patch for this. It only implements what I
described for heap_multi_insert, not for plain inserts. I wanted to see
what others think before investing additional time.

I don't think it's possible to see the overhead of the changed code in
heap_multi_insert(), and probably - with less confidence - that it's
also going to be ok for heap_insert(). But we gotta measure that.

This avoids an extra WAL record for setting empty pages to all visible,
by adding XLH_INSERT_ALL_VISIBLE_SET & XLH_INSERT_ALL_FROZEN_SET, and
setting those when appropriate in heap_multi_insert. Unfortunately
currently visibilitymap_set() doesn't really properly allow to do this,
as it has embedded WAL logging for heap.

I think we should remove the WAL logging from visibilitymap_set(), and
move it to a separate, heap specific, function. Right now different
tableams e.g. would have to reimplement visibilitymap_set(), so that's a
second need to separate that functionality. Let me try to come up with
a proposal.

The patch currently does a vmbuffer_pin() while holding an exclusive
lwlock on the page. That's something we normally try to avoid - but I
think it's probably OK here, because INSERT_FROZEN can only be used to
insert into a new relfilenode (which thus no other session can see). I
think it's preferrable to have this logic in specific to the
INSERT_FROZEN path, rather than adding nontrivial complications to
RelationGetBufferForTuple().

I noticed that, before this patch, we do a
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
after every filled page - that doesn't strike me as particularly smart -
it's pretty likely that the next heap page to be filled is going to be
on the same vm page as the previous iteration.

I noticed one small oddity that I think is common to all the approaches
presented in this thread so far. After

BEGIN;
TRUNCATE foo;
COPY foo(i) FROM '/tmp/foo' WITH FREEZE;
COPY foo(i) FROM '/tmp/foo' WITH FREEZE;
COPY foo(i) FROM '/tmp/foo' WITH FREEZE;
COMMIT;

we currently end up with pages like:
┌───────┬───────────┬──────────┬───────┬───────┬───────┬─────────┬──────────┬─────────┬───────────┐
│ blkno │ lsn │ checksum │ flags │ lower │ upper │ special │ pagesize │ version │ prune_xid │
├───────┼───────────┼──────────┼───────┼───────┼───────┼─────────┼──────────┼─────────┼───────────┤
│ 0 │ 0/50B5488 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │
│ 1 │ 0/50B6360 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │
│ 2 │ 0/50B71B8 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │
│ 3 │ 0/50B8028 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │
│ 4 │ 0/50B8660 │ 0 │ 4 │ 408 │ 5120 │ 8192 │ 8192 │ 4 │ 0 │
│ 5 │ 0/50B94B8 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │
│ 6 │ 0/50BA328 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │
│ 7 │ 0/50BB180 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │
│ 8 │ 0/50BBFD8 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │
│ 9 │ 0/50BCF88 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │
│ 10 │ 0/50BDDE0 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │
│ 11 │ 0/50BEC50 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │
│ 12 │ 0/50BFAA8 │ 0 │ 4 │ 928 │ 960 │ 8192 │ 8192 │ 4 │ 0 │
│ 13 │ 0/50C06F8 │ 0 │ 4 │ 792 │ 2048 │ 8192 │ 8192 │ 4 │ 0 │
└───────┴───────────┴──────────┴───────┴───────┴───────┴─────────┴──────────┴─────────┴───────────┘
(14 rows)

Note how block 4 has more space available. That's because the
visibilitymap_pin() called in the first COPY has to vm_extend(), which
in turn does:

/*
* Send a shared-inval message to force other backends to close any smgr
* references they may have for this rel, which we are about to change.
* This is a useful optimization because it means that backends don't have
* to keep checking for creation or extension of the file, which happens
* infrequently.
*/
CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode);

which invalidates ->rd_smgr->smgr_targblock *after* the first COPY,
because that's when the pending smgr invalidations are sent out. That's
far from great, but it doesn't seem to be this patch's fault.

It seems to me we need a separate invalidation that doesn't close the
whole smgr relation, but just invalidates the VM specific fields.

Greetings,

Andres Freund

Attachments:

0002-WIP-copy-freeze-should-actually-freeze-right.patchtext/x-diff; charset=us-asciiDownload

From 03a50acf74d9f9e7876b44b75ce94b106f6e4147 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 7 Apr 2019 18:00:10 -0700
Subject: [PATCH 2/2] WIP-copy-freeze-should-actually-freeze-right

---
 src/backend/access/heap/heapam.c        | 124 +++++++++++++++++++++---
 src/backend/access/heap/visibilitymap.c |   2 +-
 src/include/access/heapam_xlog.h        |   2 +
 3 files changed, 116 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a05b6a07ad0..d4040911956 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2115,6 +2115,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock scratch;
 	Page		page;
+	Buffer		vmbuffer = InvalidBuffer;
 	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
@@ -2169,9 +2170,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		bool		starting_with_empty_page;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_set = false;
 		int			nthispage;
+		XLogRecPtr	recptr;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2184,6 +2187,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 										   &vmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+			all_frozen_set = true;
+
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
 
@@ -2209,7 +2217,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				log_heap_new_cid(relation, heaptup);
 		}
 
-		if (PageIsAllVisible(page))
+		/*
+		 * If the page is all visible, need to clear that, unless we're only
+		 * going to add further frozen rows to it.
+		 *
+		 * If we're only adding already frozen rows, and the page was
+		 * previously empty, mark it as all-visible.
+		 */
+		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
 			all_visible_cleared = true;
 			PageClearAllVisible(page);
@@ -2217,6 +2232,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+		else if (all_frozen_set)
+			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2227,7 +2244,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/* XLOG stuff */
 		if (needwal)
 		{
-			XLogRecPtr	recptr;
 			xl_heap_multi_insert *xlrec;
 			uint8		info = XLOG_HEAP2_MULTI_INSERT;
 			char	   *tupledata;
@@ -2240,8 +2256,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			 * If the page was previously empty, we can reinit the page
 			 * instead of restoring the whole thing.
 			 */
-			init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber &&
-					PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
+			init = starting_with_empty_page;
 
 			/* allocate xl_heap_multi_insert struct from the scratch area */
 			xlrec = (xl_heap_multi_insert *) scratchptr;
@@ -2259,7 +2274,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			xlrec->flags = 0;
+			Assert((all_visible_cleared == 0 && all_frozen_set == 0) ||
+				   all_visible_cleared != all_frozen_set);
+			if (all_visible_cleared)
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+			if (all_frozen_set)
+			{
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_SET;
+				xlrec->flags |= XLH_INSERT_ALL_FROZEN_SET;
+			}
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2333,13 +2358,46 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		END_CRIT_SECTION();
 
-		UnlockReleaseBuffer(buffer);
-		if (vmbuffer != InvalidBuffer)
-			ReleaseBuffer(vmbuffer);
+		if (all_frozen_set)
+		{
+			Assert(PageIsAllVisible(page));
 
+			/*
+			 * Having to potentially read the page while holding an exclusive
+			 * lock on the page isn't great. But we only get here if
+			 * HEAP_INSERT_FROZEN is set, and we only do so if the table isn't
+			 * readable outside of this sessoin. Therefore doing IO here isn't
+			 * that bad.
+			 */
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), &vmbuffer);
+
+			/*
+			 * FIXME: setting recptr here is a dirty dirty hack, to prevent
+			 * visibilitymap_set() from WAL logging.
+			 *
+			 * It's fine to use InvalidTransactionId here - this is only used
+			 * when HEAP_INSERT_FROZEN is specified, which intentionally
+			 * violates visibility rules.
+			 */
+			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							  recptr, vmbuffer,
+							  InvalidTransactionId,
+							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+		}
+
+		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
+
+		/*
+		 * NB: Only release vmbuffer after inserting all tuples - it's fairly
+		 * likely that we'll insert into subsequent heap pages that are likely
+		 * to use the same vm page.
+		 */
 	}
 
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
 	/*
 	 * We're done with the actual inserts.  Check for conflicts again, to
 	 * ensure that all rw-conflicts in to these inserts are detected.  Without
@@ -8176,6 +8234,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	BlockNumber blkno;
 	Buffer		buffer;
 	Page		page;
+	Page		vmpage;
 	union
 	{
 		HeapTupleHeaderData hdr;
@@ -8200,13 +8259,54 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
+	if (xlrec->flags & (XLH_INSERT_ALL_VISIBLE_CLEARED |
+						XLH_INSERT_ALL_VISIBLE_SET |
+						XLH_INSERT_ALL_FROZEN_SET))
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rnode);
 		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
+
+		if (xlrec->flags & (XLH_INSERT_ALL_VISIBLE_CLEARED))
+		{
+			Assert(!(xlrec->flags & (XLH_INSERT_ALL_FROZEN_SET |
+									 XLH_INSERT_ALL_VISIBLE_SET)));
+			visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
+		}
+		else
+		{
+			int	vmbits = 0;
+
+			if (xlrec->flags & (XLH_INSERT_ALL_VISIBLE_SET))
+				vmbits |= VISIBILITYMAP_ALL_VISIBLE;
+			if (xlrec->flags & (XLH_INSERT_ALL_FROZEN_SET))
+				vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+			vmpage = BufferGetPage(vmbuffer);
+
+			/*
+			 * Don't set the bit if replay has already passed this point.
+			 *
+			 * It might be safe to do this unconditionally; if replay has passed
+			 * this point, we'll replay at least as far this time as we did
+			 * before, and if this bit needs to be cleared, the record responsible
+			 * for doing so should be again replayed, and clear it.  For right
+			 * now, out of an abundance of conservatism, we use the same test here
+			 * we did for the heap page.  If this results in a dropped bit, no
+			 * real harm is done; and the next VACUUM will fix it.
+			 *
+			 * XXX: This seems entirely unnecessary?
+			 *
+			 * FIXME: Theoretically we should only do this after we've
+			 * modified the heap - but it's safe to do it here I think,
+			 * because this means that the page previously was empty.
+			 */
+			if (lsn > PageGetLSN(vmpage))
+				visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+								  InvalidTransactionId, vmbits);
+		}
+
 		ReleaseBuffer(vmbuffer);
 		FreeFakeRelcacheEntry(reln);
 	}
@@ -8284,6 +8384,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
+		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_SET)
+			PageSetAllVisible(page);
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 64dfe06b261..8a9f4f4c42e 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -253,7 +253,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
 #endif
 
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
+	//Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
 	Assert(InRecovery || BufferIsValid(heapBuf));
 	Assert(flags & VISIBILITYMAP_VALID_BITS);
 
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 22cd13c47fc..557cc7dadcc 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -67,6 +67,8 @@
 #define XLH_INSERT_LAST_IN_MULTI				(1<<1)
 #define XLH_INSERT_IS_SPECULATIVE				(1<<2)
 #define XLH_INSERT_CONTAINS_NEW_TUPLE			(1<<3)
+#define XLH_INSERT_ALL_VISIBLE_SET				(1<<4)
+#define XLH_INSERT_ALL_FROZEN_SET				(1<<5)
 
 /*
  * xl_heap_update flag values, 8 bits are available.
-- 
2.21.0.dirty

#35

andres@anarazel.de

over 6 years ago

In reply to: Andres Freund (#34)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

On 2019-04-07 18:04:27 -0700, Andres Freund wrote:

Here's a *prototype* patch for this. It only implements what I
described for heap_multi_insert, not for plain inserts. I wanted to see
what others think before investing additional time.

Pavan, are you planning to work on this for v13 CF1? Or have you lost
interest on the topic?

- Andres

#36

pavan.deolasee@gmail.com

over 6 years ago

In reply to: Andres Freund (#35)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Tue, 28 May 2019 at 4:36 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-04-07 18:04:27 -0700, Andres Freund wrote:

Here's a *prototype* patch for this. It only implements what I
described for heap_multi_insert, not for plain inserts. I wanted to see
what others think before investing additional time.

Pavan, are you planning to work on this for v13 CF1? Or have you lost
interest on the topic?

Yes, I plan to work on it.

Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#37

pavan.deolasee@gmail.com

over 6 years ago

In reply to: Pavan Deolasee (#36)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi Andres,

On Wed, May 29, 2019 at 1:50 PM Pavan Deolasee <pavan.deolasee@gmail.com>
wrote:

On Tue, 28 May 2019 at 4:36 PM, Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2019-04-07 18:04:27 -0700, Andres Freund wrote:

Here's a *prototype* patch for this. It only implements what I
described for heap_multi_insert, not for plain inserts. I wanted to see
what others think before investing additional time.

Pavan, are you planning to work on this for v13 CF1? Or have you lost
interest on the topic?

Yes, I plan to work on it.

I am sorry, but I am not able to find time to get back to this because of
other high priority items. If it still remains unaddressed in the next few
weeks, I will pick it up again. But for now, I am happy if someone wants to
pick and finish the work.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#38

Amit Kapila

amit.kapila16@gmail.com

over 6 years ago

In reply to: Pavan Deolasee (#37)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Thu, Jun 27, 2019 at 11:02 AM Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

On 2019-04-07 18:04:27 -0700, Andres Freund wrote:

Here's a *prototype* patch for this. It only implements what I
described for heap_multi_insert, not for plain inserts. I wanted to see
what others think before investing additional time.

Pavan, are you planning to work on this for v13 CF1? Or have you lost
interest on the topic?

Yes, I plan to work on it.

I am sorry, but I am not able to find time to get back to this because of other high priority items. If it still remains unaddressed in the next few weeks, I will pick it up again. But for now, I am happy if someone wants to pick and finish the work.

Fair enough, I have marked the entry [1]https://commitfest.postgresql.org/23/2009/ in the coming CF as "Returned
with Feedback". I hope that is okay with you.

[1]: https://commitfest.postgresql.org/23/2009/

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#39

ibrar.ahmad@gmail.com

almost 6 years ago

In reply to: Amit Kapila (#38)

1 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Sat, Jun 29, 2019 at 12:56 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, Jun 27, 2019 at 11:02 AM Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

On 2019-04-07 18:04:27 -0700, Andres Freund wrote:

Here's a *prototype* patch for this. It only implements what I
described for heap_multi_insert, not for plain inserts. I wanted to

see

what others think before investing additional time.

Pavan, are you planning to work on this for v13 CF1? Or have you lost
interest on the topic?

Yes, I plan to work on it.

I am sorry, but I am not able to find time to get back to this because

of other high priority items. If it still remains unaddressed in the next
few weeks, I will pick it up again. But for now, I am happy if someone
wants to pick and finish the work.

Fair enough, I have marked the entry [1] in the coming CF as "Returned
with Feedback". I hope that is okay with you.

[1] - https://commitfest.postgresql.org/23/2009/

Hi,

As Pavan mentioned in the last email that he is no longer interested in
that, so I want to
take the lead and want to finish that. It is a bug and needs to be fixed.
I have rebased and the patch with the latest master and added some test
cases (borrowed from Pavan's patch), and did some performance testing with
a table size of
700MB (10Millions rows)

COPY WITH FREEZE took 21406.692ms and VACUUM took 2478.666ms
COPY WITH FREEZE took 23095.985ms and VACUUM took 26.309ms

The performance decrease in copy with the patch is only 7%, but we get
quite adequate performance in VACUUM. In any case, this is a bug fix, so we
can ignore
the performance hit.

There are two issues left to address.

1 - Andres: It only implements what I described for heap_multi_insert, not
for plain inserts.
I wanted to see what others think before investing additional time.

In which condition we need that for plain inserts?

2 - Andres: I think we should remove the WAL logging from
visibilitymap_set(), and
move it to a separate, heap specific, function. Right now different
tableams e.g. would have to reimplement visibilitymap_set(), so that's a
second need to separate that functionality. Let me try to come up with
a proposal.

--
Ibrar Ahmed

Attachments:

0003-copy-freeze-should-actually-freeze-right.patchapplication/octet-stream; name=0003-copy-freeze-should-actually-freeze-right.patchDownload

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index f0dcb897c4..6ac3e525eb 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -131,6 +131,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -140,3 +203,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index c2a7f1d9e4..01a65fdab4 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -72,6 +72,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -81,3 +157,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index db6fad76bc..c307db345e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2114,6 +2114,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock scratch;
 	Page		page;
+	Buffer		vmbuffer = InvalidBuffer;
 	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
@@ -2168,9 +2169,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		bool		starting_with_empty_page;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_set = false;
 		int			nthispage;
+		XLogRecPtr	recptr;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2183,6 +2186,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 										   &vmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+			all_frozen_set = true;
+
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
 
@@ -2208,7 +2216,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				log_heap_new_cid(relation, heaptup);
 		}
 
-		if (PageIsAllVisible(page))
+		/*
+		 * If the page is all visible, need to clear that, unless we're only
+		 * going to add further frozen rows to it.
+		 *
+		 * If we're only adding already frozen rows, and the page was
+		 * previously empty, mark it as all-visible.
+		 */
+		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
 			all_visible_cleared = true;
 			PageClearAllVisible(page);
@@ -2216,6 +2231,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+		else if (all_frozen_set)
+			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2226,7 +2243,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/* XLOG stuff */
 		if (needwal)
 		{
-			XLogRecPtr	recptr;
 			xl_heap_multi_insert *xlrec;
 			uint8		info = XLOG_HEAP2_MULTI_INSERT;
 			char	   *tupledata;
@@ -2239,8 +2255,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			 * If the page was previously empty, we can reinit the page
 			 * instead of restoring the whole thing.
 			 */
-			init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber &&
-					PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
+			init = starting_with_empty_page;
 
 			/* allocate xl_heap_multi_insert struct from the scratch area */
 			xlrec = (xl_heap_multi_insert *) scratchptr;
@@ -2258,7 +2273,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			xlrec->flags = 0;
+			Assert((all_visible_cleared == 0 && all_frozen_set == 0) ||
+				   all_visible_cleared != all_frozen_set);
+			if (all_visible_cleared)
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+			if (all_frozen_set)
+			{
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_SET;
+				xlrec->flags |= XLH_INSERT_ALL_FROZEN_SET;
+			}
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2332,13 +2357,46 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		END_CRIT_SECTION();
 
-		UnlockReleaseBuffer(buffer);
-		if (vmbuffer != InvalidBuffer)
-			ReleaseBuffer(vmbuffer);
+		if (all_frozen_set)
+		{
+			Assert(PageIsAllVisible(page));
+
+			/*
+			 * Having to potentially read the page while holding an exclusive
+			 * lock on the page isn't great. But we only get here if
+			 * HEAP_INSERT_FROZEN is set, and we only do so if the table isn't
+			 * readable outside of this sessoin. Therefore doing IO here isn't
+			 * that bad.
+			 */
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), &vmbuffer);
+
+			/*
+			 * FIXME: setting recptr here is a dirty dirty hack, to prevent
+			 * visibilitymap_set() from WAL logging.
+			 *
+			 * It's fine to use InvalidTransactionId here - this is only used
+			 * when HEAP_INSERT_FROZEN is specified, which intentionally
+			 * violates visibility rules.
+			 */
+			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							  recptr, vmbuffer,
+							  InvalidTransactionId,
+							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+		}
 
+		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
+
+		/*
+		 * NB: Only release vmbuffer after inserting all tuples - it's fairly
+		 * likely that we'll insert into subsequent heap pages that are likely
+		 * to use the same vm page.
+		 */
 	}
 
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
 	/*
 	 * We're done with the actual inserts.  Check for conflicts again, to
 	 * ensure that all rw-conflicts in to these inserts are detected.  Without
@@ -8226,6 +8284,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	BlockNumber blkno;
 	Buffer		buffer;
 	Page		page;
+	Page		vmpage;
 	union
 	{
 		HeapTupleHeaderData hdr;
@@ -8250,13 +8309,54 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
+	if (xlrec->flags & (XLH_INSERT_ALL_VISIBLE_CLEARED |
+						XLH_INSERT_ALL_VISIBLE_SET |
+						XLH_INSERT_ALL_FROZEN_SET))
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rnode);
 		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
+
+		if (xlrec->flags & (XLH_INSERT_ALL_VISIBLE_CLEARED))
+		{
+			Assert(!(xlrec->flags & (XLH_INSERT_ALL_FROZEN_SET |
+									 XLH_INSERT_ALL_VISIBLE_SET)));
+			visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
+		}
+		else
+		{
+			int	vmbits = 0;
+
+			if (xlrec->flags & (XLH_INSERT_ALL_VISIBLE_SET))
+				vmbits |= VISIBILITYMAP_ALL_VISIBLE;
+			if (xlrec->flags & (XLH_INSERT_ALL_FROZEN_SET))
+				vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+			vmpage = BufferGetPage(vmbuffer);
+
+			/*
+			 * Don't set the bit if replay has already passed this point.
+			 *
+			 * It might be safe to do this unconditionally; if replay has passed
+			 * this point, we'll replay at least as far this time as we did
+			 * before, and if this bit needs to be cleared, the record responsible
+			 * for doing so should be again replayed, and clear it.  For right
+			 * now, out of an abundance of conservatism, we use the same test here
+			 * we did for the heap page.  If this results in a dropped bit, no
+			 * real harm is done; and the next VACUUM will fix it.
+			 *
+			 * XXX: This seems entirely unnecessary?
+			 *
+			 * FIXME: Theoretically we should only do this after we've
+			 * modified the heap - but it's safe to do it here I think,
+			 * because this means that the page previously was empty.
+			 */
+			if (lsn > PageGetLSN(vmpage))
+				visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+								  InvalidTransactionId, vmbits);
+		}
+
 		ReleaseBuffer(vmbuffer);
 		FreeFakeRelcacheEntry(reln);
 	}
@@ -8334,6 +8434,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
+		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_SET)
+			PageSetAllVisible(page);
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 0a51678c40..876471ceb7 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -254,7 +254,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
 #endif
 
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
+	//Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
 	Assert(InRecovery || BufferIsValid(heapBuf));
 	Assert(flags & VISIBILITYMAP_VALID_BITS);
 
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 95d18cdb12..7426ad5e4f 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -67,6 +67,8 @@
 #define XLH_INSERT_LAST_IN_MULTI				(1<<1)
 #define XLH_INSERT_IS_SPECULATIVE				(1<<2)
 #define XLH_INSERT_CONTAINS_NEW_TUPLE			(1<<3)
+#define XLH_INSERT_ALL_VISIBLE_SET				(1<<4)
+#define XLH_INSERT_ALL_FROZEN_SET				(1<<5)
 
 /*
  * xl_heap_update flag values, 8 bits are available.

#40

ibrar.ahmad@gmail.com

almost 6 years ago

In reply to: Alvaro Herrera (#24)

1 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Fri, Mar 13, 2020 at 6:58 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Thanks for picking up this patch. There's a minor typo:

+ * readable outside of this sessoin. Therefore
doing IO here isn't

=> session

--
Justin

Thanks, please see the updated and rebased patch. (master
17a28b03645e27d73bf69a95d7569b61e58f06eb)

--
Ibrar Ahmed

Attachments:

0004-copy-freeze-should-actually-freeze-right.patchapplication/x-patch; name=0004-copy-freeze-should-actually-freeze-right.patchDownload

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index f0dcb897c4..6ac3e525eb 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -131,6 +131,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -140,3 +203,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index c2a7f1d9e4..01a65fdab4 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -72,6 +72,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -81,3 +157,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 29694b8aa4..614958e5ee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2114,6 +2114,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock scratch;
 	Page		page;
+	Buffer		vmbuffer = InvalidBuffer;
 	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
@@ -2168,9 +2169,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		bool		starting_with_empty_page;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_set = false;
 		int			nthispage;
+		XLogRecPtr	recptr;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2183,6 +2186,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 										   &vmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+			all_frozen_set = true;
+
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
 
@@ -2216,7 +2224,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				log_heap_new_cid(relation, heaptup);
 		}
 
-		if (PageIsAllVisible(page))
+		/*
+		 * If the page is all visible, need to clear that, unless we're only
+		 * going to add further frozen rows to it.
+		 *
+		 * If we're only adding already frozen rows, and the page was
+		 * previously empty, mark it as all-visible.
+		 */
+		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
 			all_visible_cleared = true;
 			PageClearAllVisible(page);
@@ -2224,6 +2239,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+		else if (all_frozen_set)
+			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2234,7 +2251,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/* XLOG stuff */
 		if (needwal)
 		{
-			XLogRecPtr	recptr;
 			xl_heap_multi_insert *xlrec;
 			uint8		info = XLOG_HEAP2_MULTI_INSERT;
 			char	   *tupledata;
@@ -2247,8 +2263,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			 * If the page was previously empty, we can reinit the page
 			 * instead of restoring the whole thing.
 			 */
-			init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber &&
-					PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
+			init = starting_with_empty_page;
 
 			/* allocate xl_heap_multi_insert struct from the scratch area */
 			xlrec = (xl_heap_multi_insert *) scratchptr;
@@ -2266,7 +2281,17 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			xlrec->flags = 0;
+			Assert((all_visible_cleared == 0 && all_frozen_set == 0) ||
+				   all_visible_cleared != all_frozen_set);
+			if (all_visible_cleared)
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+			if (all_frozen_set)
+			{
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_SET;
+				xlrec->flags |= XLH_INSERT_ALL_FROZEN_SET;
+			}
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2340,13 +2365,46 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		END_CRIT_SECTION();
 
-		UnlockReleaseBuffer(buffer);
-		if (vmbuffer != InvalidBuffer)
-			ReleaseBuffer(vmbuffer);
+		if (all_frozen_set)
+		{
+			Assert(PageIsAllVisible(page));
+
+			/*
+			 * Having to potentially read the page while holding an exclusive
+			 * lock on the page isn't great. But we only get here if
+			 * HEAP_INSERT_FROZEN is set, and we only do so if the table isn't
+			 * readable outside of this session. Therefore doing IO here isn't
+			 * that bad.
+			 */
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), &vmbuffer);
+
+			/*
+			 * FIXME: setting recptr here is a dirty dirty hack, to prevent
+			 * visibilitymap_set() from WAL logging.
+			 *
+			 * It's fine to use InvalidTransactionId here - this is only used
+			 * when HEAP_INSERT_FROZEN is specified, which intentionally
+			 * violates visibility rules.
+			 */
+			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							  recptr, vmbuffer,
+							  InvalidTransactionId,
+							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+		}
 
+		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
+
+		/*
+		 * NB: Only release vmbuffer after inserting all tuples - it's fairly
+		 * likely that we'll insert into subsequent heap pages that are likely
+		 * to use the same vm page.
+		 */
 	}
 
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
 	/*
 	 * We're done with the actual inserts.  Check for conflicts again, to
 	 * ensure that all rw-conflicts in to these inserts are detected.  Without
@@ -8224,6 +8282,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	BlockNumber blkno;
 	Buffer		buffer;
 	Page		page;
+	Page		vmpage;
 	union
 	{
 		HeapTupleHeaderData hdr;
@@ -8248,13 +8307,54 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
+	if (xlrec->flags & (XLH_INSERT_ALL_VISIBLE_CLEARED |
+						XLH_INSERT_ALL_VISIBLE_SET |
+						XLH_INSERT_ALL_FROZEN_SET))
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rnode);
 		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
+
+		if (xlrec->flags & (XLH_INSERT_ALL_VISIBLE_CLEARED))
+		{
+			Assert(!(xlrec->flags & (XLH_INSERT_ALL_FROZEN_SET |
+									 XLH_INSERT_ALL_VISIBLE_SET)));
+			visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
+		}
+		else
+		{
+			int	vmbits = 0;
+
+			if (xlrec->flags & (XLH_INSERT_ALL_VISIBLE_SET))
+				vmbits |= VISIBILITYMAP_ALL_VISIBLE;
+			if (xlrec->flags & (XLH_INSERT_ALL_FROZEN_SET))
+				vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+			vmpage = BufferGetPage(vmbuffer);
+
+			/*
+			 * Don't set the bit if replay has already passed this point.
+			 *
+			 * It might be safe to do this unconditionally; if replay has passed
+			 * this point, we'll replay at least as far this time as we did
+			 * before, and if this bit needs to be cleared, the record responsible
+			 * for doing so should be again replayed, and clear it.  For right
+			 * now, out of an abundance of conservatism, we use the same test here
+			 * we did for the heap page.  If this results in a dropped bit, no
+			 * real harm is done; and the next VACUUM will fix it.
+			 *
+			 * XXX: This seems entirely unnecessary?
+			 *
+			 * FIXME: Theoretically we should only do this after we've
+			 * modified the heap - but it's safe to do it here I think,
+			 * because this means that the page previously was empty.
+			 */
+			if (lsn > PageGetLSN(vmpage))
+				visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+								  InvalidTransactionId, vmbits);
+		}
+
 		ReleaseBuffer(vmbuffer);
 		FreeFakeRelcacheEntry(reln);
 	}
@@ -8332,6 +8432,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
+		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_SET)
+			PageSetAllVisible(page);
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 0a51678c40..812d839612 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -254,7 +254,6 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
 #endif
 
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
 	Assert(InRecovery || BufferIsValid(heapBuf));
 	Assert(flags & VISIBILITYMAP_VALID_BITS);
 
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 95d18cdb12..7426ad5e4f 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -67,6 +67,8 @@
 #define XLH_INSERT_LAST_IN_MULTI				(1<<1)
 #define XLH_INSERT_IS_SPECULATIVE				(1<<2)
 #define XLH_INSERT_CONTAINS_NEW_TUPLE			(1<<3)
+#define XLH_INSERT_ALL_VISIBLE_SET				(1<<4)
+#define XLH_INSERT_ALL_FROZEN_SET				(1<<5)
 
 /*
  * xl_heap_update flag values, 8 bits are available.

Import Notes

Reply to msg id not found: 20200313015834.GJ29065@telsasoft.com

#41

ibrar.ahmad@gmail.com

almost 6 years ago

In reply to: Ibrar Ahmed (#40)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Tue, Mar 24, 2020 at 10:06 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

On Fri, Mar 13, 2020 at 6:58 AM Justin Pryzby <pryzby@telsasoft.com>
wrote:

Thanks for picking up this patch. There's a minor typo:

+ * readable outside of this sessoin. Therefore
doing IO here isn't

=> session

--
Justin

Thanks, please see the updated and rebased patch. (master
17a28b03645e27d73bf69a95d7569b61e58f06eb)

--
Ibrar Ahmed

Andres while fixing the one FIXME in the patch

" visibilitymap_pin(relation, BufferGetBlockNumber(buffer),
&vmbuffer);

* FIXME: setting recptr here is a dirty dirty hack, to prevent

* visibilitymap_set() from WAL logging.
*
"
I am not able to see any scenario where recptr is not set before reaching
to that statement. Can you clarify why you think recptr will not be set at
that statement?

--
Ibrar Ahmed

#42

Daniel Gustafsson

daniel@yesql.se

over 5 years ago

In reply to: Ibrar Ahmed (#41)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

This patch incurs a compiler warning, which probably is quite simple to fix:

heapam.c: In function ‘heap_multi_insert’:
heapam.c:2349:4: error: ‘recptr’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
^
heapam.c:2136:14: note: ‘recptr’ was declared here
XLogRecPtr recptr;
^

Please fix and submit a new version, I'm marking the entry Waiting on Author in
the meantime.

cheers ./daniel

#43

a.lubennikova@postgrespro.ru

over 5 years ago

In reply to: Daniel Gustafsson (#42)

1 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 01.07.2020 12:38, Daniel Gustafsson wrote:

This patch incurs a compiler warning, which probably is quite simple to fix:

heapam.c: In function ‘heap_multi_insert’:
heapam.c:2349:4: error: ‘recptr’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
^
heapam.c:2136:14: note: ‘recptr’ was declared here
XLogRecPtr recptr;
^

Please fix and submit a new version, I'm marking the entry Waiting on Author in
the meantime.

cheers ./daniel

This patch looks very useful to me, so I want to pick it up.

The patch that fixes the compiler warning is in the attachment. Though,
I'm not
entirely satisfied with this fix. Also, the patch contains some FIXME
comments.
I'll test it more and send fixes this week.

Questions from the first review pass:

1) Do we need XLH_INSERT_ALL_VISIBLE_SET ? IIUC, in the patch it is always
implied by XLH_INSERT_ALL_FROZEN_SET.

2) What does this comment mean? Does XXX refers to the lsn comparison?
Since it
is definitely necessary to update the VM.

+             * XXX: This seems entirely unnecessary?
+             *
+             * FIXME: Theoretically we should only do this after we've
+             * modified the heap - but it's safe to do it here I think,
+             * because this means that the page previously was empty.
+             */
+            if (lsn > PageGetLSN(vmpage))
+                visibilitymap_set(reln, blkno, InvalidBuffer, lsn, 
vmbuffer,
+                                  InvalidTransactionId, vmbits);

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0005-copy-freeze-should-actually-freeze-right.patchtext/x-patch; charset=UTF-8; name=0005-copy-freeze-should-actually-freeze-right.patchDownload

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index ca4b6e186b..0017e3415c 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -188,3 +251,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index f79b54480b..ec1afd4906 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -103,3 +179,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d881f4cd46..1ee9dc78dd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2074,6 +2074,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock scratch;
 	Page		page;
+	Buffer		vmbuffer = InvalidBuffer;
 	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
@@ -2128,9 +2129,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		bool		starting_with_empty_page;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_set = false;
 		int			nthispage;
+		XLogRecPtr	recptr;
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -2143,6 +2146,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 										   &vmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+			all_frozen_set = true;
+
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
 
@@ -2176,7 +2184,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				log_heap_new_cid(relation, heaptup);
 		}
 
-		if (PageIsAllVisible(page))
+		/*
+		 * If the page is all visible, need to clear that, unless we're only
+		 * going to add further frozen rows to it.
+		 */
+		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
 			all_visible_cleared = true;
 			PageClearAllVisible(page);
@@ -2184,6 +2196,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+		/*
+		 * If we're only adding already frozen rows, and the page was
+		 * previously empty, mark it as all-visible.
+		 */
+		else if (all_frozen_set)
+			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2194,7 +2212,6 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/* XLOG stuff */
 		if (needwal)
 		{
-			XLogRecPtr	recptr;
 			xl_heap_multi_insert *xlrec;
 			uint8		info = XLOG_HEAP2_MULTI_INSERT;
 			char	   *tupledata;
@@ -2207,8 +2224,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			 * If the page was previously empty, we can reinit the page
 			 * instead of restoring the whole thing.
 			 */
-			init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber &&
-					PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
+			init = starting_with_empty_page;
 
 			/* allocate xl_heap_multi_insert struct from the scratch area */
 			xlrec = (xl_heap_multi_insert *) scratchptr;
@@ -2226,7 +2242,16 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			xlrec->flags = 0;
+			Assert (!(all_visible_cleared && all_frozen_set));
+			if (all_visible_cleared)
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+			if (all_frozen_set)
+			{
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_SET;
+				xlrec->flags |= XLH_INSERT_ALL_FROZEN_SET;
+			}
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2300,13 +2325,46 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		END_CRIT_SECTION();
 
-		UnlockReleaseBuffer(buffer);
-		if (vmbuffer != InvalidBuffer)
-			ReleaseBuffer(vmbuffer);
+		if (all_frozen_set)
+		{
+			Assert(PageIsAllVisible(page));
+
+			/*
+			 * Having to potentially read the page while holding an exclusive
+			 * lock on the page isn't great. But we only get here if
+			 * HEAP_INSERT_FROZEN is set, and we only do so if the table isn't
+			 * readable outside of this session. Therefore doing IO here isn't
+			 * that bad.
+			 */
+			visibilitymap_pin(relation, BufferGetBlockNumber(buffer), &vmbuffer);
+
+			/*
+			 * FIXME: setting recptr here is a dirty dirty hack, to prevent
+			 * visibilitymap_set() from WAL logging.
+			 *
+			 * It's fine to use InvalidTransactionId here - this is only used
+			 * when HEAP_INSERT_FROZEN is specified, which intentionally
+			 * violates visibility rules.
+			 */
+			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							  needwal ? recptr:InvalidXLogRecPtr, vmbuffer,
+							  InvalidTransactionId,
+							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+		}
 
+		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
+
+		/*
+		 * NB: Only release vmbuffer after inserting all tuples - it's fairly
+		 * likely that we'll insert into subsequent heap pages that are likely
+		 * to use the same vm page.
+		 */
 	}
 
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
 	/*
 	 * We're done with the actual inserts.  Check for conflicts again, to
 	 * ensure that all rw-conflicts in to these inserts are detected.  Without
@@ -8193,6 +8251,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	BlockNumber blkno;
 	Buffer		buffer;
 	Page		page;
+	Page		vmpage;
 	union
 	{
 		HeapTupleHeaderData hdr;
@@ -8217,13 +8276,54 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
+	if (xlrec->flags & (XLH_INSERT_ALL_VISIBLE_CLEARED |
+						XLH_INSERT_ALL_VISIBLE_SET |
+						XLH_INSERT_ALL_FROZEN_SET))
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rnode);
 		Buffer		vmbuffer = InvalidBuffer;
 
 		visibilitymap_pin(reln, blkno, &vmbuffer);
-		visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
+
+		if (xlrec->flags & (XLH_INSERT_ALL_VISIBLE_CLEARED))
+		{
+			Assert(!(xlrec->flags & (XLH_INSERT_ALL_FROZEN_SET |
+									 XLH_INSERT_ALL_VISIBLE_SET)));
+			visibilitymap_clear(reln, blkno, vmbuffer, VISIBILITYMAP_VALID_BITS);
+		}
+		else
+		{
+			int	vmbits = 0;
+
+			if (xlrec->flags & (XLH_INSERT_ALL_VISIBLE_SET))
+				vmbits |= VISIBILITYMAP_ALL_VISIBLE;
+			if (xlrec->flags & (XLH_INSERT_ALL_FROZEN_SET))
+				vmbits |= VISIBILITYMAP_ALL_FROZEN;
+
+			vmpage = BufferGetPage(vmbuffer);
+
+			/*
+			 * Don't set the bit if replay has already passed this point.
+			 *
+			 * It might be safe to do this unconditionally; if replay has passed
+			 * this point, we'll replay at least as far this time as we did
+			 * before, and if this bit needs to be cleared, the record responsible
+			 * for doing so should be again replayed, and clear it.  For right
+			 * now, out of an abundance of conservatism, we use the same test here
+			 * we did for the heap page.  If this results in a dropped bit, no
+			 * real harm is done; and the next VACUUM will fix it.
+			 *
+			 * XXX: This seems entirely unnecessary?
+			 *
+			 * FIXME: Theoretically we should only do this after we've
+			 * modified the heap - but it's safe to do it here I think,
+			 * because this means that the page previously was empty.
+			 */
+			if (lsn > PageGetLSN(vmpage))
+				visibilitymap_set(reln, blkno, InvalidBuffer, lsn, vmbuffer,
+								  InvalidTransactionId, vmbits);
+		}
+
 		ReleaseBuffer(vmbuffer);
 		FreeFakeRelcacheEntry(reln);
 	}
@@ -8301,6 +8401,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
+		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_SET)
+			PageSetAllVisible(page);
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 0a51678c40..812d839612 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -254,7 +254,6 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf,
 	elog(DEBUG1, "vm_set %s %d", RelationGetRelationName(rel), heapBlk);
 #endif
 
-	Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
 	Assert(InRecovery || BufferIsValid(heapBuf));
 	Assert(flags & VISIBILITYMAP_VALID_BITS);
 
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 95d18cdb12..7426ad5e4f 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -67,6 +67,8 @@
 #define XLH_INSERT_LAST_IN_MULTI				(1<<1)
 #define XLH_INSERT_IS_SPECULATIVE				(1<<2)
 #define XLH_INSERT_CONTAINS_NEW_TUPLE			(1<<3)
+#define XLH_INSERT_ALL_VISIBLE_SET				(1<<4)
+#define XLH_INSERT_ALL_FROZEN_SET				(1<<5)
 
 /*
  * xl_heap_update flag values, 8 bits are available.

#44

Robert Haas

robertmhaas@gmail.com

over 5 years ago

In reply to: Anastasia Lubennikova (#43)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Tue, Jul 14, 2020 at 1:51 PM Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:

Questions from the first review pass:

1) Do we need XLH_INSERT_ALL_VISIBLE_SET ? IIUC, in the patch it is always
implied by XLH_INSERT_ALL_FROZEN_SET.

I agree that it looks unnecessary to have two separate bits.

2) What does this comment mean? Does XXX refers to the lsn comparison?
Since it
is definitely necessary to update the VM.

+             * XXX: This seems entirely unnecessary?
+             *
+             * FIXME: Theoretically we should only do this after we've
+             * modified the heap - but it's safe to do it here I think,
+             * because this means that the page previously was empty.
+             */
+            if (lsn > PageGetLSN(vmpage))
+                visibilitymap_set(reln, blkno, InvalidBuffer, lsn,
vmbuffer,
+                                  InvalidTransactionId, vmbits);

I wondered about that too. The comment which precedes it was, I
believe, originally written by me, and copied here from
heap_xlog_visible(). But it's not clear very good practice to just
copy the comment like this. If the same logic applies, the code should
say that we're doing the same thing here as in heap_xlog_visible() for
consistency, or some such thing; after all, that's the primary place
where that happens. But it looks like the XXX might have been added by
a second person who thought that we didn't need this logic at all, and
the FIXME by a third person who thought it was in the wrong place, so
the whole thing is really confusing at this point.

I'm pretty worried about this, too:

+             * FIXME: setting recptr here is a dirty dirty hack, to prevent
+             * visibilitymap_set() from WAL logging.

That is indeed a dirty hack, and something needs to be done about it.

I wonder if it was really all that smart to try to make the
HEAP2_MULTI_INSERT do this instead of just issuing separate WAL
records to mark it all-visible afterwards, but I don't see any reason
why this can't be made to work. It needs substantially more polishing
than it's had, though, I think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#45

a.lubennikova@postgrespro.ru

over 5 years ago

In reply to: Robert Haas (#44)

1 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 31.07.2020 23:28, Robert Haas wrote:

On Tue, Jul 14, 2020 at 1:51 PM Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:

Questions from the first review pass:

1) Do we need XLH_INSERT_ALL_VISIBLE_SET ? IIUC, in the patch it is always
implied by XLH_INSERT_ALL_FROZEN_SET.

I agree that it looks unnecessary to have two separate bits.
2) What does this comment mean? Does XXX refers to the lsn comparison?
Since it
is definitely necessary to update the VM.
+             * XXX: This seems entirely unnecessary?
+             *
+             * FIXME: Theoretically we should only do this after we've
+             * modified the heap - but it's safe to do it here I think,
+             * because this means that the page previously was empty.
+             */
+            if (lsn > PageGetLSN(vmpage))
+                visibilitymap_set(reln, blkno, InvalidBuffer, lsn,
vmbuffer,
+                                  InvalidTransactionId, vmbits);
I wondered about that too. The comment which precedes it was, I
believe, originally written by me, and copied here from
heap_xlog_visible(). But it's not clear very good practice to just
copy the comment like this. If the same logic applies, the code should
say that we're doing the same thing here as in heap_xlog_visible() for
consistency, or some such thing; after all, that's the primary place
where that happens. But it looks like the XXX might have been added by
a second person who thought that we didn't need this logic at all, and
the FIXME by a third person who thought it was in the wrong place, so
the whole thing is really confusing at this point.

I'm pretty worried about this, too:
+             * FIXME: setting recptr here is a dirty dirty hack, to prevent
+             * visibilitymap_set() from WAL logging.
That is indeed a dirty hack, and something needs to be done about it.

I wonder if it was really all that smart to try to make the
HEAP2_MULTI_INSERT do this instead of just issuing separate WAL
records to mark it all-visible afterwards, but I don't see any reason
why this can't be made to work. It needs substantially more polishing
than it's had, though, I think.

New version of the patch is in the attachment.

New design is more conservative and simpler:
- pin the visibility map page in advance;
- set PageAllVisible;
- call visibilitymap_set() with its own XlogRecord, just like in other
places.

It allows to remove most of the "hacks" and keep code clean.
The patch passes tests added in previous versions.

I haven't tested performance yet, though. Maybe after tests, I'll bring
some optimizations back.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

copy-freeze-vm_freeze_v1.patchtext/x-patch; charset=UTF-8; name=copy-freeze-vm_freeze_v1.patchDownload

commit 7a5dfaa525ba89b86663de417638cdcb30ed147b
Author: anastasia <a.lubennikova@postgrespro.ru>
Date:   Sun Aug 2 15:28:51 2020 +0300

    copy-freeze-vm_freeze_v1.patch
    Set VM all_visible and all_frozen bits, when COPY FREEZE inserts tuples into empty page

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index ca4b6e186b..0017e3415c 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -188,3 +251,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index f79b54480b..ec1afd4906 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -103,3 +179,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2c9bb0c7ee..d6f9266bd7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2084,6 +2084,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock scratch;
 	Page		page;
+	Buffer		vmbuffer = InvalidBuffer;
 	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
@@ -2138,8 +2139,9 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		bool		starting_with_empty_page;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_set = false;
 		int			nthispage;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2147,12 +2149,20 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * Find buffer where at least the next tuple will fit.  If the page is
 		 * all-visible, this will also pin the requisite visibility map page.
+		 *
+		 * Also pin visibility map page if COPY FREEZE inserts tuples into an
+		 * empty page. See all_frozen_set below.
 		 */
 		buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
 										   InvalidBuffer, options, bistate,
 										   &vmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+			all_frozen_set = true;
+
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
 
@@ -2186,7 +2196,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				log_heap_new_cid(relation, heaptup);
 		}
 
-		if (PageIsAllVisible(page))
+		/*
+		 * If the page is all visible, need to clear that, unless we're only
+		 * going to add further frozen rows to it.
+		 */
+		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
 			all_visible_cleared = true;
 			PageClearAllVisible(page);
@@ -2194,6 +2208,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+		/*
+		 * If we're only adding already frozen rows, and the page was
+		 * previously empty, mark it as all-visible.
+		 */
+		else if (all_frozen_set)
+			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2217,8 +2237,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			 * If the page was previously empty, we can reinit the page
 			 * instead of restoring the whole thing.
 			 */
-			init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber &&
-					PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
+			init = starting_with_empty_page;
 
 			/* allocate xl_heap_multi_insert struct from the scratch area */
 			xlrec = (xl_heap_multi_insert *) scratchptr;
@@ -2236,7 +2255,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			xlrec->flags = 0;
+			Assert (!(all_visible_cleared && all_frozen_set));
+			if (all_visible_cleared)
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+			if (all_frozen_set)
+				xlrec->flags |= XLH_INSERT_ALL_FROZEN_SET;
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2310,13 +2335,39 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		END_CRIT_SECTION();
 
-		UnlockReleaseBuffer(buffer);
-		if (vmbuffer != InvalidBuffer)
-			ReleaseBuffer(vmbuffer);
+		if (all_frozen_set)
+		{
+			Assert(PageIsAllVisible(page));
 
+			/*
+			 * vmbuffer should be already pinned by RelationGetBufferForTuple,
+			 * Though, it's fine if is not. all_frozen is just an optimization,
+			 */
+			if (visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
+				/*
+				 * It's fine to use InvalidTransactionId here - this is only used
+				 * when HEAP_INSERT_FROZEN is specified, which intentionally
+				 * violates visibility rules.
+				 */
+				visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							  InvalidXLogRecPtr, vmbuffer,
+							  InvalidTransactionId,
+							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+		}
+
+		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
+
+		/*
+		 * NB: Only release vmbuffer after inserting all tuples - it's fairly
+		 * likely that we'll insert into subsequent heap pages that are likely
+		 * to use the same vm page.
+		 */
 	}
 
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
 	/*
 	 * We're done with the actual inserts.  Check for conflicts again, to
 	 * ensure that all rw-conflicts in to these inserts are detected.  Without
@@ -8203,6 +8254,7 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	BlockNumber blkno;
 	Buffer		buffer;
 	Page		page;
+	Page		vmpage;
 	union
 	{
 		HeapTupleHeaderData hdr;
@@ -8227,7 +8279,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
+	if ((xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) &&
+		!(xlrec->flags & XLH_INSERT_ALL_FROZEN_SET))
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rnode);
 		Buffer		vmbuffer = InvalidBuffer;
@@ -8311,6 +8364,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			PageSetAllVisible(page);
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..ec435869cf 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -142,6 +142,9 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 		/* Figure out which pins we need but don't have. */
 		need_to_pin_buffer1 = PageIsAllVisible(BufferGetPage(buffer1))
 			&& !visibilitymap_pin_ok(block1, *vmbuffer1);
+
+		// TODO do we need special code for COPY FREEZE here?
+
 		need_to_pin_buffer2 = buffer2 != InvalidBuffer
 			&& PageIsAllVisible(BufferGetPage(buffer2))
 			&& !visibilitymap_pin_ok(block2, *vmbuffer2);
@@ -422,6 +425,14 @@ loop:
 			buffer = ReadBufferBI(relation, targetBlock, RBM_NORMAL, bistate);
 			if (PageIsAllVisible(BufferGetPage(buffer)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
+			/*
+			 * This is for COPY FREEZE needs. If page is empty,
+			 * pin vmbuffer to set all_frozen bit
+			 */
+			if ((options & HEAP_INSERT_FROZEN) &&
+				(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
 		else if (otherBlock == targetBlock)
@@ -608,6 +619,16 @@ loop:
 	PageInit(page, BufferGetPageSize(buffer), 0);
 	MarkBufferDirty(buffer);
 
+	/*
+	 * This is for COPY FREEZE needs. If page is empty,
+	 * pin vmbuffer to set all_frozen bit
+	 */
+	if ((options & HEAP_INSERT_FROZEN))
+	{
+		Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
+		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+	}
+
 	/*
 	 * Release the file-extension lock; it's now OK for someone else to extend
 	 * the relation some more.
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 95d18cdb12..f01360f6ad 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -67,6 +67,8 @@
 #define XLH_INSERT_LAST_IN_MULTI				(1<<1)
 #define XLH_INSERT_IS_SPECULATIVE				(1<<2)
 #define XLH_INSERT_CONTAINS_NEW_TUPLE			(1<<3)
+/* all_frozen_set always implies all_visible_set */
+#define XLH_INSERT_ALL_FROZEN_SET				(1<<4)
 
 /*
  * xl_heap_update flag values, 8 bits are available.

#46

ibrar.ahmad@gmail.com

over 5 years ago

In reply to: Anastasia Lubennikova (#45)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Mon, Aug 3, 2020 at 2:29 PM Anastasia Lubennikova <
a.lubennikova@postgrespro.ru> wrote:

On 31.07.2020 23:28, Robert Haas wrote:

On Tue, Jul 14, 2020 at 1:51 PM Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:

Questions from the first review pass:

1) Do we need XLH_INSERT_ALL_VISIBLE_SET ? IIUC, in the patch it is

always
implied by XLH_INSERT_ALL_FROZEN_SET.

I agree that it looks unnecessary to have two separate bits.
2) What does this comment mean? Does XXX refers to the lsn comparison?
Since it
is definitely necessary to update the VM.
+             * XXX: This seems entirely unnecessary?
+             *
+             * FIXME: Theoretically we should only do this after we've
+             * modified the heap - but it's safe to do it here I think,
+             * because this means that the page previously was empty.
+             */
+            if (lsn > PageGetLSN(vmpage))
+                visibilitymap_set(reln, blkno, InvalidBuffer, lsn,
vmbuffer,
+                                  InvalidTransactionId, vmbits);
I wondered about that too. The comment which precedes it was, I
believe, originally written by me, and copied here from
heap_xlog_visible(). But it's not clear very good practice to just
copy the comment like this. If the same logic applies, the code should
say that we're doing the same thing here as in heap_xlog_visible() for
consistency, or some such thing; after all, that's the primary place
where that happens. But it looks like the XXX might have been added by
a second person who thought that we didn't need this logic at all, and
the FIXME by a third person who thought it was in the wrong place, so
the whole thing is really confusing at this point.

I'm pretty worried about this, too:

+ * FIXME: setting recptr here is a dirty dirty hack, to
prevent

+ * visibilitymap_set() from WAL logging.

That is indeed a dirty hack, and something needs to be done about it.

I wonder if it was really all that smart to try to make the
HEAP2_MULTI_INSERT do this instead of just issuing separate WAL
records to mark it all-visible afterwards, but I don't see any reason
why this can't be made to work. It needs substantially more polishing
than it's had, though, I think.

New version of the patch is in the attachment.

New design is more conservative and simpler:
- pin the visibility map page in advance;
- set PageAllVisible;
- call visibilitymap_set() with its own XlogRecord, just like in other
places.

It allows to remove most of the "hacks" and keep code clean.
The patch passes tests added in previous versions.

I haven't tested performance yet, though. Maybe after tests, I'll bring
some optimizations back.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Here are some performance results with a patched and unpatched master

branch.
The table used for the test contains three columns (integer, text,
varchar).
The total number of rows is 10000000 in total.

Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 9069.432 ms vacuum; 2567.961ms
COPY: 9004.533 ms vacuum: 2553.075ms
COPY: 8832.422 ms vacuum: 2540.742ms

Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 10031.723 ms vacuum: 127.524 ms
COPY: 9985.109 ms vacuum: 39.953 ms
COPY: 9283.373 ms vacuum: 37.137 ms

Time to take the copy slightly increased but the vacuum time significantly
decrease.

--
Ibrar Ahmed

#47

Hamid Akhtar

hamid.akhtar@gmail.com

over 5 years ago

In reply to: Ibrar Ahmed (#46)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Unfortunately the latest patch doesn't apply cleanly on the master branch. Can you please share an updated one.

#48

[1]: /messages/by-id/20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de

ibrar.ahmad@gmail.com

over 5 years ago

In reply to: Hamid Akhtar (#47)

1 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Mon, Aug 17, 2020 at 2:19 PM Hamid Akhtar <hamid.akhtar@gmail.com> wrote:

Unfortunately the latest patch doesn't apply cleanly on the master branch.
Can you please share an updated one.

Please see the attached patch rebased with master (
a28d731a1187e8d9d8c2b6319375fcbf0a8debd5)
--
Ibrar Ahmed

Attachments:

copy-freeze-vm_freeze_v2.patchapplication/octet-stream; name=copy-freeze-vm_freeze_v2.patchDownload

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index ca4b6e186b..0017e3415c 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -188,3 +251,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index f79b54480b..ec1afd4906 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -103,3 +179,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8eb276e464..37c182a6dc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock scratch;
 	Page		page;
+	Buffer		vmbuffer = InvalidBuffer;
 	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
@@ -2159,8 +2160,9 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		bool		starting_with_empty_page;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_set = false;
 		int			nthispage;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2168,12 +2170,20 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * Find buffer where at least the next tuple will fit.  If the page is
 		 * all-visible, this will also pin the requisite visibility map page.
+		 *
+		 * Also pin visibility map page if COPY FREEZE inserts tuples into an
+		 * empty page. See all_frozen_set below.
 		 */
 		buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
 										   InvalidBuffer, options, bistate,
 										   &vmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+			all_frozen_set = true;
+
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
 
@@ -2207,7 +2217,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				log_heap_new_cid(relation, heaptup);
 		}
 
-		if (PageIsAllVisible(page))
+		/*
+		 * If the page is all visible, need to clear that, unless we're only
+		 * going to add further frozen rows to it.
+		 */
+		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
 			all_visible_cleared = true;
 			PageClearAllVisible(page);
@@ -2215,6 +2229,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+		/*
+		 * If we're only adding already frozen rows, and the page was
+		 * previously empty, mark it as all-visible.
+		 */
+		else if (all_frozen_set)
+			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2238,8 +2258,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			 * If the page was previously empty, we can reinit the page
 			 * instead of restoring the whole thing.
 			 */
-			init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber &&
-					PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
+			init = starting_with_empty_page;
 
 			/* allocate xl_heap_multi_insert struct from the scratch area */
 			xlrec = (xl_heap_multi_insert *) scratchptr;
@@ -2257,7 +2276,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			xlrec->flags = 0;
+			Assert (!(all_visible_cleared && all_frozen_set));
+			if (all_visible_cleared)
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+			if (all_frozen_set)
+				xlrec->flags |= XLH_INSERT_ALL_FROZEN_SET;
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2331,13 +2356,39 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		END_CRIT_SECTION();
 
-		UnlockReleaseBuffer(buffer);
-		if (vmbuffer != InvalidBuffer)
-			ReleaseBuffer(vmbuffer);
+		if (all_frozen_set)
+		{
+			Assert(PageIsAllVisible(page));
 
+			/*
+			 * vmbuffer should be already pinned by RelationGetBufferForTuple,
+			 * Though, it's fine if is not. all_frozen is just an optimization,
+			 */
+			if (visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
+				/*
+				 * It's fine to use InvalidTransactionId here - this is only used
+				 * when HEAP_INSERT_FROZEN is specified, which intentionally
+				 * violates visibility rules.
+				 */
+				visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							  InvalidXLogRecPtr, vmbuffer,
+							  InvalidTransactionId,
+							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+		}
+
+		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
+
+		/*
+		 * NB: Only release vmbuffer after inserting all tuples - it's fairly
+		 * likely that we'll insert into subsequent heap pages that are likely
+		 * to use the same vm page.
+		 */
 	}
 
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
 	/*
 	 * We're done with the actual inserts.  Check for conflicts again, to
 	 * ensure that all rw-conflicts in to these inserts are detected.  Without
@@ -8255,7 +8306,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
+	if ((xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) &&
+		!(xlrec->flags & XLH_INSERT_ALL_FROZEN_SET))
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rnode);
 		Buffer		vmbuffer = InvalidBuffer;
@@ -8339,6 +8391,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			PageSetAllVisible(page);
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..ec435869cf 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -142,6 +142,9 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 		/* Figure out which pins we need but don't have. */
 		need_to_pin_buffer1 = PageIsAllVisible(BufferGetPage(buffer1))
 			&& !visibilitymap_pin_ok(block1, *vmbuffer1);
+
+		// TODO do we need special code for COPY FREEZE here?
+
 		need_to_pin_buffer2 = buffer2 != InvalidBuffer
 			&& PageIsAllVisible(BufferGetPage(buffer2))
 			&& !visibilitymap_pin_ok(block2, *vmbuffer2);
@@ -422,6 +425,14 @@ loop:
 			buffer = ReadBufferBI(relation, targetBlock, RBM_NORMAL, bistate);
 			if (PageIsAllVisible(BufferGetPage(buffer)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
+			/*
+			 * This is for COPY FREEZE needs. If page is empty,
+			 * pin vmbuffer to set all_frozen bit
+			 */
+			if ((options & HEAP_INSERT_FROZEN) &&
+				(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
 		else if (otherBlock == targetBlock)
@@ -608,6 +619,16 @@ loop:
 	PageInit(page, BufferGetPageSize(buffer), 0);
 	MarkBufferDirty(buffer);
 
+	/*
+	 * This is for COPY FREEZE needs. If page is empty,
+	 * pin vmbuffer to set all_frozen bit
+	 */
+	if ((options & HEAP_INSERT_FROZEN))
+	{
+		Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
+		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+	}
+
 	/*
 	 * Release the file-extension lock; it's now OK for someone else to extend
 	 * the relation some more.
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index aa17f7df84..16c1fd5673 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -69,6 +69,9 @@
 #define XLH_INSERT_CONTAINS_NEW_TUPLE			(1<<3)
 #define XLH_INSERT_ON_TOAST_RELATION			(1<<4)
 
+/* all_frozen_set always implies all_visible_set */
+#define XLH_INSERT_ALL_FROZEN_SET				(1<<5)
+
 /*
  * xl_heap_update flag values, 8 bits are available.
  */

#49

Alvaro Herrera

alvherre@2ndquadrant.com

over 5 years ago

In reply to: Ibrar Ahmed (#46)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 2020-Aug-14, Ibrar Ahmed wrote:

The table used for the test contains three columns (integer, text,
varchar).
The total number of rows is 10000000 in total.

Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 9069.432 ms vacuum; 2567.961ms
COPY: 9004.533 ms vacuum: 2553.075ms
COPY: 8832.422 ms vacuum: 2540.742ms

Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 10031.723 ms vacuum: 127.524 ms
COPY: 9985.109 ms vacuum: 39.953 ms
COPY: 9283.373 ms vacuum: 37.137 ms

Time to take the copy slightly increased but the vacuum time significantly
decrease.

"Slightly"? It seems quite a large performance drop to me -- more than
10%. Where is that time being spent? Andres said in [1]/messages/by-id/20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de that he
thought the performance shouldn't be affected noticeably, but this
doesn't seem to hold true. As I understand, the idea was that there
would be little or no additional WAL records .. only flags in the
existing record. So what is happening?

Also, when Andres posted this patch first, he said this was only for
heap_multi_insert because it was a prototype. But I think we expect
that the table_insert path (CIM_SINGLE mode in copy) should also receive
that treatment.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#50

a.lubennikova@postgrespro.ru

over 5 years ago

In reply to: Alvaro Herrera (#49)

2 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 18.08.2020 02:54, Alvaro Herrera wrote:

On 2020-Aug-14, Ibrar Ahmed wrote:

The table used for the test contains three columns (integer, text,
varchar).
The total number of rows is 10000000 in total.

Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 9069.432 ms vacuum; 2567.961ms
COPY: 9004.533 ms vacuum: 2553.075ms
COPY: 8832.422 ms vacuum: 2540.742ms

Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 10031.723 ms vacuum: 127.524 ms
COPY: 9985.109 ms vacuum: 39.953 ms
COPY: 9283.373 ms vacuum: 37.137 ms

Time to take the copy slightly increased but the vacuum time significantly
decrease.

"Slightly"? It seems quite a large performance drop to me -- more than
10%. Where is that time being spent? Andres said in [1] that he
thought the performance shouldn't be affected noticeably, but this
doesn't seem to hold true. As I understand, the idea was that there
would be little or no additional WAL records .. only flags in the
existing record. So what is happening?

[1] /messages/by-id/20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de

I agree that 10% performance drop is not what we expect with this patch.
Ibrar, can you share more info about your tests? I'd like to reproduce
this slowdown and fix it, if necessary.

I've run some tests on my laptop and COPY FREEZE shows the same time for
both versions, while VACUUM is much faster on the patched version. I've
also checked WAL generation and it shows that the patch works correctly
as it doesn't add any records for COPY.

Not patched:

Time: 54883,356 ms (00:54,883)
Time: 65373,333 ms (01:05,373)
Time: 64684,592 ms (01:04,685)
VACUUM Time: 60861,670 ms (01:00,862)

COPY wal_bytes 3765 MB
VACUUM wal_bytes 6015 MB
table size 5971 MB

Patched:

Time: 53142,947 ms (00:53,143)
Time: 65420,812 ms (01:05,421)
Time: 66600,114 ms (01:06,600)
VACUUM Time: 63,401 ms

COPY wal_bytes 3765 MB
VACUUM wal_bytes 30 kB
table size 5971 MB

The test script is attached.

Also, when Andres posted this patch first, he said this was only for
heap_multi_insert because it was a prototype. But I think we expect
that the table_insert path (CIM_SINGLE mode in copy) should also receive
that treatment.

I am afraid that extra checks for COPY FREEZE in heap_insert() will
slow down normal insertions.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

copy-freeze-vm_freeze_v2.patchtext/x-patch; charset=UTF-8; name=copy-freeze-vm_freeze_v2.patchDownload

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index ca4b6e186b..0017e3415c 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -188,3 +251,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index f79b54480b..ec1afd4906 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -103,3 +179,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8eb276e464..37c182a6dc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock scratch;
 	Page		page;
+	Buffer		vmbuffer = InvalidBuffer;
 	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
@@ -2159,8 +2160,9 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		bool		starting_with_empty_page;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_set = false;
 		int			nthispage;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2168,12 +2170,20 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * Find buffer where at least the next tuple will fit.  If the page is
 		 * all-visible, this will also pin the requisite visibility map page.
+		 *
+		 * Also pin visibility map page if COPY FREEZE inserts tuples into an
+		 * empty page. See all_frozen_set below.
 		 */
 		buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
 										   InvalidBuffer, options, bistate,
 										   &vmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+			all_frozen_set = true;
+
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
 
@@ -2207,7 +2217,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				log_heap_new_cid(relation, heaptup);
 		}
 
-		if (PageIsAllVisible(page))
+		/*
+		 * If the page is all visible, need to clear that, unless we're only
+		 * going to add further frozen rows to it.
+		 */
+		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
 			all_visible_cleared = true;
 			PageClearAllVisible(page);
@@ -2215,6 +2229,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+		/*
+		 * If we're only adding already frozen rows, and the page was
+		 * previously empty, mark it as all-visible.
+		 */
+		else if (all_frozen_set)
+			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2238,8 +2258,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			 * If the page was previously empty, we can reinit the page
 			 * instead of restoring the whole thing.
 			 */
-			init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber &&
-					PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
+			init = starting_with_empty_page;
 
 			/* allocate xl_heap_multi_insert struct from the scratch area */
 			xlrec = (xl_heap_multi_insert *) scratchptr;
@@ -2257,7 +2276,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			xlrec->flags = 0;
+			Assert (!(all_visible_cleared && all_frozen_set));
+			if (all_visible_cleared)
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+			if (all_frozen_set)
+				xlrec->flags |= XLH_INSERT_ALL_FROZEN_SET;
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2331,13 +2356,39 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		END_CRIT_SECTION();
 
-		UnlockReleaseBuffer(buffer);
-		if (vmbuffer != InvalidBuffer)
-			ReleaseBuffer(vmbuffer);
+		if (all_frozen_set)
+		{
+			Assert(PageIsAllVisible(page));
 
+			/*
+			 * vmbuffer should be already pinned by RelationGetBufferForTuple,
+			 * Though, it's fine if is not. all_frozen is just an optimization,
+			 */
+			if (visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
+				/*
+				 * It's fine to use InvalidTransactionId here - this is only used
+				 * when HEAP_INSERT_FROZEN is specified, which intentionally
+				 * violates visibility rules.
+				 */
+				visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							  InvalidXLogRecPtr, vmbuffer,
+							  InvalidTransactionId,
+							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+		}
+
+		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
+
+		/*
+		 * NB: Only release vmbuffer after inserting all tuples - it's fairly
+		 * likely that we'll insert into subsequent heap pages that are likely
+		 * to use the same vm page.
+		 */
 	}
 
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
 	/*
 	 * We're done with the actual inserts.  Check for conflicts again, to
 	 * ensure that all rw-conflicts in to these inserts are detected.  Without
@@ -8255,7 +8306,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
+	if ((xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) &&
+		!(xlrec->flags & XLH_INSERT_ALL_FROZEN_SET))
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rnode);
 		Buffer		vmbuffer = InvalidBuffer;
@@ -8339,6 +8391,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			PageSetAllVisible(page);
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..ec435869cf 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -142,6 +142,9 @@ GetVisibilityMapPins(Relation relation, Buffer buffer1, Buffer buffer2,
 		/* Figure out which pins we need but don't have. */
 		need_to_pin_buffer1 = PageIsAllVisible(BufferGetPage(buffer1))
 			&& !visibilitymap_pin_ok(block1, *vmbuffer1);
+
+		// TODO do we need special code for COPY FREEZE here?
+
 		need_to_pin_buffer2 = buffer2 != InvalidBuffer
 			&& PageIsAllVisible(BufferGetPage(buffer2))
 			&& !visibilitymap_pin_ok(block2, *vmbuffer2);
@@ -422,6 +425,14 @@ loop:
 			buffer = ReadBufferBI(relation, targetBlock, RBM_NORMAL, bistate);
 			if (PageIsAllVisible(BufferGetPage(buffer)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
+			/*
+			 * This is for COPY FREEZE needs. If page is empty,
+			 * pin vmbuffer to set all_frozen bit
+			 */
+			if ((options & HEAP_INSERT_FROZEN) &&
+				(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
 		else if (otherBlock == targetBlock)
@@ -608,6 +619,16 @@ loop:
 	PageInit(page, BufferGetPageSize(buffer), 0);
 	MarkBufferDirty(buffer);
 
+	/*
+	 * This is for COPY FREEZE needs. If page is empty,
+	 * pin vmbuffer to set all_frozen bit
+	 */
+	if ((options & HEAP_INSERT_FROZEN))
+	{
+		Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
+		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+	}
+
 	/*
 	 * Release the file-extension lock; it's now OK for someone else to extend
 	 * the relation some more.
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index aa17f7df84..16c1fd5673 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -69,6 +69,9 @@
 #define XLH_INSERT_CONTAINS_NEW_TUPLE			(1<<3)
 #define XLH_INSERT_ON_TOAST_RELATION			(1<<4)
 
+/* all_frozen_set always implies all_visible_set */
+#define XLH_INSERT_ALL_FROZEN_SET				(1<<5)
+
 /*
  * xl_heap_update flag values, 8 bits are available.
  */

copy_freeze_test.sqlapplication/sql; name=copy_freeze_test.sqlDownload

#51

ibrar.ahmad@gmail.com

over 5 years ago

In reply to: Anastasia Lubennikova (#50)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Wed, Aug 19, 2020 at 6:15 PM Anastasia Lubennikova <
a.lubennikova@postgrespro.ru> wrote:

On 18.08.2020 02:54, Alvaro Herrera wrote:

On 2020-Aug-14, Ibrar Ahmed wrote:

The table used for the test contains three columns (integer, text,
varchar).
The total number of rows is 10000000 in total.

Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 9069.432 ms vacuum; 2567.961ms
COPY: 9004.533 ms vacuum: 2553.075ms
COPY: 8832.422 ms vacuum: 2540.742ms

Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 10031.723 ms vacuum: 127.524 ms
COPY: 9985.109 ms vacuum: 39.953 ms
COPY: 9283.373 ms vacuum: 37.137 ms

Time to take the copy slightly increased but the vacuum time

significantly

decrease.

"Slightly"? It seems quite a large performance drop to me -- more than
10%. Where is that time being spent? Andres said in [1] that he
thought the performance shouldn't be affected noticeably, but this
doesn't seem to hold true. As I understand, the idea was that there
would be little or no additional WAL records .. only flags in the
existing record. So what is happening?

[1]

/messages/by-id/20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de

I agree that 10% performance drop is not what we expect with this patch.
Ibrar, can you share more info about your tests? I'd like to reproduce
this slowdown and fix it, if necessary.

I've run some tests on my laptop and COPY FREEZE shows the same time for
both versions, while VACUUM is much faster on the patched version. I've
also checked WAL generation and it shows that the patch works correctly
as it doesn't add any records for COPY.

Not patched:

Time: 54883,356 ms (00:54,883)
Time: 65373,333 ms (01:05,373)
Time: 64684,592 ms (01:04,685)
VACUUM Time: 60861,670 ms (01:00,862)

COPY wal_bytes 3765 MB
VACUUM wal_bytes 6015 MB
table size 5971 MB

Patched:

Time: 53142,947 ms (00:53,143)
Time: 65420,812 ms (01:05,421)
Time: 66600,114 ms (01:06,600)
VACUUM Time: 63,401 ms

COPY wal_bytes 3765 MB
VACUUM wal_bytes 30 kB
table size 5971 MB

The test script is attached.

Also, when Andres posted this patch first, he said this was only for
heap_multi_insert because it was a prototype. But I think we expect
that the table_insert path (CIM_SINGLE mode in copy) should also receive
that treatment.

I am afraid that extra checks for COPY FREEZE in heap_insert() will
slow down normal insertions.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Here is my test;

postgres=# BEGIN;

BEGIN

postgres=*# TRUNCATE foo;

TRUNCATE TABLE

postgres=*# COPY foo(id, name, address) FROM '/home/ibrar/bar.csv'
DELIMITER ',' FREEZE;

COPY 10000000

postgres=*# COMMIT;

COMMIT

postgres=# VACUUM;

VACUUM

postgres=# SELECT count(*) FROM foo;

count

----------

10000000

(1 row)

--
Ibrar Ahmed

#52

[1]: https://drive.google.com/file/d/11r19NX6yyPjvxdDub8Ce-kmApRurp4Nx/view

a.lubennikova@postgrespro.ru

over 5 years ago

In reply to: Ibrar Ahmed (#51)

1 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 21.08.2020 19:43, Ibrar Ahmed wrote:

On Wed, Aug 19, 2020 at 6:15 PM Anastasia Lubennikova
<a.lubennikova@postgrespro.ru <mailto:a.lubennikova@postgrespro.ru>>
wrote:

On 18.08.2020 02:54, Alvaro Herrera wrote:

On 2020-Aug-14, Ibrar Ahmed wrote:

The table used for the test contains three columns (integer, text,
varchar).
The total number of rows is 10000000 in total.

Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 9069.432 ms vacuum; 2567.961ms
COPY: 9004.533 ms vacuum: 2553.075ms
COPY: 8832.422 ms vacuum: 2540.742ms

Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 10031.723 ms vacuum: 127.524 ms
COPY: 9985.109 ms vacuum: 39.953 ms
COPY: 9283.373 ms vacuum: 37.137 ms

Time to take the copy slightly increased but the vacuum time

significantly

decrease.

"Slightly"? It seems quite a large performance drop to me --

more than

10%. Where is that time being spent? Andres said in [1] that he
thought the performance shouldn't be affected noticeably, but this
doesn't seem to hold true. As I understand, the idea was that there
would be little or no additional WAL records .. only flags in the
existing record. So what is happening?

[1]

/messages/by-id/20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de

I agree that 10% performance drop is not what we expect with this
patch.
Ibrar, can you share more info about your tests? I'd like to
reproduce
this slowdown and fix it, if necessary.

Here is my test;

postgres=# BEGIN;

BEGIN

postgres=*# TRUNCATE foo;

TRUNCATE TABLE

postgres=*# COPY foo(id, name, address) FROM '/home/ibrar/bar.csv'
DELIMITER ',' FREEZE;

COPY 10000000

--
Ibrar Ahmed

I've repeated the test and didn't notice any slowdown for COPY FREEZE.
Test data is here [1]https://drive.google.com/file/d/11r19NX6yyPjvxdDub8Ce-kmApRurp4Nx/view.

The numbers do fluctuate a bit, but there is no dramatic difference
between master and patched version. So I assume that the performance
drop in your test has something to do with the measurement error.
Unless, you have some non-default configuration that could affect it.

patched:

COPY: 12327,090 ms vacuum: 37,555 ms
COPY: 12939,540 ms vacuum: 35,703 ms
COPY: 12245,819 ms vacuum: 36,273 ms

master:
COPY
COPY: 13253,605 ms vacuum: 3592,849 ms
COPY: 12619,428 ms vacuum: 4253,836 ms
COPY: 12512,940 ms vacuum: 4009,847 ms

I also slightly cleaned up comments, so the new version of the patch is
attached. As this is just a performance optimization documentation is
not needed. It would be great, if other reviewers could run some
independent performance tests, as I believe that this patch is ready for
committer.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companyt

Attachments:

copy-freeze-vm_freeze_v3.patchtext/x-patch; charset=UTF-8; name=copy-freeze-vm_freeze_v3.patchDownload

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index ca4b6e186b..0017e3415c 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -188,3 +251,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index f79b54480b..ec1afd4906 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -103,3 +179,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9b5f417eac..338854d4ae 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2105,6 +2105,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock scratch;
 	Page		page;
+	Buffer		vmbuffer = InvalidBuffer;
 	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
@@ -2159,8 +2160,9 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		bool		starting_with_empty_page;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_set = false;
 		int			nthispage;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2168,12 +2170,20 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * Find buffer where at least the next tuple will fit.  If the page is
 		 * all-visible, this will also pin the requisite visibility map page.
+		 *
+		 * Also pin visibility map page if COPY FREEZE inserts tuples into an
+		 * empty page. See all_frozen_set below.
 		 */
 		buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
 										   InvalidBuffer, options, bistate,
 										   &vmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+			all_frozen_set = true;
+
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
 
@@ -2207,7 +2217,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				log_heap_new_cid(relation, heaptup);
 		}
 
-		if (PageIsAllVisible(page))
+		/*
+		 * If the page is all visible, need to clear that, unless we're only
+		 * going to add further frozen rows to it.
+		 */
+		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
 			all_visible_cleared = true;
 			PageClearAllVisible(page);
@@ -2215,6 +2229,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+		/*
+		 * If we're only adding already frozen rows, and the page was
+		 * previously empty, mark it as all-visible.
+		 */
+		else if (all_frozen_set)
+			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2238,8 +2258,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			 * If the page was previously empty, we can reinit the page
 			 * instead of restoring the whole thing.
 			 */
-			init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber &&
-					PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
+			init = starting_with_empty_page;
 
 			/* allocate xl_heap_multi_insert struct from the scratch area */
 			xlrec = (xl_heap_multi_insert *) scratchptr;
@@ -2257,7 +2276,13 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			xlrec->flags = 0;
+			Assert (!(all_visible_cleared && all_frozen_set));
+			if (all_visible_cleared)
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+			if (all_frozen_set)
+				xlrec->flags |= XLH_INSERT_ALL_FROZEN_SET;
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2331,13 +2356,39 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		END_CRIT_SECTION();
 
-		UnlockReleaseBuffer(buffer);
-		if (vmbuffer != InvalidBuffer)
-			ReleaseBuffer(vmbuffer);
+		if (all_frozen_set)
+		{
+			Assert(PageIsAllVisible(page));
 
+			/*
+			 * vmbuffer should be already pinned by RelationGetBufferForTuple,
+			 * Though, it's fine if is not. all_frozen is just an optimization.
+			 */
+			if (visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
+				/*
+				 * It's fine to use InvalidTransactionId here - this is only used
+				 * when HEAP_INSERT_FROZEN is specified, which intentionally
+				 * violates visibility rules.
+				 */
+				visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							  InvalidXLogRecPtr, vmbuffer,
+							  InvalidTransactionId,
+							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+		}
+
+		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
+
+		/*
+		 * NB: Only release vmbuffer after inserting all tuples - it's fairly
+		 * likely that we'll insert into subsequent heap pages that are likely
+		 * to use the same vm page.
+		 */
 	}
 
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
 	/*
 	 * We're done with the actual inserts.  Check for conflicts again, to
 	 * ensure that all rw-conflicts in to these inserts are detected.  Without
@@ -8246,7 +8297,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
 	 */
-	if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
+	if ((xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) &&
+		!(xlrec->flags & XLH_INSERT_ALL_FROZEN_SET))
 	{
 		Relation	reln = CreateFakeRelcacheEntry(rnode);
 		Buffer		vmbuffer = InvalidBuffer;
@@ -8330,6 +8382,8 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			PageSetAllVisible(page);
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..b264ff53d7 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -422,6 +422,14 @@ loop:
 			buffer = ReadBufferBI(relation, targetBlock, RBM_NORMAL, bistate);
 			if (PageIsAllVisible(BufferGetPage(buffer)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
+			/*
+			 * This is for COPY FREEZE needs. If page is empty,
+			 * pin vmbuffer to set all_frozen bit
+			 */
+			if ((options & HEAP_INSERT_FROZEN) &&
+				(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
+				visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
 		else if (otherBlock == targetBlock)
@@ -608,6 +616,16 @@ loop:
 	PageInit(page, BufferGetPageSize(buffer), 0);
 	MarkBufferDirty(buffer);
 
+	/*
+	 * This is for COPY FREEZE needs. If page is empty,
+	 * pin vmbuffer to set all_frozen bit
+	 */
+	if ((options & HEAP_INSERT_FROZEN))
+	{
+		Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
+		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+	}
+
 	/*
 	 * Release the file-extension lock; it's now OK for someone else to extend
 	 * the relation some more.
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index aa17f7df84..16c1fd5673 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -69,6 +69,9 @@
 #define XLH_INSERT_CONTAINS_NEW_TUPLE			(1<<3)
 #define XLH_INSERT_ON_TOAST_RELATION			(1<<4)
 
+/* all_frozen_set always implies all_visible_set */
+#define XLH_INSERT_ALL_FROZEN_SET				(1<<5)
+
 /*
  * xl_heap_update flag values, 8 bits are available.
  */

#53

ibrar.ahmad@gmail.com

over 5 years ago

In reply to: Anastasia Lubennikova (#52)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Thu, Aug 27, 2020 at 2:14 AM Anastasia Lubennikova <
a.lubennikova@postgrespro.ru> wrote:

On 21.08.2020 19:43, Ibrar Ahmed wrote:

On Wed, Aug 19, 2020 at 6:15 PM Anastasia Lubennikova <
a.lubennikova@postgrespro.ru> wrote:

On 18.08.2020 02:54, Alvaro Herrera wrote:

On 2020-Aug-14, Ibrar Ahmed wrote:

The table used for the test contains three columns (integer, text,
varchar).
The total number of rows is 10000000 in total.

Unpatched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 9069.432 ms vacuum; 2567.961ms
COPY: 9004.533 ms vacuum: 2553.075ms
COPY: 8832.422 ms vacuum: 2540.742ms

Patched (Master: 92c12e46d5f1e25fc85608a6d6a19b8f5ea02600)
COPY: 10031.723 ms vacuum: 127.524 ms
COPY: 9985.109 ms vacuum: 39.953 ms
COPY: 9283.373 ms vacuum: 37.137 ms

Time to take the copy slightly increased but the vacuum time

significantly

decrease.

"Slightly"? It seems quite a large performance drop to me -- more than
10%. Where is that time being spent? Andres said in [1] that he
thought the performance shouldn't be affected noticeably, but this
doesn't seem to hold true. As I understand, the idea was that there
would be little or no additional WAL records .. only flags in the
existing record. So what is happening?

[1]

/messages/by-id/20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de

I agree that 10% performance drop is not what we expect with this patch.
Ibrar, can you share more info about your tests? I'd like to reproduce
this slowdown and fix it, if necessary.

Here is my test;

postgres=# BEGIN;

BEGIN

postgres=*# TRUNCATE foo;

TRUNCATE TABLE

postgres=*# COPY foo(id, name, address) FROM '/home/ibrar/bar.csv'
DELIMITER ',' FREEZE;

COPY 10000000

--
Ibrar Ahmed

I've repeated the test and didn't notice any slowdown for COPY FREEZE.
Test data is here [1].

The numbers do fluctuate a bit, but there is no dramatic difference
between master and patched version. So I assume that the performance drop
in your test has something to do with the measurement error. Unless, you
have some non-default configuration that could affect it.

patched:

COPY: 12327,090 ms vacuum: 37,555 ms
COPY: 12939,540 ms vacuum: 35,703 ms
COPY: 12245,819 ms vacuum: 36,273 ms

master:
COPY
COPY: 13253,605 ms vacuum: 3592,849 ms
COPY: 12619,428 ms vacuum: 4253,836 ms
COPY: 12512,940 ms vacuum: 4009,847 ms

I also slightly cleaned up comments, so the new version of the patch is
attached. As this is just a performance optimization documentation is not
needed. It would be great, if other reviewers could run some independent
performance tests, as I believe that this patch is ready for committer.

[1] https://drive.google.com/file/d/11r19NX6yyPjvxdDub8Ce-kmApRurp4Nx/view

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companyt

I gave another try with latest v3 patch on latest master branch

(ff60394a8c9a7af8b32de420ccb54a20a0f019c1) with all default settings.
11824.495 is median with master and 11884.089 is median value with patch.

Note: There are two changes such as (1) used the v3 patch (2) now test is
done on latest master (ff60394a8c9a7af8b32de420ccb54a20a0f019c1).

Master (ff60394a8c9a7af8b32de420ccb54a20a0f019c1)

postgres=# \timing

postgres=# BEGIN;

postgres=*# TRUNCATE foo;

postgres=*# COPY foo(id, name, address) FROM '/Users/ibrar/bar.csv'
DELIMITER ',' FREEZE;

Time: 11824.495 ms (00:11.824)

postgres=*# COMMIT;

Restart

postgres=# \timing

postgres=# BEGIN;

postgres=*# TRUNCATE foo;

postgres=*# COPY foo(id, name, address) FROM '/Users/ibrar/bar.csv'
DELIMITER ',' FREEZE;

Time: 14096.987 ms (00:14.097)

postgres=*# commit;

Restart

postgres=# \timing

postgres=# BEGIN;

postgres=*# TRUNCATE foo;

postgres=*# COPY foo(id, name, address) FROM '/Users/ibrar/bar.csv'
DELIMITER ',' FREEZE;

Time: 11108.289 ms (00:11.108)

postgres=*# commit;

Patched (ff60394a8c9a7af8b32de420ccb54a20a0f019c1)

postgres=# \timing

postgres=# BEGIN;

postgres=*# TRUNCATE foo;

postgres=*# COPY foo(id, name, address) FROM '/Users/ibrar/bar.csv'
DELIMITER ',' FREEZE;

Time: 10749.945 ms (00:10.750)

postgres=*# commit;

Restart

postgres=# \timing

postgres=# BEGIN;

postgres=*# TRUNCATE foo;

postgres=*# COPY foo(id, name, address) FROM '/Users/ibrar/bar.csv'
DELIMITER ',' FREEZE;

Time: 14274.361 ms (00:14.274)

postgres=*# commit;

Restart

postgres=# \timing

postgres=# BEGIN;

postgres=*# TRUNCATE foo;

postgres=*# COPY foo(id, name, address) FROM '/Users/ibrar/bar.csv'
DELIMITER ',' FREEZE;

Time: 11884.089 ms (00:11.884)

postgres=*# commit;

--
Ibrar Ahmed

#54

lubennikovaav@gmail.com

about 5 years ago

In reply to: Ibrar Ahmed (#53)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Status update for a commitfest entry.

This patch is ReadyForCommitter. It applies and passes the CI. There are no unanswered questions in the discussion.

The discussion started in 2015 with a patch by Jeff Janes. Later it was revived by Pavan Deolasee. After it was picked up by Ibrar Ahmed and finally, it was rewritten by me, so I moved myself from reviewers to authors as well.

The latest version was reviewed and tested by Ibrar Ahmed. The patch doesn't affect COPY FREEZE performance and significantly decreases the time of the following VACUUM.

#55

Tatsuo Ishii

ishii@sraoss.co.jp

about 5 years ago

In reply to: Anastasia Lubennikova (#54)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Status update for a commitfest entry.

This patch is ReadyForCommitter. It applies and passes the CI. There are no unanswered questions in the discussion.

The discussion started in 2015 with a patch by Jeff Janes. Later it was revived by Pavan Deolasee. After it was picked up by Ibrar Ahmed and finally, it was rewritten by me, so I moved myself from reviewers to authors as well.

The latest version was reviewed and tested by Ibrar Ahmed. The patch doesn't affect COPY FREEZE performance and significantly decreases the time of the following VACUUM.

I have tested the patch on my laptop (mem 16GB, SSD 512GB) using the
data introduced in up thread and saw that VACCUM after COPY FREEZE is
nearly 60 times faster than current master branch. Quite impressive.

By the way, I noticed following comment:
+			/*
+			 * vmbuffer should be already pinned by RelationGetBufferForTuple,
+			 * Though, it's fine if is not. all_frozen is just an optimization.
+			 */

could be enhanced like below. What do you think?
+			/*
+			 * vmbuffer should be already pinned by RelationGetBufferForTuple.
+			 * Though, it's fine if it is not. all_frozen is just an optimization.
+			 */

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#56

tomas.vondra@2ndquadrant.com

about 5 years ago

In reply to: Tatsuo Ishii (#55)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

I might be somewhat late to the party, but I've done a bit of
benchmarking too ;-) I used TPC-H data from a 100GB test, and tried
different combinations of COPY [FREEZE] and VACUUM [FREEZE], both on
current master and with the patch.

The results look like this (the columns say what combination of COPY and
VACUUM was used - e.g. -/FREEZE means plain COPY and VACUUM FREEZE)

master:

- / - FREEZE / - - / FREEZE FREEZE / FREEZE
----------------------------------------------------------------
COPY 2471 2477 2486 2484
VACUUM 228 209 453 206

patched:

- / - FREEZE / - - / FREEZE FREEZE / FREEZE
----------------------------------------------------------------
COPY 2459 2445 2458 2446
VACUUM 227 0 467 0

So I don't really observe any measurable slowdowns in the COPY part (in
fact there seems to be a tiny speedup, but it might be just noise). In
the VACUUM part, there's clear speedup when the data was already frozen
by COPY (Yes, those are zeroes, because it took less than 1 second.)

So that looks pretty awesome, I guess.

For the record, these tests were run on a server with NVMe SSD, so
hopefully reliable and representative numbers.

A couple minor comments about the code:

1) Maybe add a comment before the block setting xlrec->flags in
heap_multi_insert. It's not very complex, but it used to be a bit
simpler, and all the other pieces around have comments, so it won't
hurt.

2) I find the "if (all_frozen_set)" block a bit strange. It's a matter
of personal preference, but I'd just use a single level of nesting, i.e.
something like this:

/* if everything frozen, the whole page has to be visible */
Assert(!(all_frozen_set && !PageIsAllVisible(page)));

/*
* If we've frozen everything on the page, and if we're already
* holding pin on the vmbuffer, record that in the visibilitymap.
* If we're not holding the pin, it's OK to skip this - it's just
* an optimization.
*
* It's fine to use InvalidTransactionId here - this is only used
* when HEAP_INSERT_FROZEN is specified, which intentionally
* violates visibility rules.
*/
if (all_frozen_set &&
visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
visibilitymap_set(...);

IMHO it's easier to read, but YMMV. I've also reworded the comment a bit
to say what we're doing etc. And I've moved the comment from inside the
if block into the main comment - that was making it harder to read too.

3) I see RelationGetBufferForTuple does this:

/*
* This is for COPY FREEZE needs. If page is empty,
* pin vmbuffer to set all_frozen bit
*/
if ((options & HEAP_INSERT_FROZEN) &&
(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);

so is it actually possible to get into the (all_frozen_set) without
holding a pin on the visibilitymap? I haven't investigated this so
maybe there are other ways to get into that code block. But the new
additions to hio.c get the pin too.

4) In heap_xlog_multi_insert we now have this:

if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
PageClearAllVisible(page);
if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
PageSetAllVisible(page);

IIUC it makes no sense to have both flags at the same time, right? So
what about adding

Assert(!(XLH_INSERT_ALL_FROZEN_SET && XLH_INSERT_ALL_VISIBLE_CLEARED));

to check that?

5) Not sure we need to explicitly say this is for COPY FREE in all the
blocks added to hio.c. IMO it's sufficient to use HEAP_INSERT_FROZEN in
the condition, at this level of abstraction.

I wonder what to do about the heap_insert - I know there are concerns it
would negatively impact regular insert, but is it really true? I suppose
this optimization would be valuable even for cases where multi-insert is
not possible.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#57

a.lubennikova@postgrespro.ru

about 5 years ago

In reply to: Tomas Vondra (#56)

1 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 30.10.2020 03:42, Tomas Vondra wrote:

Hi,

I might be somewhat late to the party, but I've done a bit of
benchmarking too ;-) I used TPC-H data from a 100GB test, and tried
different combinations of COPY [FREEZE] and VACUUM [FREEZE], both on
current master and with the patch.

So I don't really observe any measurable slowdowns in the COPY part (in
fact there seems to be a tiny speedup, but it might be just noise). In
the VACUUM part, there's clear speedup when the data was already frozen
by COPY (Yes, those are zeroes, because it took less than 1 second.)

So that looks pretty awesome, I guess.

For the record, these tests were run on a server with NVMe SSD, so
hopefully reliable and representative numbers.

Thank you for the review.

A couple minor comments about the code:

2) I find the "if (all_frozen_set)" block a bit strange. It's a matter
of personal preference, but I'd just use a single level of nesting, i.e.
something like this:

    /* if everything frozen, the whole page has to be visible */
    Assert(!(all_frozen_set && !PageIsAllVisible(page)));

    /*
     * If we've frozen everything on the page, and if we're already
     * holding pin on the vmbuffer, record that in the visibilitymap.
     * If we're not holding the pin, it's OK to skip this - it's just
     * an optimization.
     *
     * It's fine to use InvalidTransactionId here - this is only used
     * when HEAP_INSERT_FROZEN is specified, which intentionally
     * violates visibility rules.
     */
    if (all_frozen_set &&
        visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
    visibilitymap_set(...);

IMHO it's easier to read, but YMMV. I've also reworded the comment a bit
to say what we're doing etc. And I've moved the comment from inside the
if block into the main comment - that was making it harder to read too.

I agree that it's a matter of taste. I've updated comments and left
nesting unchanged to keep assertions simple.

3) I see RelationGetBufferForTuple does this:

    /*
     * This is for COPY FREEZE needs. If page is empty,
     * pin vmbuffer to set all_frozen bit
     */
    if ((options & HEAP_INSERT_FROZEN) &&
        (PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
        visibilitymap_pin(relation, BufferGetBlockNumber(buffer),
vmbuffer);

so is it actually possible to get into the (all_frozen_set) without
holding a pin on the visibilitymap? I haven't investigated this so
maybe there are other ways to get into that code block. But the new
additions to hio.c get the pin too.

I was thinking that GetVisibilityMapPins() can somehow unset the pin. I
gave it a second look. And now I don't think it's possible to get into
this code block without a pin. So, I converted this check into an
assertion.

4) In heap_xlog_multi_insert we now have this:

    if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
        PageClearAllVisible(page);
    if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
        PageSetAllVisible(page);

IIUC it makes no sense to have both flags at the same time, right? So
what about adding

    Assert(!(XLH_INSERT_ALL_FROZEN_SET &&
XLH_INSERT_ALL_VISIBLE_CLEARED));

to check that?

Agree.

I placed this assertion to the very beginning of the function. It also
helped to simplify the code a bit.
I also noticed, that we were not updating visibility map for all_frozen
from heap_xlog_multi_insert. Fixed.

I wonder what to do about the heap_insert - I know there are concerns it
would negatively impact regular insert, but is it really true? I suppose
this optimization would be valuable even for cases where multi-insert is
not possible.

Do we have something like INSERT .. FREEZE? I only see
TABLE_INSERT_FROZEN set for COPY FREEZE and for matview operations. Can
you explain, what use-case are we trying to optimize by extending this
patch to heap_insert()?

The new version is attached.
I've also fixed a typo in the comment by Tatsuo Ishii suggestion.
Also, I tested this patch with replication and found no issues.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

copy-freeze-vm_freeze_v4.patchtext/x-patch; charset=UTF-8; name=copy-freeze-vm_freeze_v4.patchDownload

commit 395716e277ac4be22b9b61311c301a69db0f9101
Author: anastasia <a.lubennikova@postgrespro.ru>
Date:   Mon Nov 2 16:27:48 2020 +0300

    Teach COPY FREEZE to set PD_ALL_VISIBLE and visibility map bits.

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index ca4b6e186b..0017e3415c 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -188,3 +251,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index f79b54480b..ec1afd4906 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -103,3 +179,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1585861a02..e1664c33e1 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2103,6 +2103,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock scratch;
 	Page		page;
+	Buffer		vmbuffer = InvalidBuffer;
 	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
@@ -2157,8 +2158,9 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		bool		starting_with_empty_page;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_set = false;
 		int			nthispage;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2166,12 +2168,20 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * Find buffer where at least the next tuple will fit.  If the page is
 		 * all-visible, this will also pin the requisite visibility map page.
+		 *
+		 * Also pin visibility map page if COPY FREEZE inserts tuples into an
+		 * empty page. See all_frozen_set below.
 		 */
 		buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
 										   InvalidBuffer, options, bistate,
 										   &vmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+			all_frozen_set = true;
+
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
 
@@ -2205,7 +2215,11 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				log_heap_new_cid(relation, heaptup);
 		}
 
-		if (PageIsAllVisible(page))
+		/*
+		 * If the page is all visible, need to clear that, unless we're only
+		 * going to add further frozen rows to it.
+		 */
+		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
 			all_visible_cleared = true;
 			PageClearAllVisible(page);
@@ -2213,6 +2227,12 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+		/*
+		 * If we're only adding already frozen rows, and the page was
+		 * previously empty, mark it as all-visible.
+		 */
+		else if (all_frozen_set)
+			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2236,8 +2256,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			 * If the page was previously empty, we can reinit the page
 			 * instead of restoring the whole thing.
 			 */
-			init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber &&
-					PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
+			init = starting_with_empty_page;
 
 			/* allocate xl_heap_multi_insert struct from the scratch area */
 			xlrec = (xl_heap_multi_insert *) scratchptr;
@@ -2255,7 +2274,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			xlrec->flags = 0;
+			/* check mutually exclusive flags */
+			Assert (!(all_visible_cleared && all_frozen_set));
+			if (all_visible_cleared)
+				xlrec->flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+			if (all_frozen_set)
+				xlrec->flags |= XLH_INSERT_ALL_FROZEN_SET;
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2329,13 +2355,39 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		END_CRIT_SECTION();
 
-		UnlockReleaseBuffer(buffer);
-		if (vmbuffer != InvalidBuffer)
-			ReleaseBuffer(vmbuffer);
+		if (all_frozen_set)
+		{
+			/* if everything frozen, the whole page has to be visible */
+			Assert(PageIsAllVisible(page));
 
+			/*
+			 * If we've frozen everything on the page, and if we're already
+			 * holding pin on the vmbuffer, record that in the visibilitymap.
+			 *
+			 * It's fine to use InvalidTransactionId here - this is only used
+			 * when HEAP_INSERT_FROZEN is specified, which intentionally
+			 * violates visibility rules.
+			 */
+			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
+			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							  InvalidXLogRecPtr, vmbuffer,
+							  InvalidTransactionId,
+							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+		}
+
+		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
+
+		/*
+		 * NB: Only release vmbuffer after inserting all tuples - it's fairly
+		 * likely that we'll insert into subsequent heap pages that are likely
+		 * to use the same vm page.
+		 */
 	}
 
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
 	/*
 	 * We're done with the actual inserts.  Check for conflicts again, to
 	 * ensure that all rw-conflicts in to these inserts are detected.  Without
@@ -8240,6 +8292,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 	XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
 
+	/* check mutually exclusive flags */
+	Assert (!((xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
+				&& (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)));
+
 	/*
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
@@ -8328,6 +8384,9 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
+		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			PageSetAllVisible(page);
 
 		MarkBufferDirty(buffer);
 	}
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index ca357410a2..349bb43249 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -433,6 +433,13 @@ loop:
 			buffer = ReadBufferBI(relation, targetBlock, RBM_NORMAL, bistate);
 			if (PageIsAllVisible(BufferGetPage(buffer)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
+			/*
+			 * If the page is empty, pin vmbuffer to set all_frozen bit.
+			 */
+			if ((options & HEAP_INSERT_FROZEN) &&
+				(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
+				visibilitymap_pin(relation, targetBlock, vmbuffer);
+
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
 		else if (otherBlock == targetBlock)
@@ -619,6 +626,15 @@ loop:
 	PageInit(page, BufferGetPageSize(buffer), 0);
 	MarkBufferDirty(buffer);
 
+	/*
+	 * The page is empty, pin vmbuffer to set all_frozen bit.
+	 */
+	if (options & HEAP_INSERT_FROZEN)
+	{
+		Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
+		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+	}
+
 	/*
 	 * Release the file-extension lock; it's now OK for someone else to extend
 	 * the relation some more.
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 1525194112..41185b3977 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -69,6 +69,9 @@
 #define XLH_INSERT_CONTAINS_NEW_TUPLE			(1<<3)
 #define XLH_INSERT_ON_TOAST_RELATION			(1<<4)
 
+/* all_frozen_set always implies all_visible_set */
+#define XLH_INSERT_ALL_FROZEN_SET				(1<<5)
+
 /*
  * xl_heap_update flag values, 8 bits are available.
  */

#58

tomas.vondra@2ndquadrant.com

about 5 years ago

In reply to: Anastasia Lubennikova (#57)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Mon, Nov 02, 2020 at 04:44:22PM +0300, Anastasia Lubennikova wrote:

On 30.10.2020 03:42, Tomas Vondra wrote:

Hi,

I might be somewhat late to the party, but I've done a bit of
benchmarking too ;-) I used TPC-H data from a 100GB test, and tried
different combinations of COPY [FREEZE] and VACUUM [FREEZE], both on
current master and with the patch.

So I don't really observe any measurable slowdowns in the COPY part (in
fact there seems to be a tiny speedup, but it might be just noise). In
the VACUUM part, there's clear speedup when the data was already frozen
by COPY (Yes, those are zeroes, because it took less than 1 second.)

So that looks pretty awesome, I guess.

For the record, these tests were run on a server with NVMe SSD, so
hopefully reliable and representative numbers.

Thank you for the review.

A couple minor comments about the code:

2) I find the "if (all_frozen_set)" block a bit strange. It's a matter
of personal preference, but I'd just use a single level of nesting, i.e.
something like this:

ï¿½ï¿½ï¿½ /* if everything frozen, the whole page has to be visible */
ï¿½ï¿½ï¿½ Assert(!(all_frozen_set && !PageIsAllVisible(page)));

ï¿½ï¿½ï¿½ /*
ï¿½ï¿½ï¿½ï¿½ * If we've frozen everything on the page, and if we're already
ï¿½ï¿½ï¿½ï¿½ * holding pin on the vmbuffer, record that in the visibilitymap.
ï¿½ï¿½ï¿½ï¿½ * If we're not holding the pin, it's OK to skip this - it's just
ï¿½ï¿½ï¿½ï¿½ * an optimization.
ï¿½ï¿½ï¿½ï¿½ *
ï¿½ï¿½ï¿½ï¿½ * It's fine to use InvalidTransactionId here - this is only used
ï¿½ï¿½ï¿½ï¿½ * when HEAP_INSERT_FROZEN is specified, which intentionally
ï¿½ï¿½ï¿½ï¿½ * violates visibility rules.
ï¿½ï¿½ï¿½ï¿½ */
ï¿½ï¿½ï¿½ if (all_frozen_set &&
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
ï¿½ï¿½ï¿½ï¿½visibilitymap_set(...);

IMHO it's easier to read, but YMMV. I've also reworded the comment a bit
to say what we're doing etc. And I've moved the comment from inside the
if block into the main comment - that was making it harder to read too.

I agree that it's a matter of taste. I've updated comments and left
nesting unchanged to keep assertions simple.

3) I see RelationGetBufferForTuple does this:

ï¿½ï¿½ï¿½ /*
ï¿½ï¿½ï¿½ï¿½ * This is for COPY FREEZE needs. If page is empty,
ï¿½ï¿½ï¿½ï¿½ * pin vmbuffer to set all_frozen bit
ï¿½ï¿½ï¿½ï¿½ */
ï¿½ï¿½ï¿½ if ((options & HEAP_INSERT_FROZEN) &&
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ (PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ visibilitymap_pin(relation, BufferGetBlockNumber(buffer),
vmbuffer);

so is it actually possible to get into the (all_frozen_set) without
holding a pin on the visibilitymap? I haven't investigated this so
maybe there are other ways to get into that code block. But the new
additions to hio.c get the pin too.

I was thinking that GetVisibilityMapPins() can somehow unset the pin.
I gave it a second look. And now I don't think it's possible to get
into this code block without a pin.ï¿½ So, I converted this check into
an assertion.

4) In heap_xlog_multi_insert we now have this:

ï¿½ï¿½ï¿½ if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ PageClearAllVisible(page);
ï¿½ï¿½ï¿½ if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ PageSetAllVisible(page);

IIUC it makes no sense to have both flags at the same time, right? So
what about adding

ï¿½ï¿½ï¿½ Assert(!(XLH_INSERT_ALL_FROZEN_SET &&
XLH_INSERT_ALL_VISIBLE_CLEARED));

to check that?

Agree.

I placed this assertion to the very beginning of the function. It also
helped to simplify the code a bit.
I also noticed, that we were not updating visibility map for
all_frozen from heap_xlog_multi_insert. Fixed.

I wonder what to do about the heap_insert - I know there are concerns it
would negatively impact regular insert, but is it really true? I suppose
this optimization would be valuable even for cases where multi-insert is
not possible.

Do we have something like INSERT .. FREEZE? I only see
TABLE_INSERT_FROZEN set for COPY FREEZE and for matview operations.
Can you explain, what use-case are we trying to optimize by extending
this patch to heap_insert()?

I might be mistaken, but isn't copy forced to use heap_insert for a
bunch of reasons? For example in the presence of before/after triggers,
statement triggers on partitioned tables, or with volatile functions.

The new version is attached.
I've also fixed a typo in the comment by Tatsuo Ishii suggestion.
Also, I tested this patch with replication and found no issues.

Thanks. I'll take a look.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#59

tomas.vondra@2ndquadrant.com

about 5 years ago

In reply to: Tomas Vondra (#58)

2 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Hi,

I started looking at this patch again, hoping to get it committed in
this CF, but I think there's a regression in handling TOAST tables
(compared to the v3 patch submitted by Pavan in February 2019).

The test I'm running a very simple test (see test.sql):

1) start a transaction
2) create a table with a text column
3) copy freeze data into it
4) use pg_visibility to see how many blocks are all_visible both in the
main table and it's TOAST table

For v3 patch (applied on top of 278584b526 and s/hi_options/ti_options)
I get this:

pages NOT all_visible
------------------------------------------
main 637 0
toast 50001 3

There was some discussion about relcache invalidations causing a couple
TOAST pages not be marked as all_visible, etc.

However, for this patch on master I get this

pages NOT all_visible
------------------------------------------
main 637 0
toast 50001 50001

So no pages in TOAST are marked as all_visible. I haven't investigated
what's causing this, but IMO that needs fixing to make ths patch RFC.

Attached is the test script I'm using, and a v5 of the patch - rebased
on current master, with some minor tweaks to comments etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Set-PD_ALL_VISIBLE-and-visibility-map-bits-in-COP-v5.patchtext/x-patch; charset=UTF-8; name=0001-Set-PD_ALL_VISIBLE-and-visibility-map-bits-in-COP-v5.patchDownload

From 6ccaa4f526b38fbdd2f3a38ac3bd51e96fb140b6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 10 Jan 2021 20:30:29 +0100
Subject: [PATCH] Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE

Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the
visibility map. Until now it only marked individual tuples as frozen,
but page-level flags were not updated.

This is a fairly old patch, and multiple people worked on it. The first
version was written by Jeff Janes, and then reworked by Pavan Deolasee
and Anastasia Lubennikova.

Author: Pavan Deolasee, Anastasia Lubennikova, Jeff Janes
Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada, Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii
Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com
Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com
---
 .../pg_visibility/expected/pg_visibility.out  | 64 +++++++++++++++
 contrib/pg_visibility/sql/pg_visibility.sql   | 77 +++++++++++++++++++
 src/backend/access/heap/heapam.c              | 76 ++++++++++++++++--
 src/backend/access/heap/hio.c                 | 17 ++++
 src/include/access/heapam_xlog.h              |  3 +
 5 files changed, 229 insertions(+), 8 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index ca4b6e186b..0017e3415c 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -188,3 +251,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index f79b54480b..ec1afd4906 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -103,3 +179,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..32cc010cb7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2103,6 +2103,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock scratch;
 	Page		page;
+	Buffer		vmbuffer = InvalidBuffer;
 	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
@@ -2157,8 +2158,9 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		bool		starting_with_empty_page;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_set = false;
 		int			nthispage;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2166,12 +2168,20 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * Find buffer where at least the next tuple will fit.  If the page is
 		 * all-visible, this will also pin the requisite visibility map page.
+		 *
+		 * Also pin visibility map page if COPY FREEZE inserts tuples into an
+		 * empty page. See all_frozen_set below.
 		 */
 		buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
 										   InvalidBuffer, options, bistate,
 										   &vmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+			all_frozen_set = true;
+
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
 
@@ -2205,7 +2215,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				log_heap_new_cid(relation, heaptup);
 		}
 
-		if (PageIsAllVisible(page))
+		/*
+		 * If the page is all visible, need to clear that, unless we're only
+		 * going to add further frozen rows to it.
+		 *
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-visible.
+		 */
+		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
 			all_visible_cleared = true;
 			PageClearAllVisible(page);
@@ -2213,6 +2230,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+		else if (all_frozen_set)
+			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2236,8 +2255,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			 * If the page was previously empty, we can reinit the page
 			 * instead of restoring the whole thing.
 			 */
-			init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber &&
-					PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
+			init = starting_with_empty_page;
 
 			/* allocate xl_heap_multi_insert struct from the scratch area */
 			xlrec = (xl_heap_multi_insert *) scratchptr;
@@ -2255,7 +2273,15 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			/* check that the mutually exclusive flags are not both set */
+			Assert (!(all_visible_cleared && all_frozen_set));
+
+			xlrec->flags = 0;
+			if (all_visible_cleared)
+				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+			if (all_frozen_set)
+				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2329,13 +2355,39 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		END_CRIT_SECTION();
 
-		UnlockReleaseBuffer(buffer);
-		if (vmbuffer != InvalidBuffer)
-			ReleaseBuffer(vmbuffer);
+		/*
+		 * If we've frozen everything on the page, update the visibilitymap.
+		 * We're already holding pin on the vmbuffer.
+		 */
+		if (all_frozen_set)
+		{
+			Assert(PageIsAllVisible(page));
+			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
 
+			/*
+			 * It's fine to use InvalidTransactionId here - this is only used
+			 * when HEAP_INSERT_FROZEN is specified, which intentionally
+			 * violates visibility rules.
+			 */
+			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							  InvalidXLogRecPtr, vmbuffer,
+							  InvalidTransactionId,
+							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+		}
+
+		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
+
+		/*
+		 * NB: Only release vmbuffer after inserting all tuples - it's fairly
+		 * likely that we'll insert into subsequent heap pages that are likely
+		 * to use the same vm page.
+		 */
 	}
 
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
 	/*
 	 * We're done with the actual inserts.  Check for conflicts again, to
 	 * ensure that all rw-conflicts in to these inserts are detected.  Without
@@ -8265,6 +8317,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 	XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
 
+	/* check that the mutually exclusive flags are not both set */
+	Assert (!((xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) &&
+			  (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)));
+
 	/*
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
@@ -8354,6 +8410,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
+		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			PageSetAllVisible(page);
+
 		MarkBufferDirty(buffer);
 	}
 	if (BufferIsValid(buffer))
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index fac3b8e9ff..2d23b3ef71 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -433,6 +433,14 @@ loop:
 			buffer = ReadBufferBI(relation, targetBlock, RBM_NORMAL, bistate);
 			if (PageIsAllVisible(BufferGetPage(buffer)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
+
+			/*
+			 * If the page is empty, pin vmbuffer to set all_frozen bit later.
+			 */
+			if ((options & HEAP_INSERT_FROZEN) &&
+				(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
+				visibilitymap_pin(relation, targetBlock, vmbuffer);
+
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
 		else if (otherBlock == targetBlock)
@@ -619,6 +627,15 @@ loop:
 	PageInit(page, BufferGetPageSize(buffer), 0);
 	MarkBufferDirty(buffer);
 
+	/*
+	 * The page is empty, pin vmbuffer to set all_frozen bit.
+	 */
+	if (options & HEAP_INSERT_FROZEN)
+	{
+		Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
+		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+	}
+
 	/*
 	 * Release the file-extension lock; it's now OK for someone else to extend
 	 * the relation some more.
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 51586b883d..178d49710a 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -69,6 +69,9 @@
 #define XLH_INSERT_CONTAINS_NEW_TUPLE			(1<<3)
 #define XLH_INSERT_ON_TOAST_RELATION			(1<<4)
 
+/* all_frozen_set always implies all_visible_set */
+#define XLH_INSERT_ALL_FROZEN_SET				(1<<5)
+
 /*
  * xl_heap_update flag values, 8 bits are available.
  */
-- 
2.26.2

test.sqlapplication/sql; name=test.sqlDownload

#60

a.lubennikova@postgrespro.ru

about 5 years ago

In reply to: Tomas Vondra (#59)

2 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 11.01.2021 01:35, Tomas Vondra wrote:

Hi,

I started looking at this patch again, hoping to get it committed in
this CF, but I think there's a regression in handling TOAST tables
(compared to the v3 patch submitted by Pavan in February 2019).

The test I'm running a very simple test (see test.sql):

1) start a transaction
2) create a table with a text column
3) copy freeze data into it
4) use pg_visibility to see how many blocks are all_visible both in the
   main table and it's TOAST table

For v3 patch (applied on top of 278584b526 and
s/hi_options/ti_options) I get this:

           pages           NOT all_visible
------------------------------------------
main       637                         0
toast    50001                         3

There was some discussion about relcache invalidations causing a
couple TOAST pages not be marked as all_visible, etc.

However, for this patch on master I get this

           pages           NOT all_visible
------------------------------------------
main       637                         0
toast    50001                     50001

So no pages in TOAST are marked as all_visible. I haven't investigated
what's causing this, but IMO that needs fixing to make ths patch RFC.

Attached is the test script I'm using, and a v5 of the patch - rebased
on current master, with some minor tweaks to comments etc.

Thank you for attaching the test script. I reproduced the problem. This
regression occurs because TOAST internally uses heap_insert().
You have asked upthread about adding this optimization to heap_insert().

I wrote a quick fix, see the attached patch 0002. The TOAST test passes
now, but I haven't tested performance or any other use-cases yet.
I'm going to test it properly in a couple of days and share results.

With this change a lot of new code is repeated in heap_insert() and
heap_multi_insert(). I think it's fine, because these functions already
have a lot in common.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Set-PD_ALL_VISIBLE-and-visibility-map-bits-in-COP-v5.patchtext/x-patch; charset=UTF-8; name=0001-Set-PD_ALL_VISIBLE-and-visibility-map-bits-in-COP-v5.patchDownload

From 6ccaa4f526b38fbdd2f3a38ac3bd51e96fb140b6 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas.vondra@postgresql.org>
Date: Sun, 10 Jan 2021 20:30:29 +0100
Subject: [PATCH] Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE

Make sure COPY FREEZE marks the pages as PD_ALL_VISIBLE and updates the
visibility map. Until now it only marked individual tuples as frozen,
but page-level flags were not updated.

This is a fairly old patch, and multiple people worked on it. The first
version was written by Jeff Janes, and then reworked by Pavan Deolasee
and Anastasia Lubennikova.

Author: Pavan Deolasee, Anastasia Lubennikova, Jeff Janes
Reviewed-by: Kuntal Ghosh, Jeff Janes, Tomas Vondra, Masahiko Sawada, Andres Freund, Ibrar Ahmed, Robert Haas, Tatsuro Ishii
Discussion: https://postgr.es/m/CABOikdN-ptGv0mZntrK2Q8OtfUuAjqaYMGmkdU1dCKFtUxVLrg@mail.gmail.com
Discussion: https://postgr.es/m/CAMkU%3D1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ%40mail.gmail.com
---
 .../pg_visibility/expected/pg_visibility.out  | 64 +++++++++++++++
 contrib/pg_visibility/sql/pg_visibility.sql   | 77 +++++++++++++++++++
 src/backend/access/heap/heapam.c              | 76 ++++++++++++++++--
 src/backend/access/heap/hio.c                 | 17 ++++
 src/include/access/heapam_xlog.h              |  3 +
 5 files changed, 229 insertions(+), 8 deletions(-)

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index ca4b6e186b..0017e3415c 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -188,3 +251,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index f79b54480b..ec1afd4906 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -103,3 +179,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 53e997cd55..32cc010cb7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2103,6 +2103,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock scratch;
 	Page		page;
+	Buffer		vmbuffer = InvalidBuffer;
 	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
@@ -2157,8 +2158,9 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		bool		starting_with_empty_page;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_set = false;
 		int			nthispage;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2166,12 +2168,20 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * Find buffer where at least the next tuple will fit.  If the page is
 		 * all-visible, this will also pin the requisite visibility map page.
+		 *
+		 * Also pin visibility map page if COPY FREEZE inserts tuples into an
+		 * empty page. See all_frozen_set below.
 		 */
 		buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
 										   InvalidBuffer, options, bistate,
 										   &vmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+			all_frozen_set = true;
+
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
 
@@ -2205,7 +2215,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				log_heap_new_cid(relation, heaptup);
 		}
 
-		if (PageIsAllVisible(page))
+		/*
+		 * If the page is all visible, need to clear that, unless we're only
+		 * going to add further frozen rows to it.
+		 *
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-visible.
+		 */
+		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
 			all_visible_cleared = true;
 			PageClearAllVisible(page);
@@ -2213,6 +2230,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+		else if (all_frozen_set)
+			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2236,8 +2255,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			 * If the page was previously empty, we can reinit the page
 			 * instead of restoring the whole thing.
 			 */
-			init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber &&
-					PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
+			init = starting_with_empty_page;
 
 			/* allocate xl_heap_multi_insert struct from the scratch area */
 			xlrec = (xl_heap_multi_insert *) scratchptr;
@@ -2255,7 +2273,15 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			/* check that the mutually exclusive flags are not both set */
+			Assert (!(all_visible_cleared && all_frozen_set));
+
+			xlrec->flags = 0;
+			if (all_visible_cleared)
+				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+			if (all_frozen_set)
+				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2329,13 +2355,39 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		END_CRIT_SECTION();
 
-		UnlockReleaseBuffer(buffer);
-		if (vmbuffer != InvalidBuffer)
-			ReleaseBuffer(vmbuffer);
+		/*
+		 * If we've frozen everything on the page, update the visibilitymap.
+		 * We're already holding pin on the vmbuffer.
+		 */
+		if (all_frozen_set)
+		{
+			Assert(PageIsAllVisible(page));
+			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
 
+			/*
+			 * It's fine to use InvalidTransactionId here - this is only used
+			 * when HEAP_INSERT_FROZEN is specified, which intentionally
+			 * violates visibility rules.
+			 */
+			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							  InvalidXLogRecPtr, vmbuffer,
+							  InvalidTransactionId,
+							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+		}
+
+		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
+
+		/*
+		 * NB: Only release vmbuffer after inserting all tuples - it's fairly
+		 * likely that we'll insert into subsequent heap pages that are likely
+		 * to use the same vm page.
+		 */
 	}
 
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
 	/*
 	 * We're done with the actual inserts.  Check for conflicts again, to
 	 * ensure that all rw-conflicts in to these inserts are detected.  Without
@@ -8265,6 +8317,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 	XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
 
+	/* check that the mutually exclusive flags are not both set */
+	Assert (!((xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) &&
+			  (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)));
+
 	/*
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
@@ -8354,6 +8410,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
+		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			PageSetAllVisible(page);
+
 		MarkBufferDirty(buffer);
 	}
 	if (BufferIsValid(buffer))
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index fac3b8e9ff..2d23b3ef71 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -433,6 +433,14 @@ loop:
 			buffer = ReadBufferBI(relation, targetBlock, RBM_NORMAL, bistate);
 			if (PageIsAllVisible(BufferGetPage(buffer)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
+
+			/*
+			 * If the page is empty, pin vmbuffer to set all_frozen bit later.
+			 */
+			if ((options & HEAP_INSERT_FROZEN) &&
+				(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
+				visibilitymap_pin(relation, targetBlock, vmbuffer);
+
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
 		else if (otherBlock == targetBlock)
@@ -619,6 +627,15 @@ loop:
 	PageInit(page, BufferGetPageSize(buffer), 0);
 	MarkBufferDirty(buffer);
 
+	/*
+	 * The page is empty, pin vmbuffer to set all_frozen bit.
+	 */
+	if (options & HEAP_INSERT_FROZEN)
+	{
+		Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
+		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+	}
+
 	/*
 	 * Release the file-extension lock; it's now OK for someone else to extend
 	 * the relation some more.
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 51586b883d..178d49710a 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -69,6 +69,9 @@
 #define XLH_INSERT_CONTAINS_NEW_TUPLE			(1<<3)
 #define XLH_INSERT_ON_TOAST_RELATION			(1<<4)
 
+/* all_frozen_set always implies all_visible_set */
+#define XLH_INSERT_ALL_FROZEN_SET				(1<<5)
+
 /*
  * xl_heap_update flag values, 8 bits are available.
  */
-- 
2.26.2

0002_handle_HEAP_INSERT_FROZEN_in_heap_insert.patchtext/x-patch; charset=UTF-8; name=0002_handle_HEAP_INSERT_FROZEN_in_heap_insert.patchDownload

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 32cc010cb7..3663ff4b83 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1862,8 +1862,12 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
+	bool		starting_with_empty_page;
 	bool		all_visible_cleared = false;
+	bool		all_frozen_set = false;
+	uint8		vmstatus = 0;
 
 	/*
 	 * Fill in tuple header fields and toast the tuple if necessary.
@@ -1876,11 +1880,36 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	/*
 	 * Find buffer to insert this tuple into.  If the page is all visible,
 	 * this will also pin the requisite visibility map page.
+	 *
+	 * Also pin visibility map page if COPY FREEZE inserts tuples into an
+	 * empty page. See all_frozen_set below.
 	 */
 	buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
 									   InvalidBuffer, options, bistate,
 									   &vmbuffer, NULL);
 
+
+	/*
+	 * If we're inserting frozen entry into an empty page,
+	 * set visibility map bits and PageAllVisible() hint.
+	 *
+	 * If we're inserting frozen entry into already all_frozen page,
+	 * preserve this state.
+	 */
+	if (options & HEAP_INSERT_FROZEN)
+	{
+		page = BufferGetPage(buffer);
+
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
+			vmstatus = visibilitymap_get_status(relation,
+								 BufferGetBlockNumber(buffer), &vmbuffer);
+
+		if ((starting_with_empty_page || vmstatus & VISIBILITYMAP_ALL_FROZEN))
+			all_frozen_set = true;
+	}
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -1904,7 +1933,14 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	/*
+	 * If the page is all visible, need to clear that, unless we're only
+	 * going to add further frozen rows to it.
+	 *
+	 * If we're only adding already frozen rows to a previously empty
+	 * or all visible page, mark it as all-visible.
+	 */
+	if (PageIsAllVisible(BufferGetPage(buffer)) && !(options & HEAP_INSERT_FROZEN))
 	{
 		all_visible_cleared = true;
 		PageClearAllVisible(BufferGetPage(buffer));
@@ -1912,6 +1948,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
+	else if (all_frozen_set)
+		PageSetAllVisible(page);
 
 	/*
 	 * XXX Should we set PageSetPrunable on this page ?
@@ -1959,6 +1997,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xlrec.flags = 0;
 		if (all_visible_cleared)
 			xlrec.flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+		if (all_frozen_set)
+			xlrec.flags = XLH_INSERT_ALL_FROZEN_SET;
 		if (options & HEAP_INSERT_SPECULATIVE)
 			xlrec.flags |= XLH_INSERT_IS_SPECULATIVE;
 		Assert(ItemPointerGetBlockNumber(&heaptup->t_self) == BufferGetBlockNumber(buffer));
@@ -2007,6 +2047,29 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 
 	END_CRIT_SECTION();
 
+	/*
+	 * If we've frozen everything on the page, update the visibilitymap.
+	 * We're already holding pin on the vmbuffer.
+	 *
+	 * No need to update the visibilitymap if it had all_frozen bit set
+	 * before this insertion.
+	 */
+	if (all_frozen_set && ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0))
+	{
+		Assert(PageIsAllVisible(page));
+		Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
+
+		/*
+		 * It's fine to use InvalidTransactionId here - this is only used
+		 * when HEAP_INSERT_FROZEN is specified, which intentionally
+		 * violates visibility rules.
+		 */
+		visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							InvalidXLogRecPtr, vmbuffer,
+							InvalidTransactionId,
+							VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+	}
+
 	UnlockReleaseBuffer(buffer);
 	if (vmbuffer != InvalidBuffer)
 		ReleaseBuffer(vmbuffer);
@@ -8197,6 +8260,10 @@ heap_xlog_insert(XLogReaderState *record)
 	ItemPointerSetBlockNumber(&target_tid, blkno);
 	ItemPointerSetOffsetNumber(&target_tid, xlrec->offnum);
 
+	/* check that the mutually exclusive flags are not both set */
+	Assert (!((xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) &&
+			  (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)));
+
 	/*
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
@@ -8267,6 +8334,11 @@ heap_xlog_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
+		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			PageSetAllVisible(page);
+
+
 		MarkBufferDirty(buffer);
 	}
 	if (BufferIsValid(buffer))

#61

tomas.vondra@enterprisedb.com

about 5 years ago

In reply to: Anastasia Lubennikova (#60)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 1/11/21 10:00 PM, Anastasia Lubennikova wrote:

On 11.01.2021 01:35, Tomas Vondra wrote:

Hi,

I started looking at this patch again, hoping to get it committed in
this CF, but I think there's a regression in handling TOAST tables
(compared to the v3 patch submitted by Pavan in February 2019).

The test I'm running a very simple test (see test.sql):

1) start a transaction
2) create a table with a text column
3) copy freeze data into it
4) use pg_visibility to see how many blocks are all_visible both in the
   main table and it's TOAST table

For v3 patch (applied on top of 278584b526 and
s/hi_options/ti_options) I get this:

           pages           NOT all_visible
------------------------------------------
main       637                         0
toast    50001                         3

There was some discussion about relcache invalidations causing a
couple TOAST pages not be marked as all_visible, etc.

However, for this patch on master I get this

           pages           NOT all_visible
------------------------------------------
main       637                         0
toast    50001                     50001

So no pages in TOAST are marked as all_visible. I haven't investigated
what's causing this, but IMO that needs fixing to make ths patch RFC.

Attached is the test script I'm using, and a v5 of the patch - rebased
on current master, with some minor tweaks to comments etc.

Thank you for attaching the test script. I reproduced the problem. This
regression occurs because TOAST internally uses heap_insert().
You have asked upthread about adding this optimization to heap_insert().

I wrote a quick fix, see the attached patch 0002. The TOAST test passes
now, but I haven't tested performance or any other use-cases yet.
I'm going to test it properly in a couple of days and share results.

Thanks. I think it's important to make this work for TOAST tables - it
often stores most of the data, and it was working in v3 of the patch. I
haven't looked into the details, but if it's really just due to TOAST
using heap_insert, I'd say it just confirms the importance of tweaking
heap_insert too.

With this change a lot of new code is repeated in heap_insert() and
heap_multi_insert(). I think it's fine, because these functions already
have a lot in common.

Understood. IMHO a bit of redundancy is not a big issue, but I haven't
looked at the code yet. Let's get it working first, then we can decide
if some refactoring is appropriate.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#62

a.lubennikova@postgrespro.ru

almost 5 years ago

In reply to: Tomas Vondra (#61)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 12.01.2021 00:51, Tomas Vondra wrote:

On 1/11/21 10:00 PM, Anastasia Lubennikova wrote:

On 11.01.2021 01:35, Tomas Vondra wrote:

Hi,

I started looking at this patch again, hoping to get it committed in
this CF, but I think there's a regression in handling TOAST tables
(compared to the v3 patch submitted by Pavan in February 2019).

The test I'm running a very simple test (see test.sql):

1) start a transaction
2) create a table with a text column
3) copy freeze data into it
4) use pg_visibility to see how many blocks are all_visible both in the
   main table and it's TOAST table

For v3 patch (applied on top of 278584b526 and
s/hi_options/ti_options) I get this:

           pages           NOT all_visible
------------------------------------------
main       637                         0
toast    50001                         3

There was some discussion about relcache invalidations causing a
couple TOAST pages not be marked as all_visible, etc.

However, for this patch on master I get this

           pages           NOT all_visible
------------------------------------------
main       637                         0
toast    50001                     50001

So no pages in TOAST are marked as all_visible. I haven't
investigated what's causing this, but IMO that needs fixing to make
ths patch RFC.

Attached is the test script I'm using, and a v5 of the patch -
rebased on current master, with some minor tweaks to comments etc.

Thank you for attaching the test script. I reproduced the problem.
This regression occurs because TOAST internally uses heap_insert().
You have asked upthread about adding this optimization to heap_insert().

I wrote a quick fix, see the attached patch 0002. The TOAST test
passes now, but I haven't tested performance or any other use-cases yet.
I'm going to test it properly in a couple of days and share results.

Thanks. I think it's important to make this work for TOAST tables - it
often stores most of the data, and it was working in v3 of the patch.
I haven't looked into the details, but if it's really just due to
TOAST using heap_insert, I'd say it just confirms the importance of
tweaking heap_insert too.

I've tested performance. All tests were run on my laptop, latest master
with and without patches, all default settings, except disabled
autovacuum and installed pg_stat_statements extension.

The VACUUM is significantly faster with the patch, as it only checks
visibility map. COPY speed fluctuates a lot between tests, but I didn't
notice any trend.
I would expect minor slowdown with the patch, as we need to handle
visibility map pages during COPY FREEZE. But in some runs, patched
version was faster than current master, so the impact of the patch is
insignificant.

I run 3 different tests:

1) Regular table (final size 5972 MB)

patched | master

COPY FREEZE data 3 times:

33384,544 ms 31135,037 ms
31666,226 ms 31158,294 ms
32802,783 ms 33599,234 ms

VACUUM
54,185 ms 48445,584 ms

2) Table with TOAST (final size 1207 MB where 1172 MB is in toast table)

patched | master

COPY FREEZE data 3 times:

368166,743 ms 383231,077 ms
398695,018 ms 454572,630 ms
410174,682 ms 567847,288 ms

VACUUM
43,159 ms 6648,302 ms

3) Table with a trivial BEFORE INSERT trigger (final size 5972 MB)

patched | master

COPY FREEZE data 3 times:

73544,225 ms      64967,802 ms
90960,584 ms      71826,362 ms
81356,025 ms      80023,041 ms

VACUUM
49,626 ms 40100,097 ms

I took another look at the yesterday's patch and it looks fine to me. So
now I am waiting for your review.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#63

tomas.vondra@enterprisedb.com

almost 5 years ago

In reply to: Anastasia Lubennikova (#62)

2 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Thanks. These patches seem to resolve the TOAST table issue, freezing it
as expected. I think the code duplication is not an issue, but I wonder
why heap_insert uses this condition:

/*
* ...
*
* No need to update the visibilitymap if it had all_frozen bit set
* before this insertion.
*/
if (all_frozen_set && ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0))

while heap_multi_insert only does this:

if (all_frozen_set) { ... }

I haven't looked at the details, but shouldn't both do the same thing?

I've done some benchmarks, comparing master and patched version on a
bunch of combinations (logged/unlogged, no before-insert trigger,
trigger filtering everything/nothing). On master, the results are:

group copy vacuum
-----------------------------------------------
logged / no trigger 4672 162
logged / trigger (all) 4914 162
logged / trigger (none) 1219 11
unlogged / no trigger 4132 156
unlogged / trigger (all) 4292 156
unlogged / trigger (none) 1275 11

and patched looks like this:

group copy vacuum
-----------------------------------------------
logged / no trigger 4669 12
logged / trigger (all) 4874 12
logged / trigger (none) 1233 11
unlogged / no trigger 4054 11
unlogged / trigger (all) 4185 12
unlogged / trigger (none) 1222 10

This looks pretty nice - there are no regressions, just speedups in the
vacuum step. The SQL script used is attached.

However, I've also repeated the test counting all-frozen pages in both
the main table and TOAST table, and I get this:

master
======

select count(*) from pg_visibility((select reltoastrelid from pg_class
where relname = 't'));

count
--------
100000
(1 row)

select count(*) from pg_visibility((select reltoastrelid from pg_class
where relname = 't')) where not all_visible;

count
--------
100000
(1 row)

patched
=======

select count(*) from pg_visibility((select reltoastrelid from pg_class
where relname = 't'));

count
--------
100002
(1 row)

select count(*) from pg_visibility((select reltoastrelid from pg_class
where relname = 't')) where not all_visible;

count
--------
0
(1 row)

That is - all TOAST pages are frozen (as expected, which is good). But
now there are 100002 pages, not just 100000 pages. That is, we're now
creating 2 extra pages, for some reason. I recall Pavan reported similar
issue with every 32768-th page not being properly filled, but I'm not
sure if that's the same issue.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#64

a.lubennikova@postgrespro.ru

almost 5 years ago

In reply to: Tomas Vondra (#63)

1 attachment(s)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 12.01.2021 22:30, Tomas Vondra wrote:

Thanks. These patches seem to resolve the TOAST table issue, freezing
it as expected. I think the code duplication is not an issue, but I
wonder why heap_insert uses this condition:

    /*
     * ...
     *
     * No need to update the visibilitymap if it had all_frozen bit set
     * before this insertion.
     */
    if (all_frozen_set && ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0))

while heap_multi_insert only does this:

    if (all_frozen_set) { ... }

I haven't looked at the details, but shouldn't both do the same thing?

I decided to add this check for heap_insert() to avoid unneeded calls of
visibilitymap_set(). If we insert tuples one by one, we can only call
this once per page.
In my understanding, heap_multi_insert() inserts tuples in batches, so
it doesn't need this optimization.

However, I've also repeated the test counting all-frozen pages in both
the main table and TOAST table, and I get this:

patched
=======

select count(*) from pg_visibility((select reltoastrelid from pg_class
where relname = 't'));

count
--------
100002
(1 row)

select count(*) from pg_visibility((select reltoastrelid from pg_class
where relname = 't')) where not all_visible;

count
--------
0
(1 row)

That is - all TOAST pages are frozen (as expected, which is good). But
now there are 100002 pages, not just 100000 pages. That is, we're now
creating 2 extra pages, for some reason. I recall Pavan reported
similar issue with every 32768-th page not being properly filled, but
I'm not sure if that's the same issue.

regards

As Pavan correctly figured it out before the problem is that
RelationGetBufferForTuple() moves to the next page, losing free space in
the block:

... I see that a relcache invalidation arrives
after 1st and then after every 32672th block is filled. That clears the
rel->rd_smgr field and we lose the information about the saved target
block. The code then moves to extend the relation again and thus

skips the

previously less-than-half filled block, losing the free space in that

block.

The reason of this cache invalidation is vm_extend() call, which happens
every 32762 blocks.

RelationGetBufferForTuple() tries to use the last page, but for some
reason this code is under 'use_fsm' check. And COPY FROM doesn't use fsm
(see TABLE_INSERT_SKIP_FSM).

        /*
        * If the FSM knows nothing of the rel, try the last page before we
        * give up and extend. This avoids one-tuple-per-page syndrome
during
        * bootstrapping or in a recently-started system.
        */
        if (targetBlock == InvalidBlockNumber)
        {
            BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
            if (nblocks > 0)
                targetBlock = nblocks - 1;
        }

I think we can use this code without regard to 'use_fsm'. With this
change, the number of toast rel pages is correct. The patch is attached.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Set-PD_ALL_VISIBLE-and-visibility-map-bits-in-COP-v7.patchtext/x-patch; charset=UTF-8; name=0001-Set-PD_ALL_VISIBLE-and-visibility-map-bits-in-COP-v7.patchDownload

diff --git a/contrib/pg_visibility/expected/pg_visibility.out b/contrib/pg_visibility/expected/pg_visibility.out
index ca4b6e186b..0017e3415c 100644
--- a/contrib/pg_visibility/expected/pg_visibility.out
+++ b/contrib/pg_visibility/expected/pg_visibility.out
@@ -179,6 +179,69 @@ select pg_truncate_visibility_map('test_partition');
  
 (1 row)
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | t           | t
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | f           | f
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+copy copyfreeze from stdin;
+copy copyfreeze from stdin freeze;
+commit;
+select * from pg_visibility_map('copyfreeze');
+ blkno | all_visible | all_frozen 
+-------+-------------+------------
+     0 | t           | t
+     1 | f           | f
+     2 | t           | t
+(3 rows)
+
+select * from pg_check_frozen('copyfreeze');
+ t_ctid 
+--------
+(0 rows)
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -188,3 +251,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/contrib/pg_visibility/sql/pg_visibility.sql b/contrib/pg_visibility/sql/pg_visibility.sql
index f79b54480b..ec1afd4906 100644
--- a/contrib/pg_visibility/sql/pg_visibility.sql
+++ b/contrib/pg_visibility/sql/pg_visibility.sql
@@ -94,6 +94,82 @@ select count(*) > 0 from pg_visibility_map_summary('test_partition');
 select * from pg_check_frozen('test_partition'); -- hopefully none
 select pg_truncate_visibility_map('test_partition');
 
+-- test copy freeze
+create table copyfreeze (a int, b char(1500));
+
+-- load all rows via COPY FREEZE and ensure that all pages are set all-visible
+-- and all-frozen.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- load half the rows via regular COPY and rest via COPY FREEZE. The pages
+-- which are touched by regular COPY must not be set all-visible/all-frozen. On
+-- the other hand, pages allocated by COPY FREEZE should be marked
+-- all-frozen/all-visible.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
+-- Try a mix of regular COPY and COPY FREEZE.
+begin;
+truncate copyfreeze;
+copy copyfreeze from stdin freeze;
+1	'1'
+2	'2'
+3	'3'
+4	'4'
+5	'5'
+\.
+copy copyfreeze from stdin;
+6	'6'
+\.
+copy copyfreeze from stdin freeze;
+7	'7'
+8	'8'
+9	'9'
+10	'10'
+11	'11'
+12	'12'
+\.
+commit;
+select * from pg_visibility_map('copyfreeze');
+select * from pg_check_frozen('copyfreeze');
+
 -- cleanup
 drop table test_partitioned;
 drop view test_view;
@@ -103,3 +179,4 @@ drop server dummy_server;
 drop foreign data wrapper dummy;
 drop materialized view matview_visibility_test;
 drop table regular_table;
+drop table copyfreeze;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5b9cfb26cf..3158a4de0d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1880,8 +1880,12 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	TransactionId xid = GetCurrentTransactionId();
 	HeapTuple	heaptup;
 	Buffer		buffer;
+	Page		page;
 	Buffer		vmbuffer = InvalidBuffer;
+	bool		starting_with_empty_page;
 	bool		all_visible_cleared = false;
+	bool		all_frozen_set = false;
+	uint8		vmstatus = 0;
 
 	/*
 	 * Fill in tuple header fields and toast the tuple if necessary.
@@ -1894,11 +1898,36 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	/*
 	 * Find buffer to insert this tuple into.  If the page is all visible,
 	 * this will also pin the requisite visibility map page.
+	 *
+	 * Also pin visibility map page if COPY FREEZE inserts tuples into an
+	 * empty page. See all_frozen_set below.
 	 */
 	buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
 									   InvalidBuffer, options, bistate,
 									   &vmbuffer, NULL);
 
+
+	/*
+	 * If we're inserting frozen entry into an empty page,
+	 * set visibility map bits and PageAllVisible() hint.
+	 *
+	 * If we're inserting frozen entry into already all_frozen page,
+	 * preserve this state.
+	 */
+	if (options & HEAP_INSERT_FROZEN)
+	{
+		page = BufferGetPage(buffer);
+
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer))
+			vmstatus = visibilitymap_get_status(relation,
+								 BufferGetBlockNumber(buffer), &vmbuffer);
+
+		if ((starting_with_empty_page || vmstatus & VISIBILITYMAP_ALL_FROZEN))
+			all_frozen_set = true;
+	}
+
 	/*
 	 * We're about to do the actual insert -- but check for conflict first, to
 	 * avoid possibly having to roll back work we've just done.
@@ -1922,7 +1951,14 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	RelationPutHeapTuple(relation, buffer, heaptup,
 						 (options & HEAP_INSERT_SPECULATIVE) != 0);
 
-	if (PageIsAllVisible(BufferGetPage(buffer)))
+	/*
+	 * If the page is all visible, need to clear that, unless we're only
+	 * going to add further frozen rows to it.
+	 *
+	 * If we're only adding already frozen rows to a previously empty
+	 * or all visible page, mark it as all-visible.
+	 */
+	if (PageIsAllVisible(BufferGetPage(buffer)) && !(options & HEAP_INSERT_FROZEN))
 	{
 		all_visible_cleared = true;
 		PageClearAllVisible(BufferGetPage(buffer));
@@ -1930,6 +1966,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 							ItemPointerGetBlockNumber(&(heaptup->t_self)),
 							vmbuffer, VISIBILITYMAP_VALID_BITS);
 	}
+	else if (all_frozen_set)
+		PageSetAllVisible(page);
 
 	/*
 	 * XXX Should we set PageSetPrunable on this page ?
@@ -1977,6 +2015,8 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 		xlrec.flags = 0;
 		if (all_visible_cleared)
 			xlrec.flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
+		if (all_frozen_set)
+			xlrec.flags = XLH_INSERT_ALL_FROZEN_SET;
 		if (options & HEAP_INSERT_SPECULATIVE)
 			xlrec.flags |= XLH_INSERT_IS_SPECULATIVE;
 		Assert(ItemPointerGetBlockNumber(&heaptup->t_self) == BufferGetBlockNumber(buffer));
@@ -2025,6 +2065,29 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 
 	END_CRIT_SECTION();
 
+	/*
+	 * If we've frozen everything on the page, update the visibilitymap.
+	 * We're already holding pin on the vmbuffer.
+	 *
+	 * No need to update the visibilitymap if it had all_frozen bit set
+	 * before this insertion.
+	 */
+	if (all_frozen_set && ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0))
+	{
+		Assert(PageIsAllVisible(page));
+		Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
+
+		/*
+		 * It's fine to use InvalidTransactionId here - this is only used
+		 * when HEAP_INSERT_FROZEN is specified, which intentionally
+		 * violates visibility rules.
+		 */
+		visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							InvalidXLogRecPtr, vmbuffer,
+							InvalidTransactionId,
+							VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+	}
+
 	UnlockReleaseBuffer(buffer);
 	if (vmbuffer != InvalidBuffer)
 		ReleaseBuffer(vmbuffer);
@@ -2121,6 +2184,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	int			ndone;
 	PGAlignedBlock scratch;
 	Page		page;
+	Buffer		vmbuffer = InvalidBuffer;
 	bool		needwal;
 	Size		saveFreeSpace;
 	bool		need_tuple_data = RelationIsLogicallyLogged(relation);
@@ -2175,8 +2239,9 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	while (ndone < ntuples)
 	{
 		Buffer		buffer;
-		Buffer		vmbuffer = InvalidBuffer;
+		bool		starting_with_empty_page;
 		bool		all_visible_cleared = false;
+		bool		all_frozen_set = false;
 		int			nthispage;
 
 		CHECK_FOR_INTERRUPTS();
@@ -2184,12 +2249,20 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 		/*
 		 * Find buffer where at least the next tuple will fit.  If the page is
 		 * all-visible, this will also pin the requisite visibility map page.
+		 *
+		 * Also pin visibility map page if COPY FREEZE inserts tuples into an
+		 * empty page. See all_frozen_set below.
 		 */
 		buffer = RelationGetBufferForTuple(relation, heaptuples[ndone]->t_len,
 										   InvalidBuffer, options, bistate,
 										   &vmbuffer, NULL);
 		page = BufferGetPage(buffer);
 
+		starting_with_empty_page = PageGetMaxOffsetNumber(page) == 0;
+
+		if (starting_with_empty_page && (options & HEAP_INSERT_FROZEN))
+			all_frozen_set = true;
+
 		/* NO EREPORT(ERROR) from here till changes are logged */
 		START_CRIT_SECTION();
 
@@ -2223,7 +2296,14 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 				log_heap_new_cid(relation, heaptup);
 		}
 
-		if (PageIsAllVisible(page))
+		/*
+		 * If the page is all visible, need to clear that, unless we're only
+		 * going to add further frozen rows to it.
+		 *
+		 * If we're only adding already frozen rows to a previously empty
+		 * page, mark it as all-visible.
+		 */
+		if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 		{
 			all_visible_cleared = true;
 			PageClearAllVisible(page);
@@ -2231,6 +2311,8 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 								BufferGetBlockNumber(buffer),
 								vmbuffer, VISIBILITYMAP_VALID_BITS);
 		}
+		else if (all_frozen_set)
+			PageSetAllVisible(page);
 
 		/*
 		 * XXX Should we set PageSetPrunable on this page ? See heap_insert()
@@ -2254,8 +2336,7 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			 * If the page was previously empty, we can reinit the page
 			 * instead of restoring the whole thing.
 			 */
-			init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self)) == FirstOffsetNumber &&
-					PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
+			init = starting_with_empty_page;
 
 			/* allocate xl_heap_multi_insert struct from the scratch area */
 			xlrec = (xl_heap_multi_insert *) scratchptr;
@@ -2273,7 +2354,15 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 			/* the rest of the scratch space is used for tuple data */
 			tupledata = scratchptr;
 
-			xlrec->flags = all_visible_cleared ? XLH_INSERT_ALL_VISIBLE_CLEARED : 0;
+			/* check that the mutually exclusive flags are not both set */
+			Assert (!(all_visible_cleared && all_frozen_set));
+
+			xlrec->flags = 0;
+			if (all_visible_cleared)
+				xlrec->flags = XLH_INSERT_ALL_VISIBLE_CLEARED;
+			if (all_frozen_set)
+				xlrec->flags = XLH_INSERT_ALL_FROZEN_SET;
+
 			xlrec->ntuples = nthispage;
 
 			/*
@@ -2347,13 +2436,39 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 
 		END_CRIT_SECTION();
 
-		UnlockReleaseBuffer(buffer);
-		if (vmbuffer != InvalidBuffer)
-			ReleaseBuffer(vmbuffer);
+		/*
+		 * If we've frozen everything on the page, update the visibilitymap.
+		 * We're already holding pin on the vmbuffer.
+		 */
+		if (all_frozen_set)
+		{
+			Assert(PageIsAllVisible(page));
+			Assert(visibilitymap_pin_ok(BufferGetBlockNumber(buffer), vmbuffer));
+
+			/*
+			 * It's fine to use InvalidTransactionId here - this is only used
+			 * when HEAP_INSERT_FROZEN is specified, which intentionally
+			 * violates visibility rules.
+			 */
+			visibilitymap_set(relation, BufferGetBlockNumber(buffer), buffer,
+							  InvalidXLogRecPtr, vmbuffer,
+							  InvalidTransactionId,
+							  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
+		}
 
+		UnlockReleaseBuffer(buffer);
 		ndone += nthispage;
+
+		/*
+		 * NB: Only release vmbuffer after inserting all tuples - it's fairly
+		 * likely that we'll insert into subsequent heap pages that are likely
+		 * to use the same vm page.
+		 */
 	}
 
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
 	/*
 	 * We're done with the actual inserts.  Check for conflicts again, to
 	 * ensure that all rw-conflicts in to these inserts are detected.  Without
@@ -8655,6 +8770,10 @@ heap_xlog_insert(XLogReaderState *record)
 	ItemPointerSetBlockNumber(&target_tid, blkno);
 	ItemPointerSetOffsetNumber(&target_tid, xlrec->offnum);
 
+	/* check that the mutually exclusive flags are not both set */
+	Assert (!((xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) &&
+			  (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)));
+
 	/*
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
@@ -8725,6 +8844,11 @@ heap_xlog_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
+		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			PageSetAllVisible(page);
+
+
 		MarkBufferDirty(buffer);
 	}
 	if (BufferIsValid(buffer))
@@ -8775,6 +8899,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
 
 	XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno);
 
+	/* check that the mutually exclusive flags are not both set */
+	Assert (!((xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED) &&
+			  (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)));
+
 	/*
 	 * The visibility map may need to be fixed even if the heap page is
 	 * already up-to-date.
@@ -8864,6 +8992,10 @@ heap_xlog_multi_insert(XLogReaderState *record)
 		if (xlrec->flags & XLH_INSERT_ALL_VISIBLE_CLEARED)
 			PageClearAllVisible(page);
 
+		/* XLH_INSERT_ALL_FROZEN_SET implies that all tuples are visible */
+		if (xlrec->flags & XLH_INSERT_ALL_FROZEN_SET)
+			PageSetAllVisible(page);
+
 		MarkBufferDirty(buffer);
 	}
 	if (BufferIsValid(buffer))
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index fac3b8e9ff..fb7ad0bab4 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -396,19 +396,19 @@ RelationGetBufferForTuple(Relation relation, Size len,
 		 * target.
 		 */
 		targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
+	}
 
-		/*
-		 * If the FSM knows nothing of the rel, try the last page before we
-		 * give up and extend.  This avoids one-tuple-per-page syndrome during
-		 * bootstrapping or in a recently-started system.
-		 */
-		if (targetBlock == InvalidBlockNumber)
-		{
-			BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
+	/*
+	 * If the FSM knows nothing of the rel, try the last page before we
+	 * give up and extend.  This avoids one-tuple-per-page syndrome during
+	 * bootstrapping or in a recently-started system.
+	 */
+	if (targetBlock == InvalidBlockNumber)
+	{
+		BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
 
-			if (nblocks > 0)
-				targetBlock = nblocks - 1;
-		}
+		if (nblocks > 0)
+			targetBlock = nblocks - 1;
 	}
 
 loop:
@@ -433,6 +433,14 @@ loop:
 			buffer = ReadBufferBI(relation, targetBlock, RBM_NORMAL, bistate);
 			if (PageIsAllVisible(BufferGetPage(buffer)))
 				visibilitymap_pin(relation, targetBlock, vmbuffer);
+
+			/*
+			 * If the page is empty, pin vmbuffer to set all_frozen bit later.
+			 */
+			if ((options & HEAP_INSERT_FROZEN) &&
+				(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0))
+				visibilitymap_pin(relation, targetBlock, vmbuffer);
+
 			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
 		}
 		else if (otherBlock == targetBlock)
@@ -619,6 +627,15 @@ loop:
 	PageInit(page, BufferGetPageSize(buffer), 0);
 	MarkBufferDirty(buffer);
 
+	/*
+	 * The page is empty, pin vmbuffer to set all_frozen bit.
+	 */
+	if (options & HEAP_INSERT_FROZEN)
+	{
+		Assert(PageGetMaxOffsetNumber(BufferGetPage(buffer)) == 0);
+		visibilitymap_pin(relation, BufferGetBlockNumber(buffer), vmbuffer);
+	}
+
 	/*
 	 * Release the file-extension lock; it's now OK for someone else to extend
 	 * the relation some more.
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 51586b883d..178d49710a 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -69,6 +69,9 @@
 #define XLH_INSERT_CONTAINS_NEW_TUPLE			(1<<3)
 #define XLH_INSERT_ON_TOAST_RELATION			(1<<4)
 
+/* all_frozen_set always implies all_visible_set */
+#define XLH_INSERT_ALL_FROZEN_SET				(1<<5)
+
 /*
  * xl_heap_update flag values, 8 bits are available.
  */

#65

tomas.vondra@enterprisedb.com

almost 5 years ago

In reply to: Anastasia Lubennikova (#64)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 1/16/21 4:11 PM, Anastasia Lubennikova wrote:

...

As Pavan correctly figured it out before the problem is that
RelationGetBufferForTuple() moves to the next page, losing free space in
the block:

... I see that a relcache invalidation arrives
after 1st and then after every 32672th block is filled. That clears the
rel->rd_smgr field and we lose the information about the saved target
block. The code then moves to extend the relation again and thus

skips the

previously less-than-half filled block, losing the free space in that

block.

The reason of this cache invalidation is vm_extend() call, which happens
every 32762 blocks.

RelationGetBufferForTuple() tries to use the last page, but for some
reason this code is under 'use_fsm' check. And COPY FROM doesn't use fsm
(see TABLE_INSERT_SKIP_FSM).

        /*
        * If the FSM knows nothing of the rel, try the last page
before we
        * give up and extend. This avoids one-tuple-per-page syndrome
during
        * bootstrapping or in a recently-started system.
        */
        if (targetBlock == InvalidBlockNumber)
        {
            BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
            if (nblocks > 0)
                targetBlock = nblocks - 1;
        }

I think we can use this code without regard to 'use_fsm'. With this
change, the number of toast rel pages is correct. The patch is attached.

Thanks for the updated patch, this version looks OK to me - I've marked
it as RFC. I'll do a bit more testing, review, and then I'll get it
committed.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#66

tomas.vondra@enterprisedb.com

almost 5 years ago

In reply to: Tomas Vondra (#65)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 1/16/21 11:18 PM, Tomas Vondra wrote:

...

Thanks for the updated patch, this version looks OK to me - I've marked
it as RFC. I'll do a bit more testing, review, and then I'll get it
committed.

Pushed. Thanks everyone for the effort put into this patch. The first
version was sent in 2015, so it took quite a bit of time.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#67

Tatsuo Ishii

ishii@sraoss.co.jp

almost 5 years ago

In reply to: Tomas Vondra (#66)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Pushed. Thanks everyone for the effort put into this patch. The first
version was sent in 2015, so it took quite a bit of time.

Great news. Thanks everyone who have been working on this.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#68

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2021-01-19%2020%3A27%3A46

pavan.deolasee@gmail.com

almost 5 years ago

In reply to: Tomas Vondra (#66)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On Mon, Jan 18, 2021 at 3:02 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Pushed. Thanks everyone for the effort put into this patch. The first
version was sent in 2015, so it took quite a bit of time.

Thanks Tomas, Anastasia and everyone else who worked on the patch and
ensured that it gets into the tree.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#69

Tom Lane

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Pavan Deolasee (#68)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Pavan Deolasee <pavan.deolasee@gmail.com> writes:

On Mon, Jan 18, 2021 at 3:02 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Pushed. Thanks everyone for the effort put into this patch. The first
version was sent in 2015, so it took quite a bit of time.

Thanks Tomas, Anastasia and everyone else who worked on the patch and
ensured that it gets into the tree.

Buildfarm results suggest that the test case is unstable under
CLOBBER_CACHE_ALWAYS:

This might mean an actual bug, or just that the test isn't robust.

regards, tom lane

#70

tomas.vondra@enterprisedb.com

almost 5 years ago

In reply to: Tom Lane (#69)

Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

On 1/23/21 1:10 AM, Tom Lane wrote:

Pavan Deolasee <pavan.deolasee@gmail.com> writes:

On Mon, Jan 18, 2021 at 3:02 AM Tomas Vondra <tomas.vondra@enterprisedb.com>
wrote:

Pushed. Thanks everyone for the effort put into this patch. The first
version was sent in 2015, so it took quite a bit of time.

Thanks Tomas, Anastasia and everyone else who worked on the patch and
ensured that it gets into the tree.

Buildfarm results suggest that the test case is unstable under
CLOBBER_CACHE_ALWAYS:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2021-01-19%2020%3A27%3A46

This might mean an actual bug, or just that the test isn't robust.

Yeah :-( It seems I've committed the v5 patch, not the v7 addressing
exactly this issue (which I've actually pointed out and asked to be
fixed). Oh well ... I'll get this fixed tomorrow - I have the fix, and I
have verified that it passes with CLOBBER_CACHE_ALWAYS, but pushing it
at 5AM does not seem like a good idea.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#71