COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Hi,
Jeff Janes raised an issue [1]/messages/by-id/CAMkU=1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ@mail.gmail.com about PD_ALL_VISIBLE not being set correctly
while loading data via COPY FREEZE and had also posted a draft patch.
I now have what I think is a more complete patch. I took a slightly
different approach and instead of setting PD_ALL_VISIBLE bit initially and
then not clearing it during insertion, we now recheck the page for
all-frozen, all-visible tuples just before switching to a new page. This
allows us to then also mark set the visibility map bit, like we do in
vacuumlazy.c
Some special treatment is required to handle the last page before bulk
insert it shutdown. We could have chosen not to do anything special for the
last page and let it remain unfrozen, but I thought it makes sense to take
that extra effort so that we can completely freeze the table and set all VM
bits at the end of COPY FREEZE.
Let me know what you think.
Thanks,
Pavan
[1]: /messages/by-id/CAMkU=1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ@mail.gmail.com
/messages/by-id/CAMkU=1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ@mail.gmail.com
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
copy_freeze_v3.patchapplication/octet-stream; name=copy_freeze_v3.patchDownload+206-6
Hello Pavan,
Thank you for the patch. It seems to me that while performing COPY
FREEZE, if we've copied tuples in a previously emptied page, we can
set the PageSetAllVisible(page) in heap_muli_insert only. Something
like,
bool init = (ItemPointerGetOffsetNumber(&(heaptuples[ndone]->t_self))
== FirstOffsetNumber &&
PageGetMaxOffsetNumber(page) == FirstOffsetNumber + nthispage - 1);
if (init && (options & HEAP_INSERT_FROZEN))
PageSetAllVisible(page);
Later, you can skip the pages for
CheckAndSetAllVisibleBulkInsertState() where PD_ALL_VISIBLE flag is
already set. Do you think it's correct?
On Thu, Feb 21, 2019 at 11:35 AM Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
Hi,
Jeff Janes raised an issue [1] about PD_ALL_VISIBLE not being set correctly while loading data via COPY FREEZE and had also posted a draft patch.
I now have what I think is a more complete patch. I took a slightly different approach and instead of setting PD_ALL_VISIBLE bit initially and then not clearing it during insertion, we now recheck the page for all-frozen, all-visible tuples just before switching to a new page. This allows us to then also mark set the visibility map bit, like we do in vacuumlazy.c
Some special treatment is required to handle the last page before bulk insert it shutdown. We could have chosen not to do anything special for the last page and let it remain unfrozen, but I thought it makes sense to take that extra effort so that we can completely freeze the table and set all VM bits at the end of COPY FREEZE.
Let me know what you think.
Thanks,
Pavan[1] /messages/by-id/CAMkU=1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ@mail.gmail.com
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
On Thu, 21 Feb 2019 at 15:38, Kuntal Ghosh <kuntalghosh.2007@gmail.com>
wrote:
Thank you for the patch. It seems to me that while performing COPY
FREEZE, if we've copied tuples in a previously emptied page
There won't be any previously emptied pages because of the pre-conditions
required for FREEZE.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Feb 26, 2019 at 6:46 PM Simon Riggs <simon@2ndquadrant.com> wrote:
On Thu, 21 Feb 2019 at 15:38, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
Thank you for the patch. It seems to me that while performing COPY
FREEZE, if we've copied tuples in a previously emptied pageThere won't be any previously emptied pages because of the pre-conditions required for FREEZE.
Right, I missed that part. Thanks for pointing that out. But, this
optimization is still possible for copying frozen tuples in a newly
created page, right? If current backend allocates a new page, copies a
bunch of frozen tuples in that page, it can set the PD_ALL_VISIBLE in
the same operation.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
On Thu, Feb 21, 2019 at 1:05 AM Pavan Deolasee <pavan.deolasee@gmail.com>
wrote:
Hi,
Jeff Janes raised an issue [1] about PD_ALL_VISIBLE not being set
correctly while loading data via COPY FREEZE and had also posted a draft
patch.I now have what I think is a more complete patch. I took a slightly
different approach and instead of setting PD_ALL_VISIBLE bit initially and
then not clearing it during insertion, we now recheck the page for
all-frozen, all-visible tuples just before switching to a new page. This
allows us to then also mark set the visibility map bit, like we do in
vacuumlazy.cSome special treatment is required to handle the last page before bulk
insert it shutdown. We could have chosen not to do anything special for the
last page and let it remain unfrozen, but I thought it makes sense to take
that extra effort so that we can completely freeze the table and set all VM
bits at the end of COPY FREEZE.Let me know what you think.
Hi Pavan, thanks for picking this up.
After doing a truncation and '\copy ... with (freeze)' of a table with long
data, I find that the associated toast table has a handful of unfrozen
blocks. I don't know if that is an actual problem, but it does seem a bit
odd, and thus suspicious.
perl -le 'print join "", map rand(), 1..500 foreach 1..1000000' > foo
create table foobar1 (x text);
begin;
truncate foobar1;
\copy foobar1 from foo with (freeze)
commit;
select all_visible,all_frozen,pd_all_visible, count(*) from
pg_visibility('pg_toast.pg_toast_25085') group by 1,2,3;
all_visible | all_frozen | pd_all_visible | count
-------------+------------+----------------+---------
f | f | f | 18
t | t | t | 530,361
(2 rows)
Cheers,
Jeff
Show quoted text
On Wed, Feb 27, 2019 at 7:05 AM Jeff Janes <jeff.janes@gmail.com> wrote:
After doing a truncation and '\copy ... with (freeze)' of a table with
long data, I find that the associated toast table has a handful of unfrozen
blocks. I don't know if that is an actual problem, but it does seem a bit
odd, and thus suspicious.
Hi Jeff, thanks for looking at it and the test. I can reproduce the problem
and quite curiously block number 1 and then every 32672th block is getting
skipped.
postgres=# select * from pg_visibility('pg_toast.pg_toast_16384') where
all_visible = 'f';
blkno | all_visible | all_frozen | pd_all_visible
--------+-------------+------------+----------------
1 | f | f | f
32673 | f | f | f
65345 | f | f | f
98017 | f | f | f
130689 | f | f | f
163361 | f | f | f
<snip>
Having investigated this a bit, I see that a relcache invalidation arrives
after 1st and then after every 32672th block is filled. That clears the
rel->rd_smgr field and we lose the information about the saved target
block. The code then moves to extend the relation again and thus skips the
previously less-than-half filled block, losing the free space in that block.
postgres=# SELECT * FROM
page_header(get_raw_page('pg_toast.pg_toast_16384', 0));
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
1/15B37748 | 0 | 4 | 40 | 64 | 8192 | 8192 |
4 | 0
(1 row)
postgres=# SELECT * FROM
page_header(get_raw_page('pg_toast.pg_toast_16384', 1));
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
1/15B39A28 | 0 | 4 | 28 | 7640 | 8192 | 8192 |
4 | 0
(1 row)
postgres=# SELECT * FROM
page_header(get_raw_page('pg_toast.pg_toast_16384', 2));
lsn | checksum | flags | lower | upper | special | pagesize |
version | prune_xid
------------+----------+-------+-------+-------+---------+----------+---------+-----------
1/15B3BE08 | 0 | 4 | 40 | 64 | 8192 | 8192 |
4 | 0
(1 row)
So the block 1 has a large amount of free space (upper - lower), which
never gets filled.
I am not yet sure what causes the relcache invalidation at regular
intervals. But if I have to guess, it could be because of a new VM (or
FSM?) page getting allocated. I am bit puzzled because this issue seems to
only occur with toast tables since I tested the patch while writing it on a
regular table and did not see any block remaining unfrozen. I tested only
upto 450 blocks, but that shouldn't matter because with your test, we see
the problem with block 1 as well. So something to look into in more detail.
While we could potentially fix this by what you'd done in the original
patch and what Kuntal also suggested, i.e. by setting the PD_ALL_VISIBLE
bit during page initialisation itself, I am a bit circumspect about that
approach for two reasons:
1. It requires us to then add extra logic to avoid clearing the bit during
insertions
2. It requires us to also update the VM bit during page init or risk having
divergent views on the page-level bit and the VM bit.
And even if we do that, this newly discovered problem of less-than-half
filled intermediate blocks remain. I wonder if we should instead track the
last used block in BulkInsertState and if the relcache invalidation flushes
smgr, start inserting again from the last saved block. In fact, we already
track the last used buffer in BulkInsertState and that's enough to know the
last used block.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Feb 21, 2019 at 3:05 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
Hi,
Jeff Janes raised an issue [1] about PD_ALL_VISIBLE not being set correctly while loading data via COPY FREEZE and had also posted a draft patch.
I now have what I think is a more complete patch. I took a slightly different approach and instead of setting PD_ALL_VISIBLE bit initially and then not clearing it during insertion, we now recheck the page for all-frozen, all-visible tuples just before switching to a new page. This allows us to then also mark set the visibility map bit, like we do in vacuumlazy.c
I might be missing something but why do we need to recheck whether
each pages is all-frozen after insertion? I wonder if we can set
all-frozen without checking all tuples again in this case.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Mon, Mar 11, 2019 at 1:37 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
I might be missing something but why do we need to recheck whether
each pages is all-frozen after insertion? I wonder if we can set
all-frozen without checking all tuples again in this case.
It's possible that the user may have inserted unfrozen rows (via regular
INSERT or COPY without FREEZE option) before inserting frozen rows. So we
can't set the all-visible/all-frozen flag unconditionally. I also find it
safer to do an explicit check to ensure we never accidentally mark a page
as all-frozen. Since the check is performed immediately after the page
becomes full and only once per page, there shouldn't be any additional IO
cost and the check should be quite fast.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Mar 12, 2019 at 4:54 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
On Mon, Mar 11, 2019 at 1:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I might be missing something but why do we need to recheck whether
each pages is all-frozen after insertion? I wonder if we can set
all-frozen without checking all tuples again in this case.It's possible that the user may have inserted unfrozen rows (via regular INSERT or COPY without FREEZE option) before inserting frozen rows.
I think that since COPY FREEZE can be executed only when the table is
created or truncated within the transaction other users cannot insert
any rows during COPY FREEZE.
So we can't set the all-visible/all-frozen flag unconditionally. I also find it safer to do an explicit check to ensure we never accidentally mark a page as all-frozen. Since the check is performed immediately after the page becomes full and only once per page, there shouldn't be any additional IO cost and the check should be quite fast.
I'd suggest to measure performance overhead. I can imagine one use
case of COPY FREEZE is the loading a very large table. Since in
addition to set visibility map bits this patch could scan a very large
table I'm concerned that how much performance is degraded by this
patch.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Wed, Mar 13, 2019 at 11:37 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
I think that since COPY FREEZE can be executed only when the table is
created or truncated within the transaction other users cannot insert
any rows during COPY FREEZE.
Right. But the truncating transaction can insert unfrozen rows into the
table before inserting more rows via COPY FREEZE.
postgres=# CREATE EXTENSION pageinspect ;
CREATE EXTENSION
postgres=# BEGIN;
BEGIN
postgres=# TRUNCATE testtab ;
TRUNCATE TABLE
postgres=# INSERT INTO testtab VALUES (100, 200);
INSERT 0 1
postgres=# COPY testtab FROM STDIN WITH (FREEZE);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
1 2
2 3
\.
COPY 2
postgres=# COMMIT;
postgres=# SELECT lp, to_hex(t_infomask) FROM
heap_page_items(get_raw_page('testtab', 0));
lp | to_hex
----+--------
1 | 800
2 | b00
3 | b00
(3 rows)
The first row in inserted by regular insert and it's not frozen. The next 2
are frozen. We can't mark such as page all-visible, all-frozen.
I'd suggest to measure performance overhead. I can imagine one use
case of COPY FREEZE is the loading a very large table. Since in
addition to set visibility map bits this patch could scan a very large
table I'm concerned that how much performance is degraded by this
patch.
Ok. I will run some tests. But please note that this patch is a bug fix to
address the performance issue that is caused by having to rewrite the
entire table when all-visible bit is set on the page during first vacuum.
So while we may do some more work during COPY FREEZE, we're saving a lot of
page writes during next vacuum. Also, since the scan that we are doing in
this patch is done on a page that should be in the buffer cache, we will
pay a bit in terms of CPU cost, but not anything in terms of IO cost.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Mar 14, 2019 at 5:17 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
On Wed, Mar 13, 2019 at 11:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I think that since COPY FREEZE can be executed only when the table is
created or truncated within the transaction other users cannot insert
any rows during COPY FREEZE.Right. But the truncating transaction can insert unfrozen rows into the table before inserting more rows via COPY FREEZE.
postgres=# CREATE EXTENSION pageinspect ;
CREATE EXTENSION
postgres=# BEGIN;
BEGIN
postgres=# TRUNCATE testtab ;
TRUNCATE TABLE
postgres=# INSERT INTO testtab VALUES (100, 200);
INSERT 0 1
postgres=# COPY testtab FROM STDIN WITH (FREEZE);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.1 2
2 3
\.COPY 2
postgres=# COMMIT;postgres=# SELECT lp, to_hex(t_infomask) FROM heap_page_items(get_raw_page('testtab', 0));
lp | to_hex
----+--------
1 | 800
2 | b00
3 | b00
(3 rows)The first row in inserted by regular insert and it's not frozen. The next 2 are frozen. We can't mark such as page all-visible, all-frozen.
Understood. Thank you for explanation!
I'd suggest to measure performance overhead. I can imagine one use
case of COPY FREEZE is the loading a very large table. Since in
addition to set visibility map bits this patch could scan a very large
table I'm concerned that how much performance is degraded by this
patch.Ok. I will run some tests. But please note that this patch is a bug fix to address the performance issue that is caused by having to rewrite the entire table when all-visible bit is set on the page during first vacuum. So while we may do some more work during COPY FREEZE, we're saving a lot of page writes during next vacuum. Also, since the scan that we are doing in this patch is done on a page that should be in the buffer cache, we will pay a bit in terms of CPU cost, but not anything in terms of IO cost.
Agreed. I had been misunderstanding this patch. The page scan during
COPY FREEZE is necessary and it's very cheaper than doing in the first
vacuum.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Hi Pavan,
On 3/14/19 2:20 PM, Masahiko Sawada wrote:
On Thu, Mar 14, 2019 at 5:17 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
Ok. I will run some tests. But please note that this patch is a bug fix to address the performance issue that is caused by having to rewrite the entire table when all-visible bit is set on the page during first vacuum. So while we may do some more work during COPY FREEZE, we're saving a lot of page writes during next vacuum. Also, since the scan that we are doing in this patch is done on a page that should be in the buffer cache, we will pay a bit in terms of CPU cost, but not anything in terms of IO cost.
Agreed. I had been misunderstanding this patch. The page scan during
COPY FREEZE is necessary and it's very cheaper than doing in the first
vacuum.
I have removed Ibrar as a reviewer since there has been no review from
them in three weeks, and too encourage others to have a look.
Regards,
--
-David
david@pgmasters.net
On Thu, Mar 14, 2019 at 3:54 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
Ok. I will run some tests. But please note that this patch is a bug fix
to address the performance issue that is caused by having to rewrite the
entire table when all-visible bit is set on the page during first vacuum.
So while we may do some more work during COPY FREEZE, we're saving a lot of
page writes during next vacuum. Also, since the scan that we are doing in
this patch is done on a page that should be in the buffer cache, we will
pay a bit in terms of CPU cost, but not anything in terms of IO cost.Agreed. I had been misunderstanding this patch. The page scan during
COPY FREEZE is necessary and it's very cheaper than doing in the first
vacuum.
Thanks for agreeing to the need of this bug fix. I ran some simple tests
anyways and here are the results.
The test consists of a simple table with three columns, two integers and
one char(100). I then ran COPY (FREEZE), loading 7M rows, followed by a
VACUUM. The total size of the raw data is about 800MB and the table size in
Postgres is just under 1GB. The results for 3 runs in milliseconds are:
Master:
COPY FREEZE: 40243.725 40309.675 40783.836
VACUUM: 2685.871 2517.445 2508.452
Patched:
COPY FREEZE: 40942.410 40495.303 40638.075
VACUUM: 25.067 35.793 25.390
So there is a slight increase in the time to run COPY FREEZE, but a
significant reduction in time to VACUUM the table. The benefits will only
go up if the table is vacuumed much later when most of the pages are
already written to the disk and removed from shared buffers and/or kernel
cache.
I hope this satisfies your doubts regarding performance implications of the
patch.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested
This patch is particularly helpful in processing OpenStreetMap Data in PostGIS.
OpenStreetMap is imported as a stream of 300-900 (depending on settings) gigabytes, that are needing a VACUUM after a COPY FREEZE.
With this patch, the first and usually the last transforming query is performed much faster after initial load.
I have read this patch and have no outstanding comments on it.
Pavan Deolasee demonstrates the expected speed improvement.
The new status of this patch is: Ready for Committer
On Thu, Mar 21, 2019 at 11:27 PM Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
On Thu, Mar 14, 2019 at 3:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Ok. I will run some tests. But please note that this patch is a bug fix to address the performance issue that is caused by having to rewrite the entire table when all-visible bit is set on the page during first vacuum. So while we may do some more work during COPY FREEZE, we're saving a lot of page writes during next vacuum. Also, since the scan that we are doing in this patch is done on a page that should be in the buffer cache, we will pay a bit in terms of CPU cost, but not anything in terms of IO cost.
Agreed. I had been misunderstanding this patch. The page scan during
COPY FREEZE is necessary and it's very cheaper than doing in the first
vacuum.Thanks for agreeing to the need of this bug fix. I ran some simple tests anyways and here are the results.
The test consists of a simple table with three columns, two integers and one char(100). I then ran COPY (FREEZE), loading 7M rows, followed by a VACUUM. The total size of the raw data is about 800MB and the table size in Postgres is just under 1GB. The results for 3 runs in milliseconds are:
Master:
COPY FREEZE: 40243.725 40309.675 40783.836
VACUUM: 2685.871 2517.445 2508.452Patched:
COPY FREEZE: 40942.410 40495.303 40638.075
VACUUM: 25.067 35.793 25.390So there is a slight increase in the time to run COPY FREEZE, but a significant reduction in time to VACUUM the table. The benefits will only go up if the table is vacuumed much later when most of the pages are already written to the disk and removed from shared buffers and/or kernel cache.
I hope this satisfies your doubts regarding performance implications of the patch.
Thank you for the performance testing, that's a great improvement!
I've looked at the patch and have comments and questions.
+ /*
+ * While we are holding the lock on the page, check if all tuples
+ * in the page are marked frozen at insertion. We can safely mark
+ * such page all-visible and set visibility map bits too.
+ */
+ if (CheckPageIsAllFrozen(relation, buffer))
+ PageSetAllVisible(page);
+
+ MarkBufferDirty(buffer);
Maybe we don't need to mark the buffer dirty if the page is not set all-visible.
-----
+ if (PageIsAllVisible(page))
+ {
+ uint8 vm_status = visibilitymap_get_status(relation,
+ targetBlock, &myvmbuffer);
+ uint8 flags = 0;
+
+ /* Set the VM all-frozen bit to flag, if needed */
+ if ((vm_status & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ flags |= VISIBILITYMAP_ALL_VISIBLE;
+ if ((vm_status & VISIBILITYMAP_ALL_FROZEN) == 0)
+ flags |= VISIBILITYMAP_ALL_FROZEN;
+
+ Assert(BufferIsValid(myvmbuffer));
+ if (flags != 0)
+ visibilitymap_set(relation, targetBlock, buffer, InvalidXLogRecPtr,
+ myvmbuffer, InvalidTransactionId, flags);
Since CheckPageIsAllFrozen() is used only when inserting frozen tuples
CheckAndSetPageAllVisible() seems to be implemented for the same
situation. If we have CheckAndSetPageAllVisible() for only this
situation we would rather need to check that the VM status of the page
should be 0 and then set two flags to the page? The 'flags' will
always be (VISIBILITYMAP_ALL_FROZEN | VISIBILITYMAP_ALL_VISIBLE) in
copy freeze case. I'm confused that this function has both code that
assumes some special situations and code that can be used in generic
situations.
-----
Perhaps we can add some tests for this feature to pg_visibility module.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Fri, Mar 22, 2019 at 12:19 PM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
I've looked at the patch and have comments and questions.
+ /* + * While we are holding the lock on the page, check if all tuples + * in the page are marked frozen at insertion. We can safely mark + * such page all-visible and set visibility map bits too. + */ + if (CheckPageIsAllFrozen(relation, buffer)) + PageSetAllVisible(page); + + MarkBufferDirty(buffer);Maybe we don't need to mark the buffer dirty if the page is not set
all-visible.
Yeah, makes sense. Fixed.
If we have CheckAndSetPageAllVisible() for only this
situation we would rather need to check that the VM status of the page
should be 0 and then set two flags to the page? The 'flags' will
always be (VISIBILITYMAP_ALL_FROZEN | VISIBILITYMAP_ALL_VISIBLE) in
copy freeze case. I'm confused that this function has both code that
assumes some special situations and code that can be used in generic
situations.
If a second COPY FREEZE is run within the same transaction and if starts
inserting into the page used by the previous COPY FREEZE, then the page
will already be marked all-visible/all-frozen. So we can skip repeating the
operation again. While it's quite unlikely that someone will do that and I
can't think of a situation where only one of those flags will be set, I
don't see a harm in keeping the code as is. This code is borrowed from
vacuumlazy.c and at some point we can even move it to some common location.
Perhaps we can add some tests for this feature to pg_visibility module.
That's a good idea. Please see if the tests included in the attached patch
are enough.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
copy_freeze_v4.patchapplication/octet-stream; name=copy_freeze_v4.patchDownload+348-6
Thank you for sharing the updated patch!
On Tue, Mar 26, 2019 at 6:26 PM Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
On Fri, Mar 22, 2019 at 12:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I've looked at the patch and have comments and questions.
+ /* + * While we are holding the lock on the page, check if all tuples + * in the page are marked frozen at insertion. We can safely mark + * such page all-visible and set visibility map bits too. + */ + if (CheckPageIsAllFrozen(relation, buffer)) + PageSetAllVisible(page); + + MarkBufferDirty(buffer);Maybe we don't need to mark the buffer dirty if the page is not set all-visible.
Yeah, makes sense. Fixed.
If we have CheckAndSetPageAllVisible() for only this
situation we would rather need to check that the VM status of the page
should be 0 and then set two flags to the page? The 'flags' will
always be (VISIBILITYMAP_ALL_FROZEN | VISIBILITYMAP_ALL_VISIBLE) in
copy freeze case. I'm confused that this function has both code that
assumes some special situations and code that can be used in generic
situations.If a second COPY FREEZE is run within the same transaction and if starts inserting into the page used by the previous COPY FREEZE, then the page will already be marked all-visible/all-frozen. So we can skip repeating the operation again. While it's quite unlikely that someone will do that and I can't think of a situation where only one of those flags will be set, I don't see a harm in keeping the code as is. This code is borrowed from vacuumlazy.c and at some point we can even move it to some common location.
Thank you for explanation, agreed.
Perhaps we can add some tests for this feature to pg_visibility module.
That's a good idea. Please see if the tests included in the attached patch are enough.
The patch looks good to me. There is no comment from me.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Wed, Mar 27, 2019 at 9:47 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
The patch looks good to me. There is no comment from me.
Thanks for your review! Updated patch attached since patch failed to apply
after recent changes in the master.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
copy_freeze_v5.patchapplication/octet-stream; name=copy_freeze_v5.patchDownload+348-7
Hi,
I've been looking at this patch for a while, and it seems pretty much RFC,
so barring objections I'll take care of that once I do a bit more testing
and review. Unless someone else wants to take care of that.
FWIW I wonder if we should add the code for partitioned tables to
CopyFrom, considering that's unsupported and so can't be tested etc. It's
not a huge amount of code, of course.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2019-04-03 10:19:17 +0530, Pavan Deolasee wrote:
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index c1fd7b78ce..09df70a3ac 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -2833,6 +2833,15 @@ CopyFrom(CopyState cstate) !has_instead_insert_row_trig && resultRelInfo->ri_FdwRoutine == NULL;+ /* + * Note: As of PG12, COPY FREEZE is not supported on + * partitioned table. Nevertheless have this check in place so + * that we do the right thing if it ever gets supported. + */ + if (ti_options & TABLE_INSERT_FROZEN) + CheckAndSetAllVisibleBulkInsertState(resultRelInfo->ri_RelationDesc, + bistate); + /* * We'd better make the bulk insert mechanism gets a new * buffer when the partition being inserted into changes. @@ -3062,6 +3071,15 @@ CopyFrom(CopyState cstate) firstBufferedLineNo); }+ /* + * If we are inserting frozen tuples, check if the last page used can also + * be marked as all-visible and all-frozen. This ensures that a table can + * be fully frozen when the data is loaded. + */ + if (ti_options & TABLE_INSERT_FROZEN) + CheckAndSetAllVisibleBulkInsertState(resultRelInfo->ri_RelationDesc, + bistate); + /* Done, clean up */ error_context_stack = errcallback.previous;
I'm totally not OK with this from a layering
POV. CheckAndSetAllVisibleBulkInsertState is entirely heap specific
(without being named such), whereas all the heap specific bits are
getting moved below tableam.
Greetings,
Andres Freund