Freeze avoidance of very large table.
Hi all,
I'd like to propose read-only table to avoid full scanning to the very
large table.
The WIP patch is attached.
- Background
Postgres can have tuple forever by freezing it, but freezing tuple
needs to scan whole table.
It would negatively affect to system performance, especially in very
large database system.
There is no command that will guarantee a whole table has been
completely frozen,
so postgres needs to run freezing tuples even we have not written table at all.
We need a DDL command will ensure all tuples are frozen and mark table
as read-only, as one way to avoid full scanning to the very large
table.
This topic has been already discussed before, proposed by Simon.
- Feature
I tried to implement this feature called ALTER TABLE SET READ ONLY,
and SET READ WRITE.
What I'm imagining feature is attached this mail as patch file, it's
WIP version patch.
The patch does followings.
* Add new column relreadonly to pg_class.
* Add new syntax ALTER TABLE SET READ ONLY, and ALTER TABLE SET READ WRTIE
* When marking read-only, all tuple of table are frozen with ShareLock
at one pass (like VACUUM FREEZE),
and then update pg_class.relreadonly to true.
* When un-marking read-only, just update pg_class.readonly to false.
* If table has TOAST table then TOAST table is marked as well at same time.
* The writing and vacuum to read-only table are completely restricted
or ignored.
e.g., INSERT, UPDATE ,DELTET, explicit vacuum, auto vacuum
There are a few but not critical problem.
* Processing freezing all tuple are quite similar to VACUUM FREEZE,
but calling lazy_vacuum_rel() would be overkill, I think.
* Need to consider lock level.
Please give me feedback.
Regards,
-------
Sawada Masahiko
Attachments:
000_read_only_table_v0.patchtext/x-patch; charset=US-ASCII; name=000_read_only_table_v0.patchDownload+308-21
On 4/3/15 12:59 AM, Sawada Masahiko wrote:
+ case HEAPTUPLE_LIVE: + case HEAPTUPLE_RECENTLY_DEAD: + case HEAPTUPLE_INSERT_IN_PROGRESS: + case HEAPTUPLE_DELETE_IN_PROGRESS: + if (heap_prepare_freeze_tuple(tuple.t_data, freezelimit, + mxactcutoff, &frozen[nfrozen])) + frozen[nfrozen++].offset = offnum; + break;
This doesn't seem safe enough to me. Can't there be tuples that are
still new enough that they can't be frozen, and are still live? I don't
think it's safe to leave tuples as dead either, even if they're hinted.
The hint may not be written. Also, the patch seems to be completely
ignoring actually freezing the toast relation; I can't see how that's
actually safe.
I'd feel a heck of a lot safer if any time heap_prepare_freeze_tuple
returned false we did a second check on the tuple to ensure it was truly
frozen.
Somewhat related... instead of forcing the freeze to happen
synchronously, can't we set this up so a table is in one of three
states? Read/Write, Read Only, Frozen. AT_SetReadOnly and
AT_SetReadWrite would simply change to the appropriate state, and all
the vacuum infrastructure would continue to process those tables as it
does today. lazy_vacuum_rel would become responsible for tracking if
there were any non-frozen tuples if it was also attempting a freeze. If
it discovered there were none, AND the table was marked as ReadOnly,
then it would change the table state to Frozen and set relfrozenxid =
InvalidTransactionId and relminxid = InvalidMultiXactId. AT_SetReadWrite
could change relfrozenxid to it's own Xid as an optimization. Doing it
that way leaves all the complicated vacuum code in one place, and would
eliminate concerns about race conditions with still running
transactions, etc.
BTW, you also need to put things in place to ensure it's impossible to
unfreeze a tuple in a relation that's marked ReadOnly or Frozen. I'm not
sure what the right way to do that would be.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/4/15 5:10 PM, Jim Nasby wrote:
On 4/3/15 12:59 AM, Sawada Masahiko wrote:
+ case HEAPTUPLE_LIVE: + case HEAPTUPLE_RECENTLY_DEAD: + case HEAPTUPLE_INSERT_IN_PROGRESS: + case HEAPTUPLE_DELETE_IN_PROGRESS: + if (heap_prepare_freeze_tuple(tuple.t_data, freezelimit, + mxactcutoff, &frozen[nfrozen])) + frozen[nfrozen++].offset = offnum; + break;This doesn't seem safe enough to me. Can't there be tuples that are
still new enough that they can't be frozen, and are still live? I don't
think it's safe to leave tuples as dead either, even if they're hinted.
The hint may not be written. Also, the patch seems to be completely
ignoring actually freezing the toast relation; I can't see how that's
actually safe.I'd feel a heck of a lot safer if any time heap_prepare_freeze_tuple
returned false we did a second check on the tuple to ensure it was truly
frozen.Somewhat related... instead of forcing the freeze to happen
synchronously, can't we set this up so a table is in one of three
states? Read/Write, Read Only, Frozen. AT_SetReadOnly and
AT_SetReadWrite would simply change to the appropriate state, and all
the vacuum infrastructure would continue to process those tables as it
does today. lazy_vacuum_rel would become responsible for tracking if
there were any non-frozen tuples if it was also attempting a freeze. If
it discovered there were none, AND the table was marked as ReadOnly,
then it would change the table state to Frozen and set relfrozenxid =
InvalidTransactionId and relminxid = InvalidMultiXactId. AT_SetReadWrite
could change relfrozenxid to it's own Xid as an optimization. Doing it
that way leaves all the complicated vacuum code in one place, and would
eliminate concerns about race conditions with still running
transactions, etc.BTW, you also need to put things in place to ensure it's impossible to
unfreeze a tuple in a relation that's marked ReadOnly or Frozen. I'm not
sure what the right way to do that would be.
Answering my own question... I think visibilitymap_clear() would be the
right place. AFAICT this is basically as critical as clearing the VM,
and that function has the Relation, so it can see what mode the relation
is in.
There is another possibility here, too. We can completely divorce a
ReadOnly mode (which I think is useful for other things besides
freezing) from the question of whether we need to force-freeze a
relation if we create a FrozenMap, similar to the visibility map. This
has the added advantage of helping freeze scans on relations that are
not ReadOnly in the case of tables that are insert-mostly or any other
pattern where most pages stay all-frozen.
Prior to the visibility map this would have been a rather daunting
project, but I believe this could piggyback on the VM code rather
nicely. Anytime you clear the VM you clearly must clear the FrozenMap as
well. The logic for setting the FM is clearly different, but that would
be entirely self-contained to vacuum. Unlike the VM, I don't see any
point to marking special bits in the page itself for FM.
It would be nice if each bit in the FM covered multiple pages, but that
can be optimized later.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 4/3/15 12:59 AM, Sawada Masahiko wrote:
+ case HEAPTUPLE_LIVE: + case HEAPTUPLE_RECENTLY_DEAD: + case HEAPTUPLE_INSERT_IN_PROGRESS: + case HEAPTUPLE_DELETE_IN_PROGRESS: + if (heap_prepare_freeze_tuple(tuple.t_data, freezelimit, + mxactcutoff, &frozen[nfrozen])) + frozen[nfrozen++].offset = offnum; + break;This doesn't seem safe enough to me. Can't there be tuples that are still
new enough that they can't be frozen, and are still live?
Yep. I've set a table to read only while it contained unfreezable tuples,
and the tuples remain unfrozen yet the read-only action claims to have
succeeded.
Somewhat related... instead of forcing the freeze to happen synchronously,
can't we set this up so a table is in one of three states? Read/Write, Read
Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to the
appropriate state, and all the vacuum infrastructure would continue to
process those tables as it does today. lazy_vacuum_rel would become
responsible for tracking if there were any non-frozen tuples if it was also
attempting a freeze. If it discovered there were none, AND the table was
marked as ReadOnly, then it would change the table state to Frozen and set
relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
AT_SetReadWrite could change relfrozenxid to it's own Xid as an
optimization. Doing it that way leaves all the complicated vacuum code in
one place, and would eliminate concerns about race conditions with still
running transactions, etc.
+1 here as well. I might want to set tables to read only for reasons other
than to avoid repeated freezing. (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to automatically
trigger a vacuum freeze in the process.
Cheers,
Jeff
On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 4/3/15 12:59 AM, Sawada Masahiko wrote:
+ case HEAPTUPLE_LIVE: + case HEAPTUPLE_RECENTLY_DEAD: + case HEAPTUPLE_INSERT_IN_PROGRESS: + case HEAPTUPLE_DELETE_IN_PROGRESS: + if (heap_prepare_freeze_tuple(tuple.t_data, freezelimit, + mxactcutoff, &frozen[nfrozen])) + frozen[nfrozen++].offset = offnum; + break;This doesn't seem safe enough to me. Can't there be tuples that are still
new enough that they can't be frozen, and are still live?Yep. I've set a table to read only while it contained unfreezable tuples,
and the tuples remain unfrozen yet the read-only action claims to have
succeeded.Somewhat related... instead of forcing the freeze to happen synchronously,
can't we set this up so a table is in one of three states? Read/Write, Read
Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to the
appropriate state, and all the vacuum infrastructure would continue to
process those tables as it does today. lazy_vacuum_rel would become
responsible for tracking if there were any non-frozen tuples if it was also
attempting a freeze. If it discovered there were none, AND the table was
marked as ReadOnly, then it would change the table state to Frozen and set
relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
AT_SetReadWrite could change relfrozenxid to it's own Xid as an
optimization. Doing it that way leaves all the complicated vacuum code in
one place, and would eliminate concerns about race conditions with still
running transactions, etc.+1 here as well. I might want to set tables to read only for reasons other
than to avoid repeated freezing. (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to automatically
trigger a vacuum freeze in the process.
Thank you for comments.
Somewhat related... instead of forcing the freeze to happen synchronously, can't we set this up so a table is in one of three states? Read/Write, Read Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to > the appropriate state, and all the vacuum infrastructure would continue to process those tables as it does today. lazy_vacuum_rel would become responsible for tracking if there were any non-frozen tuples if it was also attempting > a freeze. If it discovered there were none, AND the table was marked as ReadOnly, then it would change the table state to Frozen and set relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId. AT_SetReadWrite > could change relfrozenxid to it's own Xid as an optimization. Doing it that way leaves all the complicated vacuum code in one place, and would eliminate concerns about race conditions with still running transactions, etc.
I agree with 3 status, Read/Write, ReadOnly and Frozen.
But I'm not sure when we should do to freeze tuples, e.g., scan whole tables.
I think that the any changes to table are completely
ignored/restricted if table is marked as ReadOnly table,
and it's accompanied by freezing tuples, just mark as ReadOnly.
Frozen table ensures that all tuples of its table completely has been
frozen, so it also needs to scan whole table as well.
e.g., we should need to scan whole table at two times. right?
+1 here as well. I might want to set tables to read only for reasons other than to avoid repeated freezing. (After all, the name of the command suggests it is a general purpose thing) and wouldn't want to automatically trigger a
vacuum freeze in the process.There is another possibility here, too. We can completely divorce a ReadOnly mode (which I think is useful for other things besides freezing) from the question of whether we need to force-freeze a relation if we create a
FrozenMap, similar to the visibility map. This has the added advantage of helping freeze scans on relations that are not ReadOnly in the case of tables that are insert-mostly or any other pattern where most pages stay all-frozen.
Prior to the visibility map this would have been a rather daunting project, but I believe this could piggyback on the VM code rather nicely. Anytime you clear the VM you clearly must clear the FrozenMap as well. The logic for
setting the FM is clearly different, but that would be entirely self-contained to vacuum. Unlike the VM, I don't see any point to marking special bits in the page itself for FM.
I was thinking this idea (FM) to avoid freezing all tuples actually.
As you said, it might not be good idea (or overkill) that the reason
why settings table to read only is avoidance repeated freezing.
I'm attempting to try design FM to avoid freezing relations as well.
Is it enough that each bit of FM has information that corresponding
pages are completely frozen on each bit?
Regards,
-------
Sawada Masahiko
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/6/15 1:46 AM, Sawada Masahiko wrote:
On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 4/3/15 12:59 AM, Sawada Masahiko wrote:
+ case HEAPTUPLE_LIVE: + case HEAPTUPLE_RECENTLY_DEAD: + case HEAPTUPLE_INSERT_IN_PROGRESS: + case HEAPTUPLE_DELETE_IN_PROGRESS: + if (heap_prepare_freeze_tuple(tuple.t_data, freezelimit, + mxactcutoff, &frozen[nfrozen])) + frozen[nfrozen++].offset = offnum; + break;This doesn't seem safe enough to me. Can't there be tuples that are still
new enough that they can't be frozen, and are still live?Yep. I've set a table to read only while it contained unfreezable tuples,
and the tuples remain unfrozen yet the read-only action claims to have
succeeded.Somewhat related... instead of forcing the freeze to happen synchronously,
can't we set this up so a table is in one of three states? Read/Write, Read
Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to the
appropriate state, and all the vacuum infrastructure would continue to
process those tables as it does today. lazy_vacuum_rel would become
responsible for tracking if there were any non-frozen tuples if it was also
attempting a freeze. If it discovered there were none, AND the table was
marked as ReadOnly, then it would change the table state to Frozen and set
relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
AT_SetReadWrite could change relfrozenxid to it's own Xid as an
optimization. Doing it that way leaves all the complicated vacuum code in
one place, and would eliminate concerns about race conditions with still
running transactions, etc.+1 here as well. I might want to set tables to read only for reasons other
than to avoid repeated freezing. (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to automatically
trigger a vacuum freeze in the process.Thank you for comments.
Somewhat related... instead of forcing the freeze to happen synchronously, can't we set this up so a table is in one of three states? Read/Write, Read Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to > the appropriate state, and all the vacuum infrastructure would continue to process those tables as it does today. lazy_vacuum_rel would become responsible for tracking if there were any non-frozen tuples if it was also attempting > a freeze. If it discovered there were none, AND the table was marked as ReadOnly, then it would change the table state to Frozen and set relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId. AT_SetReadWrite > could change relfrozenxid to it's own Xid as an optimization. Doing it that way leaves all the complicated vacuum code in one place, and would eliminate concerns about race conditions with still running transactions, etc.
I agree with 3 status, Read/Write, ReadOnly and Frozen.
But I'm not sure when we should do to freeze tuples, e.g., scan whole tables.
I think that the any changes to table are completely
ignored/restricted if table is marked as ReadOnly table,
and it's accompanied by freezing tuples, just mark as ReadOnly.
Frozen table ensures that all tuples of its table completely has been
frozen, so it also needs to scan whole table as well.
e.g., we should need to scan whole table at two times. right?
No. You would be free to set a table as ReadOnly any time you wanted,
without scanning anything. All that setting does is disable any DML on
the table.
The Frozen state would only be set by the vacuum code, IFF:
- The table state is ReadOnly *at the start of vacuum* and did not
change during vacuum
- Vacuum ensured that there were no un-frozen tuples in the table
That does not necessitate 2 scans.
+1 here as well. I might want to set tables to read only for reasons other than to avoid repeated freezing. (After all, the name of the command suggests it is a general purpose thing) and wouldn't want to automatically trigger a
vacuum freeze in the process.There is another possibility here, too. We can completely divorce a ReadOnly mode (which I think is useful for other things besides freezing) from the question of whether we need to force-freeze a relation if we create a
FrozenMap, similar to the visibility map. This has the added advantage of helping freeze scans on relations that are not ReadOnly in the case of tables that are insert-mostly or any other pattern where most pages stay all-frozen.
Prior to the visibility map this would have been a rather daunting project, but I believe this could piggyback on the VM code rather nicely. Anytime you clear the VM you clearly must clear the FrozenMap as well. The logic for
setting the FM is clearly different, but that would be entirely self-contained to vacuum. Unlike the VM, I don't see any point to marking special bits in the page itself for FM.I was thinking this idea (FM) to avoid freezing all tuples actually.
As you said, it might not be good idea (or overkill) that the reason
why settings table to read only is avoidance repeated freezing.
I'm attempting to try design FM to avoid freezing relations as well.
Is it enough that each bit of FM has information that corresponding
pages are completely frozen on each bit?
If I'm understanding your implied question correctly, I don't think
there would actually be any relationship between FM and marking
ReadOnly. It would come into play if we wanted to do the Frozen state,
but if we have the FM, marking an entire relation as Frozen becomes a
lot less useful. What's going to happen with a VACUUM FREEZE once we
have FM is that vacuum will be able to skip reading pages if they are
all-visible *and* the FM shows them as frozen, whereas today we can't
use the VM to skip pages if scan_all is true.
For simplicity, I would start out with each FM bit representing a single
page. That means the FM would be very similar in operation to the VM;
the only difference would be when a bit in the FM was set. I would
absolutely split this into 2 patches as well; one for ReadOnly (and skip
the Frozen status for now), and one for FM.
When I looked at the VM code briefly it occurred to me that it might be
quite difficult to have 1 FM bit represent multiple pages. The issue is
the locking necessary between VACUUM and clearing a FM bit. In the VM
that's handled by the cleanup lock, but that will only work at a page
level. We'd need something to ensure that nothing came in and performed
DML while the vacuum code was getting ready to set a FM bit. There's
probably several ways this could be accomplished, but I think it would
be foolish to try and do anything about it in the initial patch.
Especially because it's only supposition that there would be much
benefit to having multiple pages per bit.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Apr 6, 2015 at 10:17 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 4/6/15 1:46 AM, Sawada Masahiko wrote:
On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:On 4/3/15 12:59 AM, Sawada Masahiko wrote:
+ case HEAPTUPLE_LIVE: + case HEAPTUPLE_RECENTLY_DEAD: + case HEAPTUPLE_INSERT_IN_PROGRESS: + case HEAPTUPLE_DELETE_IN_PROGRESS: + if (heap_prepare_freeze_tuple(tuple.t_data, freezelimit, + mxactcutoff, &frozen[nfrozen])) + frozen[nfrozen++].offset = offnum; + break;This doesn't seem safe enough to me. Can't there be tuples that are
still
new enough that they can't be frozen, and are still live?Yep. I've set a table to read only while it contained unfreezable
tuples,
and the tuples remain unfrozen yet the read-only action claims to have
succeeded.Somewhat related... instead of forcing the freeze to happen
synchronously,
can't we set this up so a table is in one of three states? Read/Write,
Read
Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to
the
appropriate state, and all the vacuum infrastructure would continue to
process those tables as it does today. lazy_vacuum_rel would become
responsible for tracking if there were any non-frozen tuples if it was
also
attempting a freeze. If it discovered there were none, AND the table was
marked as ReadOnly, then it would change the table state to Frozen and
set
relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
AT_SetReadWrite could change relfrozenxid to it's own Xid as an
optimization. Doing it that way leaves all the complicated vacuum code
in
one place, and would eliminate concerns about race conditions with still
running transactions, etc.+1 here as well. I might want to set tables to read only for reasons
other
than to avoid repeated freezing. (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to
automatically
trigger a vacuum freeze in the process.Thank you for comments.
Somewhat related... instead of forcing the freeze to happen
synchronously, can't we set this up so a table is in one of three states?
Read/Write, Read Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would
simply change to > the appropriate state, and all the vacuum infrastructure
would continue to process those tables as it does today. lazy_vacuum_rel
would become responsible for tracking if there were any non-frozen tuples if
it was also attempting > a freeze. If it discovered there were none, AND the
table was marked as ReadOnly, then it would change the table state to Frozen
and set relfrozenxid = InvalidTransactionId and relminxid =
InvalidMultiXactId. AT_SetReadWrite > could change relfrozenxid to it's own
Xid as an optimization. Doing it that way leaves all the complicated vacuum
code in one place, and would eliminate concerns about race conditions with
still running transactions, etc.I agree with 3 status, Read/Write, ReadOnly and Frozen.
But I'm not sure when we should do to freeze tuples, e.g., scan whole
tables.
I think that the any changes to table are completely
ignored/restricted if table is marked as ReadOnly table,
and it's accompanied by freezing tuples, just mark as ReadOnly.
Frozen table ensures that all tuples of its table completely has been
frozen, so it also needs to scan whole table as well.
e.g., we should need to scan whole table at two times. right?No. You would be free to set a table as ReadOnly any time you wanted,
without scanning anything. All that setting does is disable any DML on the
table.The Frozen state would only be set by the vacuum code, IFF:
- The table state is ReadOnly *at the start of vacuum* and did not change
during vacuum
- Vacuum ensured that there were no un-frozen tuples in the tableThat does not necessitate 2 scans.
I understood this comcept, and have question as I wrote below.
+1 here as well. I might want to set tables to read only for reasons
other than to avoid repeated freezing. (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to automatically
trigger a
vacuum freeze in the process.There is another possibility here, too. We can completely divorce a
ReadOnly mode (which I think is useful for other things besides freezing)
from the question of whether we need to force-freeze a relation if we create
a
FrozenMap, similar to the visibility map. This has the added advantage of
helping freeze scans on relations that are not ReadOnly in the case of
tables that are insert-mostly or any other pattern where most pages stay
all-frozen.
Prior to the visibility map this would have been a rather daunting
project, but I believe this could piggyback on the VM code rather nicely.
Anytime you clear the VM you clearly must clear the FrozenMap as well. The
logic for
setting the FM is clearly different, but that would be entirely
self-contained to vacuum. Unlike the VM, I don't see any point to marking
special bits in the page itself for FM.I was thinking this idea (FM) to avoid freezing all tuples actually.
As you said, it might not be good idea (or overkill) that the reason
why settings table to read only is avoidance repeated freezing.
I'm attempting to try design FM to avoid freezing relations as well.
Is it enough that each bit of FM has information that corresponding
pages are completely frozen on each bit?If I'm understanding your implied question correctly, I don't think there
would actually be any relationship between FM and marking ReadOnly. It would
come into play if we wanted to do the Frozen state, but if we have the FM,
marking an entire relation as Frozen becomes a lot less useful. What's going
to happen with a VACUUM FREEZE once we have FM is that vacuum will be able
to skip reading pages if they are all-visible *and* the FM shows them as
frozen, whereas today we can't use the VM to skip pages if scan_all is true.For simplicity, I would start out with each FM bit representing a single
page. That means the FM would be very similar in operation to the VM; the
only difference would be when a bit in the FM was set. I would absolutely
split this into 2 patches as well; one for ReadOnly (and skip the Frozen
status for now), and one for FM.
When I looked at the VM code briefly it occurred to me that it might be
quite difficult to have 1 FM bit represent multiple pages. The issue is the
locking necessary between VACUUM and clearing a FM bit. In the VM that's
handled by the cleanup lock, but that will only work at a page level. We'd
need something to ensure that nothing came in and performed DML while the
vacuum code was getting ready to set a FM bit. There's probably several ways
this could be accomplished, but I think it would be foolish to try and do
anything about it in the initial patch. Especially because it's only
supposition that there would be much benefit to having multiple pages per
bit.
Yes, I will separate the patch into two patches.
I'd like to confirm about whether what I'm thinking is correct here.
In first version of patch, each FM bit represent a single page is
imply whether the all tuple of the page completely has been frozen, it
would be one patch.
The second patch adds 3 states and read-only table which disable to
any write to table. The trigger which changes state from Read/Write to
Read-Only is ALTER TABLE SET READ ONLY. And the trigger changes from
Read-Only to Frozen is vacuum only when the table has been marked as
Read-Only at vacuum is started *and* the vacuum did not any freeze
tuple(including skip the page refer to FM). If we support FM, we would
be able to avoid repeated freezing whole table even if the table has
not been marked as Read-Only.
In order to change state to Frozen, we need to do VACUUM FREEZE or
wait for running of auto vacuum. Generally, the threshold of cutoff
xid is different between VACUUM (and autovacuum) and VACUUM FREEZE. We
would not expect to change status using by explicit vacuum and
autovacuum. Inevitably, we would need to do both command ALTER TABLE
SET READ ONLY and VACUUM FREEZE to change state to Frozen.
I think that we should also add DDL which does both freezing tuple and
changing state in one pass, like ALTER TABLE SET READ ONLY WITH FREEZE
or ALTER TABLE SET FROZEN.
Regards,
-------
Sawada Masahiko
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/6/15 11:12 AM, Sawada Masahiko wrote:
On Mon, Apr 6, 2015 at 10:17 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 4/6/15 1:46 AM, Sawada Masahiko wrote:
On Sun, Apr 5, 2015 at 8:21 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Sat, Apr 4, 2015 at 3:10 PM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:On 4/3/15 12:59 AM, Sawada Masahiko wrote:
+ case HEAPTUPLE_LIVE: + case HEAPTUPLE_RECENTLY_DEAD: + case HEAPTUPLE_INSERT_IN_PROGRESS: + case HEAPTUPLE_DELETE_IN_PROGRESS: + if (heap_prepare_freeze_tuple(tuple.t_data, freezelimit, + mxactcutoff, &frozen[nfrozen])) + frozen[nfrozen++].offset = offnum; + break;This doesn't seem safe enough to me. Can't there be tuples that are
still
new enough that they can't be frozen, and are still live?Yep. I've set a table to read only while it contained unfreezable
tuples,
and the tuples remain unfrozen yet the read-only action claims to have
succeeded.Somewhat related... instead of forcing the freeze to happen
synchronously,
can't we set this up so a table is in one of three states? Read/Write,
Read
Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would simply change to
the
appropriate state, and all the vacuum infrastructure would continue to
process those tables as it does today. lazy_vacuum_rel would become
responsible for tracking if there were any non-frozen tuples if it was
also
attempting a freeze. If it discovered there were none, AND the table was
marked as ReadOnly, then it would change the table state to Frozen and
set
relfrozenxid = InvalidTransactionId and relminxid = InvalidMultiXactId.
AT_SetReadWrite could change relfrozenxid to it's own Xid as an
optimization. Doing it that way leaves all the complicated vacuum code
in
one place, and would eliminate concerns about race conditions with still
running transactions, etc.+1 here as well. I might want to set tables to read only for reasons
other
than to avoid repeated freezing. (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to
automatically
trigger a vacuum freeze in the process.Thank you for comments.
Somewhat related... instead of forcing the freeze to happen
synchronously, can't we set this up so a table is in one of three states?
Read/Write, Read Only, Frozen. AT_SetReadOnly and AT_SetReadWrite would
simply change to > the appropriate state, and all the vacuum infrastructure
would continue to process those tables as it does today. lazy_vacuum_rel
would become responsible for tracking if there were any non-frozen tuples if
it was also attempting > a freeze. If it discovered there were none, AND the
table was marked as ReadOnly, then it would change the table state to Frozen
and set relfrozenxid = InvalidTransactionId and relminxid =
InvalidMultiXactId. AT_SetReadWrite > could change relfrozenxid to it's own
Xid as an optimization. Doing it that way leaves all the complicated vacuum
code in one place, and would eliminate concerns about race conditions with
still running transactions, etc.I agree with 3 status, Read/Write, ReadOnly and Frozen.
But I'm not sure when we should do to freeze tuples, e.g., scan whole
tables.
I think that the any changes to table are completely
ignored/restricted if table is marked as ReadOnly table,
and it's accompanied by freezing tuples, just mark as ReadOnly.
Frozen table ensures that all tuples of its table completely has been
frozen, so it also needs to scan whole table as well.
e.g., we should need to scan whole table at two times. right?No. You would be free to set a table as ReadOnly any time you wanted,
without scanning anything. All that setting does is disable any DML on the
table.The Frozen state would only be set by the vacuum code, IFF:
- The table state is ReadOnly *at the start of vacuum* and did not change
during vacuum
- Vacuum ensured that there were no un-frozen tuples in the tableThat does not necessitate 2 scans.
I understood this comcept, and have question as I wrote below.
+1 here as well. I might want to set tables to read only for reasons
other than to avoid repeated freezing. (After all, the name of the command
suggests it is a general purpose thing) and wouldn't want to automatically
trigger a
vacuum freeze in the process.There is another possibility here, too. We can completely divorce a
ReadOnly mode (which I think is useful for other things besides freezing)
from the question of whether we need to force-freeze a relation if we create
a
FrozenMap, similar to the visibility map. This has the added advantage of
helping freeze scans on relations that are not ReadOnly in the case of
tables that are insert-mostly or any other pattern where most pages stay
all-frozen.
Prior to the visibility map this would have been a rather daunting
project, but I believe this could piggyback on the VM code rather nicely.
Anytime you clear the VM you clearly must clear the FrozenMap as well. The
logic for
setting the FM is clearly different, but that would be entirely
self-contained to vacuum. Unlike the VM, I don't see any point to marking
special bits in the page itself for FM.I was thinking this idea (FM) to avoid freezing all tuples actually.
As you said, it might not be good idea (or overkill) that the reason
why settings table to read only is avoidance repeated freezing.
I'm attempting to try design FM to avoid freezing relations as well.
Is it enough that each bit of FM has information that corresponding
pages are completely frozen on each bit?If I'm understanding your implied question correctly, I don't think there
would actually be any relationship between FM and marking ReadOnly. It would
come into play if we wanted to do the Frozen state, but if we have the FM,
marking an entire relation as Frozen becomes a lot less useful. What's going
to happen with a VACUUM FREEZE once we have FM is that vacuum will be able
to skip reading pages if they are all-visible *and* the FM shows them as
frozen, whereas today we can't use the VM to skip pages if scan_all is true.For simplicity, I would start out with each FM bit representing a single
page. That means the FM would be very similar in operation to the VM; the
only difference would be when a bit in the FM was set. I would absolutely
split this into 2 patches as well; one for ReadOnly (and skip the Frozen
status for now), and one for FM.
When I looked at the VM code briefly it occurred to me that it might be
quite difficult to have 1 FM bit represent multiple pages. The issue is the
locking necessary between VACUUM and clearing a FM bit. In the VM that's
handled by the cleanup lock, but that will only work at a page level. We'd
need something to ensure that nothing came in and performed DML while the
vacuum code was getting ready to set a FM bit. There's probably several ways
this could be accomplished, but I think it would be foolish to try and do
anything about it in the initial patch. Especially because it's only
supposition that there would be much benefit to having multiple pages per
bit.Yes, I will separate the patch into two patches.
I'd like to confirm about whether what I'm thinking is correct here.
In first version of patch, each FM bit represent a single page is
imply whether the all tuple of the page completely has been frozen, it
would be one patch.
Yes.
The second patch adds 3 states and read-only table which disable to
Actually, I would start simply with ReadOnly and ReadWrite.
As I understand it, the goal here is to prevent huge amounts of periodic
freeze work due to XID wraparound. I don't think we need the Freeze
state to accomplish that.
With a single bit per page in the Frozen Map, checking a 800GB table
would require reading a mere 100MB of FM. That's pretty tiny, and
largely accomplishes the goal.
Obviously it would be nice to eliminate even that 100MB read, but I
suggest you leave that for a 3rd patch. I think you'll find that just
getting the first 2 accomplished will be a significant amount of work.
Also, note that you don't really even need the ReadOnly patch. As long
as you're not actually touching the table at all the FM will eventually
read as everything is frozen; that gets you 80% of the way there. So I'd
suggest starting with the FM, then doing ReadOnly, and only then
attempting to add the Frozen state.
any write to table. The trigger which changes state from Read/Write to
Read-Only is ALTER TABLE SET READ ONLY. And the trigger changes from
Read-Only to Frozen is vacuum only when the table has been marked as
Read-Only at vacuum is started *and* the vacuum did not any freeze
tuple(including skip the page refer to FM). If we support FM, we would
be able to avoid repeated freezing whole table even if the table has
not been marked as Read-Only.In order to change state to Frozen, we need to do VACUUM FREEZE or
wait for running of auto vacuum. Generally, the threshold of cutoff
xid is different between VACUUM (and autovacuum) and VACUUM FREEZE. We
would not expect to change status using by explicit vacuum and
autovacuum. Inevitably, we would need to do both command ALTER TABLE
SET READ ONLY and VACUUM FREEZE to change state to Frozen.
I think that we should also add DDL which does both freezing tuple and
changing state in one pass, like ALTER TABLE SET READ ONLY WITH FREEZE
or ALTER TABLE SET FROZEN.Regards,
-------
Sawada Masahiko
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Apr 06, 2015 at 12:07:47PM -0500, Jim Nasby wrote:
...
As I understand it, the goal here is to prevent huge amounts of
periodic freeze work due to XID wraparound. I don't think we need
the Freeze state to accomplish that.With a single bit per page in the Frozen Map, checking a 800GB table
would require reading a mere 100MB of FM. That's pretty tiny, and
largely accomplishes the goal.Obviously it would be nice to eliminate even that 100MB read, but I
suggest you leave that for a 3rd patch. I think you'll find that
just getting the first 2 accomplished will be a significant amount
of work.
Hi,
I may have my math wrong, but 800GB ~ 100M pages or 12.5MB and not
100MB.
Regards,
Ken
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/6/15 12:29 PM, ktm@rice.edu wrote:
On Mon, Apr 06, 2015 at 12:07:47PM -0500, Jim Nasby wrote:
...
As I understand it, the goal here is to prevent huge amounts of
periodic freeze work due to XID wraparound. I don't think we need
the Freeze state to accomplish that.With a single bit per page in the Frozen Map, checking a 800GB table
would require reading a mere 100MB of FM. That's pretty tiny, and
largely accomplishes the goal.Obviously it would be nice to eliminate even that 100MB read, but I
suggest you leave that for a 3rd patch. I think you'll find that
just getting the first 2 accomplished will be a significant amount
of work.Hi,
I may have my math wrong, but 800GB ~ 100M pages or 12.5MB and not
100MB.
Doh! 8 bits per byte and all that...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/06/2015 10:07 AM, Jim Nasby wrote:
Actually, I would start simply with ReadOnly and ReadWrite.
As I understand it, the goal here is to prevent huge amounts of periodic
freeze work due to XID wraparound. I don't think we need the Freeze
state to accomplish that.With a single bit per page in the Frozen Map, checking a 800GB table
would require reading a mere 100MB of FM. That's pretty tiny, and
largely accomplishes the goal.Obviously it would be nice to eliminate even that 100MB read, but I
suggest you leave that for a 3rd patch. I think you'll find that just
getting the first 2 accomplished will be a significant amount of work.Also, note that you don't really even need the ReadOnly patch. As long
as you're not actually touching the table at all the FM will eventually
read as everything is frozen; that gets you 80% of the way there. So I'd
suggest starting with the FM, then doing ReadOnly, and only then
attempting to add the Frozen state.
+1
There was some reason why we didn't have Freeze Map before, though;
IIRC these were the problems:
1. would need to make sure it gets sync'd to disk and/or WAL-logged
2. every time a page is modified, the map would need to get updated
3. Yet Another Relation File (not inconsequential for the cases we're
discussing).
Also, given that the Visibility Map necessarily needs to have the
superset of the Frozen Map, maybe combining them in some way would make
sense.
I agree with Jim that if we have a trustworthy Frozen Map, having a
ReadOnly flag is of marginal value, unless such a ReadOnly flag allowed
us to skip updating the individual row XIDs entirely. I can think of
some ways to do that, but they have severe tradeoffs.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WM9cfb479454c98fc2f6c2fc96595f2270be2f1b232337ae16090513007a4ec49f1f81155e38aa058a6be880a5bf3abffc@asav-2.01.com
Josh Berkus wrote:
I agree with Jim that if we have a trustworthy Frozen Map, having a
ReadOnly flag is of marginal value, unless such a ReadOnly flag allowed
us to skip updating the individual row XIDs entirely. I can think of
some ways to do that, but they have severe tradeoffs.
If you're thinking that the READ ONLY flag is only useful for freezing,
then yeah maybe it's of marginal value. But in the foreign key
constraint area, consider that you could have tables with
frequently-referenced PKs marked as READ ONLY -- then you don't need to
acquire row locks when inserting/updating rows in the referencing
tables. That might give you a good performance benefit that's not in
any way related to freezing, as well as reducing your multixact
consumption rate.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/06/2015 11:35 AM, Alvaro Herrera wrote:
Josh Berkus wrote:
I agree with Jim that if we have a trustworthy Frozen Map, having a
ReadOnly flag is of marginal value, unless such a ReadOnly flag allowed
us to skip updating the individual row XIDs entirely. I can think of
some ways to do that, but they have severe tradeoffs.If you're thinking that the READ ONLY flag is only useful for freezing,
then yeah maybe it's of marginal value. But in the foreign key
constraint area, consider that you could have tables with
frequently-referenced PKs marked as READ ONLY -- then you don't need to
acquire row locks when inserting/updating rows in the referencing
tables. That might give you a good performance benefit that's not in
any way related to freezing, as well as reducing your multixact
consumption rate.
Hmmmm. Yeah, that would make it worthwhile, although it would be a
fairly obscure bit of performance optimization for anyone not on this
list ;-)
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WM1c92bad30268e008b65280a077ebc2d6fb5d7fa7c5cc6eca66f05a12dddd05a283de603e859962174c274cf7cf6b3bb9@asav-1.01.com
On 4/6/15 1:28 PM, Josh Berkus wrote:
On 04/06/2015 10:07 AM, Jim Nasby wrote:
Actually, I would start simply with ReadOnly and ReadWrite.
As I understand it, the goal here is to prevent huge amounts of periodic
freeze work due to XID wraparound. I don't think we need the Freeze
state to accomplish that.With a single bit per page in the Frozen Map, checking a 800GB table
would require reading a mere 100MB of FM. That's pretty tiny, and
largely accomplishes the goal.Obviously it would be nice to eliminate even that 100MB read, but I
suggest you leave that for a 3rd patch. I think you'll find that just
getting the first 2 accomplished will be a significant amount of work.Also, note that you don't really even need the ReadOnly patch. As long
as you're not actually touching the table at all the FM will eventually
read as everything is frozen; that gets you 80% of the way there. So I'd
suggest starting with the FM, then doing ReadOnly, and only then
attempting to add the Frozen state.+1
There was some reason why we didn't have Freeze Map before, though;
IIRC these were the problems:1. would need to make sure it gets sync'd to disk and/or WAL-logged
Same as VM.
2. every time a page is modified, the map would need to get updated
Not everytime, just the first time if FM for a page was set. It would
only be set by vacuum, just like VM.
3. Yet Another Relation File (not inconsequential for the cases we're
discussing).
Sure, which is why I think it might be interesting to either allow for
more than one page per bit, or perhaps some form of compression. That
said, I don't think it's worth worrying about too much because it's
still a 64,000-1 ratio with 8k pages. If you use 32k pages it becomes
256,000-1, or 4GB of FM for 1PB of heap.
Also, given that the Visibility Map necessarily needs to have the
superset of the Frozen Map, maybe combining them in some way would make
sense.
The thing is, I think in many workloads the paterns here will actually
be radically different, in that it's way easier to get a page to be
all-visible than it is to freeze it.
Perhaps there's something we can do here when we look at other ways to
reduce space usage for FM (and maybe VM too), but I don't think now is
the time to put effort into this.
I agree with Jim that if we have a trustworthy Frozen Map, having a
ReadOnly flag is of marginal value, unless such a ReadOnly flag allowed
us to skip updating the individual row XIDs entirely. I can think of
some ways to do that, but they have severe tradeoffs.
Aside from Alvaro's points, I think many users would find it useful as
an easy way to ensure no one is writing to a table, which could be
valuable for any number of reasons. As long as the patch isn't too
complicated I don't see a reason not to do it.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6 Apr 2015 09:17, "Jim Nasby" <Jim.Nasby@bluetreble.com> wrote:
No. You would be free to set a table as ReadOnly any time you wanted,
without scanning anything. All that setting does is disable any DML on the
table.
The Frozen state would only be set by the vacuum code, IFF:
- The table state is ReadOnly *at the start of vacuum* and did not change
during vacuum
- Vacuum ensured that there were no un-frozen tuples in the table
That does not necessitate 2 scans.
This is exactly what I would suggest.
Only I would suggest thinking of it in terms of two orthogonal boolean
flags rather than three states. It's easier to reason about whether a table
has a specific property than trying to control a state machine in a
predefined pathway.
So I would say the two flags are:
READONLY: guarantees nothing can be dirtied
ALLFROZEN: guarantees no unfrozen tuples are present
In practice you can't have the later without the former since vacuum can't
know everything is frozen unless it knows nobody is inserting. But perhaps
there will be cases in the future where that's not true.
Incidentally there are number of other optimisations tat over had in mind
that are only possible on frozen read-only tables:
1) Compression: compress the pages and pack them one after the other. Build
a new fork with offsets for each page.
2) Automatic partition elimination where the statistics track the minimum
and maximum value per partition (and number of tuples) and treat then as
implicit constraints. In particular it would magically make read only empty
parent partitions be excluded regardless of the where clause.
On 4/6/15 5:18 PM, Greg Stark wrote:
Only I would suggest thinking of it in terms of two orthogonal boolean
flags rather than three states. It's easier to reason about whether a
table has a specific property than trying to control a state machine in
a predefined pathway.So I would say the two flags are:
READONLY: guarantees nothing can be dirtied
ALLFROZEN: guarantees no unfrozen tuples are presentIn practice you can't have the later without the former since vacuum
can't know everything is frozen unless it knows nobody is inserting. But
perhaps there will be cases in the future where that's not true.
I'm not so sure about that. There's a logical state progression here
(see below). ISTM it's easier to just enforce that in one place instead
of a bunch of places having to check multiple conditions. But, I'm not
wed to a single field.
Incidentally there are number of other optimisations tat over had in
mind that are only possible on frozen read-only tables:1) Compression: compress the pages and pack them one after the other.
Build a new fork with offsets for each page.2) Automatic partition elimination where the statistics track the
minimum and maximum value per partition (and number of tuples) and treat
then as implicit constraints. In particular it would magically make read
only empty parent partitions be excluded regardless of the where clause.
AFAICT neither of those actually requires ALLFROZEN, no? You'll need to
uncompact and re-compact for #1 when you actually freeze (which maybe
isn't worth it), but freezing isn't absolutely required. #2 would only
require that everything in the relation is visible; not frozen.
I think there's value here to having an ALLVISIBLE state as well as
ALLFROZEN.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 7, 2015 at 7:53 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 4/6/15 5:18 PM, Greg Stark wrote:
Only I would suggest thinking of it in terms of two orthogonal boolean
flags rather than three states. It's easier to reason about whether a
table has a specific property than trying to control a state machine in
a predefined pathway.So I would say the two flags are:
READONLY: guarantees nothing can be dirtied
ALLFROZEN: guarantees no unfrozen tuples are presentIn practice you can't have the later without the former since vacuum
can't know everything is frozen unless it knows nobody is inserting. But
perhaps there will be cases in the future where that's not true.I'm not so sure about that. There's a logical state progression here (see
below). ISTM it's easier to just enforce that in one place instead of a
bunch of places having to check multiple conditions. But, I'm not wed to a
single field.Incidentally there are number of other optimisations tat over had in
mind that are only possible on frozen read-only tables:1) Compression: compress the pages and pack them one after the other.
Build a new fork with offsets for each page.2) Automatic partition elimination where the statistics track the
minimum and maximum value per partition (and number of tuples) and treat
then as implicit constraints. In particular it would magically make read
only empty parent partitions be excluded regardless of the where clause.AFAICT neither of those actually requires ALLFROZEN, no? You'll need to
uncompact and re-compact for #1 when you actually freeze (which maybe isn't
worth it), but freezing isn't absolutely required. #2 would only require
that everything in the relation is visible; not frozen.I think there's value here to having an ALLVISIBLE state as well as
ALLFROZEN.
Based on may suggestions, I'm going to deal with FM at first as one
patch. It would be simply mechanism and similar to VM, at first patch.
- Each bit of FM represent single page
- The bit is set only by vacuum
- The bit is un-set by inserting and updating and deleting
At second, I'll deal with simply read-only table and 2 states,
Read/Write(default) and ReadOnly as one patch. ITSM the having the
Frozen state needs to more discussion. read-only table just allow us
to disable any updating table, and it's controlled by read-only flag
pg_class has. And DDL command which changes these status is like ALTER
TABLE SET READ ONLY, or READ WRITE.
Also as Alvaro's suggested, the read-only table affect not only
freezing table but also performance optimization. I'll consider
including them when I deal with read-only table.
Regards,
-------
Sawada Masahiko
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 7, 2015 at 11:22 AM, Sawada Masahiko <sawada.mshk@gmail.com> wrote:
On Tue, Apr 7, 2015 at 7:53 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 4/6/15 5:18 PM, Greg Stark wrote:
Only I would suggest thinking of it in terms of two orthogonal boolean
flags rather than three states. It's easier to reason about whether a
table has a specific property than trying to control a state machine in
a predefined pathway.So I would say the two flags are:
READONLY: guarantees nothing can be dirtied
ALLFROZEN: guarantees no unfrozen tuples are presentIn practice you can't have the later without the former since vacuum
can't know everything is frozen unless it knows nobody is inserting. But
perhaps there will be cases in the future where that's not true.I'm not so sure about that. There's a logical state progression here (see
below). ISTM it's easier to just enforce that in one place instead of a
bunch of places having to check multiple conditions. But, I'm not wed to a
single field.Incidentally there are number of other optimisations tat over had in
mind that are only possible on frozen read-only tables:1) Compression: compress the pages and pack them one after the other.
Build a new fork with offsets for each page.2) Automatic partition elimination where the statistics track the
minimum and maximum value per partition (and number of tuples) and treat
then as implicit constraints. In particular it would magically make read
only empty parent partitions be excluded regardless of the where clause.AFAICT neither of those actually requires ALLFROZEN, no? You'll need to
uncompact and re-compact for #1 when you actually freeze (which maybe isn't
worth it), but freezing isn't absolutely required. #2 would only require
that everything in the relation is visible; not frozen.I think there's value here to having an ALLVISIBLE state as well as
ALLFROZEN.Based on may suggestions, I'm going to deal with FM at first as one
patch. It would be simply mechanism and similar to VM, at first patch.
- Each bit of FM represent single page
- The bit is set only by vacuum
- The bit is un-set by inserting and updating and deletingAt second, I'll deal with simply read-only table and 2 states,
Read/Write(default) and ReadOnly as one patch. ITSM the having the
Frozen state needs to more discussion. read-only table just allow us
to disable any updating table, and it's controlled by read-only flag
pg_class has. And DDL command which changes these status is like ALTER
TABLE SET READ ONLY, or READ WRITE.
Also as Alvaro's suggested, the read-only table affect not only
freezing table but also performance optimization. I'll consider
including them when I deal with read-only table.
Attached WIP patch adds Frozen Map which enables us to avoid whole
table vacuuming even when full scan is required: preventing XID
wraparound failures.
Frozen Map is a bitmap with one bit per heap page, and quite similar
to Visibility Map. A set bit means that all tuples on heap page are
completely frozen, therefore we don't need to do vacuum freeze that
page.
A bit is set when vacuum(or autovacuum) figures out that all tuples on
corresponding heap page are completely frozen, and a bit is cleared
when INSERT and UPDATE(only new heap page) are executed.
Current patch adds new source file src/backend/access/heap/frozenmap.c
which is quite similar to visibilitymap.c. They have similar code but
are separated for now. I do refactoring these source code like adding
bitmap.c, if needed.
Also, when skipping vacuum by visibility map, we can skip at least
SKIP_PAGE_THESHOLD consecutive page, but such mechanism is not in
frozen map.
Please give me feedbacks.
Regards,
-------
Sawada Masahiko
Attachments:
000_frozenmap_WIP.patchtext/x-diff; charset=US-ASCII; name=000_frozenmap_WIP.patchDownload+1201-67
On Mon, Apr 20, 2015 at 04:45:34PM +0900, Sawada Masahiko wrote:
Attached WIP patch adds Frozen Map which enables us to avoid whole
table vacuuming even when full scan is required: preventing XID
wraparound failures.Frozen Map is a bitmap with one bit per heap page, and quite similar
to Visibility Map. A set bit means that all tuples on heap page are
completely frozen, therefore we don't need to do vacuum freeze that
page.
A bit is set when vacuum(or autovacuum) figures out that all tuples on
corresponding heap page are completely frozen, and a bit is cleared
when INSERT and UPDATE(only new heap page) are executed.
So, this patch avoids reading the all-frozen pages if it has not been
modified since the last VACUUM FREEZE? Since it is already frozen, the
running VACUUM FREEZE will not modify the page or generate WAL, so is it
really worth maintaining a new per-page bitmap just to avoid the
sequential scan of tables every 200MB transactions? I would like to see
us reduce the need for VACUUM FREEZE, rather than go in this direction.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/20/15 1:48 PM, Bruce Momjian wrote:
On Mon, Apr 20, 2015 at 04:45:34PM +0900, Sawada Masahiko wrote:
Attached WIP patch adds Frozen Map which enables us to avoid whole
table vacuuming even when full scan is required: preventing XID
wraparound failures.Frozen Map is a bitmap with one bit per heap page, and quite similar
to Visibility Map. A set bit means that all tuples on heap page are
completely frozen, therefore we don't need to do vacuum freeze that
page.
A bit is set when vacuum(or autovacuum) figures out that all tuples on
corresponding heap page are completely frozen, and a bit is cleared
when INSERT and UPDATE(only new heap page) are executed.So, this patch avoids reading the all-frozen pages if it has not been
modified since the last VACUUM FREEZE? Since it is already frozen, the
running VACUUM FREEZE will not modify the page or generate WAL, so is it
really worth maintaining a new per-page bitmap just to avoid the
sequential scan of tables every 200MB transactions? I would like to see
us reduce the need for VACUUM FREEZE, rather than go in this direction.
How would you propose we do that?
I also think there's better ways we could handle *all* our cleanup work.
Tuples have a definite lifespan, and there's potentially a lot of
efficiency to be gained if we could track tuples through their stages of
life... but I don't see any easy ways to do that.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers