Some other things about contrib/bloom and generic_xlog.c
1. It doesn't seem like generic_xlog.c has thought very carefully about
the semantics of the "hole" between pd_lower and pd_upper. The mainline
XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
for example RestoreBlockImage() explicitly zeroes the hole when restoring
from a full-page image that has a hole. But generic_xlog.c's redo routine
does not do anything comparable, nor does GenericXLogFinish make any
effort to ensure that the "hole" is all-zeroes after normal application of
a generic update. The reason this is of interest is that it means the
contents of the "hole" could diverge between master and slave, or differ
between the original state of a database and what it is after a crash and
recovery. That would at least complicate forensic comparisons of pages,
and I think it might also break checksumming. We thought that this was
important enough to take the trouble of explicitly zeroing holes during
mainline XLOG replay. Shouldn't generic_xlog.c take the same trouble?
2. Unless I'm missing something, contrib/bloom is making no effort
to ensure that bloom index pages can be distinguished from other pages
by pg_filedump. That's fine if your expectation is that bloom will always
be a toy with no use in production; but otherwise, not so much. You need
to make sure that the last two bytes of the page special space contain a
uniquely identifiable code; compare spgist_page_id in SPGiST indexes.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Apr 10, 2016 at 1:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. It doesn't seem like generic_xlog.c has thought very carefully about
the semantics of the "hole" between pd_lower and pd_upper. The mainline
XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
for example RestoreBlockImage() explicitly zeroes the hole when restoring
from a full-page image that has a hole.
Yes.
But generic_xlog.c's redo routine
does not do anything comparable, nor does GenericXLogFinish make any
effort to ensure that the "hole" is all-zeroes after normal application of
a generic update. The reason this is of interest is that it means the
contents of the "hole" could diverge between master and slave, or differ
between the original state of a database and what it is after a crash and
recovery. That would at least complicate forensic comparisons of pages,
and I think it might also break checksumming. We thought that this was
important enough to take the trouble of explicitly zeroing holes during
mainline XLOG replay. Shouldn't generic_xlog.c take the same trouble?
One look at checksum_impl.h shows up that the page hole is taken into
account in the checksum calculation, so we definitely has better zero
the page. And looking at generic_xlog.c, this code either logs in a
full page, or a delta.
Actually, as I look at this code for the first time, I find that
GenericXLogFinish() is a very confusing interface. It makes no
distinction between a page and the meta data associated to this page,
this is controlled by a flag isNew and buffer data is saved as some
delta. Actually, fullmage is not exactly true, this may refer to a
page that has a hole. It seems to me that we should not have one but
two routines: GenericXLogRegisterBuffer and
GenericXLogRegisterBufData, similar to what the normal XLOG routines
offer.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
mainline XLOG replay. Shouldn't generic_xlog.c take the same trouble?
Yes, it should. Alexander now works on it.
2. Unless I'm missing something, contrib/bloom is making no effort
to ensure that bloom index pages can be distinguished from other pages
by pg_filedump. That's fine if your expectation is that bloom will always
be a toy with no use in production; but otherwise, not so much. You need
to make sure that the last two bytes of the page special space contain a
uniquely identifiable code; compare spgist_page_id in SPGiST indexes.
Now contrib/bloom is a canonical example how to implement yourown index and how
to use generic WAL interface. And it is an useful toy: in some cases it could
help in production system, patch attached. I'm a little dissatisfied with patch
because I had to add unused field to BloomPageOpaqueData, in opposite case size
of BloomPageOpaqueData struct equals 6 bytes but special size is MAXALIGNED, so,
last two bytes will be unused. If unused field is not a problem, I will push
this patch.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi, Tom!
On Sun, Apr 10, 2016 at 7:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. It doesn't seem like generic_xlog.c has thought very carefully about
the semantics of the "hole" between pd_lower and pd_upper. The mainline
XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
for example RestoreBlockImage() explicitly zeroes the hole when restoring
from a full-page image that has a hole. But generic_xlog.c's redo routine
does not do anything comparable, nor does GenericXLogFinish make any
effort to ensure that the "hole" is all-zeroes after normal application of
a generic update. The reason this is of interest is that it means the
contents of the "hole" could diverge between master and slave, or differ
between the original state of a database and what it is after a crash and
recovery. That would at least complicate forensic comparisons of pages,
and I think it might also break checksumming. We thought that this was
important enough to take the trouble of explicitly zeroing holes during
mainline XLOG replay. Shouldn't generic_xlog.c take the same trouble?2. Unless I'm missing something, contrib/bloom is making no effort
to ensure that bloom index pages can be distinguished from other pages
by pg_filedump. That's fine if your expectation is that bloom will always
be a toy with no use in production; but otherwise, not so much. You need
to make sure that the last two bytes of the page special space contain a
uniquely identifiable code; compare spgist_page_id in SPGiST indexes.
Thank you for spotting these issues.
I'm going to prepare patches for fixing both of them.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Michael Paquier <michael.paquier@gmail.com> writes:
Actually, as I look at this code for the first time, I find that
GenericXLogFinish() is a very confusing interface. It makes no
distinction between a page and the meta data associated to this page,
this is controlled by a flag isNew and buffer data is saved as some
delta. Actually, fullmage is not exactly true, this may refer to a
page that has a hole. It seems to me that we should not have one but
two routines: GenericXLogRegisterBuffer and
GenericXLogRegisterBufData, similar to what the normal XLOG routines
offer.
Hmm. Maybe the documentation leaves something to be desired, but
I thought that the interface was reasonable if you grant that the
goal is to be easy to use rather than to have maximum performance.
The calling code only has to concern itself with making the actual
changes to the target pages, not with constructing WAL descriptions
of those changes. Also, the fact that the changes don't have to be
made within a critical section is a bigger help than it might sound.
So I don't really have any problems with the API as such. I tried
to improve the docs a day or two ago, but maybe that needs further
work.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Teodor Sigaev <teodor@sigaev.ru> writes:
2. Unless I'm missing something, contrib/bloom is making no effort
to ensure that bloom index pages can be distinguished from other pages
by pg_filedump. That's fine if your expectation is that bloom will always
be a toy with no use in production; but otherwise, not so much. You need
to make sure that the last two bytes of the page special space contain a
uniquely identifiable code; compare spgist_page_id in SPGiST indexes.
Now contrib/bloom is a canonical example how to implement yourown index and how
to use generic WAL interface. And it is an useful toy: in some cases it could
help in production system, patch attached. I'm a little dissatisfied with patch
because I had to add unused field to BloomPageOpaqueData, in opposite case size
of BloomPageOpaqueData struct equals 6 bytes but special size is MAXALIGNED, so,
last two bytes will be unused. If unused field is not a problem, I will push
this patch.
Yes, it will mean that the special space will have to be 8 bytes not 4.
But on the other hand, that only makes a difference on 32-bit machines;
if MAXALIGN is 8 then you won't be able to fit anything into those bytes
anyway. And someday there might be a use for the other two bytes ...
so I'm not particularly concerned by the "wasted space" argument.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Apr 11, 2016 at 11:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
Actually, as I look at this code for the first time, I find that
GenericXLogFinish() is a very confusing interface. It makes no
distinction between a page and the meta data associated to this page,
this is controlled by a flag isNew and buffer data is saved as some
delta. Actually, fullmage is not exactly true, this may refer to a
page that has a hole. It seems to me that we should not have one but
two routines: GenericXLogRegisterBuffer and
GenericXLogRegisterBufData, similar to what the normal XLOG routines
offer.Hmm. Maybe the documentation leaves something to be desired, but
I thought that the interface was reasonable if you grant that the
goal is to be easy to use rather than to have maximum performance.
The calling code only has to concern itself with making the actual
changes to the target pages, not with constructing WAL descriptions
of those changes. Also, the fact that the changes don't have to be
made within a critical section is a bigger help than it might sound.So I don't really have any problems with the API as such. I tried
to improve the docs a day or two ago, but maybe that needs further
work.
Well, I am not saying that the given interface does nothing, far from
that. Though I think that we could take advantage of the rmgr
RM_GENERIC_ID introduced by this interface to be able to generate
custom WAL records and pass them through a stream.
As given out now, we are able to do the following:
- Log a full page
- Log a delta of a full page, which is actually data associated to a page.
- At recovery, replay those full pages with a delta.
What I thought we should be able to do with that should not be only
limited to indexes, aka to:
- Be able to register and log full pages
- Be able to log data associated to a block
- Be able to have record data
- Use at recovery custom routines to apply those WAL records
The first 3 ones are done via the set of routines generating WAL
records for the rmgr RM_GENERIC_ID. The 4th one needs a hook in
generic_redo. Looking at the thread where this patch has been debated
I am not seeing a discussion regarding that.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier <michael.paquier@gmail.com> writes:
As given out now, we are able to do the following:
- Log a full page
- Log a delta of a full page, which is actually data associated to a page.
- At recovery, replay those full pages with a delta.
Right.
What I thought we should be able to do with that should not be only
limited to indexes, aka to:
- Be able to register and log full pages
- Be able to log data associated to a block
- Be able to have record data
- Use at recovery custom routines to apply those WAL records
I'm not following your distinction between a "page" and a "block",
perhaps. Also, the entire point here is that extensions DON'T
get to have custom apply routines, because we have no way for
replay to know about such apply routines. (It surely cannot look
in the catalogs for them, but there's no other suitable infrastructure
either.) In turn, that means there's little value in having any custom
data associated with the WAL records.
If we ever solve the registering-custom-replay-routines conundrum,
it'll be time to think about what the frontend API for that ought
to be. But this patch is not pretending to solve that, and indeed is
mainly showing that it's possible to have a useful WAL extension API
that doesn't require solving it.
In any case, the existence of this API doesn't foreclose adding
other APIs (perhaps backed by other RM_GENERIC_ID WAL record types)
later on. So I don't think we need to insist that it do everything
anyone will ever want.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
... BTW, with respect to the documentation angle, it seems to me
that it'd be better if GenericXLogRegister were renamed to
GenericXLogRegisterBuffer, or perhaps GenericXLogRegisterPage.
I think this would make the documentation clearer, and it would
also make it easier to add other sorts of Register actions later,
if we ever think of some (which seems not unlikely, really).
Another thing to think about is whether we're going to regret
hard-wiring the third argument as a boolean. Should we consider
making it a bitmask of flags, instead? It's not terribly hard
to think of other flags we might want there in future; for example
maybe something to tell GenericXLogFinish whether it's worth trying
to identify data movement on the page rather than just doing the
byte-by-byte delta calculation.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 12, 2016 at 9:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
What I thought we should be able to do with that should not be only
limited to indexes, aka to:
- Be able to register and log full pages
- Be able to log data associated to a block
- Be able to have record data
- Use at recovery custom routines to apply those WAL recordsI'm not following your distinction between a "page" and a "block",
perhaps.
s/block/page/. Sorry for the confusion.
Also, the entire point here is that extensions DON'T
get to have custom apply routines, because we have no way for
replay to know about such apply routines. (It surely cannot look
in the catalogs for them, but there's no other suitable infrastructure
either.) In turn, that means there's little value in having any custom
data associated with the WAL records.
Well, yes, the startup process has no knowledge of that. That's why it
would need to go through a hook, but the startup process has no
knowledge of routines loaded via _PG_init perhaps? I recall that it
loaded them. The weakness with this interface is that one needs to be
sure that this hook is actually loaded properly.
If we ever solve the registering-custom-replay-routines conundrum,
it'll be time to think about what the frontend API for that ought
to be. But this patch is not pretending to solve that, and indeed is
mainly showing that it's possible to have a useful WAL extension API
that doesn't require solving it.
Yep. I am not saying the contrary. This patch solves its own category
of things with its strict page-level control.
In any case, the existence of this API doesn't foreclose adding
other APIs (perhaps backed by other RM_GENERIC_ID WAL record types)
later on. So I don't think we need to insist that it do everything
anyone will ever want.
Perhaps I am just confused by the interface. Linking the idea of a
page delta with GenericXLogRegister is not that intuitive, knowing
that what it should actually be GenericXLogRegisterBuffer and we
should have GenericXLogAddDelta, that applies a page difference on top
of an existing one.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 12, 2016 at 9:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... BTW, with respect to the documentation angle, it seems to me
that it'd be better if GenericXLogRegister were renamed to
GenericXLogRegisterBuffer, or perhaps GenericXLogRegisterPage.
I think this would make the documentation clearer, and it would
also make it easier to add other sorts of Register actions later,
if we ever think of some (which seems not unlikely, really).
Funny thing. I just suggested the same just above :) With a second
routine to generate a delta difference from a page to keep the
knowledge of this delta in its own code path.
Another thing to think about is whether we're going to regret
hard-wiring the third argument as a boolean. Should we consider
making it a bitmask of flags, instead? It's not terribly hard
to think of other flags we might want there in future; for example
maybe something to tell GenericXLogFinish whether it's worth trying
to identify data movement on the page rather than just doing the
byte-by-byte delta calculation.
Yes. Definitely this interface needs more thoughts. I'd think of
GenericXLogFinish as a more generic entry point.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12 April 2016 at 08:36, Michael Paquier <michael.paquier@gmail.com>
wrote:
Also, the entire point here is that extensions DON'T
get to have custom apply routines, because we have no way for
replay to know about such apply routines. (It surely cannot look
in the catalogs for them, but there's no other suitable infrastructure
either.) In turn, that means there's little value in having any custom
data associated with the WAL records.Well, yes, the startup process has no knowledge of that. That's why it
would need to go through a hook, but the startup process has no
knowledge of routines loaded via _PG_init perhaps?
The only way we really have at the moment is shared_preload_libraries.
This got discussed much earlier, possibly on a previous iteration of the
generic xlog discussions rather than this specific thread. IIRC the gist
was that we need a way to:
- Flag the need for a redo routine so that replay stops if that redo
routine isn't available
- Find out *which* redo routine the generic log message needs during redo
- Get a pointer to that routine to actually call it
... and also possibly allow the admin to force skip of redo calls for a
given extension (but not others) in case of a major problem.
If you rely on shared_preload_libraries, then "oops, I forgot to put
myextension in shared_preload_libraries on the downstream" becomes a Bad
Thing. There's no way to go back and redo the work you've passed over. You
have to recreate the standby. Worse, it's silently wrong. The server has no
idea it was meant to do anything and no way to tell the user it couldn't do
what it was meant to do.
We can't register redo routines in the catalogs because we don't have
access to the syscaches etc during redo (right?).
So how do we look at a generic log record, say "ok, the upstream that wrote
this says it needs to invoke registered generic xlog hook 42, which is
<func> from <extension> at <ptr>" ?
Personally I'd be willing to accept the limitations of the s_p_l and hook
approach to the point where I think we should add the hook and accept the
caveats of its use ... but I understand the problems with it. I understand
not having custom redo routine support in this iteration of the patch. It's
somewhat orthogonal job to this patch anyway - while handy for specific
relation generic xlog, custom redo routines make most sense when combined
with generic xlog that isn't necessarily associated with a specific
relation. This patch doesn't support that.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Apr 12, 2016 at 10:54 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 12 April 2016 at 08:36, Michael Paquier wrote:
The only way we really have at the moment is shared_preload_libraries.
That's exactly my point.
If you rely on shared_preload_libraries, then "oops, I forgot to put
myextension in shared_preload_libraries on the downstream" becomes a Bad
Thing. There's no way to go back and redo the work you've passed over. You
have to recreate the standby. Worse, it's silently wrong. The server has no
idea it was meant to do anything and no way to tell the user it couldn't do
what it was meant to do.
Well, one playing with the generic WAL record facility is
We can't register redo routines in the catalogs because we don't have access
to the syscaches etc during redo (right?).
Yes, the startup process does not have that knowledge.
So how do we look at a generic log record, say "ok, the upstream that wrote
this says it needs to invoke registered generic xlog hook 42, which is
<func> from <extension> at <ptr>" ?Personally I'd be willing to accept the limitations of the s_p_l and hook
approach to the point where I think we should add the hook and accept the
caveats of its use ... but I understand the problems with it.
Honestly, so do I, and I could accept the use of a hook in
generic_redo in this code path which is really... Generic. Any other
infrastructure is going to be one brick shy of a load, and we are
actually sure that those routines are getting loaded once we reach the
first redo routine as far as I know. We could for example force the
hook to set some validation flags that are then checked in the generic
redo routine, and mark in the WAL record itself that this record
should have used the hook. If the record is replayed and this hook is
missing, then do a FATAL and prevent recovery to move on.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12 April 2016 at 10:09, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Tue, Apr 12, 2016 at 10:54 AM, Craig Ringer <craig@2ndquadrant.com>
wrote:
If you rely on shared_preload_libraries, then "oops, I forgot to put
myextension in shared_preload_libraries on the downstream" becomes a Bad
Thing. There's no way to go back and redo the work you've passed over.You
have to recreate the standby. Worse, it's silently wrong. The server has
no
idea it was meant to do anything and no way to tell the user it couldn't
do
what it was meant to do.
Well, one playing with the generic WAL record facility is
Yeah... that's what people say about a lot of things.
If it's there, someone will shoot their foot off with it and blame us.
There's a point where you have to say "well, that was dumb, you had to take
the safety off, remove the barrel plug attached to the giant sign saying
'do not remove', find and insert the bolt, and load the ammunition yourself
first, maybe you shouldn't have done that?"
However, that doesn't mean we should make it easy for a simple oversight to
have serious data correctness implications. I'm on the fence about whether
this counts; enough so that I wouldn't fight hard to get it in even if a
patch existed, which it doesn't, and we weren't past feature freeze, which
we are.
Any other
infrastructure is going to be one brick shy of a load
I disagree. It's very useful, just not for what you (and I) want to use it
for. It's still quite reasonable for custom index representations, and it
can be extended later.
We could for example force the
hook to set some validation flags that are then checked in the generic
redo routine, and mark in the WAL record itself that this record
should have used the hook. If the record is replayed and this hook is
missing, then do a FATAL and prevent recovery to move on.
Right, but _which_ routine on the standby? How does the standby know which
hook must be called? You can't just say "any hook"; there might be multiple
ones interested in different generic WAL messages. You need a registration
system of some sort and a way for the standby and master to agree on how to
identify extensions that have redo hooks for generic WAL. Then you need to
be able to look up that registration in some manner during redo.
The simplest registration system would be something like a function called
at _PG_init time that passes a globally-unique integer identifier for the
extension that maps to some kind of central web-based registry where we
hand out IDs. Forever. Then the standby says "this xlog needs extension
with generic xlog id 42 but it's missing". But we all know how well those
generic global identifier systems work when people are testing and
prototyping stuff, want to change versions incompatibly, etc....
So I disagree it's as simple as a validation flag.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Apr 12, 2016 at 3:08 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:
What I thought we should be able to do with that should not be only
limited to indexes, aka to:
- Be able to register and log full pages
- Be able to log data associated to a block
- Be able to have record data
- Use at recovery custom routines to apply those WAL records
The first 3 ones are done via the set of routines generating WAL
records for the rmgr RM_GENERIC_ID. The 4th one needs a hook in
generic_redo. Looking at the thread where this patch has been debated
I am not seeing a discussion regarding that.
I've designed generic xlog under following restrictions:
- We don't want users to register their own redo functions since it could
affect reliability. Unfortunately, I can't find original discussion now.
- API should be as simple as possible for extension author.
However, I think we should add some way for custom redo functions one day.
If we will add pluggable storage engines (I hope one day we will), then we
would face some engines which don't use our buffer manager. For such
pluggable storage engines, current generic WAL wouldn't work, but
user-defined redo functions would.
Now, my view is that we will need kind of two-phase recovery in order to
provide user-defined redo functions:
1) In the first phase, all built-in objects are recovered. After this
phase, we can use system catalog.
2) In the second phase, user-defined redo functions are called on the base
of consistent system catalog.
I think we can implement such two-phase recovery even now on the base of
generic logical messages. We can create logical slot. When extension is
loaded it starts second phase of recovery by reading generic logical
messages from this logical slot. Logical slot position should be also
advanced on checkpoint.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Mon, Apr 11, 2016 at 9:54 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
If you rely on shared_preload_libraries, then "oops, I forgot to put
myextension in shared_preload_libraries on the downstream" becomes a Bad
Thing. There's no way to go back and redo the work you've passed over. You
have to recreate the standby. Worse, it's silently wrong. The server has no
idea it was meant to do anything and no way to tell the user it couldn't do
what it was meant to do.
No, that's not right. If you rely on shared_preload_libraries, then
you'd just have the server die like this:
FATAL: I don't know how to replay a record for extension rmgr "craigrules".
HINT: Add the necessary library to shared_preload_libraries and
restart the server.
That seems 100% fine, although maybe we could adopt the convention
that the user-visible extension rmgr name should match the shared
library name, just as we do for logical decoding plugins (IIRC), and
the server can try to load a library by that name and continue.
We can't register redo routines in the catalogs because we don't have access
to the syscaches etc during redo (right?).
Correct.
So how do we look at a generic log record, say "ok, the upstream that wrote
this says it needs to invoke registered generic xlog hook 42, which is
<func> from <extension> at <ptr>" ?
Record enough information in WAL that the rmgrs can have names instead
of ID numbers. Stick the list of extension rmgrs in use into each
checkpoint record and call it good.
Personally I'd be willing to accept the limitations of the s_p_l and hook
approach to the point where I think we should add the hook and accept the
caveats of its use ... but I understand the problems with it. I understand
not having custom redo routine support in this iteration of the patch. It's
somewhat orthogonal job to this patch anyway - while handy for specific
relation generic xlog, custom redo routines make most sense when combined
with generic xlog that isn't necessarily associated with a specific
relation. This patch doesn't support that.
Yeah. And while this patch may (probably does) have technical defects
that need to be cured, I don't think that's an argument against this
approach. For prototyping, this is totally fine. For
performance-critical stuff, probably not so much. But if you don't
give people a way to prototype stuff and extend the system, then they
won't be as likely to innovate and make new stuff, and then we won't
get as many new patches for core proposing new and innovative things.
I think it's great that we are finally getting close to make the
access method system truly extensible - that has been a clear need for
a long time, and I think we are going about it in the right way. We
can add more in the future.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Apr 10, 2016 at 7:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. It doesn't seem like generic_xlog.c has thought very carefully about
the semantics of the "hole" between pd_lower and pd_upper. The mainline
XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
for example RestoreBlockImage() explicitly zeroes the hole when restoring
from a full-page image that has a hole. But generic_xlog.c's redo routine
does not do anything comparable, nor does GenericXLogFinish make any
effort to ensure that the "hole" is all-zeroes after normal application of
a generic update. The reason this is of interest is that it means the
contents of the "hole" could diverge between master and slave, or differ
between the original state of a database and what it is after a crash and
recovery. That would at least complicate forensic comparisons of pages,
and I think it might also break checksumming. We thought that this was
important enough to take the trouble of explicitly zeroing holes during
mainline XLOG replay. Shouldn't generic_xlog.c take the same trouble?
Attached patch is intended to fix this. It zeroes "hole" in both
GenericXLogFinish() and generic_redo().
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
generic-xlog-zero-gap.patchapplication/octet-stream; name=generic-xlog-zero-gap.patchDownload
diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
new file mode 100644
index 072838a..ce6fc80
*** a/src/backend/access/transam/generic_xlog.c
--- b/src/backend/access/transam/generic_xlog.c
*************** GenericXLogFinish(GenericXLogState *stat
*** 338,343 ****
--- 338,344 ----
{
PageData *pageData = &state->pages[i];
Page page;
+ PageHeader pageHeader;
if (BufferIsInvalid(pageData->buffer))
continue;
*************** GenericXLogFinish(GenericXLogState *stat
*** 363,368 ****
--- 364,377 ----
XLogRegisterBuffer(i, pageData->buffer, REGBUF_STANDARD);
XLogRegisterBufData(i, pageData->delta, pageData->deltaLen);
}
+
+ /*
+ * Explicitly zero "hole" between pd_lower and pd_upper in order to
+ * avoid divergence between master and slave during replication.
+ */
+ pageHeader = (PageHeader) page;
+ memset(page + pageHeader->pd_lower, 0,
+ pageHeader->pd_upper - pageHeader->pd_lower);
}
/* Insert xlog record */
*************** generic_redo(XLogReaderState *record)
*** 473,478 ****
--- 482,488 ----
if (action == BLK_NEEDS_REDO)
{
Page page;
+ PageHeader pageHeader;
char *blockDelta;
Size blockDeltaSize;
*************** generic_redo(XLogReaderState *record)
*** 481,486 ****
--- 491,504 ----
blockDelta = XLogRecGetBlockData(record, block_id, &blockDeltaSize);
applyPageRedo(page, blockDelta, blockDeltaSize);
+ /*
+ * Explicitly zero "hole" between pd_lower and pd_upper in order to
+ * avoid divergence between master and slave during replication.
+ */
+ pageHeader = (PageHeader) page;
+ memset(page + pageHeader->pd_lower, 0,
+ pageHeader->pd_upper - pageHeader->pd_lower);
+
PageSetLSN(page, lsn);
MarkBufferDirty(buffers[block_id]);
}
On Tue, Apr 12, 2016 at 3:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... BTW, with respect to the documentation angle, it seems to me
that it'd be better if GenericXLogRegister were renamed to
GenericXLogRegisterBuffer, or perhaps GenericXLogRegisterPage.
I think this would make the documentation clearer, and it would
also make it easier to add other sorts of Register actions later,
if we ever think of some (which seems not unlikely, really).Another thing to think about is whether we're going to regret
hard-wiring the third argument as a boolean. Should we consider
making it a bitmask of flags, instead? It's not terribly hard
to think of other flags we might want there in future; for example
maybe something to tell GenericXLogFinish whether it's worth trying
to identify data movement on the page rather than just doing the
byte-by-byte delta calculation.
I agree with both of these ideas. Patch is attached.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
generic-xlog-register.patchapplication/octet-stream; name=generic-xlog-register.patchDownload
diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
new file mode 100644
index 330d7fd..9d2298b
*** a/contrib/bloom/blinsert.c
--- b/contrib/bloom/blinsert.c
*************** flushCachedPage(Relation index, BloomBui
*** 49,55 ****
GenericXLogState *state;
state = GenericXLogStart(index);
! page = GenericXLogRegister(state, buffer, true);
memcpy(page, buildstate->data, BLCKSZ);
GenericXLogFinish(state);
UnlockReleaseBuffer(buffer);
--- 49,55 ----
GenericXLogState *state;
state = GenericXLogStart(index);
! page = GenericXLogRegisterBuffer(state, buffer, true);
memcpy(page, buildstate->data, BLCKSZ);
GenericXLogFinish(state);
UnlockReleaseBuffer(buffer);
*************** blinsert(Relation index, Datum *values,
*** 221,227 ****
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
state = GenericXLogStart(index);
! page = GenericXLogRegister(state, buffer, false);
if (BloomPageAddItem(&blstate, page, itup))
{
--- 221,227 ----
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
state = GenericXLogStart(index);
! page = GenericXLogRegisterBuffer(state, buffer, false);
if (BloomPageAddItem(&blstate, page, itup))
{
*************** blinsert(Relation index, Datum *values,
*** 268,274 ****
state = GenericXLogStart(index);
/* get modifiable copy of metapage */
! metaPage = GenericXLogRegister(state, metaBuffer, false);
metaData = BloomPageGetMeta(metaPage);
if (nStart >= metaData->nEnd)
--- 268,274 ----
state = GenericXLogStart(index);
/* get modifiable copy of metapage */
! metaPage = GenericXLogRegisterBuffer(state, metaBuffer, false);
metaData = BloomPageGetMeta(metaPage);
if (nStart >= metaData->nEnd)
*************** blinsert(Relation index, Datum *values,
*** 279,285 ****
buffer = ReadBuffer(index, blkno);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
! page = GenericXLogRegister(state, buffer, false);
if (BloomPageAddItem(&blstate, page, itup))
{
--- 279,285 ----
buffer = ReadBuffer(index, blkno);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
! page = GenericXLogRegisterBuffer(state, buffer, false);
if (BloomPageAddItem(&blstate, page, itup))
{
*************** blinsert(Relation index, Datum *values,
*** 305,311 ****
*/
buffer = BloomNewBuffer(index);
! page = GenericXLogRegister(state, buffer, true);
BloomInitPage(page, 0);
if (!BloomPageAddItem(&blstate, page, itup))
--- 305,311 ----
*/
buffer = BloomNewBuffer(index);
! page = GenericXLogRegisterBuffer(state, buffer, true);
BloomInitPage(page, 0);
if (!BloomPageAddItem(&blstate, page, itup))
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
new file mode 100644
index 6c7dc1d..be14fd7
*** a/contrib/bloom/blutils.c
--- b/contrib/bloom/blutils.c
*************** BloomInitMetapage(Relation index)
*** 417,423 ****
/* Initialize contents of meta page */
state = GenericXLogStart(index);
! metaPage = GenericXLogRegister(state, metaBuffer, true);
BloomInitPage(metaPage, BLOOM_META);
metadata = BloomPageGetMeta(metaPage);
--- 417,423 ----
/* Initialize contents of meta page */
state = GenericXLogStart(index);
! metaPage = GenericXLogRegisterBuffer(state, metaBuffer, true);
BloomInitPage(metaPage, BLOOM_META);
metadata = BloomPageGetMeta(metaPage);
diff --git a/contrib/bloom/blvacuum.c b/contrib/bloom/blvacuum.c
new file mode 100644
index ee40ebb..2a1e8a5
*** a/contrib/bloom/blvacuum.c
--- b/contrib/bloom/blvacuum.c
*************** blbulkdelete(IndexVacuumInfo *info, Inde
*** 65,71 ****
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
gxlogState = GenericXLogStart(index);
! page = GenericXLogRegister(gxlogState, buffer, false);
if (BloomPageIsDeleted(page))
{
--- 65,71 ----
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
gxlogState = GenericXLogStart(index);
! page = GenericXLogRegisterBuffer(gxlogState, buffer, false);
if (BloomPageIsDeleted(page))
{
*************** blbulkdelete(IndexVacuumInfo *info, Inde
*** 145,151 ****
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
gxlogState = GenericXLogStart(index);
! page = GenericXLogRegister(gxlogState, buffer, false);
metaData = BloomPageGetMeta(page);
memcpy(metaData->notFullPage, notFullPage, sizeof(BlockNumber) * countPage);
--- 145,151 ----
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
gxlogState = GenericXLogStart(index);
! page = GenericXLogRegisterBuffer(gxlogState, buffer, false);
metaData = BloomPageGetMeta(page);
memcpy(metaData->notFullPage, notFullPage, sizeof(BlockNumber) * countPage);
diff --git a/doc/src/sgml/generic-wal.sgml b/doc/src/sgml/generic-wal.sgml
new file mode 100644
index 2398d86..b644bfb
*** a/doc/src/sgml/generic-wal.sgml
--- b/doc/src/sgml/generic-wal.sgml
***************
*** 31,45 ****
<listitem>
<para>
! <function>page = GenericXLogRegister(state, buffer, isNew)</> —
! register a buffer to be modified within the current generic WAL
record. This function returns a pointer to a temporary copy of the
buffer's page, where modifications should be made. (Do not modify the
! buffer's contents directly.) The third argument indicates if the page
! is new; if true, this will result in a full-page image rather than a
! delta update being included in the WAL record.
! <function>GenericXLogRegister</> can be repeated if the WAL-logged
! action needs to modify multiple pages.
</para>
</listitem>
--- 31,46 ----
<listitem>
<para>
! <function>page = GenericXLogRegisterBuffer(state, buffer, flags)</>
! — register a buffer to be modified within the current generic WAL
record. This function returns a pointer to a temporary copy of the
buffer's page, where modifications should be made. (Do not modify the
! buffer's contents directly.) The third argument contains set of flags
! indicating how buffer should be logged. When
! <literal>GENERIC_XLOG_FULL_IMAGE</> is set then a full-page image rather
! than a delta update being included in the WAL record. There could appear
! more flags in future. <function>GenericXLogRegisterBuffer</> can be
! repeated if the WAL-logged action needs to modify multiple pages.
</para>
</listitem>
***************
*** 71,83 ****
<itemizedlist>
<listitem>
<para>
! No direct modifications of buffers are allowed! All modifications
! must be done in copies acquired from <function>GenericXLogRegister()</>.
In other words, code that makes generic WAL records should never call
<function>BufferGetPage()</> for itself. However, it remains the
caller's responsibility to pin/unpin and lock/unlock the buffers at
appropriate times. Exclusive lock must be held on each target buffer
! from before <function>GenericXLogRegister()</> until after
<function>GenericXLogFinish()</>.
</para>
</listitem>
--- 72,84 ----
<itemizedlist>
<listitem>
<para>
! No direct modifications of buffers are allowed! All modifications must
! be done in copies acquired from <function>GenericXLogRegisterBuffer()</>.
In other words, code that makes generic WAL records should never call
<function>BufferGetPage()</> for itself. However, it remains the
caller's responsibility to pin/unpin and lock/unlock the buffers at
appropriate times. Exclusive lock must be held on each target buffer
! from before <function>GenericXLogRegisterBuffer()</> until after
<function>GenericXLogFinish()</>.
</para>
</listitem>
diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
new file mode 100644
index 072838a..d2fc910
*** a/src/backend/access/transam/generic_xlog.c
--- b/src/backend/access/transam/generic_xlog.c
***************
*** 50,56 ****
typedef struct
{
Buffer buffer; /* registered buffer */
! bool fullImage; /* are we taking a full image of this page? */
int deltaLen; /* space consumed in delta field */
char image[BLCKSZ]; /* copy of page image for modification */
char delta[MAX_DELTA_SIZE]; /* delta between page images */
--- 50,56 ----
typedef struct
{
Buffer buffer; /* registered buffer */
! uint32 flags; /* buffer flags */
int deltaLen; /* space consumed in delta field */
char image[BLCKSZ]; /* copy of page image for modification */
char delta[MAX_DELTA_SIZE]; /* delta between page images */
*************** GenericXLogStart(Relation relation)
*** 282,288 ****
* If the buffer is already registered, just return its existing entry.
*/
Page
! GenericXLogRegister(GenericXLogState *state, Buffer buffer, bool isNew)
{
int block_id;
--- 282,288 ----
* If the buffer is already registered, just return its existing entry.
*/
Page
! GenericXLogRegisterBuffer(GenericXLogState *state, Buffer buffer, uint16 flags)
{
int block_id;
*************** GenericXLogRegister(GenericXLogState *st
*** 295,301 ****
{
/* Empty slot, so use it (there cannot be a match later) */
page->buffer = buffer;
! page->fullImage = isNew;
memcpy(page->image,
BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
BLCKSZ);
--- 295,301 ----
{
/* Empty slot, so use it (there cannot be a match later) */
page->buffer = buffer;
! page->flags = flags;
memcpy(page->image,
BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
BLCKSZ);
*************** GenericXLogFinish(GenericXLogState *stat
*** 345,351 ****
page = BufferGetPage(pageData->buffer, NULL, NULL,
BGP_NO_SNAPSHOT_TEST);
! if (pageData->fullImage)
{
/* A full page image does not require anything special */
memcpy(page, pageData->image, BLCKSZ);
--- 345,351 ----
page = BufferGetPage(pageData->buffer, NULL, NULL,
BGP_NO_SNAPSHOT_TEST);
! if (pageData->flags & GENERIC_XLOG_FULL_IMAGE)
{
/* A full page image does not require anything special */
memcpy(page, pageData->image, BLCKSZ);
diff --git a/src/include/access/generic_xlog.h b/src/include/access/generic_xlog.h
new file mode 100644
index 01743e3..c422955
*** a/src/include/access/generic_xlog.h
--- b/src/include/access/generic_xlog.h
***************
*** 22,35 ****
#define MAX_GENERIC_XLOG_PAGES XLR_NORMAL_MAX_BLOCK_ID
/* state of generic xlog record construction */
struct GenericXLogState;
typedef struct GenericXLogState GenericXLogState;
/* API for construction of generic xlog records */
extern GenericXLogState *GenericXLogStart(Relation relation);
! extern Page GenericXLogRegister(GenericXLogState *state, Buffer buffer,
! bool isNew);
extern XLogRecPtr GenericXLogFinish(GenericXLogState *state);
extern void GenericXLogAbort(GenericXLogState *state);
--- 22,38 ----
#define MAX_GENERIC_XLOG_PAGES XLR_NORMAL_MAX_BLOCK_ID
+ /* Flags for GenericXLogRegisterBuffer */
+ #define GENERIC_XLOG_FULL_IMAGE 1
+
/* state of generic xlog record construction */
struct GenericXLogState;
typedef struct GenericXLogState GenericXLogState;
/* API for construction of generic xlog records */
extern GenericXLogState *GenericXLogStart(Relation relation);
! extern Page GenericXLogRegisterBuffer(GenericXLogState *state, Buffer buffer,
! uint16 flags);
extern XLogRecPtr GenericXLogFinish(GenericXLogState *state);
extern void GenericXLogAbort(GenericXLogState *state);
On 12 April 2016 at 20:48, Robert Haas <robertmhaas@gmail.com> wrote:
So how do we look at a generic log record, say "ok, the upstream that
wrote
this says it needs to invoke registered generic xlog hook 42, which is
<func> from <extension> at <ptr>" ?Record enough information in WAL that the rmgrs can have names instead
of ID numbers. Stick the list of extension rmgrs in use into each
checkpoint record and call it good.
Repeating the mapping at each checkpoint sounds pretty reasonable and means
we always know what we need. There's no need to bloat each record with an
extension name and no need for any kind of ugly global registration. The
mapping table would be small and simple. I like it.
Of course, it's all maybe-in-future stuff at this point, but I think that's
a really good way to approach it.
There's no way around the fact that user defined redo functions can affect
reliability. But then, so can user-defined data types, functions,
bgworkers, plpython functions loading ctypes, plpython functions getting
creative in the datadir, and admins who paste into the wrong window. The
scope for problems is somewhat greater but not IMO prohibitively so.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
Attached patch is intended to fix this. It zeroes "hole" in both
GenericXLogFinish() and generic_redo().
Pushed. I rewrote the comments a bit and modified GenericXLogFinish
so that it doesn't copy data only to overwrite it with zeroes.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I agree with both of these ideas. Patch is attached.
Hmm, you accidentally forget to change flag type of GenericXLogRegister in
contrib/bloom signature.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 12, 2016 at 6:34 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
I agree with both of these ideas. Patch is attached.
Hmm, you accidentally forget to change flag type of GenericXLogRegister in
contrib/bloom signature.
Fixed.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
generic-xlog-register-2.patchapplication/octet-stream; name=generic-xlog-register-2.patchDownload
diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
new file mode 100644
index 330d7fd..4e3fe2f
*** a/contrib/bloom/blinsert.c
--- b/contrib/bloom/blinsert.c
*************** flushCachedPage(Relation index, BloomBui
*** 49,55 ****
GenericXLogState *state;
state = GenericXLogStart(index);
! page = GenericXLogRegister(state, buffer, true);
memcpy(page, buildstate->data, BLCKSZ);
GenericXLogFinish(state);
UnlockReleaseBuffer(buffer);
--- 49,55 ----
GenericXLogState *state;
state = GenericXLogStart(index);
! page = GenericXLogRegisterBuffer(state, buffer, GENERIC_XLOG_FULL_IMAGE);
memcpy(page, buildstate->data, BLCKSZ);
GenericXLogFinish(state);
UnlockReleaseBuffer(buffer);
*************** blinsert(Relation index, Datum *values,
*** 221,227 ****
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
state = GenericXLogStart(index);
! page = GenericXLogRegister(state, buffer, false);
if (BloomPageAddItem(&blstate, page, itup))
{
--- 221,227 ----
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
state = GenericXLogStart(index);
! page = GenericXLogRegisterBuffer(state, buffer, 0);
if (BloomPageAddItem(&blstate, page, itup))
{
*************** blinsert(Relation index, Datum *values,
*** 268,274 ****
state = GenericXLogStart(index);
/* get modifiable copy of metapage */
! metaPage = GenericXLogRegister(state, metaBuffer, false);
metaData = BloomPageGetMeta(metaPage);
if (nStart >= metaData->nEnd)
--- 268,274 ----
state = GenericXLogStart(index);
/* get modifiable copy of metapage */
! metaPage = GenericXLogRegisterBuffer(state, metaBuffer, 0);
metaData = BloomPageGetMeta(metaPage);
if (nStart >= metaData->nEnd)
*************** blinsert(Relation index, Datum *values,
*** 279,285 ****
buffer = ReadBuffer(index, blkno);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
! page = GenericXLogRegister(state, buffer, false);
if (BloomPageAddItem(&blstate, page, itup))
{
--- 279,285 ----
buffer = ReadBuffer(index, blkno);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
! page = GenericXLogRegisterBuffer(state, buffer, 0);
if (BloomPageAddItem(&blstate, page, itup))
{
*************** blinsert(Relation index, Datum *values,
*** 305,311 ****
*/
buffer = BloomNewBuffer(index);
! page = GenericXLogRegister(state, buffer, true);
BloomInitPage(page, 0);
if (!BloomPageAddItem(&blstate, page, itup))
--- 305,311 ----
*/
buffer = BloomNewBuffer(index);
! page = GenericXLogRegisterBuffer(state, buffer, GENERIC_XLOG_FULL_IMAGE);
BloomInitPage(page, 0);
if (!BloomPageAddItem(&blstate, page, itup))
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
new file mode 100644
index 6c7dc1d..a3605f0
*** a/contrib/bloom/blutils.c
--- b/contrib/bloom/blutils.c
*************** BloomInitMetapage(Relation index)
*** 417,423 ****
/* Initialize contents of meta page */
state = GenericXLogStart(index);
! metaPage = GenericXLogRegister(state, metaBuffer, true);
BloomInitPage(metaPage, BLOOM_META);
metadata = BloomPageGetMeta(metaPage);
--- 417,424 ----
/* Initialize contents of meta page */
state = GenericXLogStart(index);
! metaPage = GenericXLogRegisterBuffer(state, metaBuffer,
! GENERIC_XLOG_FULL_IMAGE);
BloomInitPage(metaPage, BLOOM_META);
metadata = BloomPageGetMeta(metaPage);
diff --git a/contrib/bloom/blvacuum.c b/contrib/bloom/blvacuum.c
new file mode 100644
index ee40ebb..7467afb
*** a/contrib/bloom/blvacuum.c
--- b/contrib/bloom/blvacuum.c
*************** blbulkdelete(IndexVacuumInfo *info, Inde
*** 65,71 ****
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
gxlogState = GenericXLogStart(index);
! page = GenericXLogRegister(gxlogState, buffer, false);
if (BloomPageIsDeleted(page))
{
--- 65,71 ----
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
gxlogState = GenericXLogStart(index);
! page = GenericXLogRegisterBuffer(gxlogState, buffer, 0);
if (BloomPageIsDeleted(page))
{
*************** blbulkdelete(IndexVacuumInfo *info, Inde
*** 145,151 ****
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
gxlogState = GenericXLogStart(index);
! page = GenericXLogRegister(gxlogState, buffer, false);
metaData = BloomPageGetMeta(page);
memcpy(metaData->notFullPage, notFullPage, sizeof(BlockNumber) * countPage);
--- 145,151 ----
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
gxlogState = GenericXLogStart(index);
! page = GenericXLogRegisterBuffer(gxlogState, buffer, 0);
metaData = BloomPageGetMeta(page);
memcpy(metaData->notFullPage, notFullPage, sizeof(BlockNumber) * countPage);
diff --git a/doc/src/sgml/generic-wal.sgml b/doc/src/sgml/generic-wal.sgml
new file mode 100644
index 2398d86..b644bfb
*** a/doc/src/sgml/generic-wal.sgml
--- b/doc/src/sgml/generic-wal.sgml
***************
*** 31,45 ****
<listitem>
<para>
! <function>page = GenericXLogRegister(state, buffer, isNew)</> —
! register a buffer to be modified within the current generic WAL
record. This function returns a pointer to a temporary copy of the
buffer's page, where modifications should be made. (Do not modify the
! buffer's contents directly.) The third argument indicates if the page
! is new; if true, this will result in a full-page image rather than a
! delta update being included in the WAL record.
! <function>GenericXLogRegister</> can be repeated if the WAL-logged
! action needs to modify multiple pages.
</para>
</listitem>
--- 31,46 ----
<listitem>
<para>
! <function>page = GenericXLogRegisterBuffer(state, buffer, flags)</>
! — register a buffer to be modified within the current generic WAL
record. This function returns a pointer to a temporary copy of the
buffer's page, where modifications should be made. (Do not modify the
! buffer's contents directly.) The third argument contains set of flags
! indicating how buffer should be logged. When
! <literal>GENERIC_XLOG_FULL_IMAGE</> is set then a full-page image rather
! than a delta update being included in the WAL record. There could appear
! more flags in future. <function>GenericXLogRegisterBuffer</> can be
! repeated if the WAL-logged action needs to modify multiple pages.
</para>
</listitem>
***************
*** 71,83 ****
<itemizedlist>
<listitem>
<para>
! No direct modifications of buffers are allowed! All modifications
! must be done in copies acquired from <function>GenericXLogRegister()</>.
In other words, code that makes generic WAL records should never call
<function>BufferGetPage()</> for itself. However, it remains the
caller's responsibility to pin/unpin and lock/unlock the buffers at
appropriate times. Exclusive lock must be held on each target buffer
! from before <function>GenericXLogRegister()</> until after
<function>GenericXLogFinish()</>.
</para>
</listitem>
--- 72,84 ----
<itemizedlist>
<listitem>
<para>
! No direct modifications of buffers are allowed! All modifications must
! be done in copies acquired from <function>GenericXLogRegisterBuffer()</>.
In other words, code that makes generic WAL records should never call
<function>BufferGetPage()</> for itself. However, it remains the
caller's responsibility to pin/unpin and lock/unlock the buffers at
appropriate times. Exclusive lock must be held on each target buffer
! from before <function>GenericXLogRegisterBuffer()</> until after
<function>GenericXLogFinish()</>.
</para>
</listitem>
diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
new file mode 100644
index 072838a..d2fc910
*** a/src/backend/access/transam/generic_xlog.c
--- b/src/backend/access/transam/generic_xlog.c
***************
*** 50,56 ****
typedef struct
{
Buffer buffer; /* registered buffer */
! bool fullImage; /* are we taking a full image of this page? */
int deltaLen; /* space consumed in delta field */
char image[BLCKSZ]; /* copy of page image for modification */
char delta[MAX_DELTA_SIZE]; /* delta between page images */
--- 50,56 ----
typedef struct
{
Buffer buffer; /* registered buffer */
! uint32 flags; /* buffer flags */
int deltaLen; /* space consumed in delta field */
char image[BLCKSZ]; /* copy of page image for modification */
char delta[MAX_DELTA_SIZE]; /* delta between page images */
*************** GenericXLogStart(Relation relation)
*** 282,288 ****
* If the buffer is already registered, just return its existing entry.
*/
Page
! GenericXLogRegister(GenericXLogState *state, Buffer buffer, bool isNew)
{
int block_id;
--- 282,288 ----
* If the buffer is already registered, just return its existing entry.
*/
Page
! GenericXLogRegisterBuffer(GenericXLogState *state, Buffer buffer, uint16 flags)
{
int block_id;
*************** GenericXLogRegister(GenericXLogState *st
*** 295,301 ****
{
/* Empty slot, so use it (there cannot be a match later) */
page->buffer = buffer;
! page->fullImage = isNew;
memcpy(page->image,
BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
BLCKSZ);
--- 295,301 ----
{
/* Empty slot, so use it (there cannot be a match later) */
page->buffer = buffer;
! page->flags = flags;
memcpy(page->image,
BufferGetPage(buffer, NULL, NULL, BGP_NO_SNAPSHOT_TEST),
BLCKSZ);
*************** GenericXLogFinish(GenericXLogState *stat
*** 345,351 ****
page = BufferGetPage(pageData->buffer, NULL, NULL,
BGP_NO_SNAPSHOT_TEST);
! if (pageData->fullImage)
{
/* A full page image does not require anything special */
memcpy(page, pageData->image, BLCKSZ);
--- 345,351 ----
page = BufferGetPage(pageData->buffer, NULL, NULL,
BGP_NO_SNAPSHOT_TEST);
! if (pageData->flags & GENERIC_XLOG_FULL_IMAGE)
{
/* A full page image does not require anything special */
memcpy(page, pageData->image, BLCKSZ);
diff --git a/src/include/access/generic_xlog.h b/src/include/access/generic_xlog.h
new file mode 100644
index 01743e3..c422955
*** a/src/include/access/generic_xlog.h
--- b/src/include/access/generic_xlog.h
***************
*** 22,35 ****
#define MAX_GENERIC_XLOG_PAGES XLR_NORMAL_MAX_BLOCK_ID
/* state of generic xlog record construction */
struct GenericXLogState;
typedef struct GenericXLogState GenericXLogState;
/* API for construction of generic xlog records */
extern GenericXLogState *GenericXLogStart(Relation relation);
! extern Page GenericXLogRegister(GenericXLogState *state, Buffer buffer,
! bool isNew);
extern XLogRecPtr GenericXLogFinish(GenericXLogState *state);
extern void GenericXLogAbort(GenericXLogState *state);
--- 22,38 ----
#define MAX_GENERIC_XLOG_PAGES XLR_NORMAL_MAX_BLOCK_ID
+ /* Flags for GenericXLogRegisterBuffer */
+ #define GENERIC_XLOG_FULL_IMAGE 1
+
/* state of generic xlog record construction */
struct GenericXLogState;
typedef struct GenericXLogState GenericXLogState;
/* API for construction of generic xlog records */
extern GenericXLogState *GenericXLogStart(Relation relation);
! extern Page GenericXLogRegisterBuffer(GenericXLogState *state, Buffer buffer,
! uint16 flags);
extern XLogRecPtr GenericXLogFinish(GenericXLogState *state);
extern void GenericXLogAbort(GenericXLogState *state);
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
I agree with both of these ideas. Patch is attached.
Pushed with minor cleanup.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 12, 2016 at 6:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
I agree with both of these ideas. Patch is attached.
Pushed with minor cleanup.
Great, thanks!
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Tue, Apr 12, 2016 at 9:36 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
Repeating the mapping at each checkpoint sounds pretty reasonable and means
we always know what we need. There's no need to bloat each record with an
extension name and no need for any kind of ugly global registration. The
mapping table would be small and simple. I like it.Of course, it's all maybe-in-future stuff at this point, but I think that's
a really good way to approach it.There's no way around the fact that user defined redo functions can affect
reliability. But then, so can user-defined data types, functions, bgworkers,
plpython functions loading ctypes, plpython functions getting creative in
the datadir, and admins who paste into the wrong window. The scope for
problems is somewhat greater but not IMO prohibitively so.
I am in agreement with you on both the merits of this particular thing
and the general principle you are articulating regarding how to think
about these things.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers