Sequence Access Method WIP

Started by Simon Riggsabout 13 years ago124 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

SeqAm allows you to specify a plugin that alters the behaviour for
sequence allocation and resetting, aimed specifically at clustering
systems.

New command options on end of statement allow syntax
CREATE SEQUENCE foo_seq
USING globalseq WITH (custom_option=setting);
which allows you to specify an alternate Sequence Access Method for a
sequence object,
or you can specify a default_sequence_mgr as a USERSET parameter
SET default_sequence_mgr = globalseq;

Existing sequences can be modified to use a different SeqAM, by calling
ALTER SEQUENCE foo_seq USING globalseq;

SeqAM is similar to IndexAM: There is a separate catalog table for
SeqAMs, but no specific API to create them. Initdb creates one
sequence am, called "local", which is the initial default. If
default_sequence_mgr is set to '' or 'local' then we use the local
seqam. The local seqam's functions are included in core.

Status is still "Work In Progress". Having said that most of the grunt
work is done and if we agree the shape of this is right, its
relatively easy going code.

postgres=# select oid, * from pg_seqam;
-[ RECORD 1 ]+--------------------
oid | 3839
seqamname | local
seqamalloc | seqam_local_alloc
seqamsetval | seqam_local_setval
seqamoptions | seqam_local_options

postgres=# select relname, relam from pg_class where relname = 'foo2';
relname | relam
---------+-------
foo2 | 3839

postgres=# create sequence foo5 using global;
ERROR: access method "global" does not exist

Footprint
backend/access/Makefile | 2
backend/access/common/reloptions.c | 26 +++
backend/access/sequence/Makefile | 17 ++
backend/access/sequence/seqam.c | 278 +++++++++++++++++++++++++++++++++++++
backend/catalog/Makefile | 2
backend/commands/sequence.c | 132 +++++++++++++++--
backend/commands/tablecmds.c | 3
backend/nodes/copyfuncs.c | 4
backend/nodes/equalfuncs.c | 4
backend/parser/gram.y | 84 ++++++++++-
backend/parser/parse_utilcmd.c | 4
backend/utils/cache/catcache.c | 6
backend/utils/cache/syscache.c | 23 +++
backend/utils/misc/guc.c | 12 +
include/access/reloptions.h | 6
include/access/seqam.h | 27 +++
include/catalog/indexing.h | 5
include/catalog/pg_proc.h | 6
include/catalog/pg_seqam.h | 70 +++++++++
include/nodes/parsenodes.h | 8 -
include/utils/guc.h | 1
include/utils/rel.h | 22 +-
include/utils/syscache.h | 2
23 files changed, 706 insertions(+), 38 deletions(-)

Tasks to complete
* contrib module for example/testing
* Docs

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

seqam.v5.patchapplication/octet-stream; name=seqam.v5.patchDownload+706-38
#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#1)
Re: Sequence Access Method WIP

On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote:

SeqAm allows you to specify a plugin that alters the behaviour for
sequence allocation and resetting, aimed specifically at clustering
systems.

New command options on end of statement allow syntax
CREATE SEQUENCE foo_seq
USING globalseq

Production version of this, ready for commit to PG 9.4

Includes test extension which allows sequences without gaps - "gapless".

Test using seq_test.psql after creating extension.

No dependencies on other patches.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

seqam.v33.patchapplication/octet-stream; name=seqam.v33.patchDownload+1106-352
gapless.tarapplication/x-tar; name=gapless.tarDownload
seq_test.psqlapplication/octet-stream; name=seq_test.psqlDownload
#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#2)
Re: Sequence Access Method WIP

On 14.11.2013 22:10, Simon Riggs wrote:

On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote:

SeqAm allows you to specify a plugin that alters the behaviour for
sequence allocation and resetting, aimed specifically at clustering
systems.

New command options on end of statement allow syntax
CREATE SEQUENCE foo_seq
USING globalseq

Production version of this, ready for commit to PG 9.4

Includes test extension which allows sequences without gaps - "gapless".

Test using seq_test.psql after creating extension.

No dependencies on other patches.

It's pretty hard to review the this without seeing the "other"
implementation you're envisioning to use this API. But I'll try:

I wonder if the level of abstraction is right. The patch assumes that
the on-disk storage of all sequences is the same, so the access method
can't change that. But then it leaves the details of actually updating
the on-disk block, WAL-logging and all, as the responsibility of the
access method. Surely that's going to look identical in all the seqams,
if they all use the same on-disk format. That also means that the
sequence access methods can't be implemented as plugins, as plugins
can't do WAL-logging.

The comment in seqam.c says that there's a private column reserved for
access method-specific data, called am_data, but that seems to be the
only mention of am_data in the patch.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#3)
Re: Sequence Access Method WIP

On 2013-11-15 20:08:30 +0200, Heikki Linnakangas wrote:

It's pretty hard to review the this without seeing the "other"
implementation you're envisioning to use this API. But I'll try:

We've written a distributed sequence implementation against it, so it
works at least for that use case.

While certainly not release worthy, it still might be interesting to
look at.
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/bdr/bdr_seq.c;h=c9480c8021882f888e35080f0e7a7494af37ae27;hb=bdr

bdr_sequencer_fill_sequence pre-allocates ranges of values,
bdr_sequence_alloc returns them upon nextval().

I wonder if the level of abstraction is right. The patch assumes that the
on-disk storage of all sequences is the same, so the access method can't
change that. But then it leaves the details of actually updating the on-disk
block, WAL-logging and all, as the responsibility of the access method.
Surely that's going to look identical in all the seqams, if they all use the
same on-disk format. That also means that the sequence access methods can't
be implemented as plugins, as plugins can't do WAL-logging.

Well, it exposes log_sequence_tuple() - together with the added "am
private" column of pg_squence that allows to do quite a bit of different
things. I think unless we really implement pluggable RMGRs - which I
don't really see coming - we cannot be radically different.

The comment in seqam.c says that there's a private column reserved for
access method-specific data, called am_data, but that seems to be the only
mention of am_data in the patch.

It's amdata, not am_data :/. Directly at the end of pg_sequence.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#3)
Re: Sequence Access Method WIP

On 15 November 2013 15:08, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

I wonder if the level of abstraction is right.

That is the big question and not something to shy away from.

What is presented is not the first thought, by a long way. Andres'
contribution to the patch is mainly around this point, so the seq am
is designed with the needs of the main use case in mind.

I'm open to suggested changes but I would say that practical usage
beats changes suggested for purity.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Simon Riggs (#2)
Re: Sequence Access Method WIP

On 11/14/13, 3:10 PM, Simon Riggs wrote:

On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote:

SeqAm allows you to specify a plugin that alters the behaviour for
sequence allocation and resetting, aimed specifically at clustering
systems.

New command options on end of statement allow syntax
CREATE SEQUENCE foo_seq
USING globalseq

Production version of this, ready for commit to PG 9.4

This patch doesn't apply anymore.

Also, you set this to "returned with feedback" in the CF app. Please
verify whether that was intentional.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Peter Eisentraut (#6)
Re: Sequence Access Method WIP

On 15 November 2013 15:48, Peter Eisentraut <peter_e@gmx.net> wrote:

On 11/14/13, 3:10 PM, Simon Riggs wrote:

On 16 January 2013 00:40, Simon Riggs <simon@2ndquadrant.com> wrote:

SeqAm allows you to specify a plugin that alters the behaviour for
sequence allocation and resetting, aimed specifically at clustering
systems.

New command options on end of statement allow syntax
CREATE SEQUENCE foo_seq
USING globalseq

Production version of this, ready for commit to PG 9.4

This patch doesn't apply anymore.

Yes, a patch conflict was just committed, will repost.

Also, you set this to "returned with feedback" in the CF app. Please
verify whether that was intentional.

Not sure that was me, if so, corrected.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#7)
Re: Sequence Access Method WIP

On 15.11.2013 21:00, Simon Riggs wrote:

On 15 November 2013 15:48, Peter Eisentraut <peter_e@gmx.net> wrote:

Also, you set this to "returned with feedback" in the CF app. Please
verify whether that was intentional.

Not sure that was me, if so, corrected.

It was me, sorry. I figured this needs such a large restructuring that
it's not going to be committable in this commitfest. But I'm happy to
keep the discussion going if you think otherwise.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#4)
Re: Sequence Access Method WIP

On 15.11.2013 20:21, Andres Freund wrote:

On 2013-11-15 20:08:30 +0200, Heikki Linnakangas wrote:

It's pretty hard to review the this without seeing the "other"
implementation you're envisioning to use this API. But I'll try:

We've written a distributed sequence implementation against it, so it
works at least for that use case.

While certainly not release worthy, it still might be interesting to
look at.
https://urldefense.proofpoint.com/v1/url?u=http://git.postgresql.org/gitweb/?p%3Dusers/andresfreund/postgres.git%3Ba%3Dblob%3Bf%3Dcontrib/bdr/bdr_seq.c%3Bh%3Dc9480c8021882f888e35080f0e7a7494af37ae27%3Bhb%3Dbdr&amp;k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&amp;r=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0A&amp;m=Rbmo%2BDar4PZrQmGH2adz7cCgqRl2%2FXH845YIA1ifS7A%3D%0A&amp;s=8d287f35070fe7cb586f10b5bfe8664ad29e986b5f2d2286c4109e09f615668d

bdr_sequencer_fill_sequence pre-allocates ranges of values,
bdr_sequence_alloc returns them upon nextval().

Thanks. That pokes into the low-level details of sequence tuples, just
like I was afraid it would.

I wonder if the level of abstraction is right. The patch assumes that the
on-disk storage of all sequences is the same, so the access method can't
change that. But then it leaves the details of actually updating the on-disk
block, WAL-logging and all, as the responsibility of the access method.
Surely that's going to look identical in all the seqams, if they all use the
same on-disk format. That also means that the sequence access methods can't
be implemented as plugins, as plugins can't do WAL-logging.

Well, it exposes log_sequence_tuple() - together with the added "am
private" column of pg_squence that allows to do quite a bit of different
things. I think unless we really implement pluggable RMGRs - which I
don't really see coming - we cannot be radically different.

The proposed abstraction leaks like a sieve. The plugin should not need
to touch anything but the private amdata field. log_sequence_tuple() is
way too low-level.

Just as a thought-experiment, imagine that we decided to re-implement
sequences so that all the sequence values are stored in one big table,
or flat-file in the data directory, instead of the current
implementation where we have a one-block relation file for each
sequence. If the sequence access method API is any good, it should stay
unchanged. That's clearly not the case with the proposed API.

The comment in seqam.c says that there's a private column reserved for
access method-specific data, called am_data, but that seems to be the only
mention of am_data in the patch.

It's amdata, not am_data :/. Directly at the end of pg_sequence.

Ah, got it.

Stepping back a bit and looking at this problem from a higher level, why
do you need to hack this stuff into the sequences? Couldn't you just
define a new set of functions, say bdr_currval() and bdr_nextval(), to
operate on these new kinds of sequences? You're not making much use of
the existing sequence infrastructure, anyway, so it might be best to
just keep the implementation completely separate. If you need it for
compatibility with applications, you could create facade
currval/nextval() functions that call the built-in version or the bdr
version depending on the argument.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#9)
Re: Sequence Access Method WIP

On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote:

On 15.11.2013 20:21, Andres Freund wrote:

Well, it exposes log_sequence_tuple() - together with the added "am
private" column of pg_squence that allows to do quite a bit of different
things. I think unless we really implement pluggable RMGRs - which I
don't really see coming - we cannot be radically different.

The proposed abstraction leaks like a sieve. The plugin should not need to
touch anything but the private amdata field. log_sequence_tuple() is way too
low-level.

Why? It's *less* low level than a good part of other crash recovery code
we have. I have quite some doubts about the layering, but it's imo
pretty sensible to centralize the wal logging code that plugins couldn't
write.

If you look at what e.g the _alloc callback currently gets passed.
a) Relation: Important for metadata like TupleDesc, name etc.
b) SeqTable entry: Important to setup state for cur/lastval
c) Buffer of the tuple: LSN etc.
d) HeapTuple itself: a place to store the tuples changed state

And if you then look at what gets modified in that callback:
Form_pg_sequence->amdata
->is_called
->last_value
->log_cnt
SeqTable ->last
SeqTable ->cached
SeqTable ->last_valid

You need is_called, last_valid because of our current representation of
sequences state (which we could change, to remove that need). The rest
contains values that need to be set depending on how you want your
sequence to behave:
* Amdata is obvious.
* You need log_cnt to influence/remember how big the chunks are you WAL log at
once are. Which e.g. you need to control if want a sequence that
doesn't leak values in crashes
* cached is needed to control the per-backend caching.

Just as a thought-experiment, imagine that we decided to re-implement
sequences so that all the sequence values are stored in one big table, or
flat-file in the data directory, instead of the current implementation where
we have a one-block relation file for each sequence. If the sequence access
method API is any good, it should stay unchanged. That's clearly not the
case with the proposed API.

I don't think we can entirely abstract away the reality that now they are
based on relations and might not be at some later point. Otherwise we'd
have to invent an API that copied all the data that's stored in struct
RelationData. Like name, owner, ...
Which non SQL accessible API we provide has a chance of providing that
level of consistency in the face of fundamental refactoring? I'd say
none.

Stepping back a bit and looking at this problem from a higher level, why do
you need to hack this stuff into the sequences? Couldn't you just define a
new set of functions, say bdr_currval() and bdr_nextval(), to operate on
these new kinds of sequences?

Basically two things:
a) User interface. For one everyone would need to reimplement the entire
sequence DDL from start. For another it means it's hard to write
(client) code that doesn't depend on a specific implementation.
b) It's not actually easy to get similar semantics in "userspace". How
would you emulate the crash-safe but non-transactional semantics of
sequences without copying most of sequence.c? Without writing
XLogInsert()s which we cannot do from a out-of-tree code?

You're not making much use of the existing sequence infrastructure, anyway, so it might be best to just keep the
implementation completely separate. If you need it for compatibility with
applications, you could create facade currval/nextval() functions that call
the built-in version or the bdr version depending on the argument.

That doesn't get you very far:
a) the default functions created by sequences are pg_catalog
prefixed. So you'd need to hack up the catalog to get your own functions
to be used if you want the application to work transparently. In which
you need to remember the former function because you now cannot call it
normally anymore. Yuck.
b) One would need nearly all of sequence.c again. You need the state
across calls, the locking, the WAL logging, DDL support. Pretty much
the only thing *not* used would be nextval_internal() and do_setval().

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#10)
Re: Sequence Access Method WIP

On 18.11.2013 11:48, Andres Freund wrote:

On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote:

On 15.11.2013 20:21, Andres Freund wrote:

Well, it exposes log_sequence_tuple() - together with the added "am
private" column of pg_squence that allows to do quite a bit of different
things. I think unless we really implement pluggable RMGRs - which I
don't really see coming - we cannot be radically different.

The proposed abstraction leaks like a sieve. The plugin should not need to
touch anything but the private amdata field. log_sequence_tuple() is way too
low-level.

Why? It's *less* low level than a good part of other crash recovery code
we have. I have quite some doubts about the layering, but it's imo
pretty sensible to centralize the wal logging code that plugins couldn't
write.

It doesn't go far enough, it's still too *low*-level. The sequence AM
implementation shouldn't need to have direct access to the buffer page
at all.

If you look at what e.g the _alloc callback currently gets passed.
a) Relation: Important for metadata like TupleDesc, name etc.
b) SeqTable entry: Important to setup state for cur/lastval
c) Buffer of the tuple: LSN etc.
d) HeapTuple itself: a place to store the tuples changed state

And if you then look at what gets modified in that callback:
Form_pg_sequence->amdata
->is_called
->last_value
->log_cnt
SeqTable ->last
SeqTable ->cached
SeqTable ->last_valid

You need is_called, last_valid because of our current representation of
sequences state (which we could change, to remove that need). The rest
contains values that need to be set depending on how you want your
sequence to behave:
* Amdata is obvious.
* You need log_cnt to influence/remember how big the chunks are you WAL log at
once are. Which e.g. you need to control if want a sequence that
doesn't leak values in crashes
* cached is needed to control the per-backend caching.

I don't think the sequence AM should be in control of 'cached'. The
caching is done outside the AM. And log_cnt probably should be passed to
the _alloc function directly as an argument, ie. the server code asks
the AM to allocate N new values in one call.

I'm thinking that the alloc function should look something like this:

seqam_alloc(Relation seqrel, int nrequested, Datum am_private)

And it should return:

int64 value - the first value allocated.
int nvalues - the number of values allocated.
am_private - updated private data.

The backend code handles the caching and logging of values. When it has
exhausted all the cached values (or doesn't have any yet), it calls the
AM's alloc function to get a new batch. The AM returns the new batch,
and updates its private state as necessary. Then the backend code
updates the relation file with the new values and the AM's private data.
WAL-logging and checkpointing is the backend's responsibility.

Just as a thought-experiment, imagine that we decided to re-implement
sequences so that all the sequence values are stored in one big table, or
flat-file in the data directory, instead of the current implementation where
we have a one-block relation file for each sequence. If the sequence access
method API is any good, it should stay unchanged. That's clearly not the
case with the proposed API.

I don't think we can entirely abstract away the reality that now they are
based on relations and might not be at some later point. Otherwise we'd
have to invent an API that copied all the data that's stored in struct
RelationData. Like name, owner, ...
Which non SQL accessible API we provide has a chance of providing that
level of consistency in the face of fundamental refactoring? I'd say
none.

I'm not thinking that we'd change sequences to not be relations. I'm
thinking that we might not want to store the state as a one-page file
anymore. In fact that was just discussed in the other thread titled
"init_sequence spill to hash table".

b) It's not actually easy to get similar semantics in "userspace". How
would you emulate the crash-safe but non-transactional semantics of
sequences without copying most of sequence.c? Without writing
XLogInsert()s which we cannot do from a out-of-tree code?

heap_inplace_update()

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#11)
Re: Sequence Access Method WIP

On 2013-11-18 12:50:21 +0200, Heikki Linnakangas wrote:

On 18.11.2013 11:48, Andres Freund wrote:
I don't think the sequence AM should be in control of 'cached'. The caching
is done outside the AM. And log_cnt probably should be passed to the _alloc
function directly as an argument, ie. the server code asks the AM to
allocate N new values in one call.

Sounds sane.

I'm thinking that the alloc function should look something like this:

seqam_alloc(Relation seqrel, int nrequested, Datum am_private)

I don't think we can avoid giving access to the other columns of
pg_sequence, stuff like increment, limits et all need to be available
for reading, so that'd also need to get passed in. And we need to signal
that am_private can be NULL, otherwise we'd end up with ugly ways to
signal that.
So I'd say to pass in the entire tuple, and return a copy? Alternatively
we can return am_private as a 'struct varlena *', so we can properly
signal empty values.

We also need a way to set am_private from outside
seqam_alloc/setval/... Many of the fancier sequences that people talked
about will need preallocation somewhere in the background. As proposed
that's easy enough to write using log_sequence_tuple(), this way we'd
need something that calls a callback with the appropriate buffer lock
held. So maybe a seqam_update(Relation seqrel, ...) callback that get's
called when somebody executes pg_sequence_update(oid)?

It'd probably a good idea to provide a generic function for checking
whether a new value falls in the boundaries of the sequence's min, max +
error handling.

I'm not thinking that we'd change sequences to not be relations. I'm
thinking that we might not want to store the state as a one-page file
anymore. In fact that was just discussed in the other thread titled
"init_sequence spill to hash table".

Yes, I read and even commented in that thread... But nothing in the
current proposed API would prevent you from going in that direction,
you'd just get passed in a different tuple/buffer.

b) It's not actually easy to get similar semantics in "userspace". How
would you emulate the crash-safe but non-transactional semantics of
sequences without copying most of sequence.c? Without writing
XLogInsert()s which we cannot do from a out-of-tree code?

heap_inplace_update()

That gets the crashsafe part partially (doesn't allow making the tuple
wider than before), but not the caching/stateful part et al. The point
is that you need most of sequence.c again.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#11)
Re: Sequence Access Method WIP

On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

It doesn't go far enough, it's still too *low*-level. The sequence AM
implementation shouldn't need to have direct access to the buffer page at
all.

I don't think the sequence AM should be in control of 'cached'. The caching
is done outside the AM. And log_cnt probably should be passed to the _alloc
function directly as an argument, ie. the server code asks the AM to
allocate N new values in one call.

I can't see what the rationale of your arguments is. All index Ams
write WAL and control buffer locking etc..

Do you have a new use case that shows why changes should happen? We
can't just redesign things based upon arbitrary decisions about what
things should or should not be possible via the API.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#13)
Re: Sequence Access Method WIP

On 18.11.2013 13:48, Simon Riggs wrote:

On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

It doesn't go far enough, it's still too *low*-level. The sequence AM
implementation shouldn't need to have direct access to the buffer page at
all.

I don't think the sequence AM should be in control of 'cached'. The caching
is done outside the AM. And log_cnt probably should be passed to the _alloc
function directly as an argument, ie. the server code asks the AM to
allocate N new values in one call.

I can't see what the rationale of your arguments is. All index Ams
write WAL and control buffer locking etc..

Index AM's are completely responsible for the on-disk structure, while
with the proposed API, both the AM and the backend are intimately aware
of the on-disk representation. Such a shared responsibility is not a
good thing in an API. I would also be fine with going 100% to the index
AM direction, and remove all knowledge of the on-disk layout from the
backend code and move it into the AMs. Then you could actually implement
the discussed "store all sequences in a single file" change by writing a
new sequence AM for it.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#2)
Re: Sequence Access Method WIP

On 14.11.2013 22:10, Simon Riggs wrote:

Includes test extension which allows sequences without gaps - "gapless".

I realize this is just for demonstration purposes, but it's worth noting
that it doesn't actually guarantee that when you use the sequence to
populate a column in the table, the column would not have gaps.
Sequences are not transactional, so rollbacks will still produce gaps.
The documentation is misleading on that point. Without a strong
guarantee, it's a pretty useless extension.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#15)
Re: Sequence Access Method WIP

On 18 November 2013 07:36, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 14.11.2013 22:10, Simon Riggs wrote:

Includes test extension which allows sequences without gaps - "gapless".

I realize this is just for demonstration purposes, but it's worth noting
that it doesn't actually guarantee that when you use the sequence to
populate a column in the table, the column would not have gaps. Sequences
are not transactional, so rollbacks will still produce gaps. The
documentation is misleading on that point. Without a strong guarantee, it's
a pretty useless extension.

True.

If I fix that problem, I should change the name to "lockup" sequences,
since only one transaction at a time could use the nextval.

Should I change the documentation, or just bin the idea?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#14)
Re: Sequence Access Method WIP

On 18 November 2013 07:06, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 18.11.2013 13:48, Simon Riggs wrote:

On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas@vmware.com>
wrote:

It doesn't go far enough, it's still too *low*-level. The sequence AM
implementation shouldn't need to have direct access to the buffer page at
all.

I don't think the sequence AM should be in control of 'cached'. The
caching
is done outside the AM. And log_cnt probably should be passed to the
_alloc
function directly as an argument, ie. the server code asks the AM to
allocate N new values in one call.

I can't see what the rationale of your arguments is. All index Ams
write WAL and control buffer locking etc..

Index AM's are completely responsible for the on-disk structure, while with
the proposed API, both the AM and the backend are intimately aware of the
on-disk representation. Such a shared responsibility is not a good thing in
an API. I would also be fine with going 100% to the index AM direction, and
remove all knowledge of the on-disk layout from the backend code and move it
into the AMs. Then you could actually implement the discussed "store all
sequences in a single file" change by writing a new sequence AM for it.

I think the way to resolve this is to do both of these things, i.e. a
two level API

1. Implement SeqAM API at the most generic level. Add a nextval() call
as well as alloc()

2. Also implement the proposed changes to alloc()

So the SeqAM would implement either nextval() or alloc() but not both

global sequences as envisaged for BDR would use a special alloc() call.

I don't think that is too much work, but I want to do this just once...

Thoughts on exact next steps for implementation please?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#17)
Re: Sequence Access Method WIP

On 24.11.2013 19:23, Simon Riggs wrote:

On 18 November 2013 07:06, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 18.11.2013 13:48, Simon Riggs wrote:

On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas@vmware.com>
wrote:

It doesn't go far enough, it's still too *low*-level. The sequence AM
implementation shouldn't need to have direct access to the buffer page at
all.

I don't think the sequence AM should be in control of 'cached'. The
caching
is done outside the AM. And log_cnt probably should be passed to the
_alloc
function directly as an argument, ie. the server code asks the AM to
allocate N new values in one call.

I can't see what the rationale of your arguments is. All index Ams
write WAL and control buffer locking etc..

Index AM's are completely responsible for the on-disk structure, while with
the proposed API, both the AM and the backend are intimately aware of the
on-disk representation. Such a shared responsibility is not a good thing in
an API. I would also be fine with going 100% to the index AM direction, and
remove all knowledge of the on-disk layout from the backend code and move it
into the AMs. Then you could actually implement the discussed "store all
sequences in a single file" change by writing a new sequence AM for it.

I think the way to resolve this is to do both of these things, i.e. a
two level API

1. Implement SeqAM API at the most generic level. Add a nextval() call
as well as alloc()

2. Also implement the proposed changes to alloc()

The proposed changes to alloc() would still suffer from all the problems
that I complained about. Adding a new API alongside doesn't help with that.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#18)
Re: Sequence Access Method WIP

On 25 November 2013 04:01, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

The proposed changes to alloc() would still suffer from all the problems
that I complained about. Adding a new API alongside doesn't help with that.

You made two proposals. I suggested implementing both.

What would you have me do?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#16)
Re: Sequence Access Method WIP

On 11/24/13 19:15, Simon Riggs wrote:

On 18 November 2013 07:36, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 14.11.2013 22:10, Simon Riggs wrote:

Includes test extension which allows sequences without gaps - "gapless".

I realize this is just for demonstration purposes, but it's worth noting
that it doesn't actually guarantee that when you use the sequence to
populate a column in the table, the column would not have gaps. Sequences
are not transactional, so rollbacks will still produce gaps. The
documentation is misleading on that point. Without a strong guarantee, it's
a pretty useless extension.

True.

If I fix that problem, I should change the name to "lockup" sequences,
since only one transaction at a time could use the nextval.

Should I change the documentation, or just bin the idea?

Just bin it. It would be useful if it could guarantee gaplessness, but I
don't see how to do that.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#19)
#22Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#11)
#23Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#22)
#24Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#23)
#25Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#23)
#26Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Petr Jelinek (#25)
#27Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#26)
#28Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Petr Jelinek (#27)
#29Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#28)
#30Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Petr Jelinek (#29)
#31Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#30)
#32Craig Ringer
craig@2ndquadrant.com
In reply to: Petr Jelinek (#27)
#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Craig Ringer (#32)
#34Petr Jelinek
petr@2ndquadrant.com
In reply to: Craig Ringer (#32)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Petr Jelinek (#31)
#36Petr Jelinek
petr@2ndquadrant.com
In reply to: Robert Haas (#35)
#37Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#36)
#38Robert Haas
robertmhaas@gmail.com
In reply to: Petr Jelinek (#37)
#39Craig Ringer
craig@2ndquadrant.com
In reply to: Petr Jelinek (#34)
#40Petr Jelinek
petr@2ndquadrant.com
In reply to: Robert Haas (#38)
#41Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#30)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Petr Jelinek (#40)
#43Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Petr Jelinek (#36)
#44Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#43)
#45Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#41)
#46Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#45)
#47José Luis Tallón
jltallon@adv-solutions.net
In reply to: Andres Freund (#46)
#48Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#46)
#49Andres Freund
andres@anarazel.de
In reply to: José Luis Tallón (#47)
#50José Luis Tallón
jltallon@adv-solutions.net
In reply to: Andres Freund (#49)
#51Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: José Luis Tallón (#50)
#52Andres Freund
andres@anarazel.de
In reply to: José Luis Tallón (#50)
#53Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#45)
#54Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#53)
#55Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#54)
#56Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Petr Jelinek (#55)
#57Petr Jelinek
petr@2ndquadrant.com
In reply to: Tomas Vondra (#56)
#58Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Petr Jelinek (#55)
#59Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#58)
#60Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#58)
#61Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#60)
#62Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Petr Jelinek (#61)
#63Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#62)
#64Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#63)
#65Robert Haas
robertmhaas@gmail.com
In reply to: Petr Jelinek (#64)
#66Petr Jelinek
petr@2ndquadrant.com
In reply to: Robert Haas (#65)
#67Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#66)
#68Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#64)
#69Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Petr Jelinek (#68)
#70Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#69)
#71Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#70)
#72Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Petr Jelinek (#68)
#73Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Heikki Linnakangas (#72)
#74Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#72)
#75Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#74)
#76Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Petr Jelinek (#75)
#77Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alvaro Herrera (#76)
#78Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#77)
#79Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#77)
#80Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#78)
#81Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#80)
#82Petr Jelinek
petr@2ndquadrant.com
In reply to: Alvaro Herrera (#76)
#83Vik Fearing
vik@postgresfriends.org
In reply to: Petr Jelinek (#82)
#84Petr Jelinek
petr@2ndquadrant.com
In reply to: Vik Fearing (#83)
#85Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Petr Jelinek (#84)
#86Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#85)
#87Thom Brown
thom@linux.com
In reply to: Petr Jelinek (#86)
#88Petr Jelinek
petr@2ndquadrant.com
In reply to: Thom Brown (#87)
#89Petr Jelinek
petr@2ndquadrant.com
In reply to: Heikki Linnakangas (#85)
#90Craig Ringer
craig@2ndquadrant.com
In reply to: Petr Jelinek (#89)
#91Craig Ringer
craig@2ndquadrant.com
In reply to: Craig Ringer (#90)
#92Petr Jelinek
petr@2ndquadrant.com
In reply to: Craig Ringer (#91)
#93Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Petr Jelinek (#92)
#94Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#93)
#95Petr Jelinek
petr@2ndquadrant.com
In reply to: Tom Lane (#94)
#96Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Petr Jelinek (#95)
#97Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#96)
#98Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alvaro Herrera (#96)
#99Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#98)
#100Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#99)
#101Petr Jelinek
petr@2ndquadrant.com
In reply to: Robert Haas (#100)
#102Robert Haas
robertmhaas@gmail.com
In reply to: Petr Jelinek (#101)
#103Petr Jelinek
petr@2ndquadrant.com
In reply to: Robert Haas (#102)
#104Alexander Korotkov
aekorotkov@gmail.com
In reply to: Petr Jelinek (#101)
#105Petr Jelinek
petr@2ndquadrant.com
In reply to: Alexander Korotkov (#104)
#106David Steele
david@pgmasters.net
In reply to: Petr Jelinek (#105)
#107Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Petr Jelinek (#105)
#108Petr Jelinek
petr@2ndquadrant.com
In reply to: Alvaro Herrera (#107)
#109Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#108)
#110Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Petr Jelinek (#109)
#111David Steele
david@pgmasters.net
In reply to: Fabrízio de Royes Mello (#110)
#112Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: David Steele (#111)
#113Petr Jelinek
petr@2ndquadrant.com
In reply to: Fabrízio de Royes Mello (#112)
#114Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Petr Jelinek (#113)
#115Petr Jelinek
petr@2ndquadrant.com
In reply to: Fabrízio de Royes Mello (#114)
#116Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Petr Jelinek (#115)
#117Petr Jelinek
petr@2ndquadrant.com
In reply to: Fabrízio de Royes Mello (#116)
#118Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Petr Jelinek (#117)
#119José Luis Tallón
jltallon@adv-solutions.net
In reply to: Fabrízio de Royes Mello (#118)
#120Petr Jelinek
petr@2ndquadrant.com
In reply to: José Luis Tallón (#119)
#121Petr Jelinek
petr@2ndquadrant.com
In reply to: Fabrízio de Royes Mello (#118)
#122Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Petr Jelinek (#121)
#123Petr Jelinek
petr@2ndquadrant.com
In reply to: Fabrízio de Royes Mello (#122)
#124Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Petr Jelinek (#123)