[RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

Started by Hannu Krosingover 13 years ago34 messageshackers
Jump to latest
#1Hannu Krosing
hannu@tm.ee

Hallo postgresql and replication hackers

This mail is an additional RFC which proposes a simple way to extend the
new logical replication feature so it can cover most usages of
skytools/pgq/londiste

While the current work for BDR/LCR (bi-directional replication/logical
replication)
using WAL is theoretically enought to cover _replication_ offered by
Londiste it
falls short in one important way - there is currently no support for
pure queueing,
that is for "streams" of data which does not need to be stored in the
source database.

Fortunately there is a simple solution - do not store it in the source
database :)

The only thing needed for adding this is to have a table type which

a) generates a INSERT record in WAL

and

b) does not actually store the data in a local file

If implemented in userspace it would be a VIEW (or table) with a
before/instead
trigger which logs the inserted data and then cancels the insert.

I'm sure this thing could be implemented, but I leave the tech
discussion to those
who are currently deep in WAL generation/reconstruction .

If we implement logged only tables / queues we would not only enable a more
performant pgQ replacement for implementing full Londiste / skytools
functionality
but would also become a very strong player to be used as persistent
basis for
message queueing solutions like ActiveMQ, StorMQ, any Advanced Message
Queuing Protocol (AMQP) and so on.

comments ?

Hannu Krosing

#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Hannu Krosing (#1)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On 16 October 2012 09:56, Hannu Krosing <hannu@2ndquadrant.com> wrote:

Hallo postgresql and replication hackers

This mail is an additional RFC which proposes a simple way to extend the
new logical replication feature so it can cover most usages of
skytools/pgq/londiste

While the current work for BDR/LCR (bi-directional replication/logical
replication) using WAL is theoretically enought to cover _replication_ offered by
Londiste it falls short in one important way - there is currently no support for pure
queueing, that is for "streams" of data which does not need to be stored in the source
database.

Fortunately there is a simple solution - do not store it in the source
database :)

The only thing needed for adding this is to have a table type which

a) generates a INSERT record in WAL

and

b) does not actually store the data in a local file

If implemented in userspace it would be a VIEW (or table) with a
before/instead
trigger which logs the inserted data and then cancels the insert.

I'm sure this thing could be implemented, but I leave the tech discussion to
those who are currently deep in WAL generation/reconstruction .

If we implement logged only tables / queues we would not only enable a more
performant pgQ replacement for implementing full Londiste / skytools
functionality
but would also become a very strong player to be used as persistent basis
for message queueing solutions like ActiveMQ, StorMQ, any Advanced Message
Queuing Protocol (AMQP) and so on.

Hmm, I was assuming that we'd be able to do that by just writing extra
WAL directly. But now you've made me think about it, that would be
very ugly.

Doing it this was, as you suggest, would allow us to write WAL records
for queuing/replication to specific queue ids. It also allows us to
have privileges assigned. So this looks like a good idea and might
even be possible for 9.3.

I've got a feeling we may want the word QUEUE again in the future, so
I think we should call this a MESSAGE QUEUE.

CREATE MESSAGE QUEUE foo;
DROP MESSAGE QUEUE foo;

GRANT INSERT ON MESSAGE QUEUE foo TO ...;
REVOKE INSERT ON MESSAGE QUEUE foo TO ...;

Rules wouldn't. DELETE and UPDATE wouldn't work, nor would SELECT.

Things for next release: Triggers, SELECT sees a stream of changes,
CHECK clauses to constrain what can be written.

One question: would we require the INSERT statement to parse against a
tupledesc, or would it be just a single blob of TEXT or can we send
any payload? I'd suggest just a single blob of TEXT, since that can be
XML or JSON etc easily enough.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3Hannu Krosing
hannu@tm.ee
In reply to: Simon Riggs (#2)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On 10/16/2012 11:18 AM, Simon Riggs wrote:

On 16 October 2012 09:56, Hannu Krosing <hannu@2ndquadrant.com> wrote:

Hallo postgresql and replication hackers

This mail is an additional RFC which proposes a simple way to extend the
new logical replication feature so it can cover most usages of
skytools/pgq/londiste

While the current work for BDR/LCR (bi-directional replication/logical
replication) using WAL is theoretically enought to cover _replication_ offered by
Londiste it falls short in one important way - there is currently no support for pure
queueing, that is for "streams" of data which does not need to be stored in the source
database.

Fortunately there is a simple solution - do not store it in the source
database :)

The only thing needed for adding this is to have a table type which

a) generates a INSERT record in WAL

and

b) does not actually store the data in a local file

If implemented in userspace it would be a VIEW (or table) with a
before/instead
trigger which logs the inserted data and then cancels the insert.

I'm sure this thing could be implemented, but I leave the tech discussion to
those who are currently deep in WAL generation/reconstruction .

If we implement logged only tables / queues we would not only enable a more
performant pgQ replacement for implementing full Londiste / skytools
functionality
but would also become a very strong player to be used as persistent basis
for message queueing solutions like ActiveMQ, StorMQ, any Advanced Message
Queuing Protocol (AMQP) and so on.

Hmm, I was assuming that we'd be able to do that by just writing extra
WAL directly. But now you've made me think about it, that would be
very ugly.

Doing it this was, as you suggest, would allow us to write WAL records
for queuing/replication to specific queue ids. It also allows us to
have privileges assigned. So this looks like a good idea and might
even be possible for 9.3.

I've got a feeling we may want the word QUEUE again in the future, so
I think we should call this a MESSAGE QUEUE.

CREATE MESSAGE QUEUE foo;
DROP MESSAGE QUEUE foo;

I would like this to be very similar to a table, so it would be

CREATE MESSAGE QUEUE(fieldname type, ...) foo;

perhaps even allowing defaults and constraints. again, this
depends on how complecxt the implementation would be.

for the receiving side it would look like a table with only inserts,
and in this case there could even be a possibility to use it as
a remote log table.

Show quoted text

GRANT INSERT ON MESSAGE QUEUE foo TO ...;
REVOKE INSERT ON MESSAGE QUEUE foo TO ...;

Rules wouldn't. DELETE and UPDATE wouldn't work, nor would SELECT.

Things for next release: Triggers, SELECT sees a stream of changes,
CHECK clauses to constrain what can be written.

One question: would we require the INSERT statement to parse against a
tupledesc, or would it be just a single blob of TEXT or can we send
any payload? I'd suggest just a single blob of TEXT, since that can be
XML or JSON etc easily enough.

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Hannu Krosing (#3)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On 16 October 2012 10:29, Hannu Krosing <hannu@2ndquadrant.com> wrote:

I would like this to be very similar to a table, so it would be

CREATE MESSAGE QUEUE(fieldname type, ...) foo;

perhaps even allowing defaults and constraints. again, this
depends on how complecxt the implementation would be.

Presumably just CHECK constraints, not UNIQUE or FKs.
Indexes would not be allowed.

for the receiving side it would look like a table with only inserts,
and in this case there could even be a possibility to use it as
a remote log table.

The queue data would be available via the API, so it can look like anything.

It would be good to identify this with a new rmgr id.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#5Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#3)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On 10/16/2012 11:29 AM, Hannu Krosing wrote:

On 10/16/2012 11:18 AM, Simon Riggs wrote:

On 16 October 2012 09:56, Hannu Krosing <hannu@2ndquadrant.com> wrote:

Hallo postgresql and replication hackers

This mail is an additional RFC which proposes a simple way to extend
the
new logical replication feature so it can cover most usages of
skytools/pgq/londiste

While the current work for BDR/LCR (bi-directional replication/logical
replication) using WAL is theoretically enought to cover
_replication_ offered by
Londiste it falls short in one important way - there is currently no
support for pure
queueing, that is for "streams" of data which does not need to be
stored in the source
database.

Fortunately there is a simple solution - do not store it in the source
database :)

The only thing needed for adding this is to have a table type which

a) generates a INSERT record in WAL

and

b) does not actually store the data in a local file

If implemented in userspace it would be a VIEW (or table) with a
before/instead
trigger which logs the inserted data and then cancels the insert.

I'm sure this thing could be implemented, but I leave the tech
discussion to
those who are currently deep in WAL generation/reconstruction .

If we implement logged only tables / queues we would not only enable
a more
performant pgQ replacement for implementing full Londiste / skytools
functionality
but would also become a very strong player to be used as persistent
basis
for message queueing solutions like ActiveMQ, StorMQ, any Advanced
Message
Queuing Protocol (AMQP) and so on.

Hmm, I was assuming that we'd be able to do that by just writing extra
WAL directly. But now you've made me think about it, that would be
very ugly.

Doing it this was, as you suggest, would allow us to write WAL records
for queuing/replication to specific queue ids. It also allows us to
have privileges assigned. So this looks like a good idea and might
even be possible for 9.3.

I've got a feeling we may want the word QUEUE again in the future, so
I think we should call this a MESSAGE QUEUE.

CREATE MESSAGE QUEUE foo;
DROP MESSAGE QUEUE foo;

I would like this to be very similar to a table, so it would be

CREATE MESSAGE QUEUE(fieldname type, ...) foo;

perhaps even allowing defaults and constraints. again, this
depends on how complecxt the implementation would be.

for the receiving side it would look like a table with only inserts,
and in this case there could even be a possibility to use it as
a remote log table.

To clarify - this is intended to be a mirror image of UNLOGGED table

That is , as much as possible a full table, except that no data gets
written, which means that

a) indexes do not make any sense
b) exclusion and unique constraints dont make any sense
c) select, update and delete always see an empty table

all these should probably throw and error, analogous to how VIEWs
currently work.

It could be also described as a write-only table, except that it is
possible to materialise it as a real table on the receiving side

Show quoted text

GRANT INSERT ON MESSAGE QUEUE foo TO ...;
REVOKE INSERT ON MESSAGE QUEUE foo TO ...;

Rules wouldn't. DELETE and UPDATE wouldn't work, nor would SELECT.

Things for next release: Triggers, SELECT sees a stream of changes,
CHECK clauses to constrain what can be written.

One question: would we require the INSERT statement to parse against a
tupledesc, or would it be just a single blob of TEXT or can we send
any payload? I'd suggest just a single blob of TEXT, since that can be
XML or JSON etc easily enough.

#6Josh Berkus
josh@agliodbs.com
In reply to: Hannu Krosing (#1)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

Hannu,

Can you explain in more detail how this would be used on the receiving
side? I'm unable to picture it from your description.

I'm also a bit reluctant to call this a "message queue", since it lacks
the features required for it to be used as an application-level queue.
"REPLICATION MESSAGE", maybe?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#6)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On 16 October 2012 23:03, Josh Berkus <josh@agliodbs.com> wrote:

Can you explain in more detail how this would be used on the receiving
side? I'm unable to picture it from your description.

This will allow implementation of pgq in core, as discussed many times
at cluster hackers meetings.

I'm also a bit reluctant to call this a "message queue", since it lacks
the features required for it to be used as an application-level queue.

It's the input end of an application-level queue. In this design the
queue is like a table, so we need SQL grammar to support this new type
of object. Replication message doesn't describe this, since it has
little if anything to do with replication and if anything its a
message type, not a message.

You're right that Hannu needs to specify the rest of the design and
outline the API. The storage of the queue is "in WAL", which raises
questions about how the API will guarantee we read just once from the
queue and what happens when queue overflows. The simple answer would
be we put everything in a table somewhere else, but that needs more
careful specification to show we have both ends of the queue and a
working design.

Do we need a new object at all? Can we not just define a record type,
then define messages using that type? At the moment I think the
named-object approach works better, but we should consider that.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#8Hannu Krosing
hannu@tm.ee
In reply to: Josh Berkus (#6)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On 10/17/2012 12:03 AM, Josh Berkus wrote:

Hannu,

Can you explain in more detail how this would be used on the receiving
side? I'm unable to picture it from your description.

It would be used similar to how the event tables in pgQ (from skytools)
is used - as a source of "events" to be replied on the subscriber side.

(For discussion sake let's just call this LOGGED ONLY TABLE, as opposed
to UNLOGGED TABLE we already have)

The simplest usage would be implementing "remote log tables" that is
tables, where you do INSERT on the master side, but it "inserts" only
a logical WAL record and nothing else.

On subscriber side your replay process reads this WAL record as an
"insert event" and if the table is declared as an ordinary table on
subscriber, it performs an insert there.

This would make it trivial to implement a persistent remote log table
with minimal required amount of writing on the master side.

We could even implement a log table which captures also log entries
from aborted transactions by treating ROLLBACK as COMMIT for this
table.

But the subscriber side could also do other things instead (or in
addition to) filling a log table. For example, it could create a
partitioned
table instead of a plain table defined on the provider side.

There is support and several example replay agents in skytools package
which do this based on pgQ

Or you could do computations/materialised views based on "events" from
the table.

Or you could use the "insert events"/wal records as a base for some
other remote processing, like sending out e-mails .

There is also support for these kinds of things in skytools.

I'm also a bit reluctant to call this a "message queue", since it lacks
the features required for it to be used as an application-level queue.
"REPLICATION MESSAGE", maybe?

Initially I'd just stick with LOG ONLY TABLE or QUEUE based on what
it does, not on how it could be used.

LOGGED ONLY TABLE is very technical description of realisation - I'd
prefer it to work as mush like a table as possible, similar to how VIEW
currently works - for all usages that make sense, you can simply
substitute it for a TABLE

QUEUE emphasizes the aspect of logged only table that it accepts
"records" in a certain order, persists these and then quarantees
that they can be read out in exact the same order - all this being
guaranteed by existing WAL mechanisms.

It is not meant to be a full implementation of application level queuing
system though but just the capture, persisting and distribution parts

Using this as an "application level queue" needs a set of interface
functions to extract the events and also to keep track of the processed
events. As there is no general consensus what these shoul be (like if
processing same event twice is allowed) this part is left for specific
queue consumer implementations.

--------------------
Hannu Krosing

#9Bruce Momjian
bruce@momjian.us
In reply to: Hannu Krosing (#8)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On Wed, Oct 17, 2012 at 11:26 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote:

The simplest usage would be implementing "remote log tables" that is
tables, where you do INSERT on the master side, but it "inserts" only
a logical WAL record and nothing else.

On subscriber side your replay process reads this WAL record as an
"insert event" and if the table is declared as an ordinary table on
subscriber, it performs an insert there.

What kinds of applications would need that?

--
greg

#10Chris Browne
cbbrowne@acm.org
In reply to: Bruce Momjian (#9)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

Well, replication is arguably a relevant case.

For Slony, the origin/master node never cares about logged changes - that
data is only processed on replicas. Now, that's certainly a little
weaselly - the log data (sl_log_*) has got to get read to get to the
replica.

This suggests, nonetheless, a curiously different table structure than is
usual, and I could see this offering interesting possibilities.

The log tables are only useful to read in transaction order, which is
pretty well the order data gets written to WAL, so perhaps we could have
savings by only writing data to WAL...

It occurs to me that this notion might exist as a special sort of table,
interesting for pgq as well as Slony, which consists of:

- table data is stored only in WAL
- an index supports quick access to this data, residing in WAL
- TOASTing perhaps unneeded?
- index might want to be on additional attributes
- the triggers-on-log-tables thing Slony 2.2 does means we want these
tables to support triggers
- if data is only held in WAL, we need to hold the WAL until (mumble,
later, when known to be replicated)
- might want to mix local updates with updates imported from remote nodes

I think it's a misnomer to think this is about having the data not locally
accessible. Rather, it has a pretty curious access and storage pattern.

And a slick pgq queue would likely make a good Slony log, too.

#11Josh Berkus
josh@agliodbs.com
In reply to: Hannu Krosing (#8)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

It is not meant to be a full implementation of application level queuing
system though but just the capture, persisting and distribution parts

Using this as an "application level queue" needs a set of interface
functions to extract the events and also to keep track of the processed
events. As there is no general consensus what these shoul be (like if
processing same event twice is allowed) this part is left for specific
queue consumer implementations.

Well, but AFAICT, you've already prohibited features through your design
which are essential to application-level queues, and are implemented by,
for example, pgQ.

1. your design only allows the queue to be read on replicas, not on the
node where the item was inserted.

2. if you can't UPDATE or DELETE queue items -- or LOCK them -- how on
earth would a client know which items they have executed and which they
haven't?

3. Double-down on #2 in a multithreaded environment.

For an application-level queue, the base functionality is:

ADD ITEM
READ NEXT (#) ITEM(S)
LOCK ITEM
DELETE ITEM

More sophisticated an useful queues also allow:

READ NEXT UNLOCKED ITEM
LOCK NEXT UNLOCKED ITEM
UPDATE ITEM
READ NEXT (#) UNSEEN ITEM(S)

The design you describe seems to prohibit pretty much all of the above
operations after READ NEXT. This makes it completely useless as a
application-level queue.

And, for that matter, if your new queue only accepts INSERTs, why not
just improve LISTEN/NOTIFY so that it's readable on replicas? What does
this design buy you that that doesn't?

Quite possibly you have plans which answer all of the above, but they
aren't at all clear in your RFC.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#12Chris Browne
cbbrowne@acm.org
In reply to: Josh Berkus (#11)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On Wed, Oct 17, 2012 at 4:25 PM, Josh Berkus <josh@agliodbs.com> wrote:

It is not meant to be a full implementation of application level queuing
system though but just the capture, persisting and distribution parts

Using this as an "application level queue" needs a set of interface
functions to extract the events and also to keep track of the processed
events. As there is no general consensus what these shoul be (like if
processing same event twice is allowed) this part is left for specific
queue consumer implementations.

Well, but AFAICT, you've already prohibited features through your design
which are essential to application-level queues, and are implemented by,
for example, pgQ.

1. your design only allows the queue to be read on replicas, not on the
node where the item was inserted.

I commented separately on this; I'm pretty sure there needs to be a
way to read the queue on a replica, yes, indeed.

2. if you can't UPDATE or DELETE queue items -- or LOCK them -- how on
earth would a client know which items they have executed and which they
haven't?

If the items are actually stored in WAL, then it seems well and truly
impossible to do any of those three things directly.

What could be done, instead, would be to add "successor" items to
indicate that they have been dealt with, in effect, back-references.

You don't get to UPDATE or DELETE; instead, you do something like:

INSERT into queue (reference_to_xid, reference_to_id_in_xid, action)
values (old_xid_1, old_id_within_xid_1, 'COMPLETED'), (old_xid_2,
old_id_within_xid_2, 'CANCELLED');

In a distributed context, it's possible that multiple nodes could be
reading from the same queue, so that while "process at least once" is
no trouble, "process at most once" is just plain troublesome.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

#13Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#11)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On 17 October 2012 21:25, Josh Berkus <josh@agliodbs.com> wrote:

It is not meant to be a full implementation of application level queuing
system though but just the capture, persisting and distribution parts

Using this as an "application level queue" needs a set of interface
functions to extract the events and also to keep track of the processed
events. As there is no general consensus what these shoul be (like if
processing same event twice is allowed) this part is left for specific
queue consumer implementations.

Well, but AFAICT, you've already prohibited features through your design
which are essential to application-level queues, and are implemented by,
for example, pgQ.

1. your design only allows the queue to be read on replicas, not on the
node where the item was inserted.

2. if you can't UPDATE or DELETE queue items -- or LOCK them -- how on
earth would a client know which items they have executed and which they
haven't?

3. Double-down on #2 in a multithreaded environment.

It's hard to work out how to reply to this because its just so off
base. I don't agree with the restrictions you think you see at all,
saying it politely rather than giving a one word answer.

The problem here is you phrase these things with too much certainty,
seeing only barriers. The "how on earth?" vibe is not appropriate at
all. It's perfectly fine to ask for answers to those difficult
questions, but don't presume that there are no answers, or that you
know with certainty they are even hard ones. By phrasing things in
such a closed way the only way forwards is through you, which does not
help.

All we're discussing is moving a successful piece of software into
core, which has been discussed for years at the international
technical meetings we've both been present at. I think an open
viewpoint on the feasibility of that would be reasonable, especially
when it comes from one of the original designers.

I apologise for making a personal comment, but this does affect the
technical discussion.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: Hannu Krosing (#8)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On 17 October 2012 11:26, Hannu Krosing <hannu@2ndquadrant.com> wrote:

LOGGED ONLY TABLE is very technical description of realisation - I'd
prefer it to work as mush like a table as possible, similar to how VIEW
currently works - for all usages that make sense, you can simply
substitute it for a TABLE

QUEUE emphasizes the aspect of logged only table that it accepts
"records" in a certain order, persists these and then quarantees
that they can be read out in exact the same order - all this being
guaranteed by existing WAL mechanisms.

It is not meant to be a full implementation of application level queuing
system though but just the capture, persisting and distribution parts

Using this as an "application level queue" needs a set of interface
functions to extract the events and also to keep track of the processed
events. As there is no general consensus what these shoul be (like if
processing same event twice is allowed) this part is left for specific
queue consumer implementations.

The two halves of the queue are the TAIL/entry point and the HEAD/exit
point. As you point out these could be on the different servers,
wherever the logical changes flow to, but could also be on the same
server. When the head and tail are on the same server, the MESSAGE
QUEUE syntax seems appropriate, but I agree that calling it that when
its just a head or just a tail seems slightly misleading.

I guess the question is whether we provide a full implementation or
just the first half.

We do, I think, want a full queue implementation in core. We also want
to allow other queue implementations to interface with Postgres, so we
probably want to allow "first half" only as well. Meaning we want both
head and tail separately in core code. The question is whether we
require both head and tail in core before we allow commit, to which I
would say I think adding the tail first is OK, and adding the head
later when we know exactly the design.

Having said that, the LOGGING ONLY syntax makes me shiver. Better name?

I should also add that this is an switchable sync/asynchronous
transactional queue, whereas LISTEN/NOTIFY is a synchronous
transactional queue.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#15Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#13)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

Simon,

It's hard to work out how to reply to this because its just so off
base. I don't agree with the restrictions you think you see at all,
saying it politely rather than giving a one word answer.

You have inside knowledge of Hannu's design. I am merely going from his
description *on this list*, because that's all I have to go in.

He requested comments, so here I am, commenting. I'm *hoping* that it's
merely the description which is poor and not the conception of the
feature. *As Hannu described the feature* it sounds useless and
obscure, and miles away from powering any kind of general queueing
mechanism. Or anything we discussed at the clustering meetings.

And, again, if you didn't want comments, you shouldn't have posted an RFC.

All we're discussing is moving a successful piece of software into
core, which has been discussed for years at the international
technical meetings we've both been present at. I think an open
viewpoint on the feasibility of that would be reasonable, especially
when it comes from one of the original designers.

When I ask you for technical clarification or bring up potential
problems with a 2Q feature, you consistently treat it as a personal
attack and are emotionally defensive instead of answering my technical
questions. This, in turn, frustrates the heck out of me (and others)
because we can't get the technical questions answered. I don't want you
to justify yourself, I want a clear technical spec.

I'm asking these questions because I'm excited about ReplicationII, and
I want it to be the best feature it can possibly be.

Or, as we tell many new contributors, "We wouldn't bring up potential
problems and ask lots of questions if we weren't interested in the feature."

Now, on to the technical questions:

QUEUE emphasizes the aspect of logged only table that it accepts
"records" in a certain order, persists these and then quarantees
that they can be read out in exact the same order - all this being
guaranteed by existing WAL mechanisms.

It is not meant to be a full implementation of application level queuing
system though but just the capture, persisting and distribution parts

Using this as an "application level queue" needs a set of interface
functions to extract the events and also to keep track of the processed
events. As there is no general consensus what these shoul be (like if
processing same event twice is allowed) this part is left for specific
queue consumer implementations.

While implementations vary, I think you'll find that the set of
operations required for a full-featured application queue are remarkably
similar across projects. Personally, I've worked with celery, Redis,
AMQ, and RabbitMQ, as well as a custom solution on top of pgQ. The
design, as you've described it, make several of these requirements
unreasonably convoluted to implement.

It sounds to me like the needs of internal queueing and application
queueing may be hopelessly divergent. That was always possible, and
maybe the answer is to forget about application queueing and focus on
making this mechanism work for replication and for matviews, the two
features we *know* we want it for. Which don't need the application
queueing features I described AFAIK.

The two halves of the queue are the TAIL/entry point and the HEAD/exit
point. As you point out these could be on the different servers,
wherever the logical changes flow to, but could also be on the same
server. When the head and tail are on the same server, the MESSAGE
QUEUE syntax seems appropriate, but I agree that calling it that when
its just a head or just a tail seems slightly misleading.

Yeah, that's why I was asking for clarification; the way Hannu described
it, it sounded like it *couldn't* be read on the insert node, but only
on a replica.

We do, I think, want a full queue implementation in core. We also want
to allow other queue implementations to interface with Postgres, so we
probably want to allow "first half" only as well. Meaning we want both
head and tail separately in core code. The question is whether we
require both head and tail in core before we allow commit, to which I
would say I think adding the tail first is OK, and adding the head
later when we know exactly the design.

I'm just pointing out that some of the requirements of the design for
the replication queue may conflict with a design for a full-featured
application queue.

I don't quite follow you on what you mean by "head" vs. "tail". Explain?

Having said that, the LOGGING ONLY syntax makes me shiver. Better name?

I suck at names. Sorry.

I should also add that this is an switchable sync/asynchronous
transactional queue, whereas LISTEN/NOTIFY is a synchronous
transactional queue.

Thanks for explaining.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

#16Claudio Freire
klaussfreire@gmail.com
In reply to: Josh Berkus (#15)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On Thu, Oct 18, 2012 at 2:33 PM, Josh Berkus <josh@agliodbs.com> wrote:

I should also add that this is an switchable sync/asynchronous
transactional queue, whereas LISTEN/NOTIFY is a synchronous
transactional queue.

Thanks for explaining.

New here, I missed half the conversation, but since it's been brought
up and (to me wrongfully) dismissed, I'd like to propose:

NOTIFY [ALL|ONE] [REMOTE|LOCAL|CLUSTER|DOWNSTREAM] ASYNCHRONOUSLY
LISTEN [REMOTE|LOCAL|CLUSTER|UPSTREAM] too for good measure.

That ought to work out fine as SQL constructs go, implementation aside.

That's not enough for matviews, but it is IMO a good starting point.
All you need after that, are triggers for notifying automatically upon
insert, and some mechanism to attach triggers to a channel for the
receiving side.

Since channels are limited to short strings, maybe a different kind of
object (but with similar manipulation syntax) ought to be created. The
CREATE QUEUE command, in fact, could be creating such a channel. The
channel itself won't be WAL-only, just the messages going through it.
This (I think) would solve locking issues.

#17Hannu Krosing
hannu@tm.ee
In reply to: Josh Berkus (#15)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On 10/18/2012 07:33 PM, Josh Berkus wrote:

Simon,

It's hard to work out how to reply to this because its just so off
base. I don't agree with the restrictions you think you see at all,
saying it politely rather than giving a one word answer.

You have inside knowledge of Hannu's design.

Actually Simon has currently no more knowledge of this specific
design than you do - I posted this on this list as soon as I had figured
it out as a possible solution of a specific problem of supporting
full pgQ/Londiste functionality in WAL based logical replication
with minimal overhead.

(well, actually I let it settle a few weeks, but i did not discuss
this off-list before ).

Simon may have better grasp of it thanks to having done work
on the BDR/Logical Replication design and thus having better or
at least more recent understanding of issues involved in Logical
Replication.

When mapping londiste/Slony message capture to Logical WAL
the WAL already _is_ the event queue for replication.
NOT LOGGED tables make it also usable for non-replication
things using same mechanisms. (the equivalent in trigger-based
system would be a log trigger which captures insert event and then
cancels an insert).

I am merely going from his
description *on this list*, because that's all I have to go in.

He requested comments, so here I am, commenting. I'm *hoping* that it's
merely the description which is poor and not the conception of the
feature. *As Hannu described the feature* it sounds useless and
obscure, and miles away from powering any kind of general queueing
mechanism.

If we describe a queue as something you put stuff in at one end and
get it out in same or some other specific order at the other end, then
WAL _is_ a queue when you use it for replication (if you just write to it,
then it is "Log", if you write and read, it is "Queue")

That is, the WAL already is a form of persistent and ordered (that is
how WAL works)
stream of messages ("WAL records") that are generated on the "master"
and replayed on one or more consumers (called "slaves" in case of simple
replication)

All it takes to make this scenario work is keeping track of LSN or simply
log position on the slave side.

What you seem to be wanting is support for a cooperative consumers,
that is multiple consumers on the same queue working together and
sharing the work to process the incoming event .

This can be easily achieved using a single ordered event stream and
extra bookkeeping structures on the consumer side (look at cooperative
consumer samples in skytools).

What I suggested was optimisation for the case where you know that you
will never need the data on the master side and are only interested in it
on the slave side.

By writing rows/events/messages only to log (or steam or queue), you
avoid the need to later clean up it on the master by either DELETE or
TRUNCATE or rotating tables.

For both physical and logical streaming the WAL _is_ the queue of events
that were recorded on master and need to be replied on the slave.

Thanks to introducing logical replication, it now makes sense to have
actions recorded _only_ in this queue and this is what the whole RC was
about.

I recommend that you introduce yourself a bit to skytools/pgQ to get a
better feel of the things I am talking about. Londiste is just one
application
built on a general event logging, transport and transform/replay (that is
what i'd call queueing :) ) system pgQ.

pgQ does have its roots in Slony an(and earlier) replication systems,
but it
is by no means _only_ a replication system.

The LOG ONLY tables are _not_ needed for pure replication (like Slony) but
they make replication + queueing type solutions like skytools/pgQ much more
efficient as they do away wuth the need to maintain the queued data on
the
master side where it will never be needed ( just to reapeat this once more
)

Or anything we discussed at the clustering meetings.

And, again, if you didn't want comments, you shouldn't have posted an RFC.

I did want comments and as far as I know I do not see you as hostile :)

I do understand that what you mean by QUEUE (and specially as a
MESSAGE QUEUE) is different from what I described.
You seem to want specifically an implementation of cooperative
consumers for a generic queue.

The answer is yes, it is possible to build this on WAL, or table based
event logs/queue of londiste / slony. It just takkes a little extra
management on the receiving side to do the record locking and
distribution between cooperating consumers.

All we're discussing is moving a successful piece of software into
core, which has been discussed for years at the international
technical meetings we've both been present at. I think an open
viewpoint on the feasibility of that would be reasonable, especially
when it comes from one of the original designers.

When I ask you for technical clarification or bring up potential
problems with a 2Q feature, you consistently treat it as a personal
attack and are emotionally defensive instead of answering my technical
questions. This, in turn, frustrates the heck out of me (and others)
because we can't get the technical questions answered. I don't want you
to justify yourself, I want a clear technical spec.

Currently the "clear tech spec" is just this:

* works as table on INSERTS up to inserting logical WAL record
describing the
insert but no data is inserted locally.

with all things that follow from the local table having no data
- unique constraints don't make sense
- indexes make no sense
- updates and deletes hit no data
- etc. . .

I'm asking these questions because I'm excited about ReplicationII, and
I want it to be the best feature it can possibly be.

Or, as we tell many new contributors, "We wouldn't bring up potential
problems and ask lots of questions if we weren't interested in the feature."

Now, on to the technical questions:

QUEUE emphasizes the aspect of logged only table that it accepts
"records" in a certain order, persists these and then quarantees
that they can be read out in exact the same order - all this being
guaranteed by existing WAL mechanisms.

It is not meant to be a full implementation of application level queuing
system though but just the capture, persisting and distribution parts

Using this as an "application level queue" needs a set of interface
functions to extract the events and also to keep track of the processed
events. As there is no general consensus what these shoul be (like if
processing same event twice is allowed) this part is left for specific
queue consumer implementations.

While implementations vary, I think you'll find that the set of
operations required for a full-featured application queue are remarkably
similar across projects. Personally, I've worked with celery, Redis,
AMQ, and RabbitMQ, as well as a custom solution on top of pgQ. The
design, as you've described it, make several of these requirements
unreasonably convoluted to implement.

As Simon explained, the initial RFC was just about not keeping the
data in local table if we know it will never be accessed (at leas not
for anything except vacuum and delete/truncate)

This is something that made no sense for physical replication .

It sounds to me like the needs of internal queueing and application
queueing may be hopelessly divergent. That was always possible, and
maybe the answer is to forget about application queueing and focus on
making this mechanism work for replication and for matviews, the two
features we *know* we want it for. Which don't need the application
queueing features I described AFAIK.

The two halves of the queue are the TAIL/entry point and the HEAD/exit
point. As you point out these could be on the different servers,
wherever the logical changes flow to, but could also be on the same
server. When the head and tail are on the same server, the MESSAGE
QUEUE syntax seems appropriate, but I agree that calling it that when
its just a head or just a tail seems slightly misleading.

Yeah, that's why I was asking for clarification; the way Hannu described
it, it sounded like it *couldn't* be read on the insert node, but only
on a replica.

Well, the reading is done the same way any WAL reading is done -
you subscribe to the stream and from that point on get the records
in LSN order.

It is very hard for me to tell for sure if walsender->walreceiver combo
"reads the events" on master or slave side

We do, I think, want a full queue implementation in core. We also want
to allow other queue implementations to interface with Postgres, so we
probably want to allow "first half" only as well. Meaning we want both
head and tail separately in core code. The question is whether we
require both head and tail in core before we allow commit, to which I
would say I think adding the tail first is OK, and adding the head
later when we know exactly the design.

I'm just pointing out that some of the requirements of the design for
the replication queue may conflict with a design for a full-featured
application queue.

I don't quite follow you on what you mean by "head" vs. "tail". Explain?

HEAD is the queue producer, where the events go in (any insert on master)

TAIL (to avoid another word) is where they come out
(walreader -> walreceiver moving the events to slave)

Think of an analogy with a snake feeding on berries used by
an ant colony to get the nutrients in the berries to its nest :)

Ans there is no processing inside the snake - the work of
distributing said nutrients once they have arrived to the nest has
to be organised by the cooperative colony of ants on that end, the
snake just guarantees that the berries arrive in the same order they
get in.

I guess this organisation of works after the events are delivered is
what you were after when asking about "an application level queue".

Having said that, the LOGGING ONLY syntax makes me shiver. Better name?

I guess WRITE ONLY tables would get us more publicity would not be
entirely correct, as the data is readable from the log .

Hannu

#18Hannu Krosing
hannu@tm.ee
In reply to: Claudio Freire (#16)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On 10/18/2012 08:36 PM, Claudio Freire wrote:

The CREATE QUEUE command, in fact, could be creating
such a channel. The channel itself won't be WAL-only, just
the messages going through it. This (I think) would solve locking issues.

Hmm. Maybe we should think of implementing this as REMOTE TABLE, that
is a table which gets no real data stored locally but all insert got
through WAL
and are replayed as real inserts on slave side.

Then if you want matviews or partitioned table, you just attach triggers to
the table on slave side to do them.

This would be tangential to their use as pure queues which would happen
at the level of plugins to logical replication.

--------------
Hannu

#19Chris Browne
cbbrowne@acm.org
In reply to: Hannu Krosing (#17)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On Thu, Oct 18, 2012 at 2:56 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:

* works as table on INSERTS up to inserting logical WAL record describing
the
insert but no data is inserted locally.

with all things that follow from the local table having no data
- unique constraints don't make sense
- indexes make no sense
- updates and deletes hit no data
- etc. . .

Yep, I think I was understanding those aspects.

I think I disagree that "indexes make no sense."

I think that it would be meaningful to have an index type for this,
one that is a pointer at WAL records, to enable efficiently jumping to
the right WAL log to start accessing a data stream, given an XID.
That's a fundamentally different sort of index than we have today
(much the way that hash indexes, GiST indexes, and BTrees differ from
one another).

I'm having a hard time thinking about what happens if you have
cascaded replication, and want to carry records downstream. In that
case, the XIDs from the original system aren't miscible with the XIDs
in a message queue on a downstream database, and I'm not sure what
we'd want to do. Keep the original XIDs in a side attribute, maybe?
It seems weird, at any rate. Or perhaps data from foreign sources has
got to go into a separate queue/'sorta-table', and thereby have two
XIDs, the "source system XID" and the "when we loaded it in locally
XID."
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

#20Ants Aasma
ants.aasma@cybertec.at
In reply to: Hannu Krosing (#18)
Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

On Thu, Oct 18, 2012 at 10:03 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:

Hmm. Maybe we should think of implementing this as REMOTE TABLE, that
is a table which gets no real data stored locally but all insert got through
WAL
and are replayed as real inserts on slave side.

FWIW, MySQL calls this exact concept the "black hole" storage engine.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

#21Hannu Krosing
hannu@tm.ee
In reply to: Ants Aasma (#20)
#22Hannu Krosing
hannu@tm.ee
In reply to: Chris Browne (#19)
#23Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#15)
In reply to: Hannu Krosing (#21)
#25Josh Berkus
josh@agliodbs.com
In reply to: Hannu Krosing (#17)
#26Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Josh Berkus (#25)
#27Bruce Momjian
bruce@momjian.us
In reply to: Chris Browne (#10)
#28Hannu Krosing
hannu@tm.ee
In reply to: Bruce Momjian (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#28)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#11)
#31Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#29)
#32Hannu Krosing
hannu@tm.ee
In reply to: Robert Haas (#30)
#33Josh Berkus
josh@agliodbs.com
In reply to: Hannu Krosing (#32)
#34Josh Berkus
josh@agliodbs.com
In reply to: Josh Berkus (#33)