Can Postgres Not Do This Safely ?!?

Started by Karl Pickettover 15 years ago12 messagesgeneral
Jump to latest
#1Karl Pickett
karl.pickett@gmail.com

Hello Postgres Hackers,

We have a simple 'event log' table that is insert only (by multiple
concurrent clients). It has an integer primary key. We want to do
incremental queries of this table every 5 minutes or so, i.e. "select
* from events where id > LAST_ID_I_GOT" to insert into a separate
reporting database. The problem is, this simple approach has a race
that will forever skip uncommitted events. I.e., if 5000 was
committed sooner than 4999, and we get 5000, we will never go back and
get 4999 when it finally commits. How can we solve this? Basically
it's a phantom row problem but it spans transactions.

I looked at checking the internal 'xmin' column but the docs say that
is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
value. I don't get it. All I want to is make sure I skip over any
rows that are newer than the oldest currently running transaction.
Has nobody else run into this before?

Thank you very much.

--
Karl Pickett

In reply to: Karl Pickett (#1)
Re: Can Postgres Not Do This Safely ?!?

On 29 October 2010 03:04, Karl Pickett <karl.pickett@gmail.com> wrote:

Hello Postgres Hackers,

We have a simple 'event log' table that is insert only (by multiple
concurrent clients).  It has an integer primary key.  We want to do
incremental queries of this table every 5 minutes or so, i.e. "select
* from events where id > LAST_ID_I_GOT" to insert into a separate
reporting database.  The problem is, this simple approach has a race
that will forever skip uncommitted events.  I.e., if 5000 was
committed sooner than 4999, and we get 5000, we will never go back and
get 4999 when it finally commits.  How can we solve this?  Basically
it's a phantom row problem but it spans transactions.

I looked at checking the internal 'xmin' column but the docs say that
is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
value.  I don't get it.   All I want to is make sure I skip over any
rows that are newer than the oldest currently running transaction.
Has nobody else run into this before?

If I understand your question correctly, you want a "gapless" PK:

http://www.varlena.com/GeneralBits/130.php
--
Regards,
Peter Geoghegan

#3Karl Pickett
karl.pickett@gmail.com
In reply to: Peter Geoghegan (#2)
Re: Can Postgres Not Do This Safely ?!?

n Fri, Oct 29, 2010 at 2:53 AM, Peter Geoghegan
<peter.geoghegan86@gmail.com> wrote:

On 29 October 2010 03:04, Karl Pickett <karl.pickett@gmail.com> wrote:

Hello Postgres Hackers,

We have a simple 'event log' table that is insert only (by multiple
concurrent clients).  It has an integer primary key.  We want to do
incremental queries of this table every 5 minutes or so, i.e. "select
* from events where id > LAST_ID_I_GOT" to insert into a separate
reporting database.  The problem is, this simple approach has a race
that will forever skip uncommitted events.  I.e., if 5000 was
committed sooner than 4999, and we get 5000, we will never go back and
get 4999 when it finally commits.  How can we solve this?  Basically
it's a phantom row problem but it spans transactions.

I looked at checking the internal 'xmin' column but the docs say that
is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
value.  I don't get it.   All I want to is make sure I skip over any
rows that are newer than the oldest currently running transaction.
Has nobody else run into this before?

If I understand your question correctly, you want a "gapless" PK:

http://www.varlena.com/GeneralBits/130.php
--
Regards,
Peter Geoghegan

That's interesting, but we're fine with having gaps in the range that
never appear. We also don't want to add a performance penalty for
concurrent writers. We just don't want any ids to appear (commit)
after we got a later id. To clarify, we are using a plain serial
primary key and we already have plenty of holes - that's fine. We
just want to do an incremental 'tail -f' of this giant table (along
with some joins) to feed into a reporting server every few minutes.
So we're treating it like a queue, but not deleting anything and
having absolute real-time data is not required.

It appears that theoretical options are:

1. Start a serializable transaction and wait until all earlier
transactions are gone (query pg_stat_activity or something?)
2. Ignore rows that were created later than any other in progress transactions

Both of these options assume that serials can never go backward as
they're handed out to connections / xids. I think that's safe to
assume?

Either would be fine, I just don't know if they're officially
supported by postgres.

--
Karl Pickett

#4Craig Ringer
craig@2ndquadrant.com
In reply to: Karl Pickett (#1)
Re: Can Postgres Not Do This Safely ?!?

On 10/29/2010 10:04 AM, Karl Pickett wrote:

Hello Postgres Hackers,

We have a simple 'event log' table that is insert only (by multiple
concurrent clients). It has an integer primary key. We want to do
incremental queries of this table every 5 minutes or so, i.e. "select
* from events where id> LAST_ID_I_GOT" to insert into a separate
reporting database.

Essentially, in a table populated by concurrent inserts by many
transactions which may commit out of order, you want a way to say "get
me all tuples inserted since I last asked". Or, really "get me all
tuples that became visible since I last looked".

I've never found a good answer for this. If there is one, it'd be
wonderful for trigger-free, efficient replication of individual tables
using batches. The problem is that - because of commit ordering - there
doesn't seem to be any way to match a *range* of transactions, you have
to match a *list* of individual transaction IDs that committed since you
last ran. And you need a way to generate and maintain that list,
preferably only including transactions that touched the table of interest.

I looked at checking the internal 'xmin' column but the docs say that
is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
value. I don't get it. All I want to is make sure I skip over any
rows that are newer than the oldest currently running transaction.

Oh, so you don't care if you get the same tuple multiple times if
there's some old, long running transaction? You're just trying to avoid
repeatedly grabbing the REALLY old stuff?

In that case xmin is what you want. You may have to be aware of xid
wraparound issues, but I don't know much more about dealing with them
than the term.

--
Craig Ringer

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Karl Pickett (#1)
Re: Can Postgres Not Do This Safely ?!?

On Thursday 28 October 2010 7:04:48 pm Karl Pickett wrote:

Hello Postgres Hackers,

We have a simple 'event log' table that is insert only (by multiple
concurrent clients). It has an integer primary key. We want to do
incremental queries of this table every 5 minutes or so, i.e. "select
* from events where id > LAST_ID_I_GOT" to insert into a separate
reporting database. The problem is, this simple approach has a race
that will forever skip uncommitted events. I.e., if 5000 was
committed sooner than 4999, and we get 5000, we will never go back and
get 4999 when it finally commits. How can we solve this? Basically
it's a phantom row problem but it spans transactions.

I looked at checking the internal 'xmin' column but the docs say that
is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
value. I don't get it.

http://www.postgresql.org/docs/8.4/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT-PARTS
"The internal transaction ID type (xid) is 32 bits wide and wraps around every 4
billion transactions. However, these functions export a 64-bit format that is
extended with an "epoch" counter so it will not wrap around during the life of
an installation. The data type used by these functions, txid_snapshot, stores
information about transaction ID visibility at a particular moment in time. Its
components are described in Table 9-53. "

So:
Current snapshot:

test=> SELECT txid_current_snapshot();
txid_current_snapshot
-----------------------
5098:5098:

xmin of snapshot:
test=> SELECT txid_snapshot_xmin(txid_current_snapshot());
txid_snapshot_xmin
--------------------
5098
(1 row)

All I want to is make sure I skip over any
rows that are newer than the oldest currently running transaction.
Has nobody else run into this before?

Thank you very much.

--
Karl Pickett

--
Adrian Klaver
adrian.klaver@gmail.com

#6Karl Pickett
karl.pickett@gmail.com
In reply to: Adrian Klaver (#5)
Re: Can Postgres Not Do This Safely ?!?

On Fri, Oct 29, 2010 at 8:58 AM, Adrian Klaver <adrian.klaver@gmail.com> wrote:

On Thursday 28 October 2010 7:04:48 pm Karl Pickett wrote:

Hello Postgres Hackers,

We have a simple 'event log' table that is insert only (by multiple
concurrent clients).  It has an integer primary key.  We want to do
incremental queries of this table every 5 minutes or so, i.e. "select
* from events where id > LAST_ID_I_GOT" to insert into a separate
reporting database.  The problem is, this simple approach has a race
that will forever skip uncommitted events.  I.e., if 5000 was
committed sooner than 4999, and we get 5000, we will never go back and
get 4999 when it finally commits.  How can we solve this?  Basically
it's a phantom row problem but it spans transactions.

I looked at checking the internal 'xmin' column but the docs say that
is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
value.  I don't get it.

http://www.postgresql.org/docs/8.4/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT-PARTS
"The internal transaction ID type (xid) is 32 bits wide and wraps around every 4
billion transactions. However, these functions export a 64-bit format that is
extended with an "epoch" counter so it will not wrap around during the life of
an installation. The data type used by these functions, txid_snapshot, stores
information about transaction ID visibility at a particular moment in time. Its
components are described in Table 9-53. "

So:
Current snapshot:

test=> SELECT txid_current_snapshot();
 txid_current_snapshot
-----------------------
 5098:5098:

xmin of snapshot:
test=> SELECT txid_snapshot_xmin(txid_current_snapshot());
 txid_snapshot_xmin
--------------------
              5098
(1 row)

So what happens when txid_snapshot_xmin() goes over 4 billion, and the
table's xmin doesn't? You can't compare a 32 bit value that rolls
over to a 64 bit that doesn't.

All I want to is make sure I skip over any
rows that are newer than the oldest currently running transaction.
Has nobody else run into this before?

Thank you very much.

--
Karl Pickett

--
Adrian Klaver
adrian.klaver@gmail.com

--
Karl Pickett

#7Merlin Moncure
mmoncure@gmail.com
In reply to: Karl Pickett (#1)
Re: Can Postgres Not Do This Safely ?!?

On Thu, Oct 28, 2010 at 10:04 PM, Karl Pickett <karl.pickett@gmail.com> wrote:

Hello Postgres Hackers,

We have a simple 'event log' table that is insert only (by multiple
concurrent clients).  It has an integer primary key.  We want to do
incremental queries of this table every 5 minutes or so, i.e. "select
* from events where id > LAST_ID_I_GOT" to insert into a separate
reporting database.  The problem is, this simple approach has a race
that will forever skip uncommitted events.  I.e., if 5000 was
committed sooner than 4999, and we get 5000, we will never go back and
get 4999 when it finally commits.  How can we solve this?  Basically
it's a phantom row problem but it spans transactions.

I looked at checking the internal 'xmin' column but the docs say that
is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
value.  I don't get it.   All I want to is make sure I skip over any
rows that are newer than the oldest currently running transaction.
Has nobody else run into this before?

You don't have a sequence problem so much as a wrong implementation
problem. Sequences are always *grabbed* in order but they can hit the
table out of order and there is a time lag between when the sequence
value is generated and the transaction commits. If I issue 'begin',
insert a log record, and hold the commit for 5 minutes you are going
to skip the record because you are only looking at the last processed
record. Your algorithm is going to fail if you use a sequence,
timestamp, or gapless sequence to manage your queue position. You
need to divide your log records into two logical sets, procesed and
unprocessed, and look at the set as a whole.

I would suggest staging your unprocessed records to a queue table and
having your writer consume them and move them to a processed table.
You can also look at already built queuing implementations like PGQ
written by our spectacularly skilled friends at Skype (haven't used it
myself, but I've heard it's good!).

merlin

#8Andy Colson
andy@squeakycode.net
In reply to: Merlin Moncure (#7)
Re: Can Postgres Not Do This Safely ?!?

On 10/29/2010 9:49 AM, Merlin Moncure wrote:

On Thu, Oct 28, 2010 at 10:04 PM, Karl Pickett<karl.pickett@gmail.com> wrote:

Hello Postgres Hackers,

We have a simple 'event log' table that is insert only (by multiple
concurrent clients). It has an integer primary key. We want to do
incremental queries of this table every 5 minutes or so, i.e. "select
* from events where id> LAST_ID_I_GOT" to insert into a separate
reporting database. The problem is, this simple approach has a race
that will forever skip uncommitted events. I.e., if 5000 was
committed sooner than 4999, and we get 5000, we will never go back and
get 4999 when it finally commits. How can we solve this? Basically
it's a phantom row problem but it spans transactions.

I looked at checking the internal 'xmin' column but the docs say that
is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
value. I don't get it. All I want to is make sure I skip over any
rows that are newer than the oldest currently running transaction.
Has nobody else run into this before?

You don't have a sequence problem so much as a wrong implementation
problem. Sequences are always *grabbed* in order but they can hit the
table out of order and there is a time lag between when the sequence
value is generated and the transaction commits. If I issue 'begin',
insert a log record, and hold the commit for 5 minutes you are going
to skip the record because you are only looking at the last processed
record. Your algorithm is going to fail if you use a sequence,
timestamp, or gapless sequence to manage your queue position. You
need to divide your log records into two logical sets, procesed and
unprocessed, and look at the set as a whole.

I would suggest staging your unprocessed records to a queue table and
having your writer consume them and move them to a processed table.
You can also look at already built queuing implementations like PGQ
written by our spectacularly skilled friends at Skype (haven't used it
myself, but I've heard it's good!).

merlin

Yep, you dont want a sequence. You want a flag.

add a boolean "processed" flag, default it to false.

then every 5 minutes run this:

begin
insert into logged select * from events where processed = false;
update events set processed = true where processed = false;
commit;

or, if you want to select them and do something to them:

begin
select * from events where processed = false;
... do you processing on each, which would include inserting it...
update events set processed = true where processed = false;
commit;

Just make sure you do it all in the same transaction, so the update sees
the exact same set as the select.

You could also create a function index on processed to keep track of
just those that are false.

-Andy

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Karl Pickett (#6)
Re: Can Postgres Not Do This Safely ?!?

On 10/29/2010 07:32 AM, Karl Pickett wrote:

On Fri, Oct 29, 2010 at 8:58 AM, Adrian Klaver<adrian.klaver@gmail.com> wrote:

On Thursday 28 October 2010 7:04:48 pm Karl Pickett wrote:

Hello Postgres Hackers,

We have a simple 'event log' table that is insert only (by multiple
concurrent clients). It has an integer primary key. We want to do
incremental queries of this table every 5 minutes or so, i.e. "select
* from events where id> LAST_ID_I_GOT" to insert into a separate
reporting database. The problem is, this simple approach has a race
that will forever skip uncommitted events. I.e., if 5000 was
committed sooner than 4999, and we get 5000, we will never go back and
get 4999 when it finally commits. How can we solve this? Basically
it's a phantom row problem but it spans transactions.

I looked at checking the internal 'xmin' column but the docs say that
is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
value. I don't get it.

http://www.postgresql.org/docs/8.4/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT-PARTS
"The internal transaction ID type (xid) is 32 bits wide and wraps around every 4
billion transactions. However, these functions export a 64-bit format that is
extended with an "epoch" counter so it will not wrap around during the life of
an installation. The data type used by these functions, txid_snapshot, stores
information about transaction ID visibility at a particular moment in time. Its
components are described in Table 9-53. "

So:
Current snapshot:

test=> SELECT txid_current_snapshot();
txid_current_snapshot
-----------------------
5098:5098:

xmin of snapshot:
test=> SELECT txid_snapshot_xmin(txid_current_snapshot());
txid_snapshot_xmin
--------------------
5098
(1 row)

So what happens when txid_snapshot_xmin() goes over 4 billion, and the
table's xmin doesn't? You can't compare a 32 bit value that rolls
over to a 64 bit that doesn't.

The long explanation is here:
http://www.postgresql.org/docs/9.0/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

The short version as I understand it is that if everything is working
correctly the XID(hence xmin) values exist in a continuous loop where 2
billion are in the past and 2 billion are in the future(assuming default
settings). At some point the old values are frozen i.e. replaced with a
special FrozenXID. This would mean that the *snapshot functions should
only return currently valid xmins. Since I have never rolled over a
database I can only speak to theory as I understand it.

All I want to is make sure I skip over any
rows that are newer than the oldest currently running transaction.
Has nobody else run into this before?

Thank you very much.

--
Karl Pickett

--
Adrian Klaver
adrian.klaver@gmail.com

--
Adrian Klaver
adrian.klaver@gmail.com

#10Jeff Davis
pgsql@j-davis.com
In reply to: Andy Colson (#8)
Re: Can Postgres Not Do This Safely ?!?

On Fri, 2010-10-29 at 16:57 -0500, Andy Colson wrote:

begin
insert into logged select * from events where processed = false;
update events set processed = true where processed = false;
commit;

There's a race condition there. The SELECT in the INSERT statement may
read 5 tuples, then a concurrent transaction inserts a 6th tuple, then
you do an update on all 6 tuples.

begin
select * from events where processed = false;
... do you processing on each, which would include inserting it...
update events set processed = true where processed = false;
commit;

Same problem here.

Just make sure you do it all in the same transaction, so the update sees
the exact same set as the select.

You need to use SERIALIZABLE isolation level for this to work. The
default is READ COMMITTED.

Or better yet, use Merlin's suggestion of PgQ. They've already worked
this out in a safe, efficient way. It's the basis for Londiste, a
replication system.

Regards,
Jeff Davis

#11bricklen
bricklen@gmail.com
In reply to: Jeff Davis (#10)
Re: Can Postgres Not Do This Safely ?!?

On Fri, Oct 29, 2010 at 4:59 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Fri, 2010-10-29 at 16:57 -0500, Andy Colson wrote:

begin
insert into logged select * from events where processed = false;
update events set processed = true where processed = false;
commit;

There's a race condition there. The SELECT in the INSERT statement may
read 5 tuples, then a concurrent transaction inserts a 6th tuple, then
you do an update on all 6 tuples.

begin
select * from events where processed = false;
... do you processing on each, which would include inserting it...
update events set processed = true where processed = false;
commit;

Same problem here.

Just make sure you do it all in the same transaction, so the update sees
the exact same set as the select.

Regards,
       Jeff Davis

As stated earlier in the thread, there is a race condition within that
transaction, but you could also use a temp table to get the ids that
you are about to process.
Maybe something like this (untested):

begin;
create temp table _idlist (id bigint) on commit drop as select id from
eventlog where processed is false;
insert into othertable select e.* from eventlog as e inner join
_idlist as i on (i.id=e.id);
update eventlog set processed=true from _idlist as i where eventlog.id = i.id;
commit;

#12Merlin Moncure
mmoncure@gmail.com
In reply to: Jeff Davis (#10)
Re: Can Postgres Not Do This Safely ?!?

On Fri, Oct 29, 2010 at 7:59 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Fri, 2010-10-29 at 16:57 -0500, Andy Colson wrote:

begin
insert into logged select * from events where processed = false;
update events set processed = true where processed = false;
commit;

There's a race condition there. The SELECT in the INSERT statement may
read 5 tuples, then a concurrent transaction inserts a 6th tuple, then
you do an update on all 6 tuples.

begin
select * from events where processed = false;
... do you processing on each, which would include inserting it...
update events set processed = true where processed = false;
commit;

Same problem here.

Just make sure you do it all in the same transaction, so the update sees
the exact same set as the select.

You need to use SERIALIZABLE isolation level for this to work. The
default is READ COMMITTED.

Or better yet, use Merlin's suggestion of PgQ. They've already worked
this out in a safe, efficient way. It's the basis for Londiste, a
replication system.

With multiple writers, single reader, it's usually ok to go the ad hoc
route. Just read a bunch of records, do something with them, and
delete/move them. This will become even easier when wCTE becomes
available, and you can do something like: with foo as (delete from
stuff_to_do returning * limit something), bar as (insert into stuff
done select * form foo) select do_stuff(...) from foo;

Heavy artillery like PGQ becomes necessary IMO when you have multiple
readers pulling things off the queue and doing things. Doing this in
ad hoc fashion would require separating the 'getting stuff from the
queue' and the 'doing stuff'. This is harder than it looks, and the
skytools people have apparently got it completely worked out (adding a
3rd party dependency is never something to be taken lightly though).
I'm a little old school in the way I do things -- I try to work the
problem down to where the simpler solutions work. I do a *lot* of
batch processing though, and maybe I should put learning PGQ on my
radar.

merlin