select where not exists returning multiple rows?

Started by Chris Dumoulinover 14 years ago6 messagesgeneral

chris@blaze.io

over 14 years ago

We're using postgresql 9.1, and we've got a table that looks like this:

testdb=# \d item
Table "public.item"
Column | Type | Modifiers
-------+----------+-----------
sig | bigint | not null
type | smallint |
data | text |
Indexes:
"item_pkey" PRIMARY KEY, btree (sig)

And we're doing an insert like this:
INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS (
SELECT NULL FROM Item WHERE Sig=$4)

In this case $1 and $4 should always be the same. The idea is to insert
if the row doesn't already exist.
We're getting primary key constraint violations:

011-10-31 22:50:26 CDT STATEMENT: INSERT INTO Item (Sig, Type, Data)
SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$4
FOR UPDATE)
2011-10-31 22:52:56 CDT ERROR: duplicate key value violates unique
constraint "item_pkey"
2011-10-31 22:52:56 CDT DETAIL: Key (sig)=(-4668668895560071572)
already exists.

I don't see how it's possible to get duplicate rows here, unless maybe
the "select where not exists" is somehow returning multiple rows.
Any ideas what's going on here?

Thanks,
Chris

mmoncure@gmail.com

over 14 years ago

In reply to: Chris Dumoulin (#1)

Re: select where not exists returning multiple rows?

On Wed, Nov 2, 2011 at 6:22 AM, Chris Dumoulin <chris@blaze.io> wrote:

We're using postgresql 9.1, and we've got a table that looks like this:

testdb=# \d item
Table "public.item"
Column | Type | Modifiers
-------+----------+-----------
sig | bigint | not null
type | smallint |
data | text |
Indexes:
"item_pkey" PRIMARY KEY, btree (sig)

And we're doing an insert like this:
INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT
NULL FROM Item WHERE Sig=$4)

In this case $1 and $4 should always be the same. The idea is to insert if
the row doesn't already exist.
We're getting primary key constraint violations:

011-10-31 22:50:26 CDT STATEMENT: INSERT INTO Item (Sig, Type, Data) SELECT
$1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$4 FOR UPDATE)
2011-10-31 22:52:56 CDT ERROR: duplicate key value violates unique
constraint "item_pkey"
2011-10-31 22:52:56 CDT DETAIL: Key (sig)=(-4668668895560071572) already
exists.

I don't see how it's possible to get duplicate rows here, unless maybe the
"select where not exists" is somehow returning multiple rows.
Any ideas what's going on here?

race condition. lock the table first or retry the insert.

merlin

chris@blaze.io

over 14 years ago

In reply to: Merlin Moncure (#2)

Re: select where not exists returning multiple rows?

On 11-11-02 08:49 AM, Merlin Moncure wrote:

On Wed, Nov 2, 2011 at 6:22 AM, Chris Dumoulin<chris@blaze.io> wrote:

We're using postgresql 9.1, and we've got a table that looks like this:

testdb=# \d item
Table "public.item"
Column | Type | Modifiers
-------+----------+-----------
sig | bigint | not null
type | smallint |
data | text |
Indexes:
"item_pkey" PRIMARY KEY, btree (sig)

And we're doing an insert like this:
INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS ( SELECT
NULL FROM Item WHERE Sig=$4)

In this case $1 and $4 should always be the same. The idea is to insert if
the row doesn't already exist.
We're getting primary key constraint violations:

011-10-31 22:50:26 CDT STATEMENT: INSERT INTO Item (Sig, Type, Data) SELECT
$1,$2,$3 WHERE NOT EXISTS ( SELECT NULL FROM Item WHERE Sig=$4 FOR UPDATE)
2011-10-31 22:52:56 CDT ERROR: duplicate key value violates unique
constraint "item_pkey"
2011-10-31 22:52:56 CDT DETAIL: Key (sig)=(-4668668895560071572) already
exists.

I don't see how it's possible to get duplicate rows here, unless maybe the
"select where not exists" is somehow returning multiple rows.
Any ideas what's going on here?

race condition. lock the table first or retry the insert.

merlin

Could you elaborate a little more on the race condition? Are you
suggesting that if two threads executed this statement at the same time,
the results from the inner "SELECT NULL ..." in one of the threads could
be incorrect by the time that thread did the INSERT? I thought about
this possibility and tried "SELECT NULL ... FOR UPDATE", but still saw
the same problem.

Thanks,
Chris

Martijn van Oosterhout

kleptog@svana.org

over 14 years ago

In reply to: Chris Dumoulin (#1)

Re: select where not exists returning multiple rows?

On Wed, Nov 02, 2011 at 07:22:09AM -0400, Chris Dumoulin wrote:

And we're doing an insert like this:
INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
( SELECT NULL FROM Item WHERE Sig=$4)

In this case $1 and $4 should always be the same.

FWIW, If they're always going to be the same, you can put that it the query,
like so:

INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
( SELECT NULL FROM Item WHERE Sig=$1)

Saves a parameter.

I don't see how it's possible to get duplicate rows here, unless
maybe the "select where not exists" is somehow returning multiple
rows.
Any ideas what's going on here?

As pointed out by others, you don't say if it this is a race condition
between processes or if it always does this.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.

-- Arthur Schopenhauer

chris@blaze.io

over 14 years ago

In reply to: Martijn van Oosterhout (#4)

Re: select where not exists returning multiple rows?

On 11-11-02 09:13 AM, Martijn van Oosterhout wrote:

On Wed, Nov 02, 2011 at 07:22:09AM -0400, Chris Dumoulin wrote:

And we're doing an insert like this:
INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
( SELECT NULL FROM Item WHERE Sig=$4)

In this case $1 and $4 should always be the same.

FWIW, If they're always going to be the same, you can put that it the query,
like so:

INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
( SELECT NULL FROM Item WHERE Sig=$1)

Saves a parameter.

I don't see how it's possible to get duplicate rows here, unless
maybe the "select where not exists" is somehow returning multiple
rows.
Any ideas what's going on here?

As pointed out by others, you don't say if it this is a race condition
between processes or if it always does this.

It's only happening intermittently, but it doesn't appear to be a race
condition; I'm pretty sure there's only one thread or process issuing
this statement.

Thanks,
Chris

Show quoted text

Have a nice day,

mmoncure@gmail.com

over 14 years ago

In reply to: Chris Dumoulin (#5)

Re: select where not exists returning multiple rows?

On Wed, Nov 2, 2011 at 8:20 AM, Chris Dumoulin <chris@blaze.io> wrote:

On 11-11-02 09:13 AM, Martijn van Oosterhout wrote:

On Wed, Nov 02, 2011 at 07:22:09AM -0400, Chris Dumoulin wrote:

And we're doing an insert like this:
INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
( SELECT NULL FROM Item WHERE Sig=$4)

In this case $1 and $4 should always be the same.

FWIW, If they're always going to be the same, you can put that it the
query,
like so:

INSERT INTO Item (Sig, Type, Data) SELECT $1,$2,$3 WHERE NOT EXISTS
( SELECT NULL FROM Item WHERE Sig=$1)

Saves a parameter.

I don't see how it's possible to get duplicate rows here, unless
maybe the "select where not exists" is somehow returning multiple
rows.
Any ideas what's going on here?

As pointed out by others, you don't say if it this is a race condition
between processes or if it always does this.

It's only happening intermittently, but it doesn't appear to be a race
condition; I'm pretty sure there's only one thread or process issuing this
statement.

Pretty sure? you need to be 100% sure. *Somebody* was worried about
concurrency in the code, because the actual statement in the log has
'FOR UPDATE' -- your example does not. Intermittent failures is
classic race condition behavior. The reason for the race is that your
select happens before the insert does so that process A and B can
select at approximately the same time and both make the decision to
insert on the same key...bam. Logging all statements will positively
prove this.

select <constants> where exists ... does not return > 1 rows ever and
there is precisely 0% chance you've uncovered a server bug that is
causing it to :-).

solve the problem by:
a: LOCK the table before making the insert, making sure to wrap the
lock and the insert in the same transaction (this should be the
default method)
b. retry the transaction on failure in the client
c. or on the server if you push the insert into a function.

merlin