On duplicate ignore

Started by Gnanakumarabout 14 years ago13 messagesgeneral

gnanam@zoniac.com

about 14 years ago

Hi,

Ours is a web-based application. We're trying to implement ON DUPLICATE
IGNORE for one of our application table, named EMAILLIST. After a quick
Google search, I'm finding the following "easy & convenient" single SQL
statement syntax to follow with:

INSERT INTO EMAILLIST (EMAIL)
SELECT 'john@example.net'
WHERE NOT EXISTS (SELECT 1 FROM EMAILLIST WHERE EMAIL =
'john@example.net');

My question is, in a single threaded INSERT, this will *definitely* work.
Since ours is a web-based application, will this work out in a concurrent
multi-threaded environment too? In other words, will it be truly unique
when INSERT calls are concurrent?

Regards,
Gnanam

Atul Goel

Atul.Goel@iggroup.com

about 14 years ago

In reply to: Gnanakumar (#1)

Re: On duplicate ignore

Just create a unique index on EMAIL column and handle error if it comes

Thanks,
Atul Goel

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Gnanakumar
Sent: 18 January 2012 11:04
To: pgsql-general@postgresql.org
Subject: [GENERAL] On duplicate ignore

Hi,

Ours is a web-based application. We're trying to implement ON DUPLICATE IGNORE for one of our application table, named EMAILLIST. After a quick Google search, I'm finding the following "easy & convenient" single SQL statement syntax to follow with:

INSERT INTO EMAILLIST (EMAIL)
SELECT 'john@example.net'
WHERE NOT EXISTS (SELECT 1 FROM EMAILLIST WHERE EMAIL = 'john@example.net');

My question is, in a single threaded INSERT, this will *definitely* work.
Since ours is a web-based application, will this work out in a concurrent multi-threaded environment too? In other words, will it be truly unique when INSERT calls are concurrent?

Regards,
Gnanam

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44 (0)20 7896 0011) and then delete the email and any copies of it. Opinions, conclusions (etc) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG Group Holdings plc is a company registered in England and Wales under number 04677092. VAT registration number 761 2978 07. Registered Office: Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Listed on the London Stock Exchange. Its subsidiaries IG Markets Limited and IG Index Limited are authorised and regulated by the Financial Services Authority (IG Markets Limited FSA registration number 195355 and IG Index Limited FSA registration number 114059).

Gnanakumar

gnanam@zoniac.com

about 14 years ago

In reply to: Atul Goel (#2)

Re: On duplicate ignore

Just create a unique index on EMAIL column and handle error if it comes

Thanks for your suggestion. Of course, I do understand that this could be
enforced/imposed at the database-level at any time. But I'm trying to find
out whether this could be solved at the application layer itself. Any
thoughts/ideas?

Dmitriy Igrishin

dmitigr@gmail.com

about 14 years ago

In reply to: Gnanakumar (#3)

Re: On duplicate ignore

Hey Gnanakumar,

2012/1/18 Gnanakumar <gnanam@zoniac.com>

Just create a unique index on EMAIL column and handle error if it comes

Thanks for your suggestion. Of course, I do understand that this could be
enforced/imposed at the database-level at any time. But I'm trying to find
out whether this could be solved at the application layer itself. Any
thoughts/ideas?

Exactly at the application level you just need to ignore
an unique constraint violation error reported by the backend.
You may also wrap INSERT statement in the PL/pgSQL
function or in the DO statement and catch the exception
generated by the backend.

// Dmitriy.

Atul Goel

Atul.Goel@iggroup.com

about 14 years ago

In reply to: Gnanakumar (#3)

Re: On duplicate ignore

No way you can make sure at application level. Think in sense of an uncommitted row and other session inserting at the same moment in time.

Thanks,
Atul Goel

-----Original Message-----
From: Gnanakumar [mailto:gnanam@zoniac.com]
Sent: 18 January 2012 12:59
To: Atul Goel; pgsql-general@postgresql.org
Subject: RE: [GENERAL] On duplicate ignore

Just create a unique index on EMAIL column and handle error if it
comes

Thanks for your suggestion. Of course, I do understand that this could be enforced/imposed at the database-level at any time. But I'm trying to find out whether this could be solved at the application layer itself. Any thoughts/ideas?

Florian Weimer

fweimer@bfk.de

about 14 years ago

In reply to: Gnanakumar (#3)

Re: On duplicate ignore

* Gnanakumar:

Just create a unique index on EMAIL column and handle error if it comes

Thanks for your suggestion. Of course, I do understand that this could be
enforced/imposed at the database-level at any time. But I'm trying to find
out whether this could be solved at the application layer itself. Any
thoughts/ideas?

If you use serializable transactions in PostgreSQL 9.1, you can
implement such constraints in the application without additional
locking. However, with concurrent writes and without an index, the rate
of detected serialization violations and resulting transactions aborts
will be high.

--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

Scott Marlowe

scott.marlowe@gmail.com

about 14 years ago

In reply to: Florian Weimer (#6)

Re: On duplicate ignore

On Thu, Jan 19, 2012 at 7:54 AM, Florian Weimer <fweimer@bfk.de> wrote:

* Gnanakumar:

Just create a unique index on EMAIL column and handle error if it comes

Thanks for your suggestion. Of course, I do understand that this could be
enforced/imposed at the database-level at any time. But I'm trying to find
out whether this could be solved at the application layer itself. Any
thoughts/ideas?

If you use serializable transactions in PostgreSQL 9.1, you can
implement such constraints in the application without additional
locking. However, with concurrent writes and without an index, the rate
of detected serialization violations and resulting transactions aborts
will be high.

No, you sadly can't. PostgreSQL doesn't yet support proper predicate
locking to allow the application to be sure that the OP's original
statement, and ones like it, don't have a race condition. A unique
index is the only way to be sure.

Scott Marlowe

scott.marlowe@gmail.com

about 14 years ago

In reply to: Scott Marlowe (#7)

Re: On duplicate ignore

On Thu, Jan 19, 2012 at 9:49 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Thu, Jan 19, 2012 at 7:54 AM, Florian Weimer <fweimer@bfk.de> wrote:

* Gnanakumar:

Just create a unique index on EMAIL column and handle error if it comes

Thanks for your suggestion. Of course, I do understand that this could be
enforced/imposed at the database-level at any time. But I'm trying to find
out whether this could be solved at the application layer itself. Any
thoughts/ideas?

If you use serializable transactions in PostgreSQL 9.1, you can
implement such constraints in the application without additional
locking. However, with concurrent writes and without an index, the rate
of detected serialization violations and resulting transactions aborts
will be high.

No, you sadly can't. PostgreSQL doesn't yet support proper predicate
locking to allow the application to be sure that the OP's original
statement, and ones like it, don't have a race condition. A unique
index is the only way to be sure.

Wait, did 9.1 implement proper predicate locking to allow this? If so
I apologize for being out of the loop on the new versions.

Florian Weimer

fweimer@bfk.de

about 14 years ago

In reply to: Scott Marlowe (#7)

Re: On duplicate ignore

* Scott Marlowe:

On Thu, Jan 19, 2012 at 7:54 AM, Florian Weimer <fweimer@bfk.de> wrote:

* Gnanakumar:

Just create a unique index on EMAIL column and handle error if it comes

Thanks for your suggestion. Of course, I do understand that this could be
enforced/imposed at the database-level at any time. But I'm trying to find
out whether this could be solved at the application layer itself. Any
thoughts/ideas?

If you use serializable transactions in PostgreSQL 9.1, you can
implement such constraints in the application without additional
locking. However, with concurrent writes and without an index, the rate
of detected serialization violations and resulting transactions aborts
will be high.

No, you sadly can't. PostgreSQL doesn't yet support proper predicate
locking to allow the application to be sure that the OP's original
statement, and ones like it, don't have a race condition. A unique
index is the only way to be sure.

Huh? This was one of the major new features in PostgreSQL 9.1.

--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

#10

Lincoln Yeoh

lyeoh@pop.jaring.my

about 14 years ago

In reply to: Florian Weimer (#6)

Re: On duplicate ignore

At 10:54 PM 1/19/2012, Florian Weimer wrote:

* Gnanakumar:

Just create a unique index on EMAIL column and handle error if it comes

Thanks for your suggestion. Of course, I do understand that this could be
enforced/imposed at the database-level at any time. But I'm trying to find
out whether this could be solved at the application layer itself. Any
thoughts/ideas?

If you use serializable transactions in PostgreSQL 9.1, you can
implement such constraints in the application without additional
locking. However, with concurrent writes and without an index, the rate
of detected serialization violations and resulting transactions aborts
will be high.

Would writing application-side code to handle those transaction
aborts in 9.1 be much easier than writing code to handle transaction
aborts/DB exceptions due to unique constraint violations? These
transaction aborts have to be handled differently (e.g. retried for X
seconds/Y tries) from other sort of transaction aborts (not retried).

Otherwise I don't see the benefit of this feature for this scenario.
Unless of course you get significantly better performance by not
having a unique constraint.

If insert performance is not an issue and code simplicity is
preferred, one could lock the table (with an exclusive lock mode),
then do the selects and inserts, that way your code can assume that
any transaction aborts are due to actual problems rather than
concurrency. Which often means less code to write :).

Regards,
Link.

#11

Florian Weimer

fweimer@bfk.de

about 14 years ago

In reply to: Gnanakumar (#1)

Re: On duplicate ignore

* Lincoln Yeoh:

If you use serializable transactions in PostgreSQL 9.1, you can
implement such constraints in the application without additional
locking. However, with concurrent writes and without an index, the rate
of detected serialization violations and resulting transactions aborts
will be high.

Would writing application-side code to handle those transaction aborts
in 9.1 be much easier than writing code to handle transaction
aborts/DB exceptions due to unique constraint violations? These
transaction aborts have to be handled differently (e.g. retried for X
seconds/Y tries) from other sort of transaction aborts (not retried).

There's a separate error code, so it's easier to deal with in theory.
However, I don't think that's sufficient justification for removing the
unique constraints.

Otherwise I don't see the benefit of this feature for this
scenario. Unless of course you get significantly better performance by
not having a unique constraint.

Performance is worse.

If insert performance is not an issue and code simplicity is
preferred, one could lock the table (with an exclusive lock mode),
then do the selects and inserts, that way your code can assume that
any transaction aborts are due to actual problems rather than
concurrency. Which often means less code to write :).

Choosing the right lock is a bit tricky because you usually want to
block INSERTs only. Explicit locks on a hash of the unique column,
using pg_advisory_xact_lock, are often an alternative.

--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

Import Notes

Reply to msg id not found: 82aa5iajy4.fsf@totally-fudged-out-message-id

#12

Lincoln Yeoh

lyeoh@pop.jaring.my

about 14 years ago

In reply to: Florian Weimer (#11)

Re: On duplicate ignore

At 04:27 PM 1/20/2012, Florian Weimer wrote:

* Lincoln Yeoh:

If you use serializable transactions in PostgreSQL 9.1, you can
implement such constraints in the application without additional
locking. However, with concurrent writes and without an index, the rate
of detected serialization violations and resulting transactions aborts
will be high.

Would writing application-side code to handle those transaction aborts
in 9.1 be much easier than writing code to handle transaction
aborts/DB exceptions due to unique constraint violations? These
transaction aborts have to be handled differently (e.g. retried for X
seconds/Y tries) from other sort of transaction aborts (not retried).

There's a separate error code, so it's easier to deal with in theory.

Is there a simple way to get postgresql to retry a transaction, or
does the application have to actually reissue all the necessary
statements again?

I'd personally prefer to use locking and selects to avoid transaction
aborts whether due to unique constraint violations or due to
serialization violations.

But I'm lazy ;).

Regards,
Link.

#13

Florian Weimer

fweimer@bfk.de

about 14 years ago

In reply to: Gnanakumar (#1)

Re: On duplicate ignore

* Lincoln Yeoh:

Is there a simple way to get postgresql to retry a transaction, or
does the application have to actually reissue all the necessary
statements again?

The application has to re-run the transaction, which might result in the
execution of different statements. In the INSERT-or-UPDATE case, the
new attempt will have to use an UPDATE instead of an INSERT, so replying
the statements verbatim will not work.

I'd personally prefer to use locking and selects to avoid transaction
aborts whether due to unique constraint violations or due to
serialization violations.

Once you address the restart issue, transactional code is simpler and
easier to check for correctness.

Restarting transactions has other benefits, too. For instance, you can
restart your PostgreSQL server process, and your applications will just
keep running.

--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

Import Notes

Reply to msg id not found: 82aa5i8v4b.fsf@totally-fudged-out-message-id