Simple Atomic Relationship Insert

Started by Robert DiFalcoabout 11 years ago28 messagesgeneral
Jump to latest
#1Robert DiFalco
robert.difalco@gmail.com

Let's say I have two tables like this (I'm leaving stuff out for
simplicity):

CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE hometowns (
id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'),
name VARCHAR,
PRIMARY KEY (id),
UNIQUE(name)
);

CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE users (
id BIGINT DEFAULT nextval('USER_SEQ_GEN'),
hometown_id INTEGER,
name VARCHAR NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (hometown_id) REFERENCES hometowns(id)
);

The hometowns table is populate as users are created. For example, a
client may submit {"name":"Robert", "hometown":"Portland"}.

The hometowns table will never be updated, only either queries or inserted.

So given this I need to INSERT a row into "users" and either SELECT the
hometowns.id that matches "Portland" or if it doesn't exist I INSERT it
returning the hometowns.id".

Normally I would do by first doing a SELECT on hometown. If I don't get
anything I do an INSERT into hometown RETURNING the id. If THAT throws an
error then I do the SELECT again. Now I'm finally ready to INSERT into
users using the hometowns.id from the above steps.

But wow, that seems like a lot of code for a simple "Add if doesn't exist"
foreign key relationship -- but this is how I've always done.

So my question. Is there a simpler, more performant, or thread-safe way to
do this?

Thanks!

#2John McKown
john.archie.mckown@gmail.com
In reply to: Robert DiFalco (#1)
Re: Simple Atomic Relationship Insert

On Tue, Jan 13, 2015 at 10:42 AM, Robert DiFalco <robert.difalco@gmail.com>
wrote:

Let's say I have two tables like this (I'm leaving stuff out for
simplicity):

CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE hometowns (
id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'),
name VARCHAR,
PRIMARY KEY (id),
UNIQUE(name)
);

CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE users (
id BIGINT DEFAULT nextval('USER_SEQ_GEN'),
hometown_id INTEGER,
name VARCHAR NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (hometown_id) REFERENCES hometowns(id)
);

The hometowns table is populate as users are created. For example, a
client may submit {"name":"Robert", "hometown":"Portland"}.

The hometowns table will never be updated, only either queries or inserted.

So given this I need to INSERT a row into "users" and either SELECT the
hometowns.id that matches "Portland" or if it doesn't exist I INSERT it
returning the hometowns.id".

Normally I would do by first doing a SELECT on hometown. If I don't get
anything I do an INSERT into hometown RETURNING the id. If THAT throws an
error then I do the SELECT again. Now I'm finally ready to INSERT into
users using the hometowns.id from the above steps.

But wow, that seems like a lot of code for a simple "Add if doesn't exist"
foreign key relationship -- but this is how I've always done.

So my question. Is there a simpler, more performant, or thread-safe way to
do this?

Thanks!

​What occurs to me is to simply do an INSERT into the "hometowns" table and
just ignore the "already exists" return indication. Then do a SELECT to get
the hometowns​ id which now exists, then INSERT the users. but I could
easily be overlooking some reason why this wouldn't work properly.

--

While a transcendent vocabulary is laudable, one must be eternally careful
so that the calculated objective of communication does not become ensconced
in obscurity. In other words, eschew obfuscation.

111,111,111 x 111,111,111 = 12,345,678,987,654,321

Maranatha! <><
John McKown

#3Robert DiFalco
robert.difalco@gmail.com
In reply to: John McKown (#2)
Re: Simple Atomic Relationship Insert

Thanks John. I've been seeing a lot of examples like this lately. Does the
following approach have any advantages over traditional approaches?

WITH sel AS (
SELECT id FROM hometowns WHERE name = 'Portland'
), ins AS (
INSERT INTO hometowns(name)
SELECT 'Portland'
WHERE NOT EXISTS (SELECT 1 FROM sel)
RETURNING id
)
INSERT INTO users(name, hometown_id)
VALUES ('Robert', SELECT id FROM ins UNION ALL SELECT id FROM sel);

On Tue, Jan 13, 2015 at 8:50 AM, John McKown <john.archie.mckown@gmail.com>
wrote:

Show quoted text

On Tue, Jan 13, 2015 at 10:42 AM, Robert DiFalco <robert.difalco@gmail.com

wrote:

Let's say I have two tables like this (I'm leaving stuff out for
simplicity):

CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE hometowns (
id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'),
name VARCHAR,
PRIMARY KEY (id),
UNIQUE(name)
);

CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE users (
id BIGINT DEFAULT nextval('USER_SEQ_GEN'),
hometown_id INTEGER,
name VARCHAR NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (hometown_id) REFERENCES hometowns(id)
);

The hometowns table is populate as users are created. For example, a
client may submit {"name":"Robert", "hometown":"Portland"}.

The hometowns table will never be updated, only either queries or
inserted.

So given this I need to INSERT a row into "users" and either SELECT the
hometowns.id that matches "Portland" or if it doesn't exist I INSERT it
returning the hometowns.id".

Normally I would do by first doing a SELECT on hometown. If I don't get
anything I do an INSERT into hometown RETURNING the id. If THAT throws an
error then I do the SELECT again. Now I'm finally ready to INSERT into
users using the hometowns.id from the above steps.

But wow, that seems like a lot of code for a simple "Add if doesn't
exist" foreign key relationship -- but this is how I've always done.

So my question. Is there a simpler, more performant, or thread-safe way
to do this?

Thanks!

​What occurs to me is to simply do an INSERT into the "hometowns" table
and just ignore the "already exists" return indication. Then do a SELECT to
get the hometowns​ id which now exists, then INSERT the users. but I could
easily be overlooking some reason why this wouldn't work properly.

--

While a transcendent vocabulary is laudable, one must be eternally careful
so that the calculated objective of communication does not become ensconced
in obscurity. In other words, eschew obfuscation.

111,111,111 x 111,111,111 = 12,345,678,987,654,321

Maranatha! <><
John McKown

#4John McKown
john.archie.mckown@gmail.com
In reply to: Robert DiFalco (#3)
Re: Simple Atomic Relationship Insert

On Tue, Jan 13, 2015 at 11:45 AM, Robert DiFalco <robert.difalco@gmail.com>
wrote:

Thanks John. I've been seeing a lot of examples like this lately. Does the
following approach have any advantages over traditional approaches?
​​

WITH sel AS (
SELECT id FROM hometowns WHERE name = 'Portland'
), ins AS (
INSERT INTO hometowns(name)
SELECT 'Portland'
WHERE NOT EXISTS (SELECT 1 FROM sel)
RETURNING id
)
INSERT INTO users(name, hometown_id)
VALUES ('Robert', SELECT id FROM ins UNION ALL SELECT id FROM sel);

​Oh, that is very clever. I've not see such a thing before. Thanks.​

​I've added it to my stable of "tricks". Which aren't really tricks, just
really nice new methods to do something.

​The main advantage that I can see is that it is a single SQL statement to
send to the server. That makes it "self contained" so that it would be more
difficult for someone to accidentally mess it up. On the other hand, CTEs
are still a bit new (at least to me) and so the "why it works" might not be
very obvious to other programmers who might need to maintain the
application.​ To many this "lack of obviousness" is a detriment. To me, it
means "update your knowledge". But then, I am sometimes a arrogant BOFH.
Add that to my being an surly old curmudgeon, and you can end up with some
bad advice when in a "corporate" environment. The minus, at present, is
that it is "clever" and so may violate corporate coding standards due to
"complexity". Or maybe I just work for a staid company.

--

While a transcendent vocabulary is laudable, one must be eternally careful
so that the calculated objective of communication does not become ensconced
in obscurity. In other words, eschew obfuscation.

111,111,111 x 111,111,111 = 12,345,678,987,654,321

Maranatha! <><
John McKown

#5Robert DiFalco
robert.difalco@gmail.com
In reply to: John McKown (#4)
Re: Simple Atomic Relationship Insert

This CTE approach doesn't appear to play well with multiple concurrent
transactions/connections.

On Tue, Jan 13, 2015 at 10:05 AM, John McKown <john.archie.mckown@gmail.com>
wrote:

Show quoted text

On Tue, Jan 13, 2015 at 11:45 AM, Robert DiFalco <robert.difalco@gmail.com

wrote:

Thanks John. I've been seeing a lot of examples like this lately. Does
the following approach have any advantages over traditional approaches?
​​

WITH sel AS (
SELECT id FROM hometowns WHERE name = 'Portland'
), ins AS (
INSERT INTO hometowns(name)
SELECT 'Portland'
WHERE NOT EXISTS (SELECT 1 FROM sel)
RETURNING id
)
INSERT INTO users(name, hometown_id)
VALUES ('Robert', SELECT id FROM ins UNION ALL SELECT id FROM sel);

​Oh, that is very clever. I've not see such a thing before. Thanks.​

​I've added it to my stable of "tricks". Which aren't really tricks, just
really nice new methods to do something.

​The main advantage that I can see is that it is a single SQL statement to
send to the server. That makes it "self contained" so that it would be more
difficult for someone to accidentally mess it up. On the other hand, CTEs
are still a bit new (at least to me) and so the "why it works" might not be
very obvious to other programmers who might need to maintain the
application.​ To many this "lack of obviousness" is a detriment. To me, it
means "update your knowledge". But then, I am sometimes a arrogant BOFH.
Add that to my being an surly old curmudgeon, and you can end up with some
bad advice when in a "corporate" environment. The minus, at present, is
that it is "clever" and so may violate corporate coding standards due to
"complexity". Or maybe I just work for a staid company.

--

While a transcendent vocabulary is laudable, one must be eternally careful
so that the calculated objective of communication does not become ensconced
in obscurity. In other words, eschew obfuscation.

111,111,111 x 111,111,111 = 12,345,678,987,654,321

Maranatha! <><
John McKown

#6Brian Dunavant
brian@omniti.com
In reply to: Robert DiFalco (#5)
Re: Simple Atomic Relationship Insert

What issue are you having? I'd imagine you have a race condition on
the insert into hometowns, but you'd have that same race condition in
your app code using a more traditional 3 query version as well.

I often use CTEs like this to make things atomic. It allows me to
remove transactional code out of the app and also to increase
performance by reducing the back-and-forth to the db.
http://omniti.com/seeds/writable-ctes-improve-performance

On Tue, Jan 13, 2015 at 4:21 PM, Robert DiFalco
<robert.difalco@gmail.com> wrote:

This CTE approach doesn't appear to play well with multiple concurrent
transactions/connections.

On Tue, Jan 13, 2015 at 10:05 AM, John McKown <john.archie.mckown@gmail.com>
wrote:

On Tue, Jan 13, 2015 at 11:45 AM, Robert DiFalco
<robert.difalco@gmail.com> wrote:

Thanks John. I've been seeing a lot of examples like this lately. Does
the following approach have any advantages over traditional approaches?

WITH sel AS (
SELECT id FROM hometowns WHERE name = 'Portland'
), ins AS (
INSERT INTO hometowns(name)
SELECT 'Portland'
WHERE NOT EXISTS (SELECT 1 FROM sel)
RETURNING id
)
INSERT INTO users(name, hometown_id)
VALUES ('Robert', SELECT id FROM ins UNION ALL SELECT id FROM sel);

Oh, that is very clever. I've not see such a thing before. Thanks.

I've added it to my stable of "tricks". Which aren't really tricks, just
really nice new methods to do something.

The main advantage that I can see is that it is a single SQL statement to
send to the server. That makes it "self contained" so that it would be more
difficult for someone to accidentally mess it up. On the other hand, CTEs
are still a bit new (at least to me) and so the "why it works" might not be
very obvious to other programmers who might need to maintain the
application. To many this "lack of obviousness" is a detriment. To me, it
means "update your knowledge". But then, I am sometimes a arrogant BOFH. Add
that to my being an surly old curmudgeon, and you can end up with some bad
advice when in a "corporate" environment. The minus, at present, is that it
is "clever" and so may violate corporate coding standards due to
"complexity". Or maybe I just work for a staid company.

--
While a transcendent vocabulary is laudable, one must be eternally careful
so that the calculated objective of communication does not become ensconced
in obscurity. In other words, eschew obfuscation.

111,111,111 x 111,111,111 = 12,345,678,987,654,321

Maranatha! <><
John McKown

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Robert DiFalco
robert.difalco@gmail.com
In reply to: Brian Dunavant (#6)
Re: Simple Atomic Relationship Insert

Well, traditionally I would create a LOOP where I tried the SELECT, if
there was nothing I did the INSERT, if that raised an exception I would
repeat the LOOP.

What's the best way to do it with the CTE? Currently I have the following
which gives me Duplicate Key Exceptions when two sessions try to insert the
same record at the same time.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
RETURNS INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
RETURN hometown_id;
END;
$ LANGUAGE plpgsql;

And that is no bueno. Should I just put the whole thing in a LOOP?

#8Brian Dunavant
brian@omniti.com
In reply to: Robert DiFalco (#7)
Re: Simple Atomic Relationship Insert

With the single CTE I don't believe you can do a full upsert loop. If
you're doing this inside of a postgres function, your changes are
already atomic, so I don't believe by switching you are buying
yourself much (if anything) by using a CTE query instead of something
more traditional here.

The advantages of switching to a CTE would be if this code was all
being done inside of the app code with multiple queries.

On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
<robert.difalco@gmail.com> wrote:

Well, traditionally I would create a LOOP where I tried the SELECT, if there
was nothing I did the INSERT, if that raised an exception I would repeat the
LOOP.

What's the best way to do it with the CTE? Currently I have the following
which gives me Duplicate Key Exceptions when two sessions try to insert the
same record at the same time.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
RETURN hometown_id;
END;
$ LANGUAGE plpgsql;

And that is no bueno. Should I just put the whole thing in a LOOP?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Robert DiFalco
robert.difalco@gmail.com
In reply to: Brian Dunavant (#8)
Re: Simple Atomic Relationship Insert

This seems to get rid of the INSERT race condition.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
RETURNS INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
LOOP
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
RETURN hometown_id;

EXCEPTION WHEN unique_violation
THEN
END;
END LOOP;
END;
$ LANGUAGE plpgsql;

On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant <brian@omniti.com> wrote:

Show quoted text

With the single CTE I don't believe you can do a full upsert loop. If
you're doing this inside of a postgres function, your changes are
already atomic, so I don't believe by switching you are buying
yourself much (if anything) by using a CTE query instead of something
more traditional here.

The advantages of switching to a CTE would be if this code was all
being done inside of the app code with multiple queries.

On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
<robert.difalco@gmail.com> wrote:

Well, traditionally I would create a LOOP where I tried the SELECT, if

there

was nothing I did the INSERT, if that raised an exception I would repeat

the

LOOP.

What's the best way to do it with the CTE? Currently I have the following
which gives me Duplicate Key Exceptions when two sessions try to insert

the

same record at the same time.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)

RETURNS

INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
RETURN hometown_id;
END;
$ LANGUAGE plpgsql;

And that is no bueno. Should I just put the whole thing in a LOOP?

#10Brian Dunavant
brian@omniti.com
In reply to: Robert DiFalco (#9)
Re: Simple Atomic Relationship Insert

The loop to run it twice handles that yes. I don't think that buys
you anything over a more traditional non-cte method though. I'd run
them a few thousand times to see if there's any difference in runtimes
but my guess is the CTE version would be slightly slower here.

v_id integer;
BEGIN;
select id into v_id from hometowns where name = hometown_name;
BEGIN
insert into hometowns (name)
select hometown_name where v_id is null
returning id into v_id;
EXCEPTION WHEN unique_violation
THEN
select id into v_id from hometowns where name = hometown_name;
END;
insert into users (name, hometown_id)
values ('Robert', v_id);
END;

On Tue, Jan 13, 2015 at 5:06 PM, Robert DiFalco
<robert.difalco@gmail.com> wrote:

This seems to get rid of the INSERT race condition.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
LOOP
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
RETURN hometown_id;

EXCEPTION WHEN unique_violation
THEN
END;
END LOOP;
END;
$ LANGUAGE plpgsql;

On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant <brian@omniti.com> wrote:

With the single CTE I don't believe you can do a full upsert loop. If
you're doing this inside of a postgres function, your changes are
already atomic, so I don't believe by switching you are buying
yourself much (if anything) by using a CTE query instead of something
more traditional here.

The advantages of switching to a CTE would be if this code was all
being done inside of the app code with multiple queries.

On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
<robert.difalco@gmail.com> wrote:

Well, traditionally I would create a LOOP where I tried the SELECT, if
there
was nothing I did the INSERT, if that raised an exception I would repeat
the
LOOP.

What's the best way to do it with the CTE? Currently I have the
following
which gives me Duplicate Key Exceptions when two sessions try to insert
the
same record at the same time.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
RETURNS
INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
RETURN hometown_id;
END;
$ LANGUAGE plpgsql;

And that is no bueno. Should I just put the whole thing in a LOOP?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Thomas Kellerer
spam_eater@gmx.net
In reply to: Brian Dunavant (#6)
Re: Simple Atomic Relationship Insert

Brian Dunavant wrote on 13.01.2015 22:33:

What issue are you having? I'd imagine you have a race condition on
the insert into hometowns, but you'd have that same race condition in
your app code using a more traditional 3 query version as well.

I often use CTEs like this to make things atomic. It allows me to
remove transactional code out of the app and also to increase
performance by reducing the back-and-forth to the db.
http://omniti.com/seeds/writable-ctes-improve-performance

Craig Ringer explained some of the pitfalls of this approach here:

http://dba.stackexchange.com/questions/78510/why-is-cte-open-to-lost-updates

which is a follow up question based on this: http://stackoverflow.com/a/8702291/330315

Thomas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Brian Dunavant
brian@omniti.com
In reply to: Thomas Kellerer (#11)
Re: Simple Atomic Relationship Insert

A very good point, but it does not apply as here (and in my article)
we are not using updates, only insert and select.

On Tue, Jan 13, 2015 at 6:03 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:

Brian Dunavant wrote on 13.01.2015 22:33:

What issue are you having? I'd imagine you have a race condition on
the insert into hometowns, but you'd have that same race condition in
your app code using a more traditional 3 query version as well.

I often use CTEs like this to make things atomic. It allows me to
remove transactional code out of the app and also to increase
performance by reducing the back-and-forth to the db.
http://omniti.com/seeds/writable-ctes-improve-performance

Craig Ringer explained some of the pitfalls of this approach here:

http://dba.stackexchange.com/questions/78510/why-is-cte-open-to-lost-updates

which is a follow up question based on this:
http://stackoverflow.com/a/8702291/330315

Thomas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Robert DiFalco
robert.difalco@gmail.com
In reply to: Brian Dunavant (#12)
Re: Simple Atomic Relationship Insert

Good points. I guess my feeling is that if there can be a race condition on
INSERT then the CTE version is not truly atomic, hence the LOOP.

On Tue, Jan 13, 2015 at 3:11 PM, Brian Dunavant <brian@omniti.com> wrote:

Show quoted text

A very good point, but it does not apply as here (and in my article)
we are not using updates, only insert and select.

On Tue, Jan 13, 2015 at 6:03 PM, Thomas Kellerer <spam_eater@gmx.net>
wrote:

Brian Dunavant wrote on 13.01.2015 22:33:

What issue are you having? I'd imagine you have a race condition on
the insert into hometowns, but you'd have that same race condition in
your app code using a more traditional 3 query version as well.

I often use CTEs like this to make things atomic. It allows me to
remove transactional code out of the app and also to increase
performance by reducing the back-and-forth to the db.
http://omniti.com/seeds/writable-ctes-improve-performance

Craig Ringer explained some of the pitfalls of this approach here:

http://dba.stackexchange.com/questions/78510/why-is-cte-open-to-lost-updates

which is a follow up question based on this:
http://stackoverflow.com/a/8702291/330315

Thomas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14Berend Tober
btober@broadstripe.net
In reply to: John McKown (#2)
Re: Simple Atomic Relationship Insert

John McKown wrote:

On Tue, Jan 13, 2015 at 10:42 AM, Robert DiFalco
<robert.difalco@gmail.com <mailto:robert.difalco@gmail.com>>wrote:

Let's say I have two tables like this (I'm leaving stuff out for
simplicity):

CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE hometowns (
id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'),
name VARCHAR,
PRIMARY KEY (id),
UNIQUE(name)
);

CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE users (
id BIGINT DEFAULT nextval('USER_SEQ_GEN'),
hometown_id INTEGER,
name VARCHAR NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (hometown_id) REFERENCES hometowns(id)
);

The hometowns table is populate as users are created. For example,
a client may submit {"name":"Robert", "hometown":"Portland"}.

The hometowns table will never be updated, only either queries or
inserted.

So given this I need to INSERT a row into "users" and either SELECT
the hometowns.id <http://hometowns.id&gt; that matches "Portland" or if
it doesn't exist I INSERT it returning the hometowns.id
<http://hometowns.id&gt;&quot;.

Normally I would do by first doing a SELECT on hometown. If I don't
get anything I do an INSERT into hometown RETURNING the id. If THAT
throws an error then I do the SELECT again. Now I'm finally ready to
INSERT into users using the hometowns.id <http://hometowns.id&gt; from
the above steps.

But wow, that seems like a lot of code for a simple "Add if doesn't
exist" foreign key relationship -- but this is how I've always done.

So my question. Is there a simpler, more performant, or thread-safe
way to do this?

Thanks!

​What occurs to me is to simply do an INSERT into the "hometowns" table
and just ignore the "already exists" return indication. Then do a SELECT
to get the hometowns​ id which now exists, then INSERT the users. but I
could easily be overlooking some reason why this wouldn't work properly.

And one more approach:

CREATE SEQUENCE HOMETOWN_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE hometowns (
id INTEGER DEFAULT nextval('HOMETOWN_SEQ_GEN'),
name VARCHAR,
PRIMARY KEY (id),
UNIQUE(name)
);

CREATE SEQUENCE USER_SEQ_GEN START 1 INCREMENT 1;
CREATE TABLE users (
id BIGINT DEFAULT nextval('USER_SEQ_GEN'),
hometown_id INTEGER,
name VARCHAR NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (hometown_id) REFERENCES hometowns(id)
);

create or replace view user_town as
select
users.name as username,
hometowns.name as hometownname
from users
join hometowns
on hometowns.id = users.hometown_id;

create rule user_town_exists as on insert to user_town
where exists(select id from hometowns where (hometowns.name =
new.hometownname))
do
insert into users (name, hometown_id)
values (new.username, (select id from hometowns where
(hometowns.name = new.hometownname)));

create rule user_town_not_exists as on insert to user_town
where not exists(select id from hometowns where (hometowns.name =
new.hometownname))
do (
insert into hometowns (name) values (new.hometownname);
insert into users (name, hometown_id)
values (new.username, (select id from hometowns where
(hometowns.name = new.hometownname)));
);

create rule user_town_nothing as on insert to user_town
do instead nothing;

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Robert DiFalco (#13)
Re: Simple Atomic Relationship Insert

I don't like loops to catch "failure" condition... can you possibly fail
to stop?

In a stored procedure (or with auto-commit turned off in any
transaction)... You can avoid any race condition by using a semaphore
(e.g. you lock "something" for the duration of the critical part of your
processing so that anything that "would" update that data just waits).
[
http://www.postgresql.org/docs/9.4/static/applevel-consistency.html#NON-SERIALIZABLE-CONSISTENCY
]

When you have a sequence of steps that need to be serialized across
processes, choose or even create a table to use for locking. SELECT FOR
UPDATE "a" row in that table. Every process trying to access that row
now waits until the first transaction to get the exclusive lock
commits/rollbacks (no actual update need be executed). How atomic the
lock is depends on what you use for your lock. (poor levels of atomicity
will lead to performance problems that are hard to diagnose)

For the most recent version of this I've done... we used the unique
business key in an project based audit table. (the table was totally
unrelated to the work being done other than it had the same business key
values and locking wouldn't interfere with other processing.) So if you
had to "create" something to lock...

-- create something to lock... uniqueness is not required but allowed
for if you need it
INSERT INTO LOCKIT (table, biz_key) VALUES
('hometowns',hometown_name);
insert into users(name, hometown_id) VALUES ('Robert', SELECT
select_hometown_id(hometown_name));

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
RETURNS INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN

SELECT FOR UPDATE from LOCKIT where table = 'hometowns' and
biz_key = hometown_name; -- "wait"

WITH sel AS (
SELECT id FROM hometowns WHERE name = 'Portland'
), ins AS (
INSERT INTO hometowns(name)
SELECT 'Portland'
WHERE NOT EXISTS (SELECT 1 FROM sel)
RETURNING id
)

SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;

RETURN hometown_id;
END;
$ LANGUAGE plpgsql;

Only one process will be able to execute the CTE at a time - always -
and no looping required.

Roxanne

On 1/13/2015 6:52 PM, Robert DiFalco wrote:

Good points. I guess my feeling is that if there can be a race
condition on INSERT then the CTE version is not truly atomic, hence
the LOOP.

On Tue, Jan 13, 2015 at 3:11 PM, Brian Dunavant <brian@omniti.com
<mailto:brian@omniti.com>> wrote:

A very good point, but it does not apply as here (and in my article)
we are not using updates, only insert and select.

On Tue, Jan 13, 2015 at 6:03 PM, Thomas Kellerer
<spam_eater@gmx.net <mailto:spam_eater@gmx.net>> wrote:

Brian Dunavant wrote on 13.01.2015 22:33:

What issue are you having? I'd imagine you have a race

condition on

the insert into hometowns, but you'd have that same race

condition in

your app code using a more traditional 3 query version as well.

I often use CTEs like this to make things atomic. It allows me to
remove transactional code out of the app and also to increase
performance by reducing the back-and-forth to the db.
http://omniti.com/seeds/writable-ctes-improve-performance

Craig Ringer explained some of the pitfalls of this approach here:

http://dba.stackexchange.com/questions/78510/why-is-cte-open-to-lost-updates

which is a follow up question based on this:
http://stackoverflow.com/a/8702291/330315

Thomas

--
Sent via pgsql-general mailing list

(pgsql-general@postgresql.org <mailto:pgsql-general@postgresql.org>)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
[At other schools] I think the most common fault in general is to teach students how to pass exams instead of teaching them the science.
Donald Knuth

#16Daniel Verite
daniel@manitou-mail.org
In reply to: Roxanne Reid-Bennett (#15)
Re: Simple Atomic Relationship Insert

Roxanne Reid-Bennett wrote:

When you have a sequence of steps that need to be serialized across
processes, choose or even create a table to use for locking

This can also be done with an advisory lock, presumably faster:
http://www.postgresql.org/docs/current/static/explicit-locking.html

DECLARE
lock_key int := hashtext(hometown_name);
BEGIN
SELECT pg_advisory_xact_lock(lock_key,0);
// check for existence and insert if it doesn't exist
END;

When several sessions try to insert the same town (or strictly speaking, with
the same hash), only one of them will be allowed to proceed, the others being
put to wait until the first one commits or rollbacks, and so on until every
session gets through. The lock is automatically released at the end of the
transaction. It makes no difference if the "check and insert" is crammed into
a single SQL statement or several statements in procedural style.

This technique is intended to work with the default "Read Committed"
isolation level, which allows the concurrent transactions to "see" the new
row inserted by the single other transaction that got the "it does not yet
exist" result in the check for existence, and proceeded to insert and
eventually commit.

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17Robert DiFalco
robert.difalco@gmail.com
In reply to: Brian Dunavant (#10)
Re: Simple Atomic Relationship Insert

I must be doing something wrong because both of these approaches are giving
me deadlock exceptions.

On Tue, Jan 13, 2015 at 2:21 PM, Brian Dunavant <brian@omniti.com> wrote:

Show quoted text

The loop to run it twice handles that yes. I don't think that buys
you anything over a more traditional non-cte method though. I'd run
them a few thousand times to see if there's any difference in runtimes
but my guess is the CTE version would be slightly slower here.

v_id integer;
BEGIN;
select id into v_id from hometowns where name = hometown_name;
BEGIN
insert into hometowns (name)
select hometown_name where v_id is null
returning id into v_id;
EXCEPTION WHEN unique_violation
THEN
select id into v_id from hometowns where name = hometown_name;
END;
insert into users (name, hometown_id)
values ('Robert', v_id);
END;

On Tue, Jan 13, 2015 at 5:06 PM, Robert DiFalco
<robert.difalco@gmail.com> wrote:

This seems to get rid of the INSERT race condition.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)

RETURNS

INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
LOOP
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
RETURN hometown_id;

EXCEPTION WHEN unique_violation
THEN
END;
END LOOP;
END;
$ LANGUAGE plpgsql;

On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant <brian@omniti.com>

wrote:

With the single CTE I don't believe you can do a full upsert loop. If
you're doing this inside of a postgres function, your changes are
already atomic, so I don't believe by switching you are buying
yourself much (if anything) by using a CTE query instead of something
more traditional here.

The advantages of switching to a CTE would be if this code was all
being done inside of the app code with multiple queries.

On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
<robert.difalco@gmail.com> wrote:

Well, traditionally I would create a LOOP where I tried the SELECT, if
there
was nothing I did the INSERT, if that raised an exception I would

repeat

the
LOOP.

What's the best way to do it with the CTE? Currently I have the
following
which gives me Duplicate Key Exceptions when two sessions try to

insert

the
same record at the same time.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
RETURNS
INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM

sel;

RETURN hometown_id;
END;
$ LANGUAGE plpgsql;

And that is no bueno. Should I just put the whole thing in a LOOP?

#18Robert DiFalco
robert.difalco@gmail.com
In reply to: Brian Dunavant (#10)
Re: Simple Atomic Relationship Insert

FWIW I was using the select_hometown_id FUNCTION like this:

INSERT INTO users(...) values(..., select_hometown_id('Portland, OR'));

On Tue, Jan 13, 2015 at 2:21 PM, Brian Dunavant <brian@omniti.com> wrote:

Show quoted text

The loop to run it twice handles that yes. I don't think that buys
you anything over a more traditional non-cte method though. I'd run
them a few thousand times to see if there's any difference in runtimes
but my guess is the CTE version would be slightly slower here.

v_id integer;
BEGIN;
select id into v_id from hometowns where name = hometown_name;
BEGIN
insert into hometowns (name)
select hometown_name where v_id is null
returning id into v_id;
EXCEPTION WHEN unique_violation
THEN
select id into v_id from hometowns where name = hometown_name;
END;
insert into users (name, hometown_id)
values ('Robert', v_id);
END;

On Tue, Jan 13, 2015 at 5:06 PM, Robert DiFalco
<robert.difalco@gmail.com> wrote:

This seems to get rid of the INSERT race condition.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)

RETURNS

INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
LOOP
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM sel;
RETURN hometown_id;

EXCEPTION WHEN unique_violation
THEN
END;
END LOOP;
END;
$ LANGUAGE plpgsql;

On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant <brian@omniti.com>

wrote:

With the single CTE I don't believe you can do a full upsert loop. If
you're doing this inside of a postgres function, your changes are
already atomic, so I don't believe by switching you are buying
yourself much (if anything) by using a CTE query instead of something
more traditional here.

The advantages of switching to a CTE would be if this code was all
being done inside of the app code with multiple queries.

On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
<robert.difalco@gmail.com> wrote:

Well, traditionally I would create a LOOP where I tried the SELECT, if
there
was nothing I did the INSERT, if that raised an exception I would

repeat

the
LOOP.

What's the best way to do it with the CTE? Currently I have the
following
which gives me Duplicate Key Exceptions when two sessions try to

insert

the
same record at the same time.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR)
RETURNS
INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id FROM

sel;

RETURN hometown_id;
END;
$ LANGUAGE plpgsql;

And that is no bueno. Should I just put the whole thing in a LOOP?

#19Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Daniel Verite (#16)
Re: Simple Atomic Relationship Insert

On 1/14/15 8:28 AM, Daniel Verite wrote:

Roxanne Reid-Bennett wrote:

When you have a sequence of steps that need to be serialized across
processes, choose or even create a table to use for locking

This can also be done with an advisory lock, presumably faster:
http://www.postgresql.org/docs/current/static/explicit-locking.html

DECLARE
lock_key int := hashtext(hometown_name);
BEGIN
SELECT pg_advisory_xact_lock(lock_key,0);
// check for existence and insert if it doesn't exist
END;

I doubt that's going to be any faster than the preferred approach, which is documented in the ERROR TRAPPING section of the plpgsql docs (Example 40-2): http://www.postgresql.org/docs/devel/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Robert DiFalco (#18)
Re: Simple Atomic Relationship Insert

On 1/15/2015 6:12 PM, Robert DiFalco wrote:

FWIW I was using the select_hometown_id FUNCTION like this:

INSERT INTO users(...) values(..., select_hometown_id('Portland, OR'));

try this: (if you still get deadlocks, uncomment the advisory lock
[thanks Daniel] and try again)
Logically I suppose it might run faster to do the select, then insert
"if". I almost always write these as insert first - because it's the
more restrictive lock.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name VARCHAR) RETURNS
INTEGER AS
$BODY$
DECLARE
v_id integer;
BEGIN
-- perform pg_advisory_xact_lock(hashtext(hometown_name));
BEGIN
insert into hometowns (name)
select hometown_name where not exists (select id from hometowns
where name = hometown_name)
returning id into v_id;
IF (v_id IS NULL) THEN
select id into v_id from hometowns where name = hometown_name;
END IF;
return v_id;
EXCEPTION
WHEN OTHERS THEN
-- choose your poison, this really shouldn't get here
END;

return null;
END;
$BODY$
LANGUAGE plpgsql;

On Tue, Jan 13, 2015 at 2:21 PM, Brian Dunavant <brian@omniti.com
<mailto:brian@omniti.com>> wrote:

The loop to run it twice handles that yes. I don't think that buys
you anything over a more traditional non-cte method though. I'd run
them a few thousand times to see if there's any difference in runtimes
but my guess is the CTE version would be slightly slower here.

v_id integer;
BEGIN;
select id into v_id from hometowns where name = hometown_name;
BEGIN
insert into hometowns (name)
select hometown_name where v_id is null
returning id into v_id;
EXCEPTION WHEN unique_violation
THEN
select id into v_id from hometowns where name = hometown_name;
END;
insert into users (name, hometown_id)
values ('Robert', v_id);
END;

On Tue, Jan 13, 2015 at 5:06 PM, Robert DiFalco
<robert.difalco@gmail.com <mailto:robert.difalco@gmail.com>> wrote:

This seems to get rid of the INSERT race condition.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name

VARCHAR) RETURNS

INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
LOOP
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id

FROM sel;

RETURN hometown_id;

EXCEPTION WHEN unique_violation
THEN
END;
END LOOP;
END;
$ LANGUAGE plpgsql;

On Tue, Jan 13, 2015 at 1:53 PM, Brian Dunavant

<brian@omniti.com <mailto:brian@omniti.com>> wrote:

With the single CTE I don't believe you can do a full upsert

loop. If

you're doing this inside of a postgres function, your changes are
already atomic, so I don't believe by switching you are buying
yourself much (if anything) by using a CTE query instead of

something

more traditional here.

The advantages of switching to a CTE would be if this code was all
being done inside of the app code with multiple queries.

On Tue, Jan 13, 2015 at 4:39 PM, Robert DiFalco
<robert.difalco@gmail.com <mailto:robert.difalco@gmail.com>> wrote:

Well, traditionally I would create a LOOP where I tried the

SELECT, if

there
was nothing I did the INSERT, if that raised an exception I

would repeat

the
LOOP.

What's the best way to do it with the CTE? Currently I have the
following
which gives me Duplicate Key Exceptions when two sessions try

to insert

the
same record at the same time.

CREATE OR REPLACE FUNCTION select_hometown_id(hometown_name

VARCHAR)

RETURNS
INTEGER AS $
DECLARE hometown_id INTEGER;
BEGIN
WITH sel AS (
SELECT id FROM hometowns WHERE name = hometown_name
), ins AS (
INSERT INTO hometowns (name)
SELECT hometown_name
WHERE NOT EXISTS(SELECT 1 FROM sel)
RETURNING id
)
SELECT id INTO hometown_id FROM ins UNION ALL SELECT id

FROM sel;

RETURN hometown_id;
END;
$ LANGUAGE plpgsql;

And that is no bueno. Should I just put the whole thing in a

LOOP?

--
[At other schools] I think the most common fault in general is to teach students how to pass exams instead of teaching them the science.
Donald Knuth

#21Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Roxanne Reid-Bennett (#20)
#22Robert DiFalco
robert.difalco@gmail.com
In reply to: Jim Nasby (#21)
#23Daniel Verite
daniel@manitou-mail.org
In reply to: Robert DiFalco (#17)
#24Robert DiFalco
robert.difalco@gmail.com
In reply to: Daniel Verite (#23)
In reply to: Jim Nasby (#21)
#26Robert DiFalco
robert.difalco@gmail.com
In reply to: Roxanne Reid-Bennett (#25)
#27Robert DiFalco
robert.difalco@gmail.com
In reply to: Robert DiFalco (#26)
In reply to: Robert DiFalco (#27)