How to always run UPDATE FROM despite missing records in the source table?

Started by Alexander Farberabout 7 years ago6 messagesgeneral
Jump to latest
#1Alexander Farber
alexander.farber@gmail.com

Good afternoon

I have prepared a simplified test case for my question:
https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0

In PostgreSQL 10.6 there are 2 tables:

CREATE TABLE users (
uid SERIAL PRIMARY KEY,
created timestamptz NOT NULL,
visited timestamptz NOT NULL,
ip inet NOT NULL,
lat double precision,
lng double precision
);

CREATE TABLE geoip (
block inet PRIMARY KEY,
lat double precision,
lng double precision
);

CREATE INDEX ON geoip USING SPGIST (block);

which are filled with the following test data:

INSERT INTO users (created, visited, ip) VALUES
(now(), now(), '1.2.3.4'::inet),
(now(), now(), '1.2.3.5'::inet),
(now(), now(), '1.2.3.6'::inet);

INSERT INTO geoip (block, lat, lng) VALUES
('1.2.3.0/24', -33.4940, 143.2104),
('10.0.0.0/8', 34.6617, 133.9350);

Then in a stored function I run the following UPDATE command -

UPDATE users u SET
visited = now(),
ip = '10.10.10.10'::inet,
lat = i.lat,
lng = i.lng
FROM geoip i
WHERE u.uid = 1 AND '10.10.10.10'::inet <<= i.block;

(the 1 and the ip address are actually in_uid and in_ip parameters in my
stored function).

The above query works well and updates all 4 fields in the users table.

However the following query does not work as intended and does not update
any fields, because there is no matching block in the geoip table found:

UPDATE users u SET
visited = now(), -- HOW TO ALWAYS UPDATE THIS FIELD?
ip = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD?
lat = i.lat,
lng = i.lng
FROM geoip i
WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block;

The field visited and ip however should be always updated - regardless if
the block was found or not.

Kind of LEFT JOIN, but for UPDATE - how to achieve this please?

The only workaround that I could think of is -

UPDATE users SET
visited = now(),
ip = '20.20.20.20'::inet,
lat = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block),
lng = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block)
WHERE uid = 2;

But that would run the same subquery twice (correct?) and my geoip table is
already slow with 3073410 records (and that is why I am trying to cache its
lat and lng values in the users table on each user login event)

Regards
Alex

#2Kevin Brannen
KBrannen@efji.com
In reply to: Alexander Farber (#1)
RE: How to always run UPDATE FROM despite missing records in the source table?

From: Alexander Farber <alexander.farber@gmail.com>

* The only workaround that I could think of is -

UPDATE users SET
visited = now(),
ip = '20.20.20.20'::inet,
lat = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block),
lng = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block)
WHERE uid = 2;

But that would run the same subquery twice (correct?) and my geoip table is already slow with 3073410 records (and that is why I am trying to cache its lat and lng values in the users table on each user login event)

Have you considered using a WITH clause to get the data so the query is only run once?
See section 7.8.2 at https://www.postgresql.org/docs/9.6/queries-with.html
Kevin
This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Alexander Farber (#1)
Re: How to always run UPDATE FROM despite missing records in the source table?

On 1/11/19 4:50 AM, Alexander Farber wrote:

Good afternoon

I have prepared a simplified test case for my question:
https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0

In PostgreSQL 10.6 there are 2 tables:

CREATE TABLE users (
        uid SERIAL PRIMARY KEY,
        created       timestamptz NOT NULL,
        visited       timestamptz NOT NULL,
        ip            inet        NOT NULL,
        lat           double precision,
        lng           double precision
  );

  CREATE TABLE geoip (
        block   inet    PRIMARY KEY,
        lat     double precision,
        lng     double precision
);

CREATE INDEX ON geoip USING SPGIST (block);

which are filled with the following test data:

INSERT INTO users (created, visited, ip) VALUES
  (now(), now(), '1.2.3.4'::inet),
  (now(), now(), '1.2.3.5'::inet),
  (now(), now(), '1.2.3.6'::inet);

INSERT INTO geoip (block, lat, lng) VALUES
 ('1.2.3.0/24 <http://1.2.3.0/24&gt;&#39;, -33.4940, 143.2104),
 ('10.0.0.0/8 <http://10.0.0.0/8&gt;&#39;, 34.6617, 133.9350);

Then in a stored function I run the following UPDATE command -

UPDATE users u SET
    visited = now(),
    ip      = '10.10.10.10'::inet,
    lat     = i.lat,
    lng     = i.lng
FROM geoip i
WHERE u.uid = 1 AND '10.10.10.10'::inet <<= i.block;

(the 1 and the ip address are actually in_uid and in_ip parameters in my
stored function).

The above query works well and updates all 4 fields in the users table.

However the following query does not work as intended and does not
update any fields, because there is no matching block in the geoip table
found:

UPDATE users u SET
    visited = now(),                  -- HOW TO ALWAYS UPDATE THIS FIELD?
    ip      = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD?
    lat     = i.lat,
    lng     = i.lng
FROM geoip i
WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block;

The field visited and ip however should be always updated - regardless
if the block was found or not.

Kind of LEFT JOIN, but for UPDATE - how to achieve this please?

The only workaround that I could think of is -

UPDATE users SET
    visited = now(),
    ip      = '20.20.20.20'::inet,
    lat     = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block),
    lng     = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block)
WHERE uid = 2;

But that would run the same subquery twice (correct?) and my geoip table
is already slow with 3073410 records (and that is why I am trying to
cache its lat and lng values in the users table on each user login event)

Why not put a test for the block in the function and then use different
UPDATE's depending on the result?

Pseudo code:

IF ip IN block THEN
UPDATE users u SET
visited = now(),
ip = '10.10.10.10'::inet,
lat = i.lat,
lng = i.lng
FROM geoip i
WHERE u.uid = 1;

ELSE

UPDATE users u SET
visited = now(),
ip = '20.20.20.20'::inet
FROM geoip i
WHERE u.uid = 2 ;

END IF;

Regards
Alex

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Alexander Farber (#1)
Re: How to always run UPDATE FROM despite missing records in the source table?

"Alexander" == Alexander Farber <alexander.farber@gmail.com> writes:

Alexander> However the following query does not work as intended and
Alexander> does not update any fields, because there is no matching
Alexander> block in the geoip table found:

Alexander> UPDATE users u SET
Alexander> visited = now(), -- HOW TO ALWAYS UPDATE THIS FIELD?
Alexander> ip = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD?
Alexander> lat = i.lat,
Alexander> lng = i.lng
Alexander> FROM geoip i
Alexander> WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block;

Alexander> The field visited and ip however should be always updated -
Alexander> regardless if the block was found or not.

Alexander> Kind of LEFT JOIN, but for UPDATE - how to achieve this please?

It can be done like this (this assumes you want to preserve the previous
values of u.lat/u.lng if the block was not found; if you want to set
them to null instead, then remove the coalesce() calls):

UPDATE users u
SET visited = now(),
ip = v.ip,
lat = coalesce(i.lat, u.lat),
lng = coalesce(i.lng, u.lng)
FROM (VALUES ('20.20.20.20'::inet)) v(ip)
LEFT JOIN geoip i ON (v.ip <<= i.block)
WHERE u.uid = 2;

Alexander> But that would run the same subquery twice (correct?) and my
Alexander> geoip table is already slow with 3073410 records

Slow even with a gist or spgist index? what does the explain analyze
look like?

(You could also try using the ip4r module; I've not done any serious
benchmarking to see if it's faster than the built-in index types, though
it has some theoretical advantages due to not being restricted to CIDR
ranges. In pg versions before the built-in inet type got a gist index
method, ip4r was _the_ way to do ip block lookups for geoip etc.)

--
Andrew (irc:RhodiumToad)

#5Alexander Farber
alexander.farber@gmail.com
In reply to: Adrian Klaver (#3)
Re: How to always run UPDATE FROM despite missing records in the source table?

Thank you Adrian -

On Fri, Jan 11, 2019 at 4:55 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

On 1/11/19 4:50 AM, Alexander Farber wrote:

https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0

Why not put a test for the block in the function and then use different
UPDATE's depending on the result?

I didn't want to use IF and switch that statement to PL/pgSQL from pure
SQL, so Andrew's answer

UPDATE users u SET
visited = now(),
ip = v.ip,
lat = i.lat,
lng = i.lng
FROM (VALUES ('20.20.20.20'::inet)) v(ip)
LEFT JOIN geoip i ON (v.ip <<= i.block)
WHERE u.uid = 2;

suits me better, even though I wonder what is the (VALUES
('20.20.20.20'::inet)) v(ip) construct there, some temporary table which is
then LEGT JOINed to the geoip table?

Also, Andrew you have been right - with spgist index my queries against
geoip are fast enough, I was looking at the wrong spot in my EXPLAIN
ANALYZE output (the average values are slow, I am going to cache them soon)

Regards
Alex

#6Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Alexander Farber (#5)
Re: How to always run UPDATE FROM despite missing records in the source table?

"Alexander" == Alexander Farber <alexander.farber@gmail.com> writes:

Alexander> even though I wonder what is the (VALUES
Alexander> ('20.20.20.20'::inet)) v(ip) construct there, some temporary
Alexander> table which is then LEGT JOINed to the geoip table?

The SQL spec calls it a <table value constructor>. The VALUES clause
behaves like a SELECT that returns a fixed number of rows (1 or more)
whose columns contain the results of the specified expressions. The
v(ip) part is just a table and column alias (I omitted the optional AS
keyword out of long habit) to name the constructed table.

--
Andrew (irc:RhodiumToad)