Help with slow table update

Started by Pawel Veselovabout 11 years ago16 messagesgeneral

pawel.veselov@gmail.com

about 11 years ago

Hi.

I have a plpgsql procedure that updates a few similar tables.
for some reason, updates on one of the tables take a lot longer the updates
on the other ones. The difference is, say, 7 seconds vs. 80 milliseconds.

the procedure uses cursors and record variables to do the updates. For
example:

update r_agrio_total set
unserved = unserved + (agrow->>'unserved')::numeric(38),
r_brkconn = mush_brk_conn(r_brkconn, q_item.r_brkconn),
where
tagid = _tagid and
unitid = (akey->>'unitid')::numeric and
placement = (akey->>'placement')::numeric and
device_type = (akey->>'device_type')::numeric;

There is another table (xq_agr) that is read record by record, and for each
of those records, such update is executed.

I was trying to select analyze the updates to see where the time could be
spent.
There are only 24 row in the "bad" table, and 3,400 rows in "good" table.
So, for the "bad" table, most of the updates will be on the same rows. The
times were measured on processing 100 original records.

When I'm analyzing pure update statements, I don't see anything strange.

"bad" table: explain analyze update r_agrio_total set unconfirmed =
unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0 and
device_type = 100;

RESULT:
Update on r_agrio_total (cost=0.42..4.46 rows=1 width=321) (actual
time=0.253..0.253 rows=0 loops=1)
-> Index Scan using tag_r_agrio_total on r_agrio_total
(cost=0.42..4.46 rows=1 width=321) (actual time=0.037..0.041 rows=1
loops=1)
Index Cond: (tagid = 1000::numeric)
Filter: ((unitid = 1000::numeric) AND (placement = 0::numeric) AND
(device_type = 100::numeric))
Rows Removed by Filter: 7
Total runtime: 0.282 ms

"good" table: explain analyze update r_agrio_hourly set unconfirmed =
unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0 and
device_type = 100 and rowdate = '2015-02-23T13';

RESULT:
Update on r_agrio_hourly (cost=0.42..17.36 rows=6 width=329) (actual
time=0.102..0.102 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..17.36 rows=6 width=329) (actual time=0.047..0.048 rows=1
loops=1)
Index Cond: ((tagid = 1000::numeric) AND (unitid = 1000::numeric)
AND ((rowdate)::text = '2015-02-23T13'::text) AND (device_type =
100::numeric) AND (placement = 0::numeric))
Total runtime: 0.135 ms

When I try doing it with WITH statement (really, to apply the actual data
that the plpgsql function uses), there is something strange in the "bad"
table.

explain analyze
with SRC as (select * from xq_agr where id = 914830)
update r_agrio_total set
unconfirmed = unconfirmed +
(SRC.r_agrio->>'unconfirmed')::numeric(38)
from SRC
where
tagid = (SRC.r_agrio->'key'->>'tagid')::numeric and
unitid = (SRC.r_agrio->'key'->>'unit')::numeric and
placement = (SRC.r_agrio->'key'->>'placement')::numeric and
device_type = (SRC.r_agrio->'key'->>'device_type')::numeric;

RESULT:
Update on r_agrio_total (cost=8.91..32777.51 rows=19331 width=409)
(actual time=0.107..0.107 rows=0 loops=1)
CTE src
-> Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44 rows=1
width=379) (actual time=0.026..0.027 rows=1 loops=1)
Index Cond: (id = 914830)
-> Nested Loop (cost=0.46..32769.07 rows=19331 width=409) (actual
time=0.107..0.107 rows=0 loops=1)
-> CTE Scan on src (cost=0.00..0.02 rows=1 width=88) (actual
time=0.032..0.033 rows=1 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total
(*cost=0.46..32285.78
rows=19331* width=321) (actual time=0.001..0.001 rows=0 loops=1)
Index Cond: ((tagid = (((src.r_agrio -> 'key'::text) ->>
'tagid'::text))::numeric) AND (unitid = (((src.r_agrio -> 'key'::text) ->>
'unit'::text))::numeric) AND (device_type = (((src.r_agrio -> 'key'::text)
->> 'device_type'::text))::numeric) AND (placement = (((src.r_agrio ->
'key'::text) ->> 'placement'::text))::numeric))
Total runtime: 0.155 ms

explain analyze
with SRC as (select * from xq_agr where id = 914830)
update r_agrio_hourly set
unconfirmed = unconfirmed +
(SRC.r_agrio->>'unconfirmed')::numeric(38)
from SRC
where
tagid = (SRC.r_agrio->'key'->>'tagid')::numeric and
unitid = (SRC.r_agrio->'key'->>'unit')::numeric and
placement = (SRC.r_agrio->'key'->>'placement')::numeric and
device_type = (SRC.r_agrio->'key'->>'device_type')::numeric and
rowdate = (SRC.r_agrio->'key'->>'rowdate');

RESULT:
Update on r_agrio_hourly (cost=8.91..52.91 rows=20 width=417) (actual
time=0.123..0.123 rows=0 loops=1)
CTE src
-> Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44 rows=1
width=379) (actual time=0.023..0.024 rows=1 loops=1)
Index Cond: (id = 914830)
-> Nested Loop (cost=0.47..44.47 rows=20 width=417) (actual
time=0.121..0.121 rows=0 loops=1)
-> CTE Scan on src (cost=0.00..0.02 rows=1 width=88) (actual
time=0.030..0.031 rows=1 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(*cost=0.47..43.95
rows=20* width=329) (actual time=0.000..0.000 rows=0 loops=1)
Index Cond: ((tagid = (((src.r_agrio -> 'key'::text) ->>
'tagid'::text))::numeric) AND (unitid = (((src.r_agrio -> 'key'::text) ->>
'unit'::text))::numeric) AND ((rowdate)::text = ((src.r_agrio ->
'key'::text) ->> 'rowdate'::text)) AND (device_type = (((src.r_agrio ->
'key'::text) ->> 'device_type'::text))::numeric) AND (placement =
(((src.r_agrio -> 'key'::text) ->> 'placement'::text))::numeric))
Total runtime: 0.176 ms

I've tried doing vacuum full analyze, rebuilding indexes. Vacuum did help
somewhat, but the time differential is still huge (that 7sec vs. 80ms)

The table structure (some fields thrown out just to save output length):

=> \d r_agrio_total
Column | Type | Modifiers

Would appreciate any help on this, thank you!

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Pawel Veselov (#1)

Re: Help with slow table update

On 4/9/15 6:18 PM, Pawel Veselov wrote:

Hi.

I have a plpgsql procedure that updates a few similar tables.
for some reason, updates on one of the tables take a lot longer the
updates on the other ones. The difference is, say, 7 seconds vs. 80
milliseconds.

the procedure uses cursors and record variables to do the updates. For
example:

update r_agrio_total set
unserved = unserved + (agrow->>'unserved')::numeric(38),
r_brkconn = mush_brk_conn(r_brkconn, q_item.r_brkconn),
where
tagid = _tagid and
unitid = (akey->>'unitid')::numeric and
placement = (akey->>'placement')::numeric and
device_type = (akey->>'device_type')::numeric;

There is another table (xq_agr) that is read record by record, and for
each of those records, such update is executed.

I was trying to select analyze the updates to see where the time could
be spent.
There are only 24 row in the "bad" table, and 3,400 rows in "good"
table. So, for the "bad" table, most of the updates will be on the same
rows. The times were measured on processing 100 original records.

When I'm analyzing pure update statements, I don't see anything strange.

"bad" table: explain analyze update r_agrio_total set unconfirmed =
unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0
and device_type = 100;

RESULT:
Update on r_agrio_total (cost=0.42..4.46 rows=1 width=321) (actual
time=0.253..0.253 rows=0 loops=1)
-> Index Scan using tag_r_agrio_total on r_agrio_total
(cost=0.42..4.46 rows=1 width=321) (actual time=0.037..0.041 rows=1
loops=1)
Index Cond: (tagid = 1000::numeric)
Filter: ((unitid = 1000::numeric) AND (placement = 0::numeric)
AND (device_type = 100::numeric))
Rows Removed by Filter: 7
Total runtime: 0.282 ms

"good" table: explain analyze update r_agrio_hourly set unconfirmed =
unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0
and device_type = 100 and rowdate = '2015-02-23T13';

RESULT:
Update on r_agrio_hourly (cost=0.42..17.36 rows=6 width=329) (actual
time=0.102..0.102 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..17.36 rows=6 width=329) (actual time=0.047..0.048 rows=1
loops=1)
Index Cond: ((tagid = 1000::numeric) AND (unitid =
1000::numeric) AND ((rowdate)::text = '2015-02-23T13'::text) AND
(device_type = 100::numeric) AND (placement = 0::numeric))
Total runtime: 0.135 ms

When I try doing it with WITH statement (really, to apply the actual
data that the plpgsql function uses), there is something strange in the
"bad" table.

explain analyze
with SRC as (select * from xq_agr where id = 914830)
update r_agrio_total set
unconfirmed = unconfirmed +
(SRC.r_agrio->>'unconfirmed')::numeric(38)
from SRC
where
tagid = (SRC.r_agrio->'key'->>'tagid')::numeric and
unitid = (SRC.r_agrio->'key'->>'unit')::numeric and
placement = (SRC.r_agrio->'key'->>'placement')::numeric and
device_type = (SRC.r_agrio->'key'->>'device_type')::numeric;

RESULT:
Update on r_agrio_total (cost=8.91..32777.51 rows=19331 width=409)
(actual time=0.107..0.107 rows=0 loops=1)
CTE src
-> Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44
rows=1 width=379) (actual time=0.026..0.027 rows=1 loops=1)
Index Cond: (id = 914830)
-> Nested Loop (cost=0.46..32769.07 rows=19331 width=409) (actual
time=0.107..0.107 rows=0 loops=1)
-> CTE Scan on src (cost=0.00..0.02 rows=1 width=88) (actual
time=0.032..0.033 rows=1 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total
(*cost=0.46..32285.78 rows=19331* width=321) (actual time=0.001..0.001
rows=0 loops=1)
Index Cond: ((tagid = (((src.r_agrio -> 'key'::text) ->>
'tagid'::text))::numeric) AND (unitid = (((src.r_agrio -> 'key'::text)
->> 'unit'::text))::numeric) AND (device_type = (((src.r_agrio ->
'key'::text) ->> 'device_type'::text))::numeric) AND (placement =
(((src.r_agrio -> 'key'::text) ->> 'placement'::text))::numeric))
Total runtime: 0.155 ms

explain analyze
with SRC as (select * from xq_agr where id = 914830)
update r_agrio_hourly set
unconfirmed = unconfirmed +
(SRC.r_agrio->>'unconfirmed')::numeric(38)
from SRC
where
tagid = (SRC.r_agrio->'key'->>'tagid')::numeric and
unitid = (SRC.r_agrio->'key'->>'unit')::numeric and
placement = (SRC.r_agrio->'key'->>'placement')::numeric and
device_type = (SRC.r_agrio->'key'->>'device_type')::numeric and
rowdate = (SRC.r_agrio->'key'->>'rowdate');

RESULT:
Update on r_agrio_hourly (cost=8.91..52.91 rows=20 width=417) (actual
time=0.123..0.123 rows=0 loops=1)
CTE src
-> Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44
rows=1 width=379) (actual time=0.023..0.024 rows=1 loops=1)
Index Cond: (id = 914830)
-> Nested Loop (cost=0.47..44.47 rows=20 width=417) (actual
time=0.121..0.121 rows=0 loops=1)
-> CTE Scan on src (cost=0.00..0.02 rows=1 width=88) (actual
time=0.030..0.031 rows=1 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(*cost=0.47..43.95 rows=20* width=329) (actual time=0.000..0.000
rows=0 loops=1)
Index Cond: ((tagid = (((src.r_agrio -> 'key'::text) ->>
'tagid'::text))::numeric) AND (unitid = (((src.r_agrio -> 'key'::text)
->> 'unit'::text))::numeric) AND ((rowdate)::text = ((src.r_agrio ->
'key'::text) ->> 'rowdate'::text)) AND (device_type = (((src.r_agrio ->
'key'::text) ->> 'device_type'::text))::numeric) AND (placement =
(((src.r_agrio -> 'key'::text) ->> 'placement'::text))::numeric))
Total runtime: 0.176 ms

I've tried doing vacuum full analyze, rebuilding indexes. Vacuum did
help somewhat, but the time differential is still huge (that 7sec vs. 80ms)

The table structure (some fields thrown out just to save output length):

=> \d r_agrio_total
Column | Type | Modifiers
-------------------+---------------+------------------------------------------------------------
id | bigint | not null default
nextval('r_agrio_total_id_seq'::regclass)
tagid | numeric(38,0) | not null
unitid | numeric(38,0) | not null
device_type | numeric(38,0) | not null
placement | numeric(38,0) | not null default 0
unserved | numeric(38,0) | not null default 0
r_brkconn | json |
Indexes:
"r_agrio_total_pkey" PRIMARY KEY, btree (id)
"u_r_agrio_total" UNIQUE, btree (tagid, unitid, device_type, placement)
"unit_r_agrio_total" btree (unitid)
"tag_r_agrio_total" btree (tagid)

=> \d r_agrio_hourly
Column | Type |
Modifiers
-------------------+-----------------------+-------------------------------------------------------------
id | bigint | not null default
nextval('r_agrio_hourly_id_seq'::regclass)
tagid | numeric(38,0) | not null
unitid | numeric(38,0) | not null
rowdate | character varying(15) | not null
device_type | numeric(38,0) | not null
placement | numeric(38,0) | not null default 0
unserved | numeric(38,0) | not null default 0
unconfirmed | numeric(38,0) | not null default 0
r_brkconn | json |
Indexes:
"r_agrio_hourly_pkey" PRIMARY KEY, btree (id)
"u_r_agrio_hourly" UNIQUE, btree (tagid, unitid, rowdate,
device_type, placement)
"unit_r_agrio_hourly" btree (unitid)
"rowdate_r_agrio_hourly" btree (rowdate)
"tag_r_agrio_hourly" btree (tagid)

Would appreciate any help on this, thank you!

What is the ->> operator? I'm not familiar with it.

numeric is FAR slower than int/bigint. Unless you *really* need 38
digits, use one of the other types.

Cursors tend to make things slow. Avoid them if you can.

As for your specific question, I suggest you modify the plpgsql function
so that it's doing an EXPLAIN ANALYZE on the slow table. EXPLAIN ANALYZE
actually returns a recordset the same way a SELECT would, with a single
column of type text. So you just need to do something with that output.
The easiest thing would be to replace this in your function:

UPDATE slow_table SET ...

to this (untested)

RETURN QUERY EXPLAIN ANALYZE UPDATE slow_table SET ...

and change the function so it returns SETOF text instead of whatever it
returns now.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Pawel Veselov

pawel.veselov@gmail.com

almost 11 years ago

In reply to: Jim Nasby (#2)

Re: Help with slow table update

On Sun, Apr 12, 2015 at 5:40 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 4/9/15 6:18 PM, Pawel Veselov wrote:

Hi.

I have a plpgsql procedure that updates a few similar tables.
for some reason, updates on one of the tables take a lot longer the
updates on the other ones. The difference is, say, 7 seconds vs. 80
milliseconds.

the procedure uses cursors and record variables to do the updates. For
example:

update r_agrio_total set
unserved = unserved + (agrow->>'unserved')::numeric(38),
r_brkconn = mush_brk_conn(r_brkconn, q_item.r_brkconn),
where
tagid = _tagid and
unitid = (akey->>'unitid')::numeric and
placement = (akey->>'placement')::numeric and
device_type = (akey->>'device_type')::numeric;

There is another table (xq_agr) that is read record by record, and for
each of those records, such update is executed.

I was trying to select analyze the updates to see where the time could
be spent.
There are only 24 row in the "bad" table, and 3,400 rows in "good"
table. So, for the "bad" table, most of the updates will be on the same
rows. The times were measured on processing 100 original records.

When I'm analyzing pure update statements, I don't see anything strange.

"bad" table: explain analyze update r_agrio_total set unconfirmed =
unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0
and device_type = 100;

RESULT:
Update on r_agrio_total (cost=0.42..4.46 rows=1 width=321) (actual
time=0.253..0.253 rows=0 loops=1)
-> Index Scan using tag_r_agrio_total on r_agrio_total
(cost=0.42..4.46 rows=1 width=321) (actual time=0.037..0.041 rows=1
loops=1)
Index Cond: (tagid = 1000::numeric)
Filter: ((unitid = 1000::numeric) AND (placement = 0::numeric)
AND (device_type = 100::numeric))
Rows Removed by Filter: 7
Total runtime: 0.282 ms

"good" table: explain analyze update r_agrio_hourly set unconfirmed =
unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0
and device_type = 100 and rowdate = '2015-02-23T13';

RESULT:
Update on r_agrio_hourly (cost=0.42..17.36 rows=6 width=329) (actual
time=0.102..0.102 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..17.36 rows=6 width=329) (actual time=0.047..0.048 rows=1
loops=1)
Index Cond: ((tagid = 1000::numeric) AND (unitid =
1000::numeric) AND ((rowdate)::text = '2015-02-23T13'::text) AND
(device_type = 100::numeric) AND (placement = 0::numeric))
Total runtime: 0.135 ms

When I try doing it with WITH statement (really, to apply the actual
data that the plpgsql function uses), there is something strange in the
"bad" table.

explain analyze
with SRC as (select * from xq_agr where id = 914830)
update r_agrio_total set
unconfirmed = unconfirmed +
(SRC.r_agrio->>'unconfirmed')::numeric(38)
from SRC
where
tagid = (SRC.r_agrio->'key'->>'tagid')::numeric and
unitid = (SRC.r_agrio->'key'->>'unit')::numeric and
placement = (SRC.r_agrio->'key'->>'placement')::numeric and
device_type = (SRC.r_agrio->'key'->>'device_type')::numeric;

RESULT:
Update on r_agrio_total (cost=8.91..32777.51 rows=19331 width=409)
(actual time=0.107..0.107 rows=0 loops=1)
CTE src
-> Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44
rows=1 width=379) (actual time=0.026..0.027 rows=1 loops=1)
Index Cond: (id = 914830)
-> Nested Loop (cost=0.46..32769.07 rows=19331 width=409) (actual
time=0.107..0.107 rows=0 loops=1)
-> CTE Scan on src (cost=0.00..0.02 rows=1 width=88) (actual
time=0.032..0.033 rows=1 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total
(*cost=0.46..32285.78 rows=19331* width=321) (actual time=0.001..0.001

rows=0 loops=1)
Index Cond: ((tagid = (((src.r_agrio -> 'key'::text) ->>
'tagid'::text))::numeric) AND (unitid = (((src.r_agrio -> 'key'::text)
->> 'unit'::text))::numeric) AND (device_type = (((src.r_agrio ->
'key'::text) ->> 'device_type'::text))::numeric) AND (placement =
(((src.r_agrio -> 'key'::text) ->> 'placement'::text))::numeric))
Total runtime: 0.155 ms

explain analyze
with SRC as (select * from xq_agr where id = 914830)
update r_agrio_hourly set
unconfirmed = unconfirmed +
(SRC.r_agrio->>'unconfirmed')::numeric(38)
from SRC
where
tagid = (SRC.r_agrio->'key'->>'tagid')::numeric and
unitid = (SRC.r_agrio->'key'->>'unit')::numeric and
placement = (SRC.r_agrio->'key'->>'placement')::numeric and
device_type = (SRC.r_agrio->'key'->>'device_type')::numeric
and
rowdate = (SRC.r_agrio->'key'->>'rowdate');

RESULT:
Update on r_agrio_hourly (cost=8.91..52.91 rows=20 width=417) (actual
time=0.123..0.123 rows=0 loops=1)
CTE src
-> Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44
rows=1 width=379) (actual time=0.023..0.024 rows=1 loops=1)
Index Cond: (id = 914830)
-> Nested Loop (cost=0.47..44.47 rows=20 width=417) (actual
time=0.121..0.121 rows=0 loops=1)
-> CTE Scan on src (cost=0.00..0.02 rows=1 width=88) (actual
time=0.030..0.031 rows=1 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(*cost=0.47..43.95 rows=20* width=329) (actual time=0.000..0.000

rows=0 loops=1)
Index Cond: ((tagid = (((src.r_agrio -> 'key'::text) ->>
'tagid'::text))::numeric) AND (unitid = (((src.r_agrio -> 'key'::text)
->> 'unit'::text))::numeric) AND ((rowdate)::text = ((src.r_agrio ->
'key'::text) ->> 'rowdate'::text)) AND (device_type = (((src.r_agrio ->
'key'::text) ->> 'device_type'::text))::numeric) AND (placement =
(((src.r_agrio -> 'key'::text) ->> 'placement'::text))::numeric))
Total runtime: 0.176 ms

I've tried doing vacuum full analyze, rebuilding indexes. Vacuum did
help somewhat, but the time differential is still huge (that 7sec vs.
80ms)

The table structure (some fields thrown out just to save output length):

=> \d r_agrio_total
Column | Type | Modifiers
-------------------+---------------+------------------------
------------------------------------
id | bigint | not null default
nextval('r_agrio_total_id_seq'::regclass)
tagid | numeric(38,0) | not null
unitid | numeric(38,0) | not null
device_type | numeric(38,0) | not null
placement | numeric(38,0) | not null default 0
unserved | numeric(38,0) | not null default 0
r_brkconn | json |
Indexes:
"r_agrio_total_pkey" PRIMARY KEY, btree (id)
"u_r_agrio_total" UNIQUE, btree (tagid, unitid, device_type,
placement)
"unit_r_agrio_total" btree (unitid)
"tag_r_agrio_total" btree (tagid)

=> \d r_agrio_hourly
Column | Type |
Modifiers
-------------------+-----------------------+----------------
---------------------------------------------
id | bigint | not null default
nextval('r_agrio_hourly_id_seq'::regclass)
tagid | numeric(38,0) | not null
unitid | numeric(38,0) | not null
rowdate | character varying(15) | not null
device_type | numeric(38,0) | not null
placement | numeric(38,0) | not null default 0
unserved | numeric(38,0) | not null default 0
unconfirmed | numeric(38,0) | not null default 0
r_brkconn | json |
Indexes:
"r_agrio_hourly_pkey" PRIMARY KEY, btree (id)
"u_r_agrio_hourly" UNIQUE, btree (tagid, unitid, rowdate,
device_type, placement)
"unit_r_agrio_hourly" btree (unitid)
"rowdate_r_agrio_hourly" btree (rowdate)
"tag_r_agrio_hourly" btree (tagid)

Would appreciate any help on this, thank you!

What is the ->> operator? I'm not familiar with it.

It extracts JSON value as a string.

numeric is FAR slower than int/bigint. Unless you *really* need 38 digits,
use one of the other types.

Yeah, that's on purpose.

Cursors tend to make things slow. Avoid them if you can.

Is there an alternative to iterating over a number of rows, where a direct
update query is not an option?

I really doubt that either the actual processing logic, including use of
types has anything to do with my problem. This is based on the fact that
out of the tables that are being changed, only one is exhibiting the
problem. All of the involved tables have nearly the same structure, and
have the same logical operations performed on them. I thought may be the
"bad" table is slow because it was first in the list, and Postgres was
caching the functions results, but I moved things around, and pattern is
the same.

As for your specific question, I suggest you modify the plpgsql function
so that it's doing an EXPLAIN ANALYZE on the slow table. EXPLAIN ANALYZE
actually returns a recordset the same way a SELECT would, with a single
column of type text. So you just need to do something with that output. The
easiest thing would be to replace this in your function:

UPDATE slow_table SET ...

to this (untested)

RETURN QUERY EXPLAIN ANALYZE UPDATE slow_table SET ...

and change the function so it returns SETOF text instead of whatever it
returns now.

Thank you, that made it a lot easier to see into what's really going on.
But the outcome is somewhat the same. The "bad" table analysis shows a very
high cost, and thousands of rows, where the table contains only 24 rows.
This time, however, the actual run time is shown, and one can see where the
time is spent (I was using just a sum of clock_time()s around the update
statements to see where the problem is).

r_agrio_hourly - "good", r_agrio_total - "bad".

Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual
time=2.248..2.248 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207 rows=1
loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type =
3::numeric) AND (placement = 2::numeric))
Total runtime: 2.281 ms
Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321)
(actual time=106.766..106.766 rows=0 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total
(cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626
rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
AND (device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 106.793 ms
Update on r_agrio_hourly (cost=0.42..978.49 rows=252 width=329) (actual
time=1.796..1.796 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..978.49 rows=252 width=329) (actual time=1.054..1.090 rows=1
loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type =
1::numeric) AND (placement = 2::numeric))
Total runtime: 1.824 ms
Update on r_agrio_total (cost=0.42..11553.27 rows=3017 width=321) (actual
time=102.685..102.685 rows=0 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total
(cost=0.42..11553.27 rows=3017 width=321) (actual time=0.675..33.375
rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
AND (device_type = 1::numeric) AND (placement = 2::numeric))
Total runtime: 102.711 ms
Update on r_agrio_hourly (cost=0.42..1005.13 rows=259 width=329) (actual
time=1.962..1.962 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..1005.13 rows=259 width=329) (actual time=0.978..1.124 rows=1
loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type =
2::numeric) AND (placement = 2::numeric))
Total runtime: 1.992 ms
Update on r_agrio_total (cost=0.42..66541.26 rows=18103 width=321)
(actual time=108.574..108.574 rows=0 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total
(cost=0.42..66541.26 rows=18103 width=321) (actual time=0.761..33.245
rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
AND (device_type = 2::numeric) AND (placement = 2::numeric))
Total runtime: 108.601 ms

(and there is ~800 more, with the same pattern).

Thank you!

David G. Johnston

david.g.johnston@gmail.com

almost 11 years ago

In reply to: Pawel Veselov (#3)

Re: Help with slow table update

On Mon, Apr 13, 2015 at 5:01 PM, Pawel Veselov <pawel.veselov@gmail.com>
wrote:

r_agrio_hourly - "good", r_agrio_total - "bad".

Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual
time=2.248..2.248 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207 rows=1
loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type =
3::numeric) AND (placement = 2::numeric))
Total runtime: 2.281 ms
Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321)
(actual time=106.766..106.766 rows=0 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total
(cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626
rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
AND (device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 106.793 ms

What it is you expect to see here?

What are the results (count and times) for:

SELECT count(*) FROM r_agrio_total WHERE tagid = 1002 and unitid = 1002;
SELECT count(*) FROM r_agrio_hourly WHERE tagid = 1002 and unitid = 1002;

More queries along this line might be needed. The underlying question is
how many index rows need to be skipped over on "total" to get the final
result - or rather are the columns in the index in descending order of
cardinality?

Any chance you can perform a "REINDEX" - maybe there is some bloat
present? There are queries to help discern if that may be the case, I do
not know then off the top of my head, but just doing it might be acceptable
and is definitely quicker if so.

I'm still not really following your presentation but maybe my thoughts
will spark something.

David J.

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Pawel Veselov (#3)

Re: Help with slow table update

On 4/13/15 7:01 PM, Pawel Veselov wrote:

Cursors tend to make things slow. Avoid them if you can.

Is there an alternative to iterating over a number of rows, where a
direct update query is not an option?

I really doubt that either the actual processing logic, including use of
types has anything to do with my problem. This is based on the fact that
out of the tables that are being changed, only one is exhibiting the
problem. All of the involved tables have nearly the same structure, and
have the same logical operations performed on them. I thought may be the
"bad" table is slow because it was first in the list, and Postgres was
caching the functions results, but I moved things around, and pattern is
the same.

I'm guessing that you're essentially processing a queue. Take a look at
/messages/by-id/552C750F.2010401@BlueTreble.com for
some ideas. Basically, not only do cursors have non-trivial overhead,
doing a ton of single-row queries is going to have a non-trivial
overhead itself.

As for your specific question, I suggest you modify the plpgsql
function so that it's doing an EXPLAIN ANALYZE on the slow table.
EXPLAIN ANALYZE actually returns a recordset the same way a SELECT
would, with a single column of type text. So you just need to do
something with that output. The easiest thing would be to replace
this in your function:

UPDATE slow_table SET ...

to this (untested)

RETURN QUERY EXPLAIN ANALYZE UPDATE slow_table SET ...

and change the function so it returns SETOF text instead of whatever
it returns now.

Thank you, that made it a lot easier to see into what's really going on.
But the outcome is somewhat the same. The "bad" table analysis shows a
very high cost, and thousands of rows, where the table contains only 24
rows. This time, however, the actual run time is shown, and one can see
where the time is spent (I was using just a sum of clock_time()s around
the update statements to see where the problem is).

r_agrio_hourly - "good", r_agrio_total - "bad".

Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329)
(actual time=2.248..2.248 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207
rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid =
1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND
(device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 2.281 ms
Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321)
(actual time=106.766..106.766 rows=0 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total
(cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626
rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid =
1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 106.793 ms

Keep in mind that the estimated cost is not terribly useful; it's the
actual times that matter.

I suspect what's happening here is a combination of things. First, the
hourly table is basically living in cache, but the total table is not.
That means that when you go to find a row in the total table you're
actually hitting the disk instead of pulling the data from memory.

Second, you may have a lot of dead rows in the total table. I suspect
this because of the very large amount of time the index scan is taking.
Unless you're running on an old 10MB MFM drive you'd be pretty hard
pressed for even 2 IO operations (one for the index leaf page and one
for the heap page) to take 32ms. I suspect the index scan is having to
read many dead rows in before it finds a live one, and incurring
multiple IOs. Swiching to EXPLAIN (analyze, buffers) would help confirm
that.

Third, I think something odd is happening with the update itself. I'm
pretty sure that the index scan itself is visiting the heap pages, so
each page should be in shared buffers by the time each tuple hits the
update node. That makes me wonder what on earth is taking 60ms to update
the tuple. I suspect it's going into either finding a free buffer to put
the new tuple on, or waiting to try and extend the relation. Selecting
ctid from the freshly updated rows and comparing the first number to the
total number of pages in the heap would show if the new tuples are all
ending up at the end of the heap.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Pawel Veselov

pawel.veselov@gmail.com

almost 11 years ago

In reply to: David G. Johnston (#4)

Re: Help with slow table update

On Mon, Apr 13, 2015 at 6:03 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Mon, Apr 13, 2015 at 5:01 PM, Pawel Veselov <pawel.veselov@gmail.com>
wrote:

r_agrio_hourly - "good", r_agrio_total - "bad".

Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual
time=2.248..2.248 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207 rows=1
loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid =
1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND
(device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 2.281 ms
Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321)
(actual time=106.766..106.766 rows=0 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total
(cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626
rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid =
1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 106.793 ms

What it is you expect to see here?

What are the results (count and times) for:

SELECT count(*) FROM r_agrio_total WHERE tagid = 1002 and unitid = 1002;

Result: 8 (the whole table is 24 rows). It returns somewhat with a stumble,
but relatively quickly.
db=> explain analyze SELECT count(*) FROM r_agrio_total WHERE tagid = 1002
and unitid = 1002;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4.45..4.46 rows=1 width=0) (actual time=327.194..327.195
rows=1 loops=1)
-> Index Scan using tag_r_agrio_total on r_agrio_total
(cost=0.42..4.45 rows=1 width=0) (actual time=0.039..327.189 rows=8
loops=1)
Index Cond: (tagid = 1002::numeric)
Filter: (unitid = 1002::numeric)
Total runtime: 327.228 ms

SELECT count(*) FROM r_agrio_hourly WHERE tagid = 1002 and unitid = 1002;

Result is 2869. Returns somewhat quckly. Explain analyze is crazy though:
db=> explain analyze SELECT count(*) FROM r_agrio_hourly WHERE tagid = 1002
and unitid = 1002;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=68134.68..68134.69 rows=1 width=0) (actual
time=15177.211..15177.211 rows=1 loops=1)
-> Index Scan using adunit_r_agrio_hourly on r_agrio_hourly
(cost=0.42..67027.10 rows=443035 width=0) (actual time=0.096..15175.730
rows=2869 loops=1)
Index Cond: (unitid = 1002::numeric)
Filter: (tagid = 1002::numeric)
Total runtime: 15177.240 ms

More queries along this line might be needed. The underlying question is

how many index rows need to be skipped over on "total" to get the final
result - or rather are the columns in the index in descending order of
cardinality?

Idea is - both tables have unique multi-field indices, and each update hits
exactly one row from that index, no more, and all fields from the index are
locked with equality condition on the update. All of the updates (within a
transaction) would always work on a small subset of rows (max a few
hundred, ever; in this case, it's may be around 10). I expect it to be
possible for the server to keep the active working set in the cache at all
times. Since the index is unique, there shouldn't be a reason to re-scan
the table, if a cached row is found, no?

Any chance you can perform a "REINDEX" - maybe there is some bloat
present? There are queries to help discern if that may be the case, I do
not know then off the top of my head, but just doing it might be acceptable
and is definitely quicker if so.

That's the thing - I've done both vacuum full, and re-index. The very first
time I did vacuum full things improved (60 seconds to 7 seconds). Re-index
didn't improve anything (but it was done after vacuum full).

I'm still not really following your presentation but maybe my thoughts
will spark something.

Thank you! I hope I clarified this some :)

Pawel Veselov

pawel.veselov@gmail.com

almost 11 years ago

In reply to: Jim Nasby (#5)

Re: Help with slow table update

On Mon, Apr 13, 2015 at 7:37 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 4/13/15 7:01 PM, Pawel Veselov wrote:

Cursors tend to make things slow. Avoid them if you can.

Is there an alternative to iterating over a number of rows, where a
direct update query is not an option?

I really doubt that either the actual processing logic, including use of
types has anything to do with my problem. This is based on the fact that
out of the tables that are being changed, only one is exhibiting the
problem. All of the involved tables have nearly the same structure, and
have the same logical operations performed on them. I thought may be the
"bad" table is slow because it was first in the list, and Postgres was
caching the functions results, but I moved things around, and pattern is
the same.

I'm guessing that you're essentially processing a queue. Take a look at
/messages/by-id/552C750F.2010401@BlueTreble.com for
some ideas. Basically, not only do cursors have non-trivial overhead, doing
a ton of single-row queries is going to have a non-trivial overhead itself.

Thank you for the pointers. PgQ sounds interesting, it has to be remote for
RDS (I use RDS), but I'll try implementing a solution based on it.
However, for all the times that is being spent during this update, the
breakdown is:

update "total" table: 10.773033
update "hourly" table: 00.179711
update "daily" table: 01.082467
update some other table (actually, it has cardinality similar to "total"
table): 00.168287
clean the queue table: 00.021733
overhead: 00.014922

The "overhead" is time taken to run the whole procedure, minus all these
other times that have been counted.

(some notes about the "daily" table below)

As for your specific question, I suggest you modify the plpgsql

function so that it's doing an EXPLAIN ANALYZE on the slow table.
EXPLAIN ANALYZE actually returns a recordset the same way a SELECT
would, with a single column of type text. So you just need to do
something with that output. The easiest thing would be to replace
this in your function:

UPDATE slow_table SET ...

to this (untested)

RETURN QUERY EXPLAIN ANALYZE UPDATE slow_table SET ...

and change the function so it returns SETOF text instead of whatever
it returns now.

Thank you, that made it a lot easier to see into what's really going on.
But the outcome is somewhat the same. The "bad" table analysis shows a
very high cost, and thousands of rows, where the table contains only 24
rows. This time, however, the actual run time is shown, and one can see
where the time is spent (I was using just a sum of clock_time()s around
the update statements to see where the problem is).

r_agrio_hourly - "good", r_agrio_total - "bad".

Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329)
(actual time=2.248..2.248 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207
rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid =
1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND
(device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 2.281 ms
Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321)
(actual time=106.766..106.766 rows=0 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total
(cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626
rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid =
1002::numeric) AND (device_type = 3::numeric) AND (placement =
2::numeric))
Total runtime: 106.793 ms

Keep in mind that the estimated cost is not terribly useful; it's the
actual times that matter.

I suspect what's happening here is a combination of things. First, the
hourly table is basically living in cache, but the total table is not. That
means that when you go to find a row in the total table you're actually
hitting the disk instead of pulling the data from memory.

Second, you may have a lot of dead rows in the total table. I suspect this
because of the very large amount of time the index scan is taking. Unless
you're running on an old 10MB MFM drive you'd be pretty hard pressed for
even 2 IO operations (one for the index leaf page and one for the heap
page) to take 32ms. I suspect the index scan is having to read many dead
rows in before it finds a live one, and incurring multiple IOs. Swiching to
EXPLAIN (analyze, buffers) would help confirm that.

That looks most likely to me as well. Most of the updates in a single
"batch", for the total table would be on the same record, while for hourly
table it's a lot less. Logically, the tables contain identical data, except
that hourly table breaks it down per hour, and total table contains the
data for all times. The daily table contains the same data per day.

So, if I compared the tables, the "total" table has the smallest amount of
rows, followed by daily, and then by hourly. However, total table gets the
most amount of updates on the same row, followed by daily followed by
hourly. It seems to me that the more of these inline updates are done on
rows, the worse this particular situation is (as you can see from the
above, the daily table is also already over a second long to get updated)

Here is the explain analyze output with buffers:

Update on r_agrio_hourly (cost=0.42..986.10 rows=254 width=329) (actual
time=5.945..5.945 rows=0 loops=1)
Buffers: shared hit=827 read=431 dirtied=17
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..986.10 rows=254 width=329) (actual time=2.220..2.645 rows=1
loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type =
3::numeric) AND (placement = 2::numeric))
Buffers: shared hit=402 read=14 dirtied=11
Total runtime: 6.005 ms
Update on r_agrio_total (cost=0.42..6.49 rows=1 width=321) (actual
time=320.494..320.494 rows=0 loops=1)
Buffers: shared hit=170298 read=79791 dirtied=6 written=41
-> Index Scan using tag_r_agrio_total on r_agrio_total
(cost=0.42..6.49 rows=1 width=321) (actual time=0.977..233.330 rows=1
loops=1)
Index Cond: (tagid = 1002::numeric)
Filter: ((unitid = 1002::numeric) AND (placement = 2::numeric) AND
(device_type = 3::numeric))
Rows Removed by Filter: 7
Buffers: shared hit=26817 read=59973 written=41
Total runtime: 320.530 ms
Update on r_agrio_hourly (cost=0.42..989.90 rows=255 width=329) (actual
time=2.605..2.605 rows=0 loops=1)
Buffers: shared hit=817 read=429 dirtied=16
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..989.90 rows=255 width=329) (actual time=1.606..1.644 rows=1
loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type =
1::numeric) AND (placement = 2::numeric))
Buffers: shared hit=133 read=283 dirtied=12
Total runtime: 2.634 ms
Update on r_agrio_total (cost=0.42..6.49 rows=1 width=321) (actual
time=283.720..283.720 rows=0 loops=1)
Buffers: shared hit=196398 read=53955 dirtied=5 written=18
-> Index Scan using tag_r_agrio_total on r_agrio_total
(cost=0.42..6.49 rows=1 width=321) (actual time=0.868..210.946 rows=1
loops=1)
Index Cond: (tagid = 1002::numeric)
Filter: ((unitid = 1002::numeric) AND (placement = 2::numeric) AND
(device_type = 1::numeric))
Rows Removed by Filter: 7
Buffers: shared hit=39758 read=47033 written=18
Total runtime: 283.757 ms
Update on r_agrio_hourly (cost=0.42..1020.91 rows=263 width=329) (actual
time=1.850..1.850 rows=0 loops=1)
Buffers: shared hit=1211 read=35 dirtied=15
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..1020.91 rows=263 width=329) (actual time=0.976..1.087 rows=1
loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric)
AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type =
2::numeric) AND (placement = 2::numeric))
Buffers: shared hit=399 read=17 dirtied=11
Total runtime: 1.878 ms
Update on r_agrio_total (cost=0.42..6.49 rows=1 width=321) (actual
time=287.481..287.481 rows=0 loops=1)
Buffers: shared hit=181548 read=68207 dirtied=4 written=24
-> Index Scan using tag_r_agrio_total on r_agrio_total
(cost=0.42..6.49 rows=1 width=321) (actual time=0.785..206.947 rows=1
loops=1)
Index Cond: (tagid = 1002::numeric)
Filter: ((unitid = 1002::numeric) AND (placement = 2::numeric) AND
(device_type = 2::numeric))
Rows Removed by Filter: 7
Buffers: shared hit=31168 read=55623 written=24
Total runtime: 287.506 ms

Third, I think something odd is happening with the update itself. I'm
pretty sure that the index scan itself is visiting the heap pages, so each
page should be in shared buffers by the time each tuple hits the update
node. That makes me wonder what on earth is taking 60ms to update the
tuple. I suspect it's going into either finding a free buffer to put the
new tuple on, or waiting to try and extend the relation. Selecting ctid
from the freshly updated rows and comparing the first number to the total
number of pages in the heap would show if the new tuples are all ending up
at the end of the heap.

before:
ctid
-----------
(25536,8)
(77292,7)
(77292,6)
(77292,5)
(77292,4)
(77292,3)
(77292,2)
(77292,1)

after:
ctid
-----------
(77314,2)
(77314,1)
(77313,7)
(77313,6)
(77313,5)
(77313,4)
(77313,8)
(77312,9)

But, I'm sorry, I don't know how to determine "total number of pages in the
heap"...

I wonder if what I need to do, considering that I update a lot of "the
same" rows as I process this queue, is to create a temp table, update the
rows there, and then update the actual tables once at the end...

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Pawel Veselov (#7)

Re: Help with slow table update

On 4/14/15 1:28 PM, Pawel Veselov wrote:

I wonder if what I need to do, considering that I update a lot of "the
same" rows as I process this queue, is to create a temp table, update
the rows there, and then update the actual tables once at the end...

That's what I'd do.

The other option would be to use a constraint trigger paired with a
per-row trigger on the hourly table to drive the daily table, and on the
daily table to drive the total table. The way that would work is the
per-row table would simply keep track of all the unique records that
were changed in a statement (presumably by putting them in a temp
table). Once the statement is "done", the constraint trigger would fire;
it would summarize all the changed data and do a much smaller number of
updates to the table being summarized into.

BTW, you also made a comment about not having to hit the table if you
look at something in an index. You can only do that if all the data you
need is in the index, AND the page with the record is marked as being
all-visible (google for Postgres Visibility Map). If that's not the case
then you still have to pull the row in the table in, in order to
determine visibility. The only case where you can still avoid hitting
the table is something like a NOT EXISTS; if you can't find any entries
in the index for something then they definitely won't be in the table.
But remember that if you update or delete a row, removing it from an
index, the data will stay in that index until vacuum comes along.

Also, there's no point in doing a REINDEX after a VACUUM FULL; vacuum
full rebuilds all the indexes for you.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Pawel Veselov

pawel.veselov@gmail.com

almost 11 years ago

In reply to: Jim Nasby (#8)

Re: Help with slow table update

On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 4/14/15 1:28 PM, Pawel Veselov wrote:

I wonder if what I need to do, considering that I update a lot of "the
same" rows as I process this queue, is to create a temp table, update
the rows there, and then update the actual tables once at the end...

That's what I'd do.

Well, in short, I changed (repeat the body of loop for how many tables are
there)

LOOP (item)
UPDATE table with item
IF not found INSERT item INTO table; END IF;
END LOOP;

to:

CREATE TEMP TABLE xq_table (like table) on commit drop;
LOOP (item)
LOOP
UPDATE xq_table with item;
exit when found;
INSERT INTO xq_table select * from table for update;
continue when found;
INSERT item INTO xq_table;
exit;
END LOOP;
END LOOP;
UPDATE table a set (rows) = (xq.rows)
FROM xq_table xq
WHERE (a.keys) = (xq.keys)

That works significantly faster. The final update statement is very fast.
The process is somewhat slow in the beginning as it sucks in records from
"total" into "xq_total", but once all of that is moved into the temp table,
it rushes through the rest.

The other option would be to use a constraint trigger paired with a
per-row trigger on the hourly table to drive the daily table, and on the
daily table to drive the total table. The way that would work is the
per-row table would simply keep track of all the unique records that were
changed in a statement (presumably by putting them in a temp table). Once
the statement is "done", the constraint trigger would fire; it would
summarize all the changed data and do a much smaller number of updates to
the table being summarized into.

I'm not sure how I would be able to avoid the same number of changes on the
total table, trigger would fire on each update, won't it? So, same problem
with a lot of changes on a table...

BTW, you also made a comment about not having to hit the table if you look
at something in an index. You can only do that if all the data you need is
in the index, AND the page with the record is marked as being all-visible
(google for Postgres Visibility Map). If that's not the case then you still
have to pull the row in the table in, in order to determine visibility. The
only case where you can still avoid hitting the table is something like a
NOT EXISTS; if you can't find any entries in the index for something then
they definitely won't be in the table.

What I was saying is that if a table has a unique index, and there is
cached fact that a particular index value points to a particular row, there
shouldn't be a need to re-scan the index again to search for any more
matching values (which would be necessary if the index was not unique).
Again, all considering the size of the index, the amount of different index
values that are being queried, etc.

But remember that if you update or delete a row, removing it from an
index, the data will stay in that index until vacuum comes along.

Also, there's no point in doing a REINDEX after a VACUUM FULL; vacuum full
rebuilds all the indexes for you.

I was being desperate :)

I still think there is something very wrong with this particular table.
First, I have production systems that employ this function on way larger
data set, and there is no problem (so far, but still). This machine is part
of a test deployment, there is no constant load, the only data that is
being written now is when I do these tests. Vacuuming should prune all that
dead stuff, and if it's absent, it's unclear where is the time spent
navigating/updating the table with 24 rows :)

#10

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Pawel Veselov (#9)

Re: Help with slow table update

On 4/14/15 4:44 PM, Pawel Veselov wrote:

On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <Jim.Nasby@bluetreble.com
<mailto:Jim.Nasby@bluetreble.com>> wrote:

On 4/14/15 1:28 PM, Pawel Veselov wrote:

I wonder if what I need to do, considering that I update a lot
of "the
same" rows as I process this queue, is to create a temp table,
update
the rows there, and then update the actual tables once at the end...

That's what I'd do.

Well, in short, I changed (repeat the body of loop for how many tables
are there)

LOOP (item)
UPDATE table with item
IF not found INSERT item INTO table; END IF;
END LOOP;

to:

CREATE TEMP TABLE xq_table (like table) on commit drop;
LOOP (item)
LOOP
UPDATE xq_table with item;
exit when found;
INSERT INTO xq_table select * from table for update;
continue when found;
INSERT item INTO xq_table;
exit;
END LOOP;
END LOOP;
UPDATE table a set (rows) = (xq.rows)
FROM xq_table xq
WHERE (a.keys) = (xq.keys)

That works significantly faster. The final update statement is very
fast. The process is somewhat slow in the beginning as it sucks in
records from "total" into "xq_total", but once all of that is moved into
the temp table, it rushes through the rest.

Databases like to think in sets. It will generally be more efficient to
do set operations instead of a bunch of row-by-row stuff.

Since you're pulling all of this from some other table your best bet is
probably something like:

CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *;

CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY;
UPDATE ar_hourly SET ... FROM hourly_v JOIN ...;
INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...;

-- Same thing for daily
-- Same thing for total

The other option would be to use a constraint trigger paired with a
per-row trigger on the hourly table to drive the daily table, and on
the daily table to drive the total table. The way that would work is
the per-row table would simply keep track of all the unique records
that were changed in a statement (presumably by putting them in a
temp table). Once the statement is "done", the constraint trigger
would fire; it would summarize all the changed data and do a much
smaller number of updates to the table being summarized into.

I'm not sure how I would be able to avoid the same number of changes on
the total table, trigger would fire on each update, won't it? So, same
problem with a lot of changes on a table...

The difference is that you'd be doing plain INSERTs into a temp table
and then summarizing that. That's going to be a LOT more efficient than
a slew of updates on an existing table.

BTW, you also made a comment about not having to hit the table if
you look at something in an index. You can only do that if all the
data you need is in the index, AND the page with the record is
marked as being all-visible (google for Postgres Visibility Map). If
that's not the case then you still have to pull the row in the table
in, in order to determine visibility. The only case where you can
still avoid hitting the table is something like a NOT EXISTS; if you
can't find any entries in the index for something then they
definitely won't be in the table.

What I was saying is that if a table has a unique index, and there is
cached fact that a particular index value points to a particular row,
there shouldn't be a need to re-scan the index again to search for any
more matching values (which would be necessary if the index was not
unique). Again, all considering the size of the index, the amount of
different index values that are being queried, etc.

It still has to rescan because of visibility concerns.

But remember that if you update or delete a row, removing it from an
index, the data will stay in that index until vacuum comes along.

Also, there's no point in doing a REINDEX after a VACUUM FULL;
vacuum full rebuilds all the indexes for you.

I was being desperate :)

I still think there is something very wrong with this particular table.
First, I have production systems that employ this function on way larger
data set, and there is no problem (so far, but still). This machine is
part of a test deployment, there is no constant load, the only data that
is being written now is when I do these tests. Vacuuming should prune
all that dead stuff, and if it's absent, it's unclear where is the time
spent navigating/updating the table with 24 rows :)

I think you definitely have a problem with dead rows, as evidenced by
the huge improvement VACUUM FULL made.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11

Pawel Veselov

pawel.veselov@gmail.com

almost 11 years ago

In reply to: Jim Nasby (#10)

Re: Help with slow table update

On Tue, Apr 14, 2015 at 3:29 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 4/14/15 4:44 PM, Pawel Veselov wrote:

On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <Jim.Nasby@bluetreble.com
<mailto:Jim.Nasby@bluetreble.com>> wrote:

On 4/14/15 1:28 PM, Pawel Veselov wrote:

I wonder if what I need to do, considering that I update a lot
of "the
same" rows as I process this queue, is to create a temp table,
update
the rows there, and then update the actual tables once at the
end...

That's what I'd do.

Well, in short, I changed (repeat the body of loop for how many tables
are there)

LOOP (item)
UPDATE table with item
IF not found INSERT item INTO table; END IF;
END LOOP;

to:

CREATE TEMP TABLE xq_table (like table) on commit drop;
LOOP (item)
LOOP
UPDATE xq_table with item;
exit when found;
INSERT INTO xq_table select * from table for update;
continue when found;
INSERT item INTO xq_table;
exit;
END LOOP;
END LOOP;
UPDATE table a set (rows) = (xq.rows)
FROM xq_table xq
WHERE (a.keys) = (xq.keys)

That works significantly faster. The final update statement is very
fast. The process is somewhat slow in the beginning as it sucks in
records from "total" into "xq_total", but once all of that is moved into
the temp table, it rushes through the rest.

Databases like to think in sets. It will generally be more efficient to do
set operations instead of a bunch of row-by-row stuff.

Since you're pulling all of this from some other table your best bet is
probably something like:

CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *;

CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY;
UPDATE ar_hourly SET ... FROM hourly_v JOIN ...;
INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...;

-- Same thing for daily
-- Same thing for total

In my previous post, there was a problem with that pseudo-code, as it's
missing inserts into the final table at the end of loop, for those records
that need to be inserted and not updated.

This is where using sets becomes really tedious, as Postgres severely lacks
an upsert-like statement.
I don't think there are joins allowed in UPDATE statement, so I will need
to use WITH query, right?
Also, I'm not sure how LEFT JOIN will help me isolate and insert "missed"
entries...

Would it be OK to replace upsert part with merging into a temp table, then
deleting and inserting from temp table? Is there any penalty for
insert/delete comparing to update?

[skipped]

But remember that if you update or delete a row, removing it from an

index, the data will stay in that index until vacuum comes along.

Also, there's no point in doing a REINDEX after a VACUUM FULL;
vacuum full rebuilds all the indexes for you.

I was being desperate :)

I still think there is something very wrong with this particular table.
First, I have production systems that employ this function on way larger
data set, and there is no problem (so far, but still). This machine is
part of a test deployment, there is no constant load, the only data that
is being written now is when I do these tests. Vacuuming should prune
all that dead stuff, and if it's absent, it's unclear where is the time
spent navigating/updating the table with 24 rows :)

I think you definitely have a problem with dead rows, as evidenced by the
huge improvement VACUUM FULL made.

But it's not clear why (and not reasonable, IMHO, that) it wouldn't improve
past current point.

#12

Igor Neyman

ineyman@perceptron.com

almost 11 years ago

In reply to: Pawel Veselov (#11)

Re: Help with slow table update

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Pawel Veselov
Sent: Tuesday, April 14, 2015 8:01 PM
To: Jim Nasby
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Help with slow table update

[skipped]

This is where using sets becomes really tedious, as Postgres severely lacks an upsert-like statement.
I don't think there are joins allowed in UPDATE statement, so I will need to use WITH query, right?
Also, I'm not sure how LEFT JOIN will help me isolate and insert "missed" entries...

Would it be OK to replace upsert part with merging into a temp table, then deleting and inserting from temp table? Is there any penalty for insert/delete comparing to update?

[skipped]

Yes, you can do UPDATE with joins (http://www.postgresql.org/docs/9.4/static/sql-update.html) like this:

UPDATE table1 A SET col1 = B.col2
FROM table2 B
WHERE A.col3 = B.col4;

Regards,
Igor Neyman

#13

Pawel Veselov

pawel.veselov@gmail.com

almost 11 years ago

In reply to: Igor Neyman (#12)

Re: Help with slow table update

[skipped]

This is where using sets becomes really tedious, as Postgres severely
lacks an upsert-like statement.

I don't think there are joins allowed in UPDATE statement, so I will need
to use WITH query, right?

Also, I'm not sure how LEFT JOIN will help me isolate and insert "missed"
entries...

[skipped]

Yes, you can do UPDATE with joins (
http://www.postgresql.org/docs/9.4/static/sql-update.html) like this:

UPDATE table1 A SET col1 = B.col2

FROM table2 B

WHERE A.col3 = B.col4;

I meant using JOIN operator in the update. But it's still possible, though
through WITH query.

#14

Pawel Veselov

pawel.veselov@gmail.com

almost 11 years ago

In reply to: Pawel Veselov (#11)

Re: Help with slow table update

[skipped]

But remember that if you update or delete a row, removing it from an

index, the data will stay in that index until vacuum comes along.

Also, there's no point in doing a REINDEX after a VACUUM FULL;
vacuum full rebuilds all the indexes for you.

I was being desperate :)

I still think there is something very wrong with this particular table.
First, I have production systems that employ this function on way larger
data set, and there is no problem (so far, but still). This machine is
part of a test deployment, there is no constant load, the only data that
is being written now is when I do these tests. Vacuuming should prune
all that dead stuff, and if it's absent, it's unclear where is the time
spent navigating/updating the table with 24 rows :)

I think you definitely have a problem with dead rows, as evidenced by the
huge improvement VACUUM FULL made.

But it's not clear why (and not reasonable, IMHO, that) it wouldn't
improve past current point.

What I should've done is 'VACUUM FULL VERBOSE'. Once I did, it told me
there were 800k dead rows that can't be removed. After digging around I
found some dangling prepared transactions, going back months. Once I threw
those away, and re-vacuumed, things got back to normal.

Thanks for all your help and advice.

#15

Tim Uckun

timuckun@gmail.com

almost 11 years ago

In reply to: Pawel Veselov (#14)

Re: Help with slow table update

On Sat, Apr 18, 2015 at 10:24 AM, Pawel Veselov <pawel.veselov@gmail.com>
wrote:

I found some dangling prepared transactions

How do you find and remove these?

#16

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Tim Uckun (#15)

Re: Help with slow table update

On 4/19/15 9:53 PM, Tim Uckun wrote:

On Sat, Apr 18, 2015 at 10:24 AM, Pawel Veselov <pawel.veselov@gmail.com
<mailto:pawel.veselov@gmail.com>> wrote:

I found some dangling prepared transactions

How do you find and remove these?

SELECT * FROM pg_prepared_xacts;
ROLLBACK PREPARED xid;
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general