query locks up when run concurrently

Started by azhwkdover 9 years ago19 messagesgeneral
Jump to latest
#1azhwkd
azhwkd@gmail.com

Greetings!

I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
installation which was installed through apt-get.

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

The query in question is this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour")
(select
a."group",
a.id,
b.sub_category,
to_timestamp($2)::date as "date",
max(a.aa / a.quantity) as aa,
min(a.aa / a.quantity) as ab,
max(a.bb / a.quantity) as bb,
min(a.bb/ a.quantity) as ba,
sum(a.quantity) as quantity,
extract('hour' from to_timestamp($2)) as "hour"
from tbla a
join tblb b on a.id = b.id
where a."group" = $1 and b."group" = $1
group by a."group", a.id, b.sub_category
);

When I'm running a perf on the system it looks like this while running
the query 10 times:

Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
Overhead Shared Object Symbol
17.95% postgres [.] heap_hot_search_buffer
5.64% postgres [.] heap_page_prune_opt
4.62% postgres [.] hash_search_with_hash_value
3.80% postgres [.] LWLockRelease
3.73% postgres [.] 0x00000000002f420d
2.50% postgres [.] _bt_checkkeys
2.48% postgres [.] hash_any
2.45% postgres [.] 0x00000000002f41e7
2.10% postgres [.] slot_getattr
1.80% postgres [.] ResourceOwnerForgetBuffer
1.58% postgres [.] LWLockAcquire
1.58% postgres [.] ReadBufferExtended
1.54% postgres [.] index_fetch_heap
1.47% postgres [.] MemoryContextReset
1.43% postgres [.] btgettuple
1.38% postgres [.] 0x00000000002d710c
1.36% postgres [.] 0x00000000002d70a5
1.35% postgres [.] ExecQual

Explain (Analyze, Verbose) Output

QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------
Insert on public.group_history (cost=10254.36..10315.16 rows=2432
width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
-> Subquery Scan on "*SELECT*" (cost=10254.36..10315.16 rows=2432
width=62) (actual time=353.880..376.490 rows=6139 loops=1)
Output: "*SELECT*"."group", "*SELECT*".id,
"*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
"*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
"*SELECT*"."hour"
-> HashAggregate (cost=10254.36..10278.68 rows=2432 width=28)
(actual time=353.871..367.144 rows=6139 loops=1)
Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
precision
Group Key: a."group", a.id, b.sub_category
-> Hash Join (cost=5558.64..10181.40 rows=2432 width=28)
(actual time=193.949..294.106 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
Hash Cond: (b.id = a.id)
-> Bitmap Heap Scan on public.auctions_extra b
(cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
rows=30343 loops=1)
Output: b.sub_category, b.id
Recheck Cond: (b."group" = 7)
Heap Blocks: exact=289
-> Bitmap Index Scan on auction_extra_pk
(cost=0.00..677.55 rows=30550 width=0) (actual time=55.966..55.966
rows=30343 loops=1)
Index Cond: (b."group" = 7)
-> Hash (cost=4280.62..4280.62 rows=30627 width=28)
(actual time=137.160..137.160 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, a.id
Buckets: 16384 Batches: 4 Memory Usage: 638kB
-> Bitmap Heap Scan on public.tbla a
(cost=689.78..4280.62 rows=30627 width=28) (actual
time=58.530..117.064 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity,
a.bb, a.id
Recheck Cond: (a."group" = 7)
Heap Blocks: exact=254
-> Bitmap Index Scan on tbla_pk
(cost=0.00..682.12 rows=30627 width=0) (actual time=57.801..57.801
rows=30343 loops=1)
Index Cond: (a."group" = 7)
Planning time: 0.475 ms
Trigger group_history_trigger: time=1442.561 calls=6139
Execution time: 1834.119 ms

group_history_trigger:

CREATE OR REPLACE FUNCTION public.group_history_partition_function()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
declare
_new_date timestamptz;
_tablename text;
_startdate text;
begin
-- Takes the current inbound "when" value and determines when
midnight is for the given date
_new_date := date_trunc('day', new."when");
_startdate := to_char(_new_date, 'YYYY_MM_DD');
_tablename := 'group_history_'||_startdate;

-- Insert the current record into the correct partition
execute 'INSERT INTO public.' || quote_ident(_tablename) || ' VALUES ($1.*)
on conflict ("group", id, sub_category, "when", "hour") do
update set aa = excluded.aa,
ab = excluded.ab,
bb = excluded.bb,
ba = excluded.ba,
quantity = excluded.quantity' using new;
return null;
end
$function$

Has anyone experienced this before or even has a solution for this?
I'd be very grateful.

Kind regards
Sebastian

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: azhwkd (#1)
Re: query locks up when run concurrently

On 11/22/2016 11:41 PM, azhwkd@gmail.com wrote:

Greetings!

I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
installation which was installed through apt-get.

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

Have you looked in the Postgres logs to see if there are any relevant
messages?

Best guess is that running in parallel is creating a lock problem where
one of the queries is holding a lock that is keeping the others from
completing.

What relationships do the group_history and group_history_* tables have
to other tables?

What is the definition of the trigger that fires the trigger function below?

The query in question is this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour")
(select
a."group",
a.id,
b.sub_category,
to_timestamp($2)::date as "date",
max(a.aa / a.quantity) as aa,
min(a.aa / a.quantity) as ab,
max(a.bb / a.quantity) as bb,
min(a.bb/ a.quantity) as ba,
sum(a.quantity) as quantity,
extract('hour' from to_timestamp($2)) as "hour"
from tbla a
join tblb b on a.id = b.id
where a."group" = $1 and b."group" = $1
group by a."group", a.id, b.sub_category
);

When I'm running a perf on the system it looks like this while running
the query 10 times:

Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
Overhead Shared Object Symbol
17.95% postgres [.] heap_hot_search_buffer
5.64% postgres [.] heap_page_prune_opt
4.62% postgres [.] hash_search_with_hash_value
3.80% postgres [.] LWLockRelease
3.73% postgres [.] 0x00000000002f420d
2.50% postgres [.] _bt_checkkeys
2.48% postgres [.] hash_any
2.45% postgres [.] 0x00000000002f41e7
2.10% postgres [.] slot_getattr
1.80% postgres [.] ResourceOwnerForgetBuffer
1.58% postgres [.] LWLockAcquire
1.58% postgres [.] ReadBufferExtended
1.54% postgres [.] index_fetch_heap
1.47% postgres [.] MemoryContextReset
1.43% postgres [.] btgettuple
1.38% postgres [.] 0x00000000002d710c
1.36% postgres [.] 0x00000000002d70a5
1.35% postgres [.] ExecQual

Explain (Analyze, Verbose) Output

QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------
Insert on public.group_history (cost=10254.36..10315.16 rows=2432
width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
-> Subquery Scan on "*SELECT*" (cost=10254.36..10315.16 rows=2432
width=62) (actual time=353.880..376.490 rows=6139 loops=1)
Output: "*SELECT*"."group", "*SELECT*".id,
"*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
"*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
"*SELECT*"."hour"
-> HashAggregate (cost=10254.36..10278.68 rows=2432 width=28)
(actual time=353.871..367.144 rows=6139 loops=1)
Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
precision
Group Key: a."group", a.id, b.sub_category
-> Hash Join (cost=5558.64..10181.40 rows=2432 width=28)
(actual time=193.949..294.106 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
Hash Cond: (b.id = a.id)
-> Bitmap Heap Scan on public.auctions_extra b
(cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
rows=30343 loops=1)
Output: b.sub_category, b.id
Recheck Cond: (b."group" = 7)
Heap Blocks: exact=289
-> Bitmap Index Scan on auction_extra_pk
(cost=0.00..677.55 rows=30550 width=0) (actual time=55.966..55.966
rows=30343 loops=1)
Index Cond: (b."group" = 7)
-> Hash (cost=4280.62..4280.62 rows=30627 width=28)
(actual time=137.160..137.160 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, a.id
Buckets: 16384 Batches: 4 Memory Usage: 638kB
-> Bitmap Heap Scan on public.tbla a
(cost=689.78..4280.62 rows=30627 width=28) (actual
time=58.530..117.064 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity,
a.bb, a.id
Recheck Cond: (a."group" = 7)
Heap Blocks: exact=254
-> Bitmap Index Scan on tbla_pk
(cost=0.00..682.12 rows=30627 width=0) (actual time=57.801..57.801
rows=30343 loops=1)
Index Cond: (a."group" = 7)
Planning time: 0.475 ms
Trigger group_history_trigger: time=1442.561 calls=6139
Execution time: 1834.119 ms

group_history_trigger:

CREATE OR REPLACE FUNCTION public.group_history_partition_function()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
declare
_new_date timestamptz;
_tablename text;
_startdate text;
begin
-- Takes the current inbound "when" value and determines when
midnight is for the given date
_new_date := date_trunc('day', new."when");
_startdate := to_char(_new_date, 'YYYY_MM_DD');
_tablename := 'group_history_'||_startdate;

-- Insert the current record into the correct partition
execute 'INSERT INTO public.' || quote_ident(_tablename) || ' VALUES ($1.*)
on conflict ("group", id, sub_category, "when", "hour") do
update set aa = excluded.aa,
ab = excluded.ab,
bb = excluded.bb,
ba = excluded.ba,
quantity = excluded.quantity' using new;
return null;
end
$function$

Has anyone experienced this before or even has a solution for this?
I'd be very grateful.

Kind regards
Sebastian

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: azhwkd (#1)
Re: query locks up when run concurrently

azhwkd@gmail.com writes:

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be fighting
over inserting/updating the same row in the group_history partition
tables. Or are you certain that they should be hitting different rows?

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4azhwkd
azhwkd@gmail.com
In reply to: Adrian Klaver (#2)
Re: query locks up when run concurrently

Greetings!

Yes I had a look at the logfiles but there is not a single logfile
generated when I'm reproducing this issue.
Concerning locks I used there queries at
https://wiki.postgresql.org/wiki/Lock_Monitoring and they came up empty.
The group_history table and its sub-tables do not have any foreign keys as
they only contain calculated data and the source data is removed after some
time.
The trigger which calls the partition function below is defined as follows:

create
trigger group_history_trigger before insert
on
item_history for each row execute procedure
group_history_partition_function()

Kind regards,
Sebastian

Adrian Klaver <adrian.klaver@aklaver.com> schrieb am Mi., 23. Nov. 2016 um
16:25 Uhr:

Show quoted text

On 11/22/2016 11:41 PM, azhwkd@gmail.com wrote:

Greetings!

I'm using postgres version 9.5.5 on a ubuntu 16.04.1 server
installation which was installed through apt-get.

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

Have you looked in the Postgres logs to see if there are any relevant
messages?

Best guess is that running in parallel is creating a lock problem where
one of the queries is holding a lock that is keeping the others from
completing.

What relationships do the group_history and group_history_* tables have
to other tables?

What is the definition of the trigger that fires the trigger function
below?

The query in question is this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour")
(select
a."group",
a.id,
b.sub_category,
to_timestamp($2)::date as "date",
max(a.aa / a.quantity) as aa,
min(a.aa / a.quantity) as ab,
max(a.bb / a.quantity) as bb,
min(a.bb/ a.quantity) as ba,
sum(a.quantity) as quantity,
extract('hour' from to_timestamp($2)) as "hour"
from tbla a
join tblb b on a.id = b.id
where a."group" = $1 and b."group" = $1
group by a."group", a.id, b.sub_category
);

When I'm running a perf on the system it looks like this while running
the query 10 times:

Samples: 4M of event 'cpu-clock', Event count (approx.): 18972107951
Overhead Shared Object Symbol
17.95% postgres [.] heap_hot_search_buffer
5.64% postgres [.] heap_page_prune_opt
4.62% postgres [.] hash_search_with_hash_value
3.80% postgres [.] LWLockRelease
3.73% postgres [.] 0x00000000002f420d
2.50% postgres [.] _bt_checkkeys
2.48% postgres [.] hash_any
2.45% postgres [.] 0x00000000002f41e7
2.10% postgres [.] slot_getattr
1.80% postgres [.] ResourceOwnerForgetBuffer
1.58% postgres [.] LWLockAcquire
1.58% postgres [.] ReadBufferExtended
1.54% postgres [.] index_fetch_heap
1.47% postgres [.] MemoryContextReset
1.43% postgres [.] btgettuple
1.38% postgres [.] 0x00000000002d710c
1.36% postgres [.] 0x00000000002d70a5
1.35% postgres [.] ExecQual

Explain (Analyze, Verbose) Output

QUERY PLAN

----------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------------------------------------

Insert on public.group_history (cost=10254.36..10315.16 rows=2432
width=62) (actual time=1833.967..1833.967 rows=0 loops=1)
-> Subquery Scan on "*SELECT*" (cost=10254.36..10315.16 rows=2432
width=62) (actual time=353.880..376.490 rows=6139 loops=1)
Output: "*SELECT*"."group", "*SELECT*".id,
"*SELECT*".sub_category, "*SELECT*"."when", "*SELECT*".aa,
"*SELECT*".ab, "*SELECT*".bb, "*SELECT*".ba, "*SELECT*".quantity,
"*SELECT*"."hour"
-> HashAggregate (cost=10254.36..10278.68 rows=2432 width=28)
(actual time=353.871..367.144 rows=6139 loops=1)
Output: a."group", a.id, b.sub_category, '2016-11-20'::date,
max((a.aa / a.quantity)), min((a.aa / a.quantity)), max((a.bb /
a.quantity)), min((a.bb / a.quantity)), sum(a.quantity), '21'::double
precision
Group Key: a."group", a.id, b.sub_category
-> Hash Join (cost=5558.64..10181.40 rows=2432 width=28)
(actual time=193.949..294.106 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, b.sub_category
Hash Cond: (b.id = a.id)
-> Bitmap Heap Scan on public.auctions_extra b
(cost=685.19..4719.06 rows=30550 width=8) (actual time=56.678..111.038
rows=30343 loops=1)
Output: b.sub_category, b.id
Recheck Cond: (b."group" = 7)
Heap Blocks: exact=289
-> Bitmap Index Scan on auction_extra_pk
(cost=0.00..677.55 rows=30550 width=0) (actual time=55.966..55.966
rows=30343 loops=1)
Index Cond: (b."group" = 7)
-> Hash (cost=4280.62..4280.62 rows=30627 width=28)
(actual time=137.160..137.160 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity, a.bb, a.id
Buckets: 16384 Batches: 4 Memory Usage: 638kB
-> Bitmap Heap Scan on public.tbla a
(cost=689.78..4280.62 rows=30627 width=28) (actual
time=58.530..117.064 rows=30343 loops=1)
Output: a."group", a.id, a.aa, a.quantity,
a.bb, a.id
Recheck Cond: (a."group" = 7)
Heap Blocks: exact=254
-> Bitmap Index Scan on tbla_pk
(cost=0.00..682.12 rows=30627 width=0) (actual time=57.801..57.801
rows=30343 loops=1)
Index Cond: (a."group" = 7)
Planning time: 0.475 ms
Trigger group_history_trigger: time=1442.561 calls=6139
Execution time: 1834.119 ms

group_history_trigger:

CREATE OR REPLACE FUNCTION public.group_history_partition_function()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
declare
_new_date timestamptz;
_tablename text;
_startdate text;
begin
-- Takes the current inbound "when" value and determines when
midnight is for the given date
_new_date := date_trunc('day', new."when");
_startdate := to_char(_new_date, 'YYYY_MM_DD');
_tablename := 'group_history_'||_startdate;

-- Insert the current record into the correct partition
execute 'INSERT INTO public.' || quote_ident(_tablename) || ' VALUES

($1.*)

on conflict ("group", id, sub_category, "when", "hour") do
update set aa = excluded.aa,
ab = excluded.ab,
bb = excluded.bb,
ba = excluded.ba,
quantity = excluded.quantity' using new;
return null;
end
$function$

Has anyone experienced this before or even has a solution for this?
I'd be very grateful.

Kind regards
Sebastian

--
Adrian Klaver
adrian.klaver@aklaver.com

#5azhwkd
azhwkd@gmail.com
In reply to: Tom Lane (#3)
Re: query locks up when run concurrently

Greetings!

The parallel calls should not be working on the same row. Each query
services a different group ID on it's own and there is no overlap.

Kind regards,
Sebastian

Tom Lane <tgl@sss.pgh.pa.us> schrieb am Mi., 23. Nov. 2016 um 17:47 Uhr:

Show quoted text

azhwkd@gmail.com writes:

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be fighting
over inserting/updating the same row in the group_history partition
tables. Or are you certain that they should be hitting different rows?

regards, tom lane

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: azhwkd (#4)
Re: query locks up when run concurrently

On 11/23/2016 01:47 PM, azhwkd wrote:

Greetings!

Yes I had a look at the logfiles but there is not a single logfile
generated when I'm reproducing this issue.

We are talking about the Postgres logfile, correct?

I have to believe a log file exists, but maybe not entries at that time.

What are you logging settings?

Concerning locks I used there queries
at https://wiki.postgresql.org/wiki/Lock_Monitoring and they came up empty.
The group_history table and its sub-tables do not have any foreign keys
as they only contain calculated data and the source data is removed
after some time.
The trigger which calls the partition function below is defined as follows:

create
trigger group_history_trigger before insert
on
item_history for each row execute procedure
group_history_partition_function()

This trigger is on the table item_history not group history, is that
correct?

If so how does the item_history table fit into this?

Kind regards,
Sebastian

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: azhwkd (#5)
Re: query locks up when run concurrently

On 11/23/2016 01:52 PM, azhwkd wrote:

Greetings!

The parallel calls should not be working on the same row. Each query
services a different group ID on it's own and there is no overlap.

Except the INSERT query in the trigger function is working on dates not
group ids.

Kind regards,
Sebastian

Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> schrieb am Mi.,
23. Nov. 2016 um 17:47 Uhr:

azhwkd@gmail.com <mailto:azhwkd@gmail.com> writes:

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries from
the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be fighting
over inserting/updating the same row in the group_history partition
tables. Or are you certain that they should be hitting different rows?

regards, tom lane

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8azhwkd
azhwkd@gmail.com
In reply to: Adrian Klaver (#6)
Re: query locks up when run concurrently

I'm sorry. I worded this quite poorly. I meant to say that there were
no log lines added to the postgres logfile at the time.
I hope these are the settings you were refferring to (I did not change
anything in the config files vs. the default)

log_destination = 'stderr'
logging_collector = off
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
log_truncate_on_rotation = off
log_rotation_age = 1d
log_rotation_size = 10MB
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
event_source = 'PostgreSQL'
client_min_messages = notice
log_min_messages = warning
log_min_error_statement = error
log_min_duration_statement = -1
debug_print_parse = off
debug_print_rewritten = off
debug_print_plan = off
debug_pretty_print = on
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_error_verbosity = default
log_hostname = off
log_line_prefix = '%t '
log_lock_waits = off
log_statement = 'none'
log_temp_files = -1
log_timezone = 'UTC'

I'm sorry, it seems like I copied the trigger definition from the
wrong table (item_history is a test table I created yesterday while
trying things which is exactly the same as group_history just a
different name).
The trigger on the group_history table is exactly the same though,
except for the table name.

Kind regards,
Sebastian

2016-11-24 0:26 GMT+01:00 Adrian Klaver <adrian.klaver@aklaver.com>:

On 11/23/2016 01:47 PM, azhwkd wrote:

Greetings!

Yes I had a look at the logfiles but there is not a single logfile
generated when I'm reproducing this issue.

We are talking about the Postgres logfile, correct?

I have to believe a log file exists, but maybe not entries at that time.

What are you logging settings?

Concerning locks I used there queries
at https://wiki.postgresql.org/wiki/Lock_Monitoring and they came up
empty.
The group_history table and its sub-tables do not have any foreign keys
as they only contain calculated data and the source data is removed
after some time.
The trigger which calls the partition function below is defined as
follows:

create
trigger group_history_trigger before insert
on
item_history for each row execute procedure
group_history_partition_function()

This trigger is on the table item_history not group history, is that
correct?

If so how does the item_history table fit into this?

Kind regards,
Sebastian

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9azhwkd
azhwkd@gmail.com
In reply to: Adrian Klaver (#7)
Re: query locks up when run concurrently

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?

The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Kind regards,
Sebastian

2016-11-24 0:28 GMT+01:00 Adrian Klaver <adrian.klaver@aklaver.com>:

On 11/23/2016 01:52 PM, azhwkd wrote:

Greetings!

The parallel calls should not be working on the same row. Each query
services a different group ID on it's own and there is no overlap.

Except the INSERT query in the trigger function is working on dates not
group ids.

Kind regards,
Sebastian

Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> schrieb am Mi.,
23. Nov. 2016 um 17:47 Uhr:

azhwkd@gmail.com <mailto:azhwkd@gmail.com> writes:

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries

from

the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be
fighting
over inserting/updating the same row in the group_history partition
tables. Or are you certain that they should be hitting different
rows?

regards, tom lane

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Adrian Klaver
adrian.klaver@aklaver.com
In reply to: azhwkd (#8)
Re: query locks up when run concurrently

On 11/23/2016 10:26 PM, azhwkd wrote:

I'm sorry. I worded this quite poorly. I meant to say that there were
no log lines added to the postgres logfile at the time.
I hope these are the settings you were refferring to (I did not change
anything in the config files vs. the default)

log_destination = 'stderr'
logging_collector = off
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_file_mode = 0600
log_truncate_on_rotation = off
log_rotation_age = 1d
log_rotation_size = 10MB
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
event_source = 'PostgreSQL'
client_min_messages = notice
log_min_messages = warning
log_min_error_statement = error
log_min_duration_statement = -1
debug_print_parse = off
debug_print_rewritten = off
debug_print_plan = off
debug_pretty_print = on
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_error_verbosity = default
log_hostname = off
log_line_prefix = '%t '
log_lock_waits = off
log_statement = 'none'
log_temp_files = -1
log_timezone = 'UTC'

While troubleshooting this I would enable log_connections and
log_disconnections and change log_statement to 'mod'. This will help you
see when and what is happening when you do the concurrent queries. FYI,
you need reload the server to have the changes be seen.

I'm sorry, it seems like I copied the trigger definition from the
wrong table (item_history is a test table I created yesterday while
trying things which is exactly the same as group_history just a
different name).
The trigger on the group_history table is exactly the same though,
except for the table name.

Kind regards,
Sebastian

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Adrian Klaver
adrian.klaver@aklaver.com
In reply to: azhwkd (#9)
Re: query locks up when run concurrently

On 11/23/2016 10:41 PM, azhwkd wrote:

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?

The suspicion is that this:

insert into group_history ("group", id, sub_category, "date", aa, ab,
bb, ba, quantity, "hour") ....

is not the problem. That this(from the trigger function) is:

execute 'INSERT INTO public.' || quote_ident(_tablename) || ' VALUES ($1.*)
on conflict ("group", id, sub_category, "when", "hour") do

....

where _tablename is:

_tablename := 'group_history_'||_startdate;

It would be nice to see the schema definitions for group_history and at
least one of the group_history_startdate tables.

The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Kind regards,
Sebastian

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Adrian Klaver
adrian.klaver@aklaver.com
In reply to: azhwkd (#9)
Re: query locks up when run concurrently

On 11/23/2016 10:41 PM, azhwkd wrote:

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?

The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Me slaps head

Windy night last night + lack of sleep + insufficient caffeine = Not
seeing the table definition before hitting Send on previous post.

Kind regards,
Sebastian

2016-11-24 0:28 GMT+01:00 Adrian Klaver <adrian.klaver@aklaver.com>:

On 11/23/2016 01:52 PM, azhwkd wrote:

Greetings!

The parallel calls should not be working on the same row. Each query
services a different group ID on it's own and there is no overlap.

Except the INSERT query in the trigger function is working on dates not
group ids.

Kind regards,
Sebastian

Tom Lane <tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>> schrieb am Mi.,
23. Nov. 2016 um 17:47 Uhr:

azhwkd@gmail.com <mailto:azhwkd@gmail.com> writes:

I have a query which if run alone usually completes in about 300ms.
When run in my application this query constantly locks up and bogs
down all connections of the connection pool (In the application this
query is run up to 10 times in parallel with different parameters).
What's really weird is that I can re-run one of the hung queries

from

the command line while it's hung and it will complete as expected
while the hung queries continue to use 100% CPU time.

Judging from the EXPLAIN timing, most of the work is in the trigger,
which leads me to wonder if the parallel calls are likely to be
fighting
over inserting/updating the same row in the group_history partition
tables. Or are you certain that they should be hitting different
rows?

regards, tom lane

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Adrian Klaver
adrian.klaver@aklaver.com
In reply to: azhwkd (#9)
Re: query locks up when run concurrently

On 11/23/2016 10:41 PM, azhwkd wrote:

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?

After fresh coffee:

In your first post you had this:
"(In the application this
query is run up to 10 times in parallel with different parameters)"

where the parameters look to be a timestamp and a group id.

Is it possible that the combination of parameters over the 10 different
queries is not actually selecting rows in each result set that are
unique over (group,id,sub_category,"date","hour") between result sets?

The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY (group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Kind regards,
Sebastian

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14azhwkd
azhwkd@gmail.com
In reply to: Adrian Klaver (#13)
Re: query locks up when run concurrently

It should not be possible because a group does not return to the
update pool before the update hasn't finished.
I watched the queries in a postgres client and there was no overlap I could see.
I don't really know what to make from this behavior, sometimes when I
start the application a few updates go through and eventually it will
lock up completely and sometimes it locks up immediately - always with
heap_hot_search_buffer using ~20 of all CPU time on the system.

2016-11-24 19:14 GMT+01:00 Adrian Klaver <adrian.klaver@aklaver.com>:

On 11/23/2016 10:41 PM, azhwkd wrote:

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?

After fresh coffee:

In your first post you had this:
"(In the application this
query is run up to 10 times in parallel with different parameters)"

where the parameters look to be a timestamp and a group id.

Is it possible that the combination of parameters over the 10 different
queries is not actually selecting rows in each result set that are unique
over (group,id,sub_category,"date","hour") between result sets?

The table definition for group_history is the following:

CREATE TABLE public.group_history (
group int4 NOT NULL,
id int4 NOT NULL,
sub_category int4 NOT NULL,
"date" date NOT NULL,
aa int8 NOT NULL,
ab int8 NOT NULL,
bb int8 NOT NULL,
ba int8 NOT NULL,
quantity int8 NOT NULL,
"hour" int2 NOT NULL,
CONSTRAINT group_history_pk PRIMARY KEY
(group,id,sub_category,"date","hour")
)
WITH (
OIDS=FALSE
);

Kind regards,
Sebastian

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15Adrian Klaver
adrian.klaver@aklaver.com
In reply to: azhwkd (#14)
Re: query locks up when run concurrently

On 11/24/2016 01:23 PM, azhwkd wrote:

It should not be possible because a group does not return to the
update pool before the update hasn't finished.

So what is this 'update pool' and what is driving/using it?

In other words how is the determination of the parameters done?

To be more specific, the implication is that a group id can be reused so
what determines that?

I watched the queries in a postgres client and there was no overlap I could see.

Was this a visual inspection or did you dump the results of the various
query/parameter combinations into tables and do an SQL comparison?

I don't really know what to make from this behavior, sometimes when I
start the application a few updates go through and eventually it will
lock up completely and sometimes it locks up immediately - always with

Is there a common thread with regard to the parameters in use when
things lock up?

heap_hot_search_buffer using ~20 of all CPU time on the system.

2016-11-24 19:14 GMT+01:00 Adrian Klaver <adrian.klaver@aklaver.com>:

On 11/23/2016 10:41 PM, azhwkd wrote:

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16azhwkd
azhwkd@gmail.com
In reply to: Adrian Klaver (#15)
Re: query locks up when run concurrently

Adrian Klaver <adrian.klaver@aklaver.com> schrieb am Do., 24. Nov. 2016 um
22:34 Uhr:

On 11/24/2016 01:23 PM, azhwkd wrote:

It should not be possible because a group does not return to the
update pool before the update hasn't finished.

So what is this 'update pool' and what is driving/using it?

In other words how is the determination of the parameters done?

To be more specific, the implication is that a group id can be reused so
what determines that?

The application is written in go. Every group ID has its own go routine and
the routine is blocked until the SQL statement returns.
The update process starts with a check to an external API endpoint and if
there is new data available the routine is downloading it, parsing it and
inserting the data into 2 tables. Once that is done, the routine continues
to execute the statement in question using the data it inserted before for
the calculation. Only once this finishes will the routine start over again.

I watched the queries in a postgres client and there was no overlap I

could see.

Was this a visual inspection or did you dump the results of the various
query/parameter combinations into tables and do an SQL comparison?

I inspected it visually and also dumped all variables into a file directly
from the application.

I don't really know what to make from this behavior, sometimes when I
start the application a few updates go through and eventually it will
lock up completely and sometimes it locks up immediately - always with

Is there a common thread with regard to the parameters in use when
things lock up?

Do you mean if it always locks on the same parameters? If so then it does
not, sadly

Show quoted text

heap_hot_search_buffer using ~20 of all CPU time on the system.

2016-11-24 19:14 GMT+01:00 Adrian Klaver <adrian.klaver@aklaver.com>:

On 11/23/2016 10:41 PM, azhwkd wrote:

The group ID is part of the primary key of the group_history table. My
understanding is that two INSERTs with different group IDs should not
collide in this case, or am I wrong in thinking this?

--
Adrian Klaver
adrian.klaver@aklaver.com

#17rob stone
floriparob@gmail.com
In reply to: azhwkd (#14)
Re: query locks up when run concurrently

On Thu, 2016-11-24 at 22:23 +0100, azhwkd wrote:

It should not be possible because a group does not return to the
update pool before the update hasn't finished.
I watched the queries in a postgres client and there was no overlap I
could see.
I don't really know what to make from this behavior, sometimes when I
start the application a few updates go through and eventually it will
lock up completely and sometimes it locks up immediately - always
with
heap_hot_search_buffer using ~20 of all CPU time on the system.

Hello Sebastian,
You stated that the application is written using go (www.golang.org).
Are you able to run the application with the -race flag?

HTH,
Rob

From wikipedia:-

Lack of race condition safety

There are no restrictions on how goroutines access shared data, making
race conditions possible. Specifically, unless a program explicitly
synchronizes via channels or other means, writes from one goroutine
might be partly, entirely, or not at all visible to another, often with
no guarantees about ordering of writes. Furthermore, Go's internal data
structures like interface values, slice headers, hash tables, and
string headers are not immune to race conditions, so type and memory
safety can be violated in multithreaded programs that modify shared
instances of those types without synchronization.

Instead of language support, safe concurrent programming thus relies on
conventions; for example, Chisnall recommends an idiom called "aliases
xor mutable", meaning that passing a mutable value (or pointer) over a
channel signals a transfer of ownership over the value to its receiver.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18Adrian Klaver
adrian.klaver@aklaver.com
In reply to: azhwkd (#16)
Re: query locks up when run concurrently

On 11/24/2016 02:14 PM, azhwkd wrote:

Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> schrieb am Do., 24. Nov. 2016 um
22:34 Uhr:

On 11/24/2016 01:23 PM, azhwkd wrote:

It should not be possible because a group does not return to the
update pool before the update hasn't finished.

So what is this 'update pool' and what is driving/using it?

In other words how is the determination of the parameters done?

To be more specific, the implication is that a group id can be reused so
what determines that?

The application is written in go. Every group ID has its own go routine
and the routine is blocked until the SQL statement returns.
The update process starts with a check to an external API endpoint and
if there is new data available the routine is downloading it, parsing it
and inserting the data into 2 tables. Once that is done, the routine
continues to execute the statement in question using the data it
inserted before for the calculation. Only once this finishes will the
routine start over again.

I watched the queries in a postgres client and there was no

overlap I could see.

Was this a visual inspection or did you dump the results of the various
query/parameter combinations into tables and do an SQL comparison?

I inspected it visually and also dumped all variables into a file
directly from the application.

I don't really know what to make from this behavior, sometimes when I
start the application a few updates go through and eventually it will
lock up completely and sometimes it locks up immediately - always with

Is there a common thread with regard to the parameters in use when
things lock up?

Do you mean if it always locks on the same parameters? If so then it
does not, sadly

Yes, that would have been too easy. I'm out of ideas for the moment. Rob
Stones post looks promising though.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#19azhwkd
azhwkd@gmail.com
In reply to: rob stone (#17)
Re: query locks up when run concurrently

Greetings!

I tried running the application using the race detector but no data races
were detected.
To further narrow this down I removed the call to the function in question
from the pool
and instead run it one by one from a queue. This seems to have helped and
the query didn't lock up anymore.
I took this one step further and left the locked up queries sitting there
for a day and they eventually finished. Who would have thought a ~2 second
query could blow up to ~20hrs.

rob stone <floriparob@gmail.com> schrieb am Fr., 25. Nov. 2016 um 02:34 Uhr:

On Thu, 2016-11-24 at 22:23 +0100, azhwkd wrote:

It should not be possible because a group does not return to the
update pool before the update hasn't finished.
I watched the queries in a postgres client and there was no overlap I
could see.
I don't really know what to make from this behavior, sometimes when I
start the application a few updates go through and eventually it will
lock up completely and sometimes it locks up immediately - always
with
heap_hot_search_buffer using ~20 of all CPU time on the system.

Hello Sebastian,
You stated that the application is written using go (www.golang.org).
Are you able to run the application with the -race flag?

HTH,
Rob

From wikipedia:-

Lack of race condition safety

There are no restrictions on how goroutines access shared data, making
race conditions possible. Specifically, unless a program explicitly
synchronizes via channels or other means, writes from one goroutine
might be partly, entirely, or not at all visible to another, often with
no guarantees about ordering of writes. Furthermore, Go's internal data
structures like interface values, slice headers, hash tables, and
string headers are not immune to race conditions, so type and memory
safety can be violated in multithreaded programs that modify shared
instances of those types without synchronization.

Instead of language support, safe concurrent programming thus relies on
conventions; for example, Chisnall recommends an idiom called "aliases
xor mutable", meaning that passing a mutable value (or pointer) over a
channel signals a transfer of ownership over the value to its receiver.