EXPLAIN SELECT .. does not return

Started by David Linkover 20 years ago7 messagesgeneral
Jump to latest
#1David Link
dlink@soundscan.com

Hi, This has become a major problem for us. Thank you in advance for
your help.

OS: SUSE Linux 2.6.5-7.191-bigsmp
PostgreSQL: 7.4.8
Application: ModPerl Web application using DBI.pm
Database size: 100 Gb, 1025 Tables.

Problem: EXPLAIN SELECT ... does not return.

Description:

The Application uses an EXPLAIN cost to determine whether a client's dynamic
request for data is too demanding for the server so it can gracefully deny
them. (Currently, anything over cost=0.00..500000.00).

The system gets about 3000 page requests a day.

Certain SQL Queries, I believe those with many table joins, when run as
EXPLAIN plans, never return. As a result they seem to stay churning in the
system. Once that happens other queries build up and the performance of the
whole database server grinds to a halt. Postgresql never dies, but
eventually, user requests start timing out.

This happens on average two or three times a week. I kill an offending
process and all's well again. I have not been able to identify with
certainty
an offending SQL statement.

Config params, that have changed from default:

tcpip_socket = true
max_connections = 200
shared_buffers = 2000
sort_mem = 1048576
vacuum_mem = 65536
max_fsm_pages = 100000
max_fsm_relations = 1000
max_files_per_process = 1000
fsync = false
wal_sync_method = fsync
wal_buffers = 800
checkpoint_segments = 30
commit_delay = 100
commit_siblings = 50
effective_cache_size = 1000
random_page_cost = 4
geqo = true
geqo_threshold = 14
default_statistics_target = 100
from_collapse_limit = 13
join_collapse_limit = 13

Note: we load lumps of data ea. week. Then primarily it is a readonly
database.

#2Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: David Link (#1)
Re: EXPLAIN SELECT .. does not return

On 12/5/05, David Link <dlink@soundscan.com> wrote:

Hi, This has become a major problem for us. Thank you in advance for
your help.

OS: SUSE Linux 2.6.5-7.191-bigsmp
PostgreSQL: 7.4.8
Application: ModPerl Web application using DBI.pm
Database size: 100 Gb, 1025 Tables.

Problem: EXPLAIN SELECT ... does not return.

Description:

The Application uses an EXPLAIN cost to determine whether a client's dynamic
request for data is too demanding for the server so it can gracefully deny
them. (Currently, anything over cost=0.00..500000.00).

The system gets about 3000 page requests a day.

Certain SQL Queries, I believe those with many table joins, when run as
EXPLAIN plans, never return. As a result they seem to stay churning in the
system. Once that happens other queries build up and the performance of the
whole database server grinds to a halt. Postgresql never dies, but
eventually, user requests start timing out.

This happens on average two or three times a week. I kill an offending
process and all's well again. I have not been able to identify with
certainty
an offending SQL statement.

Config params, that have changed from default:

tcpip_socket = true
max_connections = 200
shared_buffers = 2000
sort_mem = 1048576
vacuum_mem = 65536
max_fsm_pages = 100000
max_fsm_relations = 1000
max_files_per_process = 1000
fsync = false
wal_sync_method = fsync
wal_buffers = 800
checkpoint_segments = 30
commit_delay = 100
commit_siblings = 50
effective_cache_size = 1000
random_page_cost = 4
geqo = true
geqo_threshold = 14
default_statistics_target = 100
from_collapse_limit = 13
join_collapse_limit = 13

Note: we load lumps of data ea. week. Then primarily it is a readonly
database.

when you have thoses cases, you can take a look in pg_stats_activity
to find the offending query...

or simply logs all queries

--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Link (#1)
Re: EXPLAIN SELECT .. does not return

David Link <dlink@soundscan.com> writes:

Certain SQL Queries, I believe those with many table joins, when run as
EXPLAIN plans, never return.

I'd guess that one or all of these settings are excessive:

geqo_threshold = 14
from_collapse_limit = 13
join_collapse_limit = 13

Keep in mind that the planning cost is exponential in these limits,
eg geqo_threshold = 14 probably allows planning times about 14 times
greater than geqo_threshold = 13.

While I'm looking:

shared_buffers = 2000

That seems extremely low for modern machines.

sort_mem = 1048576

That, on the other hand, is almost certainly way too high for a system-wide
setting. You're promising you have 1Gb available for *each* sort.

max_fsm_pages = 100000

And this way too low for a 100Gb database, unless most of the tables
never see any UPDATEs or DELETEs.

wal_buffers = 800

Seems a bit high, especially considering you have fsync disabled and
thus there is no benefit whatever to buffering WAL.

commit_delay = 100
commit_siblings = 50

Have you measured any benefit to having this turned on?

All in all it looks like your configuration settings were chosen by
throwing darts :-(

regards, tom lane

#4David Link
dlink@soundscan.com
In reply to: Jaime Casanova (#2)
Re: EXPLAIN SELECT .. does not return

Jaime Casanova wrote:

when you have thoses cases, you can take a look in pg_stats_activity
to find the offending query...

or simply logs all queries

Thanks for the advice. I also turned on stat_command_string

#5David Link
dlink@soundscan.com
In reply to: Tom Lane (#3)
Re: EXPLAIN SELECT .. does not return

Tom Lane wrote:

David Link <dlink@soundscan.com> writes:

Certain SQL Queries, I believe those with many table joins, when run as
EXPLAIN plans, never return.

I'd guess that one or all of these settings are excessive:

geqo_threshold = 14
from_collapse_limit = 13
join_collapse_limit = 13

Keep in mind that the planning cost is exponential in these limits,
eg geqo_threshold = 14 probably allows planning times about 14 times
greater than geqo_threshold = 13.

While I'm looking:

shared_buffers = 2000

That seems extremely low for modern machines.

sort_mem = 1048576

That, on the other hand, is almost certainly way too high for a system-wide
setting. You're promising you have 1Gb available for *each* sort.

max_fsm_pages = 100000

And this way too low for a 100Gb database, unless most of the tables
never see any UPDATEs or DELETEs.

wal_buffers = 800

Seems a bit high, especially considering you have fsync disabled and
thus there is no benefit whatever to buffering WAL.

commit_delay = 100
commit_siblings = 50

Have you measured any benefit to having this turned on?

All in all it looks like your configuration settings were chosen by
throwing darts :-(

regards, tom lane

Thanks for your reply, Tom. Different folks have made different
suggestions. Can you suggest more reasonable values for these? But
more importantly, do you think the problem I am having is due to these
configuration short comings?

Thanks much.
David

#6Joshua D. Drake
jd@commandprompt.com
In reply to: David Link (#5)
Re: EXPLAIN SELECT .. does not return

Thanks for your reply, Tom. Different folks have made different
suggestions. Can you suggest more reasonable values for these? But
more importantly, do you think the problem I am having is due to these
configuration short comings?

Thanks much.

David take a look at:
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

Joshua D. Drake

Show quoted text

David

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Link (#5)
Re: EXPLAIN SELECT .. does not return

David Link <dlink@soundscan.com> writes:

more importantly, do you think the problem I am having is due to these
configuration short comings?

Yeah, the planning thresholds ...

regards, tom lane