Slow query on primary server runs fast on hot standby

Started by Kaixi Luoover 9 years ago3 messagesgeneral
Jump to latest
#1Kaixi Luo
kaixiluo@gmail.com

Hello,

I have a primary PostgreSQL server with 64GB of RAM that is replicated
using streaming replication to a hot standby server with 16GB of RAM. My
problem is as follows: I've detected a query that takes a long time to run
on my primary server but runs very fast on the standby server. I did an
EXPLAIN ANALYZE on the query:

EXPLAIN ANALYZE
SELECT this_.id AS id1_31_0_,
this_.nom AS nom2_31_0_,
this_.nom_slug AS nom3_31_0_,
this_.descripcio AS descripc4_31_0_,
this_.url AS url5_31_0_,
this_.data_captura AS data6_31_0_,
this_.data_publicacio AS data7_31_0_,
this_.propietari AS propieta8_31_0_,
this_.privacitat AS privacit9_31_0_,
this_.desnivellpujada AS desnive10_31_0_,
this_.desnivellbaixada AS desnive11_31_0_,
this_.longitud AS longitu13_31_0_,
this_.beginpoint AS beginpo14_31_0_,
this_.endpoint AS endpoin15_31_0_,
this_.caixa3d AS caixa16_31_0_,
this_.pic_id AS pic17_31_0_,
this_.skill AS skill18_31_0_,
this_.spatial_type AS spatial19_31_0_,
this_.tags_cached AS tags20_31_0_,
this_.images_cached AS images21_31_0_,
this_.ncomments AS ncommen22_31_0_,
this_.group_order AS group23_31_0_,
this_.author AS author24_31_0_,
this_.proper_a AS proper25_31_0_,
this_.duration AS duratio26_31_0_,
this_.isloop AS isloop27_31_0_,
this_.seo_country AS seo28_31_0_,
this_.seo_region AS seo29_31_0_,
this_.seo_place AS seo30_31_0_,
this_.source AS source31_31_0_,
this_.source_name AS source32_31_0_,
this_.api_key AS api33_31_0_,
this_.ratingui AS ratingu34_31_0_,
this_.nratings AS nrating35_31_0_,
this_.trailrank AS trailra36_31_0_,
this_.ncoords AS ncoords37_31_0_,
this_.egeom AS egeom38_31_0_,
this_.elevels AS elevels39_31_0_,
this_.elevations AS elevati40_31_0_,
this_.nphotoswpts AS nphotos41_31_0_,
this_.nfavourited AS nfavour42_31_0_,
this_.ncompanions AS ncompan43_31_0_,
this_.group_id AS group44_31_0_
FROM spatial_artifact this_
WHERE this_.group_id IS NULL
AND this_.propietari=7649
ORDER BY this_.id DESC LIMIT 20

*--PRIMARY SERVER *(EXPLAIN ANALYZE output)

"Limit (cost=0.43..22734.71 rows=20 width=604) (actual
time=1804.124..293469.085 rows=20 loops=1)"
" -> Index Scan Backward using "PK_SPATIAL_ARTIFACT" on spatial_artifact
this_ (cost=0.43..7776260.84 rows=6841 width=604) (actual
time=1804.121..293469.056 rows=20 loops=1)"
" Filter: ((group_id IS NULL) AND (propietari = 7649))"
" Rows Removed by Filter: 2848286"
"Total runtime: *293469.135 ms*"

*--STANDBY SERVER *(EXPLAIN ANALYZE output)

"Limit (cost=23533.73..23533.78 rows=20 width=604) (actual
time=2.566..2.569 rows=20 loops=1)"
" -> Sort (cost=23533.73..23550.83 rows=6841 width=604) (actual
time=2.566..2.567 rows=20 loops=1)"
" Sort Key: id"
" Sort Method: top-N heapsort Memory: 35kB"
" -> Index Scan using idx_own_spas on spatial_artifact this_
(cost=0.43..23351.70 rows=6841 width=604) (actual time=0.037..2.119
rows=618 loops=1)"
" Index Cond: (propietari = 7649)"
"Total runtime: *2.612 ms*"

I've run ANALYZE on my table and have reindexed the index idx_own_spas on
my primary server, but it hasn't helped.

Here is the postgresql config of my two servers:

*--PRIMARY SERVER *(postgresql.conf)
shared_buffers = 8GB
work_mem = 42MB
maintenance_work_mem = 2GB
effective_cache_size = 44GB

*--**STANDBY** SERVER *(postgresql.conf)
shared_buffers = 800MB
work_mem = 20MB
maintenance_work_mem = 128MB
effective_cache_size = 1024MB

Could you shed some light into why this is happening? Thank you.

Cheers,

Kaixi

#2Sameer Kumar
sameer.kumar@ashnik.com
In reply to: Kaixi Luo (#1)
Re: Slow query on primary server runs fast on hot standby

On Fri, Jul 15, 2016 at 4:17 PM Kaixi Luo <kaixiluo@gmail.com> wrote:

Hello,

I have a primary PostgreSQL server with 64GB of RAM that is replicated
using streaming replication to a hot standby server with 16GB of RAM.

Do you have different hardware configuration for master and standby? I am
not sure if that is the right thing to do. I usually prefer them to be
equally sized and have same configuration. But may be someone else with
more experience can comment better on that aspect.

My problem is as follows: I've detected a query that takes a long time to
run on my primary server but runs very fast on the standby server. I did an
EXPLAIN ANALYZE on the query:

What is the version of PostgreSQL on both servers? Check with

select version();

EXPLAIN ANALYZE

SELECT this_.id AS id1_31_0_,
this_.nom AS nom2_31_0_,
this_.nom_slug AS nom3_31_0_,
this_.descripcio AS descripc4_31_0_,
this_.url AS url5_31_0_,
this_.data_captura AS data6_31_0_,
this_.data_publicacio AS data7_31_0_,
this_.propietari AS propieta8_31_0_,
this_.privacitat AS privacit9_31_0_,
this_.desnivellpujada AS desnive10_31_0_,
this_.desnivellbaixada AS desnive11_31_0_,
this_.longitud AS longitu13_31_0_,
this_.beginpoint AS beginpo14_31_0_,
this_.endpoint AS endpoin15_31_0_,
this_.caixa3d AS caixa16_31_0_,
this_.pic_id AS pic17_31_0_,
this_.skill AS skill18_31_0_,
this_.spatial_type AS spatial19_31_0_,
this_.tags_cached AS tags20_31_0_,
this_.images_cached AS images21_31_0_,
this_.ncomments AS ncommen22_31_0_,
this_.group_order AS group23_31_0_,
this_.author AS author24_31_0_,
this_.proper_a AS proper25_31_0_,
this_.duration AS duratio26_31_0_,
this_.isloop AS isloop27_31_0_,
this_.seo_country AS seo28_31_0_,
this_.seo_region AS seo29_31_0_,
this_.seo_place AS seo30_31_0_,
this_.source AS source31_31_0_,
this_.source_name AS source32_31_0_,
this_.api_key AS api33_31_0_,
this_.ratingui AS ratingu34_31_0_,
this_.nratings AS nrating35_31_0_,
this_.trailrank AS trailra36_31_0_,
this_.ncoords AS ncoords37_31_0_,
this_.egeom AS egeom38_31_0_,
this_.elevels AS elevels39_31_0_,
this_.elevations AS elevati40_31_0_,
this_.nphotoswpts AS nphotos41_31_0_,
this_.nfavourited AS nfavour42_31_0_,
this_.ncompanions AS ncompan43_31_0_,
this_.group_id AS group44_31_0_
FROM spatial_artifact this_
WHERE this_.group_id IS NULL
AND this_.propietari=7649
ORDER BY this_.id DESC LIMIT 20

*--PRIMARY SERVER *(EXPLAIN ANALYZE output)

"Limit (cost=0.43..22734.71 rows=20 width=604) (actual
time=1804.124..293469.085 rows=20 loops=1)"
" -> Index Scan Backward using "PK_SPATIAL_ARTIFACT" on spatial_artifact
this_ (cost=0.43..7776260.84 rows=6841 width=604) (actual
time=1804.121..293469.056 rows=20 loops=1)"
" Filter: ((group_id IS NULL) AND (propietari = 7649))"
" Rows Removed by Filter: 2848286"
"Total runtime: *293469.135 ms*"

*--STANDBY SERVER *(EXPLAIN ANALYZE output)

"Limit (cost=23533.73..23533.78 rows=20 width=604) (actual
time=2.566..2.569 rows=20 loops=1)"
" -> Sort (cost=23533.73..23550.83 rows=6841 width=604) (actual
time=2.566..2.567 rows=20 loops=1)"
" Sort Key: id"
" Sort Method: top-N heapsort Memory: 35kB"
" -> Index Scan using idx_own_spas on spatial_artifact this_
(cost=0.43..23351.70 rows=6841 width=604) (actual time=0.037..2.119
rows=618 loops=1)"
" Index Cond: (propietari = 7649)"
"Total runtime: *2.612 ms*"

I've run ANALYZE on my table and have reindexed the index idx_own_spas on
my primary server, but it hasn't helped.

Here is the postgresql config of my two servers:

*--PRIMARY SERVER *(postgresql.conf)
shared_buffers = 8GB
work_mem = 42MB
maintenance_work_mem = 2GB
effective_cache_size = 44GB

what are the values of random_page_cost and seq_page_cost?

Also what might help here is the number of rows and pages in the table -
select relpages,reltuples, relname from pg_class where relname in ('
idx_own_spas ','spatial_artifact','PK_SPATIAL_ARTIFACT');

*--**STANDBY** SERVER *(postgresql.conf)
shared_buffers = 800MB
work_mem = 20MB
maintenance_work_mem = 128MB
effective_cache_size = 1024MB

Could you shed some light into why this is happening? Thank you.

Cheers,

Kaixi

--
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com

#3Kaixi Luo
kaixiluo@gmail.com
In reply to: Sameer Kumar (#2)
Re: Slow query on primary server runs fast on hot standby

Do you have different hardware configuration for master and standby?

Unfortunately, I do. Changing this is beyond my control at the moment.
Also, I made a mistake in my first email. The standby server has 32GB of
RAM. Here are the specs:

*PRIMARY SERVER*
CPU: Intel Xeon E5-1650 v2 @ 3.50GHz
RAM: 64GBDDR3 ECC
SSD disk: SAMSUNG MZ7WD240HAFV-00003

*STANDBY SERVER*
CPU: Intel(R) Xeon(R) CPU E31245 @ 3.30GHz
RAM: 32GBDDR3 ECC
SAS disk: SEAGATE ST3300657SS

What is the version of PostgreSQL on both servers?

PostgreSQL 9.3.4

what are the values of random_page_cost and seq_page_cost?

Both instances are using the default values:
random_page_cost = 4
seq_page_cost = 1

select relpages,reltuples, relname from pg_class where relname in ('

idx_own_spas ','spatial_artifact','PK_SPATIAL_ARTIFACT');

relpages | reltuples | relname
----------+-------------+---------------------
49217 | 9.70814e+06 | PK_SPATIAL_ARTIFACT
14329 | 5.22472e+06 | idx_own_spas
3423824 | 1.11087e+07 | spatial_artifact
(3 rows)

On Fri, Jul 15, 2016 at 11:12 AM, Sameer Kumar <sameer.kumar@ashnik.com>
wrote:

Show quoted text

On Fri, Jul 15, 2016 at 4:17 PM Kaixi Luo <kaixiluo@gmail.com> wrote:

Hello,

I have a primary PostgreSQL server with 64GB of RAM that is replicated
using streaming replication to a hot standby server with 16GB of RAM.

Do you have different hardware configuration for master and standby? I am
not sure if that is the right thing to do. I usually prefer them to be
equally sized and have same configuration. But may be someone else with
more experience can comment better on that aspect.

My problem is as follows: I've detected a query that takes a long time to
run on my primary server but runs very fast on the standby server. I did an
EXPLAIN ANALYZE on the query:

What is the version of PostgreSQL on both servers? Check with

select version();

EXPLAIN ANALYZE

SELECT this_.id AS id1_31_0_,
this_.nom AS nom2_31_0_,
this_.nom_slug AS nom3_31_0_,
this_.descripcio AS descripc4_31_0_,
this_.url AS url5_31_0_,
this_.data_captura AS data6_31_0_,
this_.data_publicacio AS data7_31_0_,
this_.propietari AS propieta8_31_0_,
this_.privacitat AS privacit9_31_0_,
this_.desnivellpujada AS desnive10_31_0_,
this_.desnivellbaixada AS desnive11_31_0_,
this_.longitud AS longitu13_31_0_,
this_.beginpoint AS beginpo14_31_0_,
this_.endpoint AS endpoin15_31_0_,
this_.caixa3d AS caixa16_31_0_,
this_.pic_id AS pic17_31_0_,
this_.skill AS skill18_31_0_,
this_.spatial_type AS spatial19_31_0_,
this_.tags_cached AS tags20_31_0_,
this_.images_cached AS images21_31_0_,
this_.ncomments AS ncommen22_31_0_,
this_.group_order AS group23_31_0_,
this_.author AS author24_31_0_,
this_.proper_a AS proper25_31_0_,
this_.duration AS duratio26_31_0_,
this_.isloop AS isloop27_31_0_,
this_.seo_country AS seo28_31_0_,
this_.seo_region AS seo29_31_0_,
this_.seo_place AS seo30_31_0_,
this_.source AS source31_31_0_,
this_.source_name AS source32_31_0_,
this_.api_key AS api33_31_0_,
this_.ratingui AS ratingu34_31_0_,
this_.nratings AS nrating35_31_0_,
this_.trailrank AS trailra36_31_0_,
this_.ncoords AS ncoords37_31_0_,
this_.egeom AS egeom38_31_0_,
this_.elevels AS elevels39_31_0_,
this_.elevations AS elevati40_31_0_,
this_.nphotoswpts AS nphotos41_31_0_,
this_.nfavourited AS nfavour42_31_0_,
this_.ncompanions AS ncompan43_31_0_,
this_.group_id AS group44_31_0_
FROM spatial_artifact this_
WHERE this_.group_id IS NULL
AND this_.propietari=7649
ORDER BY this_.id DESC LIMIT 20

*--PRIMARY SERVER *(EXPLAIN ANALYZE output)

"Limit (cost=0.43..22734.71 rows=20 width=604) (actual
time=1804.124..293469.085 rows=20 loops=1)"
" -> Index Scan Backward using "PK_SPATIAL_ARTIFACT" on
spatial_artifact this_ (cost=0.43..7776260.84 rows=6841 width=604) (actual
time=1804.121..293469.056 rows=20 loops=1)"
" Filter: ((group_id IS NULL) AND (propietari = 7649))"
" Rows Removed by Filter: 2848286"
"Total runtime: *293469.135 ms*"

*--STANDBY SERVER *(EXPLAIN ANALYZE output)

"Limit (cost=23533.73..23533.78 rows=20 width=604) (actual
time=2.566..2.569 rows=20 loops=1)"
" -> Sort (cost=23533.73..23550.83 rows=6841 width=604) (actual
time=2.566..2.567 rows=20 loops=1)"
" Sort Key: id"
" Sort Method: top-N heapsort Memory: 35kB"
" -> Index Scan using idx_own_spas on spatial_artifact this_
(cost=0.43..23351.70 rows=6841 width=604) (actual time=0.037..2.119
rows=618 loops=1)"
" Index Cond: (propietari = 7649)"
"Total runtime: *2.612 ms*"

I've run ANALYZE on my table and have reindexed the index idx_own_spas
on my primary server, but it hasn't helped.

Here is the postgresql config of my two servers:

*--PRIMARY SERVER *(postgresql.conf)
shared_buffers = 8GB
work_mem = 42MB
maintenance_work_mem = 2GB
effective_cache_size = 44GB

what are the values of random_page_cost and seq_page_cost?

Also what might help here is the number of rows and pages in the table -
select relpages,reltuples, relname from pg_class where relname in ('
idx_own_spas ','spatial_artifact','PK_SPATIAL_ARTIFACT');

*--**STANDBY** SERVER *(postgresql.conf)
shared_buffers = 800MB
work_mem = 20MB
maintenance_work_mem = 128MB
effective_cache_size = 1024MB

Could you shed some light into why this is happening? Thank you.

Cheers,

Kaixi

--
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com