PostgreSQL 10.0 SELECT LIMIT performance problem
Hi!
We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.
We have select with subselect filter:
select a.id, a.jdata
from oss_alarms a
where
a.jdata->>'dn' in
(
select o.jdata->>'ossDn'
from oss_objects o, tvc_entity e
where e.jtype='object'
and o.jdata->>'sid'=e.jdata->>'siteId'
and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')
)
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc
limit 20;
Select used to get data for user interface table view window. Table view has several filters what users can apply and subselect realize one of them. Subselect execution time always fine, but main select have some serous performance problems. When subselect replaced with static values, select execute time is fine.
Table oss_alarms very often updated but only new records, there is about 10`000 to 30`000 new records per day and, when they processed, there are no more changes.
Version string PostgreSQL 10.0 on powerpc64le-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065., 64-bit
I’m gathered some technical information about DB structure and execution (see attachment).
Mareks Kalnačs
Software Engineer
Software Development Department
T: +371 67628888
M: +371 26479242
@: Mareks.Kalnacs@datakom.lv<mailto:Mareks.Kalnacs@datakom.lv>
www.datakom.lv<http://www.datakom.lv>, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058
[cid:image003.png@01D44A8B.E3896950]
________________________________
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
________________________________
ср, 12 сент. 2018 г. в 11:39, Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>:
We have some serious performance problem with SELECTS when add limit, for
example, execute time without limit ~250msec (316 rows returned), when add
limit 20, execute time 15 – 50secs.
Hi, Mareks.
As this is not actually a bug, it's better to use pgsql-performance or
pgsql-general for such questions next time.
We have select with subselect filter:
select a.id, a.jdata
from oss_alarms a
where
a.jdata->>'dn' in
(
select o.jdata->>'ossDn'
from oss_objects o, tvc_entity e
where e.jtype='object'
and
o.jdata->>'sid'=e.jdata->>'siteId'and
tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%'))
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc
limit 20;
As execution plan without limit shows, your join conditions yield 316 rows,
explicit sort is fast for this amount
For the plan with the limit, planner prefers to use
`idx_oss_alarms_alarm_time` index, to avoid extra sort.
This leads to:
Rows Removed by Join Filter: 10717797
I.e. you're reading 10M rows via index scan and later throw them away, as
they do not match your join condition:
Join Filter: ((a.jdata ->> 'dn'::text) = (o.jdata ->>
'ossDn'::text))
It looks like a.jdata->'dn' and a.jdata->'alarmTime' are correlated,
although planner doesn't knows that.
PostgreSQL 10 has `CREATE STATISTICS`, but it works on table columns, not
expressions.
You can try disabling `idx_oss_alarms_alarm_time` by using expression in
`ORDER BY`:
order by (idx_oss_alarms_alarm_timetvc_convert_array_to_date(a.jdata ->
'alarmTime'))+INTERVAL '0' desc
But be warned — this can help in some situations and make things worse in
others.
I would recommend to move JOIN and ORDER BY columns out of JSON and make
them direct table columns.
--
Victor Yegorov
Attachments:
2018-09-12 10:31 GMT+02:00 Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>:
Hi!
We have some serious performance problem with SELECTS when add limit, for
example, execute time without limit ~250msec (316 rows returned), when add
limit 20, execute time 15 – 50secs.We have select with subselect filter:
select a.id, a.jdata
from oss_alarms a
where
a.jdata->>'dn' in
(
select o.jdata->>'ossDn'
from oss_objects o, tvc_entity e
where e.jtype='object'
and
o.jdata->>'sid'=e.jdata->>'siteId'and
tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%'))
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc
limit 20;
Select used to get data for user interface table view window. Table view
has several filters what users can apply and subselect realize one of them.
Subselect execution time always fine, but main select have some serous
performance problems. When subselect replaced with static values, select
execute time is fine.Table oss_alarms very often updated but only new records, there is about
10`000 to 30`000 new records per day and, when they processed, there are no
more changes.Version string PostgreSQL 10.0 on powerpc64le-unknown-linux-gnu,
compiled by gcc (GCC) 6.3.1 20170515 (Advance-Toolchain-at10.0) IBM AT 10
branch, based on subversion id 248065., 64-bitI’m gathered some technical information about DB structure and execution
(see attachment).
Sometimes LIMIT clause can confuse optimizator, when data are not uniform.
You can try OFFSET 0 trick:
Original query: SELECT * FROM t ORDER BY c LIMIT 10
transform to:
SELECT * FROM (original query without limit OFFSET 0) x LIMIT 10;
Show quoted text
*Mareks Kalnačs*
Software Engineer
Software Development Department
T: +371 67628888
M: +371 26479242
@: Mareks.Kalnacs@datakom.lv*www.datakom.lv <http://www.datakom.lv>*, SIA Datakom, Vienības gatve
109, Rīga, Latvija, LV-1058------------------------------
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts
tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un
var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja
nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā
ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam
sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo
vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta
sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā
vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis
elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par
SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties
saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605,
Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, LatvijaThis electronic transmission and attached documents is intended only for
the addressee(s). It contains confidential information and may also contain
privileged information. If you are not an intended recipient, you may not
use, read, retransmit, disseminate or take any action in reliance upon it.
If this electronic transmission is received in error, please contact the
sender by sending a reply e-mail and delete this message. SIA Datakom
reserves the right to monitor all e-mail communications through its
networks. Unless this letter or the accompanying documents clearly stated
otherwise, by this electronic transmission does not constitute a contract
offer, a contract amendment or an acceptance of a contract offer on behalf
of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605,
Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia------------------------------
Attachments:
Hi Pavel!
He know how to trick SQL, but this not a solution. We have different filters with different distribution fields, for example, jdata->>'dn' distribution is 2.8%, but jdata->>’tech’ 25%, this means we must get count and based on count modify select. This is DB server job, not an application server job.
Best regards,
Mareks Kalnačs
Software Engineer
Software Development Department
T: +371 67628888
M: +371 26479242
@: Mareks.Kalnacs@datakom.lv<mailto:Mareks.Kalnacs@datakom.lv>
www.datakom.lv<http://www.datakom.lv>, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058
[cid:image002.png@01D44AB3.AED25B90]
From: Pavel Stehule <pavel.stehule@gmail.com>
Sent: trešdiena, 2018. gada 12. septembris 15:43
To: Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>
Cc: pgsql-bugs@postgresql.org; Māris Rucis <Maris.Rucis@datakom.lv>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem
2018-09-12 10:31 GMT+02:00 Mareks Kalnačs <Mareks.Kalnacs@datakom.lv<mailto:Mareks.Kalnacs@datakom.lv>>:
Hi!
We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.
We have select with subselect filter:
select a.id<http://a.id>, a.jdata
from oss_alarms a
where
a.jdata->>'dn' in
(
select o.jdata->>'ossDn'
from oss_objects o, tvc_entity e
where e.jtype='object'
and o.jdata->>'sid'=e.jdata->>'siteId'
and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')
)
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc
limit 20;
Select used to get data for user interface table view window. Table view has several filters what users can apply and subselect realize one of them. Subselect execution time always fine, but main select have some serous performance problems. When subselect replaced with static values, select execute time is fine.
Table oss_alarms very often updated but only new records, there is about 10`000 to 30`000 new records per day and, when they processed, there are no more changes.
Version string PostgreSQL 10.0 on powerpc64le-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065., 64-bit
I’m gathered some technical information about DB structure and execution (see attachment).
Sometimes LIMIT clause can confuse optimizator, when data are not uniform.
You can try OFFSET 0 trick:
Original query: SELECT * FROM t ORDER BY c LIMIT 10
transform to:
SELECT * FROM (original query without limit OFFSET 0) x LIMIT 10;
Mareks Kalnačs
Software Engineer
Software Development Department
T: +371 67628888
M: +371 26479242
@: Mareks.Kalnacs@datakom.lv<mailto:Mareks.Kalnacs@datakom.lv>
www.datakom.lv<http://www.datakom.lv>, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058
[cid:image001.png@01D44AB1.2B8636F0]
________________________________
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
________________________________
________________________________
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
________________________________
Hi Victor!
I will disagree with you about bug. I understand that foreign keys will solve some issues and we already making changes add FK, but this only bypass problem, not solve it and in different conditions we got problem again. Main issue, optimizer not take in account index distribution. In current case main cause actually is subselect. I little bit played around with indexes and different select conditions and got very strange behavior. What we done, add new index for tvc_entity table (idx_tvc_entity_sid_u) and run full analyze for all tables and got problem in reverse, now optimizer thinks that he will get less rows from subselect.
Subselect:
select o.jdata->>'ossDn'
from oss_objects o, tvc_entity e
where e.jtype='object'
and e.jdata->>'siteId' IS NOT NULL
and o.jdata->>'sid'=e.jdata->>'siteId'
and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%za%')
oss_objects.jdata->>’sid’ distribution is 4.2% , null value count 319, total rows 31693, but execution plan always think that real rows returned will be less than actual:
Gather (cost=1014.16..5280.19 rows=9 width=32) (actual time=1.304..18.982 rows=1291 loops=1)
-> Nested Loop (cost=14.16..4279.28 rows=5 width=32) (actual time=3.172..14.497 rows=646 loops=2)
-> Parallel Bitmap Heap Scan on tvcis.tvc_entity e (cost=13.87..1694.08 rows=149 width=208) (actual time=3.017..11.597 rows=36 loops=2)
-> Index Scan using idx_oss_objects_sid on tvcis.oss_objects o (cost=0.29..17.12 rows=23 width=951) (actual time=0.015..0.054 rows=18 loops=71)
If oss_objects.jdata->>’sid’ distribution is 4.2% then for every unique tvc_entity.jdata->>’siteId’ will be selected 31693 * 4.2% = 23 rows, this means join must return more rows than select from tvc_entity, what is real result, but optimized think in reverse.
We have another example, where we don’t understand how to solve it, and this may not a bug. In this example we have full text search index within one table, and, when we hit value with low hit count, select runs slow:
select a.id, a.jdata
from tvcis.oss_alarms a
where (
to_tsvector(
'simple'::regconfig,
a.ts_vector_fields
) @@ to_tsquery(
'simple',
' (decodedType5b6f0753ossAlarmsGenerator) '
)
)
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) DESC
limit 20;
Bad case:
Sort (cost=8787.37..8799.58 rows=4885 width=809) (actual time=7.351..7.423 rows=586 loops=1)
Without limit:
Planning time: 0.301 ms
Execution time: 7.574 ms
With limit:
Planning time: 0.312 ms
Execution time: 4972.245 ms
Good case:
From count:
-> Parallel Bitmap Heap Scan on tvcis.oss_alarms a (cost=5080.23..455246.09 rows=248615 width=0) (actual time=1142.280..175706.097 rows=199195 loops=3)
Without limit:
Too many rows to test, get count instead
With limit:
Planning time: 0.370 ms
Execution time: 43.131 ms
But counts is a nightmare:
Planning time: 0.236 ms
Execution time: 176343.730 ms
Couple examples of explain plan attached.
Best regards,
Mareks Kalnačs
Software Engineer
Software Development Department
T: +371 67628888
M: +371 26479242
@: Mareks.Kalnacs@datakom.lv<mailto:Mareks.Kalnacs@datakom.lv>
www.datakom.lv<http://www.datakom.lv>, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058
[cid:image002.png@01D44ACD.C28FEBB0]
From: Victor Yegorov <vyegorov@gmail.com>
Sent: trešdiena, 2018. gada 12. septembris 15:32
To: Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>
Cc: pgsql-bugs@postgresql.org; Māris Rucis <Maris.Rucis@datakom.lv>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem
ср, 12 сент. 2018 г. в 11:39, Mareks Kalnačs <Mareks.Kalnacs@datakom.lv<mailto:Mareks.Kalnacs@datakom.lv>>:
We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.
Hi, Mareks.
As this is not actually a bug, it's better to use pgsql-performance or pgsql-general for such questions next time.
We have select with subselect filter:
select a.id<http://a.id>, a.jdata
from oss_alarms a
where
a.jdata->>'dn' in
(
select o.jdata->>'ossDn'
from oss_objects o, tvc_entity e
where e.jtype='object'
and o.jdata->>'sid'=e.jdata->>'siteId'
and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')
)
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc
limit 20;
As execution plan without limit shows, your join conditions yield 316 rows, explicit sort is fast for this amount
For the plan with the limit, planner prefers to use `idx_oss_alarms_alarm_time` index, to avoid extra sort.
This leads to:
Rows Removed by Join Filter: 10717797
I.e. you're reading 10M rows via index scan and later throw them away, as they do not match your join condition:
Join Filter: ((a.jdata ->> 'dn'::text) = (o.jdata ->> 'ossDn'::text))
It looks like a.jdata->'dn' and a.jdata->'alarmTime' are correlated, although planner doesn't knows that.
PostgreSQL 10 has `CREATE STATISTICS`, but it works on table columns, not expressions.
You can try disabling `idx_oss_alarms_alarm_time` by using expression in `ORDER BY`:
order by (idx_oss_alarms_alarm_timetvc_convert_array_to_date(a.jdata -> 'alarmTime'))+INTERVAL '0' desc
But be warned — this can help in some situations and make things worse in others.
I would recommend to move JOIN and ORDER BY columns out of JSON and make them direct table columns.
--
Victor Yegorov
________________________________
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
________________________________
ср, 12 сент. 2018 г. в 19:21, Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>:
I will disagree with you about bug. I understand that foreign keys will
solve some issues and we already making changes add FK, but this only
bypass problem, not solve it and in different conditions we got problem
again. Main issue, optimizer not take in account index distribution. In
current case main cause actually is subselect. I little bit played around
with indexes and different select conditions and got very strange behavior.
What we done, add new index for tvc_entity table (idx_tvc_entity_sid_u) and
run full analyze for all tables and got problem in reverse, now optimizer
thinks that he will get less rows from subselect.
The fact, that planner is not accurate on the estimates of JSON internal
keys is expected, PostgreSQL is not parsing JSON values when gathering
stats.
You cannot expect planner to be picky about all possible corner cases, it
would make planning time enormously huge.
That is the reason I outlined, that important keys should be extracted into
plain columns.
If you still consider this is a bug, please — send isolated reproducible
test case that demonstrates the bug.
--
Victor Yegorov
Attachments:
Victor Yegorov <vyegorov@gmail.com> writes:
The fact, that planner is not accurate on the estimates of JSON internal
keys is expected, PostgreSQL is not parsing JSON values when gathering
stats.
You cannot expect planner to be picky about all possible corner cases, it
would make planning time enormously huge.
Right. The fact that it doesn't make the right guesses without help
can't be considered to be a bug in all cases. These are engineering
tradeoffs we have to make.
That is the reason I outlined, that important keys should be extracted into
plain columns.
If that seems infeasible from an application standpoint, another
possibility is to make expression indexes on those important keys.
ANALYZE will gather stats on the values of indexed expressions, and then
perhaps the planner will have enough info to make better decisions.
In the other case mentioned, where the problem is a poor guess about
the selectivity of
where (
to_tsvector(
'simple'::regconfig,
a.ts_vector_fields
) @@ to_tsquery(
'simple',
' (decodedType5b6f0753ossAlarmsGenerator) '
)
)
it's the same problem: the planner has no stats that would let it
figure out the selectivity. It can't reasonably extract an
estimate on the fly --- if it did, you'd be complaining that
planning time was too long. The only way to get reasonable behavior
is to set things up so that ANALYZE will accumulate stats about
the values of "to_tsvector('simple',a.ts_vector_fields)". You
can arrange that either by extracting that into a column, or by
making an index on it.
regards, tom lane
On 9/12/18 7:04 PM, Tom Lane wrote:
Victor Yegorov <vyegorov@gmail.com> writes:
The fact, that planner is not accurate on the estimates of JSON internal
keys is expected, PostgreSQL is not parsing JSON values when gathering
stats.
You cannot expect planner to be picky about all possible corner cases, it
would make planning time enormously huge.Right. The fact that it doesn't make the right guesses without help
can't be considered to be a bug in all cases. These are engineering
tradeoffs we have to make.That is the reason I outlined, that important keys should be extracted into
plain columns.If that seems infeasible from an application standpoint, another
possibility is to make expression indexes on those important keys.
ANALYZE will gather stats on the values of indexed expressions, and then
perhaps the planner will have enough info to make better decisions.
Hi,
For what it can help, I wrote this article explaining that:
https://blog.anayrat.info/en/2017/11/26/postgresql---jsonb-and-statistics/
Regards,
Hi!
But we are using value indexes not a json index:
CREATE INDEX idx_oss_alarms_dn
ON oss_alarms
USING btree
((jdata ->> 'dn'::text) COLLATE pg_catalog."default");
CREATE INDEX idx_oss_objects_sid
ON oss_objects
USING btree
((jdata ->> 'sid'::text) COLLATE pg_catalog."default");
CREATE UNIQUE INDEX idx_tvc_entity_sid_u
ON tvc_entity
USING btree
((jdata ->> 'siteId'::text) COLLATE pg_catalog."default")
WHERE (jdata ->> 'siteId'::text) IS NOT NULL;
May be we don’t understand this index behavior?
Mareks
From: Victor Yegorov <vyegorov@gmail.com>
Sent: trešdiena, 2018. gada 12. septembris 19:43
To: Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>
Cc: pgsql-bugs@postgresql.org; Māris Rucis <Maris.Rucis@datakom.lv>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem
ср, 12 сент. 2018 г. в 19:21, Mareks Kalnačs <Mareks.Kalnacs@datakom.lv<mailto:Mareks.Kalnacs@datakom.lv>>:
I will disagree with you about bug. I understand that foreign keys will solve some issues and we already making changes add FK, but this only bypass problem, not solve it and in different conditions we got problem again. Main issue, optimizer not take in account index distribution. In current case main cause actually is subselect. I little bit played around with indexes and different select conditions and got very strange behavior. What we done, add new index for tvc_entity table (idx_tvc_entity_sid_u) and run full analyze for all tables and got problem in reverse, now optimizer thinks that he will get less rows from subselect.
The fact, that planner is not accurate on the estimates of JSON internal keys is expected, PostgreSQL is not parsing JSON values when gathering stats.
You cannot expect planner to be picky about all possible corner cases, it would make planning time enormously huge.
That is the reason I outlined, that important keys should be extracted into plain columns.
If you still consider this is a bug, please — send isolated reproducible test case that demonstrates the bug.
--
Victor Yegorov
________________________________
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
________________________________
Hi!
My point is, why planned ignore table hits and, when joining tables, think that rows will be less than in reality:
Table1: Index Scan using idx_tvc_entity_sid_u on tvcis.tvc_entity e (cost=0.28..1856.72 rows=253 width=208) (actual time=0.091..13.944 rows=71 loops=1)
Table2: Index Scan using idx_oss_objects_sid on tvcis.oss_objects o (cost=0.29..16.84 rows=23 width=951) (actual time=0.009..0.033 rows=18 loops=71)
Join: Nested Loop (cost=0.57..6175.55 rows=9 width=951) (actual time=0.112..17.355 rows=1291 loops=1)
Why planned don’t take in account results from each table:
Join rows = Table1 rows * Table2 rows == 253 * 23 = 5819? In this case difference will be 22%, but planner got 9 rows what is 0.7% from actual result
As we see, actual rows almost equal with subquery row multiplication 71 * 18 ~= 1291
Mareks
-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: trešdiena, 2018. gada 12. septembris 20:04
To: Victor Yegorov <vyegorov@gmail.com>
Cc: Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>; pgsql-bugs@postgresql.org; Māris Rucis <Maris.Rucis@datakom.lv>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem
Victor Yegorov <vyegorov@gmail.com<mailto:vyegorov@gmail.com>> writes:
The fact, that planner is not accurate on the estimates of JSON
internal keys is expected, PostgreSQL is not parsing JSON values when
gathering stats.
You cannot expect planner to be picky about all possible corner cases,
it would make planning time enormously huge.
Right. The fact that it doesn't make the right guesses without help can't be considered to be a bug in all cases. These are engineering tradeoffs we have to make.
That is the reason I outlined, that important keys should be extracted
into plain columns.
If that seems infeasible from an application standpoint, another possibility is to make expression indexes on those important keys.
ANALYZE will gather stats on the values of indexed expressions, and then perhaps the planner will have enough info to make better decisions.
In the other case mentioned, where the problem is a poor guess about the selectivity of
where (
to_tsvector(
'simple'::regconfig,
a.ts_vector_fields
) @@ to_tsquery(
'simple',
' (decodedType5b6f0753ossAlarmsGenerator) '
)
)
it's the same problem: the planner has no stats that would let it figure out the selectivity. It can't reasonably extract an estimate on the fly --- if it did, you'd be complaining that planning time was too long. The only way to get reasonable behavior is to set things up so that ANALYZE will accumulate stats about the values of "to_tsvector('simple',a.ts_vector_fields)". You can arrange that either by extracting that into a column, or by making an index on it.
regards, tom lane
________________________________
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
________________________________
=?utf-8?B?TWFyZWtzIEthbG5hxI1z?= <Mareks.Kalnacs@datakom.lv> writes:
But we are using value indexes not a json index:
CREATE INDEX idx_oss_alarms_dn
ON oss_alarms
USING btree
((jdata ->> 'dn'::text) COLLATE pg_catalog."default");
I think you're outsmarting yourself by including those COLLATE clauses.
They don't do anything, since they're just selecting the default behavior
--- but they're enough to make the planner not realize that stats
collected on the index expression would be applicable to a plain
reference to oss_alarms.jdata ->> 'dn'. In general you want the index
expression to be spelled exactly the same way that you refer to the
value in queries, else the system may not realize it's relevant.
regards, tom lane
Actually Postgres trying to outsmart himself, I do not add COLLATE ;)
Original trigger text
CREATE INDEX idx_oss_alarms_dn ON oss_alarms ((jdata ->> 'dn'));
Mareks
-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: ceturtdiena, 2018. gada 13. septembris 20:42
To: Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>
Cc: Victor Yegorov <vyegorov@gmail.com>; pgsql-bugs@postgresql.org; Māris Rucis <Maris.Rucis@datakom.lv>; Pāvels Koržs <Pavels.Korzs@datakom.lv>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem
=?utf-8?B?TWFyZWtzIEthbG5hxI1z?= <Mareks.Kalnacs@datakom.lv> writes:
But we are using value indexes not a json index:
CREATE INDEX idx_oss_alarms_dn
ON oss_alarms
USING btree
((jdata ->> 'dn'::text) COLLATE pg_catalog."default");
I think you're outsmarting yourself by including those COLLATE clauses.
They don't do anything, since they're just selecting the default behavior
--- but they're enough to make the planner not realize that stats collected on the index expression would be applicable to a plain reference to oss_alarms.jdata ->> 'dn'. In general you want the index expression to be spelled exactly the same way that you refer to the value in queries, else the system may not realize it's relevant.
regards, tom lane
________________________________
Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija
This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia
________________________________