OPtimize the performance of a query
Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class,
feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match between
the data contained in each of them because I need the result for a farther
process.
I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city ||
'city'))
The table geoname contains 370260 rows and the table test_table contains
10270 rows.
The query took a long time to accomplish more than half an hour.Should I
add another column in the table test_table which contains the country_code
and make an inner join with the geoname table or should I use indexs to
accelerate the process?
Sorry I forget the lower command when I wrote the code, it is like this:
lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or lower(g.feature_class) like
lower('L'))
2018-01-16 17:40 GMT+01:00 Martin Moore <martin.moore@avbrief.com>:
Show quoted text
Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class,
feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match
between the data contained in each of them because I need the result for a
farther process.I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city
|| 'city'))
The table geoname contains 370260 rows and the table test_table contains
10270 rows.
The query took a long time to accomplish more than half an hour.Should I
add another column in the table test_table which contains the country_code
and make an inner join with the geoname table or >should I use indexs to
accelerate the process?Indexes are your friends ☺
I’d certainly add indexes on lower(g.feature_class, g.country_code) and
lower(t.state)Note “and lower(g.country_code) like 'US'” will not return any results as
you’ve got the ‘US’ in UPPER…. Same as (lower(g.feature_class) like 'P' or
lower(g.feature_class) like 'L')Why are you using LIKE? Equals (=) is surely correct and probably faster?
Martin.
Import Notes
Reply to msg id not found: 7FFA315E-9968-4764-9795-D7504FA7DCE4@avbrief.com
Do you have any indecies?
https://www.postgresql.org/docs/current/static/indexes-expressional.html
might be helpful to you.
Also, EXPLAIN will help you understand how your query is being run and
where it can be improved.
https://www.postgresql.org/docs/current/static/using-explain.html
http://postgresguide.com/performance/explain.html
http://jimkeener.com/posts/explain-pg
Jim
On Tue, Jan 16, 2018 at 11:32 AM, hmidi slim <hmidi.slim2@gmail.com> wrote:
Show quoted text
Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class,
feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match between
the data contained in each of them because I need the result for a farther
process.
I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city
|| 'city'))The table geoname contains 370260 rows and the table test_table contains
10270 rows.
The query took a long time to accomplish more than half an hour.Should I
add another column in the table test_table which contains the country_code
and make an inner join with the geoname table or should I use indexs to
accelerate the process?
On Tue, Jan 16, 2018 at 11:46 AM, James Keener <jim@jimkeener.com> wrote:
Do you have any indecies? https://www.postgresql.org/
docs/current/static/indexes-expressional.html might be helpful to you.Also, EXPLAIN will help you understand how your query is being run and
where it can be improved.https://www.postgresql.org/docs/current/static/using-explain.html
http://postgresguide.com/performance/explain.html
http://jimkeener.com/posts/explain-pgJim
On Tue, Jan 16, 2018 at 11:32 AM, hmidi slim <hmidi.slim2@gmail.com>
wrote:Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class,
feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match
between the data contained in each of them because I need the result for a
farther process.
I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city
|| 'city'))The table geoname contains 370260 rows and the table test_table contains
10270 rows.
The query took a long time to accomplish more than half an hour.Should I
add another column in the table test_table which contains the country_code
and make an inner join with the geoname table or should I use indexs to
accelerate the process?
It would have been more helpful if you had included
the actual table structures for both tables.
However, I would start by creating separate indexes on
lower(feature_class)
lower(country_code)
lower(admin1)
lower(name)
lower(city)
That being said, you are better off forcing lowercase on all fields
BEFORE inserting into the table.
EG:
INSERT INTO test_table VALUES (lower(some_key), lower(name),
lower(feature_class), ....)
Then you would would not need to use lower() in the indexes or the query.
Please, in the future, always include your version of PostgreSQL and O/S
--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
Hi
2018-01-16 17:44 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:
Sorry I forget the lower command when I wrote the code, it is like this:
lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or lower(g.feature_class) like
lower('L'))
please, don't do top post.
Your query must be slow. When you use LIKE instead =. It is terrible
performance issue.
So don't use "LIKE" is first rule. Second - you can create functional
indexes
CREATE INDEX ON geoname ((lower(name)))
Regards
Pavel
Show quoted text
2018-01-16 17:40 GMT+01:00 Martin Moore <martin.moore@avbrief.com>:
Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class,
feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match
between the data contained in each of them because I need the result for a
farther process.I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like
'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like
lower(t.city || 'city'))
The table geoname contains 370260 rows and the table test_table contains
10270 rows.
The query took a long time to accomplish more than half an hour.Should I
add another column in the table test_table which contains the country_code
and make an inner join with the geoname table or >should I use indexs to
accelerate the process?Indexes are your friends ☺
I’d certainly add indexes on lower(g.feature_class, g.country_code) and
lower(t.state)Note “and lower(g.country_code) like 'US'” will not return any results as
you’ve got the ‘US’ in UPPER…. Same as (lower(g.feature_class) like 'P' or
lower(g.feature_class) like 'L')Why are you using LIKE? Equals (=) is surely correct and probably faster?
Martin.
On 16 Jan 2018, at 17:32, hmidi slim <hmidi.slim2@gmail.com> wrote:
Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class, feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match between the data contained in each of them because I need the result for a farther process.
I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city || 'city'))
That query is equivalent to:
select g.name, t.city
from geoname as g, test_table as t
where false or lower(g.name) = lower(t.city || 'city'));
So those are probably not the results you want.
At the very least, if you're lower-casing column contents, don't compare those to an upper-cased constant ;)
Also, AND has precedence over OR, which is the other reason why my equivalent query is so much shorter.
And finally, LIKE is equivalent to = (equals) without any wildcards.
The table geoname contains 370260 rows and the table test_table contains 10270 rows.
The query took a long time to accomplish more than half an hour.Should I add another column in the table test_table which contains the country_code and make an inner join with the geoname table or should I use indexs to accelerate the process?
Some indices on lower(g.name) and lower(t.city) from your query would be useful, but in that case make sure you take the concatenation of 'city' out of the lower()-call in your query.
Just reading your latest addition - using lower() on constants is just a waste of cycles. It won't hurt your query much though.
Regards,
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
I changed the operator like and I'm using the operator = .I got the results
much faster but I still have another question about operator. For
difference should I use '<>' or 'is distinct from' with indexes?
2018-01-16 17:49 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Show quoted text
Hi
2018-01-16 17:44 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:
Sorry I forget the lower command when I wrote the code, it is like this:
lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or lower(g.feature_class) like
lower('L'))please, don't do top post.
Your query must be slow. When you use LIKE instead =. It is terrible
performance issue.So don't use "LIKE" is first rule. Second - you can create functional
indexesCREATE INDEX ON geoname ((lower(name)))
Regards
Pavel
2018-01-16 17:40 GMT+01:00 Martin Moore <martin.moore@avbrief.com>:
Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class,
feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match
between the data contained in each of them because I need the result for a
farther process.I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like
'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like
lower(t.city || 'city'))
The table geoname contains 370260 rows and the table test_table
contains 10270 rows.
The query took a long time to accomplish more than half an hour.Should
I add another column in the table test_table which contains the
country_code and make an inner join with the geoname table or >should I use
indexs to accelerate the process?Indexes are your friends ☺
I’d certainly add indexes on lower(g.feature_class, g.country_code) and
lower(t.state)Note “and lower(g.country_code) like 'US'” will not return any results
as you’ve got the ‘US’ in UPPER…. Same as (lower(g.feature_class) like 'P'
or lower(g.feature_class) like 'L')Why are you using LIKE? Equals (=) is surely correct and probably faster?
Martin.
Hi
2018-01-16 18:57 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:
I changed the operator like and I'm using the operator = .I got the
results much faster but I still have another question about operator. For
difference should I use '<>' or 'is distinct from' with indexes?
https://en.wikipedia.org/wiki/Posting_style#Top-posting .. please, don't do
it.
IS DISTINCT FROM has sense if your data - or your queries has NULL. If not,
and it is probably your case, then <> should be preferred.
Regards
Pavel
Show quoted text
2018-01-16 17:49 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi
2018-01-16 17:44 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:
Sorry I forget the lower command when I wrote the code, it is like this:
lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or lower(g.feature_class) like
lower('L'))please, don't do top post.
Your query must be slow. When you use LIKE instead =. It is terrible
performance issue.So don't use "LIKE" is first rule. Second - you can create functional
indexesCREATE INDEX ON geoname ((lower(name)))
Regards
Pavel
2018-01-16 17:40 GMT+01:00 Martin Moore <martin.moore@avbrief.com>:
Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name,
feature_class, feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match
between the data contained in each of them because I need the result for a
farther process.I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like
'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like
lower(t.city || 'city'))
The table geoname contains 370260 rows and the table test_table
contains 10270 rows.
The query took a long time to accomplish more than half an hour.Should
I add another column in the table test_table which contains the
country_code and make an inner join with the geoname table or >should I use
indexs to accelerate the process?Indexes are your friends ☺
I’d certainly add indexes on lower(g.feature_class, g.country_code)
and lower(t.state)Note “and lower(g.country_code) like 'US'” will not return any results
as you’ve got the ‘US’ in UPPER…. Same as (lower(g.feature_class) like 'P'
or lower(g.feature_class) like 'L')Why are you using LIKE? Equals (=) is surely correct and probably
faster?Martin.
Thank you for your advices and thanks for all people who give me some best
practises and useful ideas.
2018-01-16 19:35 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:
Thank you for your advices and thanks for all people who give me some best
practises and useful ideas.
you are welcome
Regards
Pavel
Hi Hmidi,
On 17/01/18 06:57, hmidi slim wrote:
I changed the operator like and I'm using the operator = .I got the
results much faster but I still have another question about operator.
For difference should I use '<>' or 'is distinct from' with indexes?2018-01-16 17:49 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com
<mailto:pavel.stehule@gmail.com>>:Hi
2018-01-16 17:44 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com
<mailto:hmidi.slim2@gmail.com>>:Sorry I forget the lower command when I wrote the code, it is
like this: lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or
lower(g.feature_class) like lower('L'))please, don't do top post.
[...]
Top posting is when you reply at the top of the email, rather than at
the bottom like this.
Bottom posting allows people to see the context before your reply. You
can trim excess, or no longer relevant, content - but note the bits that
you have omitted with '[...]'
Bottom posting is preferred.
Cheers,
Gavin