OPtimize the performance of a query

Started by hmidi slimabout 8 years ago11 messagesgeneral
Jump to latest
#1hmidi slim
hmidi.slim2@gmail.com

Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class,
feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match between
the data contained in each of them because I need the result for a farther
process.
I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city ||
'city'))

The table geoname contains 370260 rows and the table test_table contains
10270 rows.
The query took a long time to accomplish more than half an hour.Should I
add another column in the table test_table which contains the country_code
and make an inner join with the geoname table or should I use indexs to
accelerate the process?

#2hmidi slim
hmidi.slim2@gmail.com
In reply to: hmidi slim (#1)
Re: OPtimize the performance of a query

Sorry I forget the lower command when I wrote the code, it is like this:
lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or lower(g.feature_class) like
lower('L'))

2018-01-16 17:40 GMT+01:00 Martin Moore <martin.moore@avbrief.com>:

Show quoted text

Hi,

I have two tables in the same database: geoname and test_table.

The geoname table contains many columns which are: name, feature_class,

feature_code, admin1, admin2,admin3, name and so on.

The second table 'test_table' contains only the columns: city, state.

There is no join between the two tables and I want to make a match

between the data contained in each of them because I need the result for a
farther process.

I wrote this query:

select g.name, t.city

from geoname as g, test_table as t

where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')

and lower(g.country_code) like 'US'

and lower(g.admin1) like lower(t.state)

and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city

|| 'city'))

The table geoname contains 370260 rows and the table test_table contains

10270 rows.

The query took a long time to accomplish more than half an hour.Should I

add another column in the table test_table which contains the country_code
and make an inner join with the geoname table or >should I use indexs to
accelerate the process?

Indexes are your friends ☺

I’d certainly add indexes on lower(g.feature_class, g.country_code) and
lower(t.state)

Note “and lower(g.country_code) like 'US'” will not return any results as
you’ve got the ‘US’ in UPPER…. Same as (lower(g.feature_class) like 'P' or
lower(g.feature_class) like 'L')

Why are you using LIKE? Equals (=) is surely correct and probably faster?

Martin.

#3James Keener
jim@jimkeener.com
In reply to: hmidi slim (#1)
Re: OPtimize the performance of a query

Do you have any indecies?
https://www.postgresql.org/docs/current/static/indexes-expressional.html
might be helpful to you.

Also, EXPLAIN will help you understand how your query is being run and
where it can be improved.

https://www.postgresql.org/docs/current/static/using-explain.html
http://postgresguide.com/performance/explain.html
http://jimkeener.com/posts/explain-pg

Jim

On Tue, Jan 16, 2018 at 11:32 AM, hmidi slim <hmidi.slim2@gmail.com> wrote:

Show quoted text

Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class,
feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match between
the data contained in each of them because I need the result for a farther
process.
I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city
|| 'city'))

The table geoname contains 370260 rows and the table test_table contains
10270 rows.
The query took a long time to accomplish more than half an hour.Should I
add another column in the table test_table which contains the country_code
and make an inner join with the geoname table or should I use indexs to
accelerate the process?

#4Melvin Davidson
melvin6925@gmail.com
In reply to: James Keener (#3)
Re: OPtimize the performance of a query

On Tue, Jan 16, 2018 at 11:46 AM, James Keener <jim@jimkeener.com> wrote:

Do you have any indecies? https://www.postgresql.org/
docs/current/static/indexes-expressional.html might be helpful to you.

Also, EXPLAIN will help you understand how your query is being run and
where it can be improved.

https://www.postgresql.org/docs/current/static/using-explain.html
http://postgresguide.com/performance/explain.html
http://jimkeener.com/posts/explain-pg

Jim

On Tue, Jan 16, 2018 at 11:32 AM, hmidi slim <hmidi.slim2@gmail.com>
wrote:

Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class,
feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match
between the data contained in each of them because I need the result for a
farther process.
I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city
|| 'city'))

The table geoname contains 370260 rows and the table test_table contains
10270 rows.
The query took a long time to accomplish more than half an hour.Should I
add another column in the table test_table which contains the country_code
and make an inner join with the geoname table or should I use indexs to
accelerate the process?

It would have been more helpful if you had included
the actual table structures for both tables.
However, I would start by creating separate indexes on
lower(feature_class)
lower(country_code)
lower(admin1)
lower(name)
lower(city)

That being said, you are better off forcing lowercase on all fields
BEFORE inserting into the table.
EG:

INSERT INTO test_table VALUES (lower(some_key), lower(name),
lower(feature_class), ....)

Then you would would not need to use lower() in the indexes or the query.

Please, in the future, always include your version of PostgreSQL and O/S

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: hmidi slim (#2)
Re: OPtimize the performance of a query

Hi

2018-01-16 17:44 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:

Sorry I forget the lower command when I wrote the code, it is like this:
lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or lower(g.feature_class) like
lower('L'))

please, don't do top post.

Your query must be slow. When you use LIKE instead =. It is terrible
performance issue.

So don't use "LIKE" is first rule. Second - you can create functional
indexes

CREATE INDEX ON geoname ((lower(name)))

Regards

Pavel

Show quoted text

2018-01-16 17:40 GMT+01:00 Martin Moore <martin.moore@avbrief.com>:

Hi,

I have two tables in the same database: geoname and test_table.

The geoname table contains many columns which are: name, feature_class,

feature_code, admin1, admin2,admin3, name and so on.

The second table 'test_table' contains only the columns: city, state.

There is no join between the two tables and I want to make a match

between the data contained in each of them because I need the result for a
farther process.

I wrote this query:

select g.name, t.city

from geoname as g, test_table as t

where (lower(g.feature_class) like 'P' or lower(g.feature_class) like

'L')

and lower(g.country_code) like 'US'

and lower(g.admin1) like lower(t.state)

and (lower(g.name) like lower(t.city) or lower(g.name) like

lower(t.city || 'city'))

The table geoname contains 370260 rows and the table test_table contains

10270 rows.

The query took a long time to accomplish more than half an hour.Should I

add another column in the table test_table which contains the country_code
and make an inner join with the geoname table or >should I use indexs to
accelerate the process?

Indexes are your friends ☺

I’d certainly add indexes on lower(g.feature_class, g.country_code) and
lower(t.state)

Note “and lower(g.country_code) like 'US'” will not return any results as
you’ve got the ‘US’ in UPPER…. Same as (lower(g.feature_class) like 'P' or
lower(g.feature_class) like 'L')

Why are you using LIKE? Equals (=) is surely correct and probably faster?

Martin.

#6Alban Hertroys
haramrae@gmail.com
In reply to: hmidi slim (#1)
Re: OPtimize the performance of a query

On 16 Jan 2018, at 17:32, hmidi slim <hmidi.slim2@gmail.com> wrote:

Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class, feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match between the data contained in each of them because I need the result for a farther process.
I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city || 'city'))

That query is equivalent to:

select g.name, t.city
from geoname as g, test_table as t
where false or lower(g.name) = lower(t.city || 'city'));

So those are probably not the results you want.

At the very least, if you're lower-casing column contents, don't compare those to an upper-cased constant ;)
Also, AND has precedence over OR, which is the other reason why my equivalent query is so much shorter.
And finally, LIKE is equivalent to = (equals) without any wildcards.

The table geoname contains 370260 rows and the table test_table contains 10270 rows.
The query took a long time to accomplish more than half an hour.Should I add another column in the table test_table which contains the country_code and make an inner join with the geoname table or should I use indexs to accelerate the process?

Some indices on lower(g.name) and lower(t.city) from your query would be useful, but in that case make sure you take the concatenation of 'city' out of the lower()-call in your query.

Just reading your latest addition - using lower() on constants is just a waste of cycles. It won't hurt your query much though.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

#7hmidi slim
hmidi.slim2@gmail.com
In reply to: Pavel Stehule (#5)
Re: OPtimize the performance of a query

I changed the operator like and I'm using the operator = .I got the results
much faster but I still have another question about operator. For
difference should I use '<>' or 'is distinct from' with indexes?

2018-01-16 17:49 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:

Show quoted text

Hi

2018-01-16 17:44 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:

Sorry I forget the lower command when I wrote the code, it is like this:
lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or lower(g.feature_class) like
lower('L'))

please, don't do top post.

Your query must be slow. When you use LIKE instead =. It is terrible
performance issue.

So don't use "LIKE" is first rule. Second - you can create functional
indexes

CREATE INDEX ON geoname ((lower(name)))

Regards

Pavel

2018-01-16 17:40 GMT+01:00 Martin Moore <martin.moore@avbrief.com>:

Hi,

I have two tables in the same database: geoname and test_table.

The geoname table contains many columns which are: name, feature_class,

feature_code, admin1, admin2,admin3, name and so on.

The second table 'test_table' contains only the columns: city, state.

There is no join between the two tables and I want to make a match

between the data contained in each of them because I need the result for a
farther process.

I wrote this query:

select g.name, t.city

from geoname as g, test_table as t

where (lower(g.feature_class) like 'P' or lower(g.feature_class) like

'L')

and lower(g.country_code) like 'US'

and lower(g.admin1) like lower(t.state)

and (lower(g.name) like lower(t.city) or lower(g.name) like

lower(t.city || 'city'))

The table geoname contains 370260 rows and the table test_table

contains 10270 rows.

The query took a long time to accomplish more than half an hour.Should

I add another column in the table test_table which contains the
country_code and make an inner join with the geoname table or >should I use
indexs to accelerate the process?

Indexes are your friends ☺

I’d certainly add indexes on lower(g.feature_class, g.country_code) and
lower(t.state)

Note “and lower(g.country_code) like 'US'” will not return any results
as you’ve got the ‘US’ in UPPER…. Same as (lower(g.feature_class) like 'P'
or lower(g.feature_class) like 'L')

Why are you using LIKE? Equals (=) is surely correct and probably faster?

Martin.

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: hmidi slim (#7)
Re: OPtimize the performance of a query

Hi

2018-01-16 18:57 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:

I changed the operator like and I'm using the operator = .I got the
results much faster but I still have another question about operator. For
difference should I use '<>' or 'is distinct from' with indexes?

https://en.wikipedia.org/wiki/Posting_style#Top-posting .. please, don't do
it.

IS DISTINCT FROM has sense if your data - or your queries has NULL. If not,
and it is probably your case, then <> should be preferred.

Regards

Pavel

Show quoted text

2018-01-16 17:49 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:

Hi

2018-01-16 17:44 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:

Sorry I forget the lower command when I wrote the code, it is like this:
lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or lower(g.feature_class) like
lower('L'))

please, don't do top post.

Your query must be slow. When you use LIKE instead =. It is terrible
performance issue.

So don't use "LIKE" is first rule. Second - you can create functional
indexes

CREATE INDEX ON geoname ((lower(name)))

Regards

Pavel

2018-01-16 17:40 GMT+01:00 Martin Moore <martin.moore@avbrief.com>:

Hi,

I have two tables in the same database: geoname and test_table.

The geoname table contains many columns which are: name,

feature_class, feature_code, admin1, admin2,admin3, name and so on.

The second table 'test_table' contains only the columns: city, state.

There is no join between the two tables and I want to make a match

between the data contained in each of them because I need the result for a
farther process.

I wrote this query:

select g.name, t.city

from geoname as g, test_table as t

where (lower(g.feature_class) like 'P' or lower(g.feature_class) like

'L')

and lower(g.country_code) like 'US'

and lower(g.admin1) like lower(t.state)

and (lower(g.name) like lower(t.city) or lower(g.name) like

lower(t.city || 'city'))

The table geoname contains 370260 rows and the table test_table

contains 10270 rows.

The query took a long time to accomplish more than half an hour.Should

I add another column in the table test_table which contains the
country_code and make an inner join with the geoname table or >should I use
indexs to accelerate the process?

Indexes are your friends ☺

I’d certainly add indexes on lower(g.feature_class, g.country_code)
and lower(t.state)

Note “and lower(g.country_code) like 'US'” will not return any results
as you’ve got the ‘US’ in UPPER…. Same as (lower(g.feature_class) like 'P'
or lower(g.feature_class) like 'L')

Why are you using LIKE? Equals (=) is surely correct and probably
faster?

Martin.

#9hmidi slim
hmidi.slim2@gmail.com
In reply to: Pavel Stehule (#8)
Re: OPtimize the performance of a query

Thank you for your advices and thanks for all people who give me some best
practises and useful ideas.

#10Pavel Stehule
pavel.stehule@gmail.com
In reply to: hmidi slim (#9)
Re: OPtimize the performance of a query

2018-01-16 19:35 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:

Thank you for your advices and thanks for all people who give me some best
practises and useful ideas.

you are welcome

Regards

Pavel

#11Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: hmidi slim (#7)
Re: OPtimize the performance of a query

Hi Hmidi,

On 17/01/18 06:57, hmidi slim wrote:

I changed the operator like and I'm using the operator = .I got the
results much faster but I still have another question about operator.
For difference should I use '<>' or 'is distinct from' with indexes?

2018-01-16 17:49 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com
<mailto:pavel.stehule@gmail.com>>:

Hi

2018-01-16 17:44 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com
<mailto:hmidi.slim2@gmail.com>>:

Sorry I forget the lower command when I wrote the code, it is
like this: lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or
lower(g.feature_class) like lower('L'))

please, don't do top post.

[...]

Top posting is when you reply at the top of the email, rather than at
the bottom like this.

Bottom posting allows people to see the context before your reply.  You
can trim excess, or no longer relevant, content - but note the bits that
you have omitted with '[...]'

Bottom posting is preferred.

Cheers,
Gavin