How to speed up product code and subcode match

Started by Andrusalmost 3 years ago5 messagesgeneral
Jump to latest
#1Andrus
kobruleht2@hot.ee

Hi!

Price list of main products vordlusajuhinnak contains 3 prices for
product (column toode) and has 39433 products:

    create table vordlusajuhinnak( toode varchar(60), n2 numeric(8,2),
n3 numeric(8,2), n4 numeric(8,2) );

toode column in unique, may be primary key in table and contains upper
case letters, digits and - characters.

product table (toode) contains 733021 products:

    CREATE TABLE toode (
        grupp character(1),
        toode character(60) primary key,
        ... lot of other columns
      );

Both tables have pattern indexes to speed up queries:

    CREATE INDEX toode_toode_pattern_idx
        ON toode (toode bpchar_pattern_ops ASC NULLS LAST);
    -- This index is probably not used, should removed:
    CREATE INDEX vordlusajuhinnak_toode_pattern_idx ON
vordlusajuhinnak(toode bpchar_pattern_ops);

Product table as both main products and subproducts with sizes. Size is
separated by main product code always by / character:

    SHOE1-BLACK
    SHOE1-BLACK/38
    SHOE1-BLACK/41
    SHOE1-BLACK/42
    SHOE1-BLACK/43
    SHOE2/XXL
    SHOE2/L

Product codes contain upper characers only in this table also.

Trying to get prices for all products using

    create table peatoode as
    select toode.toode , n2, n3, n4
    from toode, vordlusajuhinnak
    where  toode.toode between vordlusajuhinnak.toode and
vordlusajuhinnak.toode||'/z'

Takes 4.65 hours. How to speed this query up?
Output from explain:

    > "Nested Loop  (cost=0.55..272273178.12 rows=3202240012 width=78)" "
    > ->  Seq Scan on vordlusajuhinnak  (cost=0.00..721.33 rows=39433
width=32)" "  ->  Index Only Scan using toode_pkey on toode
    > (cost=0.55..6092.62 rows=81207 width=60)" "        Index Cond: (toode
    > >= (vordlusajuhinnak.toode)::bpchar)" "        Filter:
((toode)::text <= ((vordlusajuhinnak.toode)::text || '/z'::text))"

Using  PostgreSQL 13.2, compiled by Visual C++ build 1900, 64-bit in
Windows server and psqlODBC driver.
Upgrading Postgres is possible, if this helps.

Tried also using like:

    WHERE toode.toode=vordlusajuhinnak.toode OR
      toode.toode LIKE vordlusajuhinnak.toode||'/%'

Posted also in

https://stackoverflow.com/questions/76311957/how-to-match-character-columns-with-separated-subcolumns

Andrus.

#2Andrus
kobruleht2@hot.ee
In reply to: Andrus (#1)
Re: How to speed up product code and subcode match

Hi!

I ran

analyze toode;

create index vordlusajuhinnak_toode_pattern_idx on
vordlusajuhinnak(toode bpchar_pattern_ops);

create index vordlusajuhinnak_toode_idx on vordlusajuhinnak(toode);
analyze vordlusajuhinnak;

Select runs now more than one hour. Output from explain

explain create table peatoode as
select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
WHERE toode.toode=vordlusajuhinnak.toode OR  toode.toode LIKE
vordlusajuhinnak.toode||'/%'

"Gather  (cost=1000.55..443361894.55 rows=143828910 width=78)"
"  Workers Planned: 2"
"  ->  Nested Loop  (cost=0.55..428978003.55 rows=59928712 width=78)"
"        Join Filter: ((toode.toode = (vordlusajuhinnak.toode)::bpchar)
OR (toode.toode ~~ ((vordlusajuhinnak.toode)::text || '/%'::text)))"
"        ->  Parallel Index Only Scan using toode_pkey on toode 
(cost=0.55..95017.93 rows=303869 width=60)"
"        ->  Seq Scan on vordlusajuhinnak  (cost=0.00..721.33 rows=39433
width=32)"

with

Set enable_nestloop to off;

explain output is:

"Gather  (cost=10000001000.55..10443361906.55 rows=143828910 width=78)"
"  Workers Planned: 2"
"  ->  Nested Loop  (cost=10000000000.55..10428978015.55 rows=59928712
width=78)"
"        Join Filter: ((toode.toode = (vordlusajuhinnak.toode)::bpchar)
OR (toode.toode ~~ ((vordlusajuhinnak.toode)::text || '/%'::text)))"
"        ->  Parallel Index Only Scan using toode_pkey on toode 
(cost=0.55..95029.93 rows=303869 width=60)"
"        ->  Seq Scan on vordlusajuhinnak  (cost=0.00..721.33 rows=39433
width=32)"

How to speed it up?

Andrus.

23.05.2023 14:32 Bzm@g kirjutas:

Show quoted text

Great,

However I think it is still way to slow.
Next step is to run analyze also for the other table  vordlusajuhinnak.

And make sure you have an index on vordlusajuhinnak.toode similar to
the index on toode.toode

--
Boris

Am 23.05.2023 um 12:56 schrieb Andrus <kobruleht2@hot.ee>:



Hi!

I ran analyze firma2.toode and changed where clause to use like:

create table peatoode as
select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
WHERE toode.toode=vordlusajuhinnak.toode OR  toode.toode LIKE
vordlusajuhinnak.toode||'/%'

In this case it took 37 minutes, returned 277966 rows.

Thank you for help.

Andrus.

23.05.2023 11:24 Bzm@g kirjutas:

Also your row count is way off I guess. Did you ever run analyze bigtable?

--
Boris

Am 23.05.2023 um 10:22 schriebbzm@2bz.de:

Hi there,

I guess the main problem is the nested loop.

As a quick recheck what happened if you run your query Without nested loop?

This is not a solution but a quickt test

In a Session

Set enable_nestedloop = off;
Explain Select your query ;
--
Boris

Am 23.05.2023 um 08:53 schrieb Andrus<kobruleht2@hot.ee>:

Hi!

Price list of main products vordlusajuhinnak contains 3 prices for product (column toode) and has 39433 products:

create table vordlusajuhinnak( toode varchar(60), n2 numeric(8,2), n3 numeric(8,2), n4 numeric(8,2) );

toode column in unique, may be primary key in table and contains upper case letters, digits and - characters.

product table (toode) contains 733021 products:

CREATE TABLE toode (
grupp character(1),
toode character(60) primary key,
... lot of other columns
);

Both tables have pattern indexes to speed up queries:

CREATE INDEX toode_toode_pattern_idx
ON toode (toode bpchar_pattern_ops ASC NULLS LAST);
-- This index is probably not used, should removed:
CREATE INDEX vordlusajuhinnak_toode_pattern_idx ON vordlusajuhinnak(toode bpchar_pattern_ops);

Product table as both main products and subproducts with sizes. Size is separated by main product code always by / character:

SHOE1-BLACK
SHOE1-BLACK/38
SHOE1-BLACK/41
SHOE1-BLACK/42
SHOE1-BLACK/43
SHOE2/XXL
SHOE2/L

Product codes contain upper characers only in this table also.

Trying to get prices for all products using

create table peatoode as
select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
where toode.toode between vordlusajuhinnak.toode and vordlusajuhinnak.toode||'/z'

Takes 4.65 hours. How to speed this query up?
Output from explain:

"Nested Loop (cost=0.55..272273178.12 rows=3202240012 width=78)""
-> Seq Scan on vordlusajuhinnak (cost=0.00..721.33 rows=39433 width=32)" " -> Index Only Scan using toode_pkey on toode
(cost=0.55..6092.62 rows=81207 width=60)" " Index Cond: (toode

= (vordlusajuhinnak.toode)::bpchar)" " Filter: ((toode)::text <= ((vordlusajuhinnak.toode)::text || '/z'::text))"

Using PostgreSQL 13.2, compiled by Visual C++ build 1900, 64-bit in Windows server and psqlODBC driver.
Upgrading Postgres is possible, if this helps.

Tried also using like:

WHERE toode.toode=vordlusajuhinnak.toode OR
toode.toode LIKE vordlusajuhinnak.toode||'/%'

Posted also in

https://stackoverflow.com/questions/76311957/how-to-match-character-columns-with-separated-subcolumns

Andrus.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrus (#1)
Re: How to speed up product code and subcode match

Andrus <kobruleht2@hot.ee> writes:

Product table as both main products and subproducts with sizes. Size is
separated by main product code always by / character:

    SHOE1-BLACK
    SHOE1-BLACK/38
    SHOE1-BLACK/41
    SHOE1-BLACK/42
    SHOE1-BLACK/43
    SHOE2/XXL
    SHOE2/L

You could probably have devised a worse data representation if
you really tried, but it would have taken some effort. Separate
the product code and size into two columns --- if there's somebody
who really wants to see them in the above format, give them a
view or generated column. Then instead of the impossible-to-optimize
queries you showed, you could do something like

select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
where toode.toode = vordlusajuhinnak.toode;

Anytime you're trying to join two tables on something that isn't
a plain equality condition (or ANDed conditions), you're in for
a world of hurt.

regards, tom lane

#4Andrus
kobruleht2@hot.ee
In reply to: Tom Lane (#3)
Re: How to speed up product code and subcode match

Hi!

Separate
the product code and size into two columns --- if there's somebody
who really wants to see them in the above format, give them a
view or generated column. Then instead of the impossible-to-optimize
queries you showed, you could do something like

select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
where toode.toode = vordlusajuhinnak.toode;

Can function index

create index on toode ( *split_part( toode, '/',1) *)

and query

select toode.toode , n2, n3, n4

from toode, vordlusajuhinnak

where *split_part( toode.toode, '/',1) *= vordlusajuhinnak.toode;

used and keeping existing table structure? Functional index should
produce same speed improvement as using separate column?

Andrus.

#5Andrus
kobruleht2@hot.ee
In reply to: Andrus (#4)
Re: How to speed up product code and subcode match

Hi!

Using index

create index on toode ( *split_part( toode, '/',1) *)

and query

select toode.toode , n2, n3, n4

from toode, vordlusajuhinnak

where *split_part( toode.toode, '/',1) *= vordlusajuhinnak.toode;

reduces run time to 5 minutes.

Andrus.

23.05.2023 17:26 Andrus kirjutas:

Show quoted text

Hi!

Separate
the product code and size into two columns --- if there's somebody
who really wants to see them in the above format, give them a
view or generated column. Then instead of the impossible-to-optimize
queries you showed, you could do something like

select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
where toode.toode = vordlusajuhinnak.toode;

Can function index

create index on toode ( *split_part( toode, '/',1) *)

and query

select toode.toode , n2, n3, n4

from toode, vordlusajuhinnak

where *split_part( toode.toode, '/',1) *= vordlusajuhinnak.toode;

used and keeping existing table structure? Functional index should
produce same speed improvement as using separate column?

Andrus.