(VERY) Slow Query - PostgreSQL 9.2

Started by drum.lucas@gmail.comalmost 10 years ago4 messagesgeneral
Jump to latest
#1drum.lucas@gmail.com
drum.lucas@gmail.com

Hi all,

I'm trying to get the query below a better performance.. but just don't
know what else I can do...

Please, have a look and let me know if you can help somehow.. also.. if you
need some extra data jet ask me please.

* Note that the gorfs.inode_segments table is 1.7TB size

I have the following Query:

explain analyzeSELECT split_part(full_path, '/', 4)::INT AS account_id,
split_part(full_path, '/', 6)::INT AS note_id,
split_part(full_path, '/', 9)::TEXT AS variation,
st_size,
segment_index,
reverse(split_part(reverse(full_path), '/', 1)) as file_name,
i.st_ino,
full_path,
(i.st_size / 1000000::FLOAT)::NUMERIC(5,2) || 'MB' AS
size_mbFROM gorfs.inodes iJOIN gorfs.inode_segments s
ON i.st_ino = s.st_ino_targetWHERE
i.checksum_md5 IS NOT NULL
AND s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
AND i.st_size > 0;
split_part(s.full_path, '/', 4)::INT IN (
SELECT account.id
FROM public.ja_clients AS account
WHERE
NOT (
((account.last_sub_pay > EXTRACT('epoch' FROM
(transaction_timestamp() - CAST('4 Months' AS INTERVAL)))) AND
(account.price_model > 0)) OR
(account.regdate > EXTRACT('epoch' FROM
(transaction_timestamp() - CAST('3 Month' AS INTERVAL)))) OR
(((account.price_model = 0) AND (account.jobcredits >
0)) AND (account.last_login > EXTRACT('epoch' FROM
(transaction_timestamp() - CAST('4 Month' AS INTERVAL)))))
) LIMIT 100);

- Explain analyze link: http://explain.depesz.com/s/Oc6

The query is taking ages, and I can't get the problem solved.

These are the index I've already created on the inode_segments table:

Indexes:
"ix_account_id_from_full_path" "btree"
(("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE
"full_path"::"text" ~
'^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'::"text"
"ix_inode_segments_ja_files_lookup" "btree" ((CASE
WHEN "full_path"::"text" ~ '/[^/]*\.[^/]*$'::"text" THEN
"upper"("regexp_replace"("full_path"::"text", '.*\.'::"text",
''::"text", 'g'::"text"))
ELSE NULL::"text"END)) WHERE
"gorfs"."is_kaminski_note_path"("full_path"::"text")
"ix_inode_segments_notes_clientids" "btree"
(("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE
"gorfs"."is_kaminski_note_path"("full_path"::"text")
"ix_inode_segments_notes_clientids2" "btree" ("full_path")
"ix_inode_segments_notes_fileids" "btree"
(("split_part"("full_path"::"text", '/'::"text", 8)::integer)) WHERE
"gorfs"."is_kaminski_note_path"("full_path"::"text")
"ix_inode_segments_notes_noteids" "btree"
((NULLIF("split_part"("full_path"::"text", '/'::"text", 6),
'unassigned'::"text")::integer)) WHERE
"gorfs"."is_kaminski_note_path"("full_path"::"text")

These are the index I've already created on the inodes table:

Indexes:
"ix_inodes_checksum_st_size" "btree" ("checksum_md5", "st_size")
WHERE "checksum_md5" IS NOT NULL

*Question:*

What else can I do to improve the Performance of the Query?

#2Bill Moran
wmoran@potentialtech.com
In reply to: drum.lucas@gmail.com (#1)
Re: (VERY) Slow Query - PostgreSQL 9.2

On Tue, 3 May 2016 21:55:21 +1200
"drum.lucas@gmail.com" <drum.lucas@gmail.com> wrote:

Hi all,

I'm trying to get the query below a better performance.. but just don't
know what else I can do...

Please, have a look and let me know if you can help somehow.. also.. if you
need some extra data jet ask me please.

* Note that the gorfs.inode_segments table is 1.7TB size

I have the following Query:

explain analyzeSELECT split_part(full_path, '/', 4)::INT AS account_id,
split_part(full_path, '/', 6)::INT AS note_id,
split_part(full_path, '/', 9)::TEXT AS variation,
st_size,
segment_index,
reverse(split_part(reverse(full_path), '/', 1)) as file_name,
i.st_ino,
full_path,
(i.st_size / 1000000::FLOAT)::NUMERIC(5,2) || 'MB' AS
size_mbFROM gorfs.inodes iJOIN gorfs.inode_segments s
ON i.st_ino = s.st_ino_targetWHERE
i.checksum_md5 IS NOT NULL
AND
AND i.st_size > 0;
split_part(s.full_path, '/', 4)::INT IN (
SELECT account.id
FROM public.ja_clients AS account
WHERE
NOT (
((account.last_sub_pay > EXTRACT('epoch' FROM
(transaction_timestamp() - CAST('4 Months' AS INTERVAL)))) AND
(account.price_model > 0)) OR
(account.regdate > EXTRACT('epoch' FROM
(transaction_timestamp() - CAST('3 Month' AS INTERVAL)))) OR
(((account.price_model = 0) AND (account.jobcredits >
0)) AND (account.last_login > EXTRACT('epoch' FROM
(transaction_timestamp() - CAST('4 Month' AS INTERVAL)))))
) LIMIT 100);

- Explain analyze link: http://explain.depesz.com/s/Oc6

The query is taking ages, and I can't get the problem solved.

These are the index I've already created on the inode_segments table:

Indexes:
"ix_account_id_from_full_path" "btree"
(("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE
"full_path"::"text" ~
'^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'::"text"
"ix_inode_segments_ja_files_lookup" "btree" ((CASE
WHEN "full_path"::"text" ~ '/[^/]*\.[^/]*$'::"text" THEN
"upper"("regexp_replace"("full_path"::"text", '.*\.'::"text",
''::"text", 'g'::"text"))
ELSE NULL::"text"END)) WHERE
"gorfs"."is_kaminski_note_path"("full_path"::"text")
"ix_inode_segments_notes_clientids" "btree"
(("split_part"("full_path"::"text", '/'::"text", 4)::integer)) WHERE
"gorfs"."is_kaminski_note_path"("full_path"::"text")
"ix_inode_segments_notes_clientids2" "btree" ("full_path")
"ix_inode_segments_notes_fileids" "btree"
(("split_part"("full_path"::"text", '/'::"text", 8)::integer)) WHERE
"gorfs"."is_kaminski_note_path"("full_path"::"text")
"ix_inode_segments_notes_noteids" "btree"
((NULLIF("split_part"("full_path"::"text", '/'::"text", 6),
'unassigned'::"text")::integer)) WHERE
"gorfs"."is_kaminski_note_path"("full_path"::"text")

These are the index I've already created on the inodes table:

Indexes:
"ix_inodes_checksum_st_size" "btree" ("checksum_md5", "st_size")
WHERE "checksum_md5" IS NOT NULL

*Question:*

What else can I do to improve the Performance of the Query?

From the explain, it looks like the biggest pain point is the
inode_segments table, specifically, this condition:
s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
It's doing a full scan of every record in that table, which is
a large number, and that regex can't be cheap over that kind of
volume.

If you do:
SELECT count(*)
FROM inode_segments
WHERE full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+';
how many tuples actually match that condition? If the number
is a large percentage of the total table, then I'm not sure
how to help you, but if the percentage is small, you might
be able to speed things up by adding an index:

CREATE INDEX is_fp_trunc_idx ON inode_segments(substring(full_path FROM 1 FOR 19));

Then adding this condition to the where clause:
substring(s.full_path FROM 1 FOR 19) = '/userfiles/account/'

There are other index combinations that may help as well,
depending on the nature of the values in that table, but,
in general, anything you can do to reduce the number of
records that have to be examined in that table is liable
to speed things up.

--
Bill Moran

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Alban Hertroys
haramrae@gmail.com
In reply to: drum.lucas@gmail.com (#1)
Re: (VERY) Slow Query - PostgreSQL 9.2

On 03 May 2016, at 11:55, drum.lucas@gmail.com wrote:

Hi all,

I'm trying to get the query below a better performance.. but just don't know what else I can do...

Please, have a look and let me know if you can help somehow.. also.. if you need some extra data jet ask me please.

* Note that the gorfs.inode_segments table is 1.7TB size

I have the following Query:

explain analyze

SELECT split_part(full_path, '/', 4)::INT AS account_id,
split_part(full_path, '/', 6)::INT AS note_id,
split_part(full_path, '/', 9)::TEXT AS variation,
st_size,
segment_index,
reverse(split_part(reverse(full_path), '/', 1)) as file_name,
i.st_ino,
full_path,
(i.st_size / 1000000::FLOAT)::NUMERIC(5,2) || 'MB' AS size_mb
FROM gorfs.inodes i
JOIN gorfs.inode_segments s
ON i.st_ino = s.st_ino_target
WHERE i.checksum_md5 IS NOT NULL
AND s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
AND i.st_size > 0;

(Stripped the 1-and-a-half extra queries in there, but that incomplete one might be why you're waiting?)

• Explain analyze link: http://explain.depesz.com/s/Oc6
The query is taking ages, and I can't get the problem solved.

These are the index I've already created on the inode_segments table:

What else can I do to improve the Performance of the Query?

The first thing I notice in your query is that you're making use of hierarchically organised data without storing it hierarchically, namely that full_path field. The result of that is that both your table and your index contain a lot of redundant information.

Now I'm not so sure a hierarchical table + query are going to help get you much performance out of this (probably worth an experiment or two, mind that O/S's usually use inode trees for such things), but reducing the redundancy in the index would probably help:

create index gorfs.inodes_accounts_idx on gorfs.inodes (substring (full_path from 20)) where full_path like '/userfiles/account/%';

and then use similar expressions in your query of course:

where full_path like '/userfiles/account/%' and substring(full_path from 20) ~ '^[0-9]+/[a-z]+/[0-9]+';

Good luck!
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Mike Sofen
msofen@runbox.com
In reply to: drum.lucas@gmail.com (#1)
Re: (VERY) Slow Query - PostgreSQL 9.2

From: drum.lucas@gmail.com <mailto:drum.lucas@gmail.com> Sent: Tuesday, May 03, 2016 2:55 AM
I'm trying to get the query below a better performance.. but just don't know what else I can do...

Please, have a look and let me know if you can help somehow.. also.. if you need some extra data jet ask me please.

* Note that the gorfs.inode_segments table is 1.7TB size

I have the following Query:

explain analyze
SELECT split_part(full_path, '/', 4)::INT AS account_id,
split_part(full_path, '/', 6)::INT AS note_id,
split_part(full_path, '/', 9)::TEXT AS variation,
st_size,
segment_index,
reverse(split_part(reverse(full_path), '/', 1)) as file_name,
i.st_ino,
full_path,
(i.st_size / 1000000::FLOAT)::NUMERIC(5,2) || 'MB' AS size_mb
FROM gorfs.inodes i
JOIN gorfs.inode_segments s
ON i.st_ino = s.st_ino_target
WHERE
i.checksum_md5 IS NOT NULL
AND s.full_path ~ '^/userfiles/account/[0-9]+/[a-z]+/[0-9]+'
AND i.st_size > 0;
split_part(s.full_path, '/', 4)::INT IN (

SELECT account.id
FROM public.ja_clients AS account
WHERE
NOT (
((account.last_sub_pay > EXTRACT('epoch' FROM (transaction_timestamp() - CAST('4 Months' AS INTERVAL)))) AND (account.price_model > 0)) OR
(account.regdate > EXTRACT('epoch' FROM (transaction_timestamp() - CAST('3 Month' AS INTERVAL)))) OR
(((account.price_model = 0) AND (account.jobcredits > 0)) AND (account.last_login > EXTRACT('epoch' FROM (transaction_timestamp() - CAST('4 Month' AS INTERVAL)))))
) LIMIT 100
);

There is one obvious solution: restructure your data, since it is not in a “standard” form but you’re trying to query it as if it were…you are turning your long full_path string into columns…if performance is a concern, that overhead has to be eliminated.

Your two choices would be to either restructure this table directly (requiring a change in app code that was filling it), or use it to fill a proper table that already has everything decomposed from the long full_path string via post-processing after the insert. A third consideration would be to archive off older/unneeded rows to a history table to reduce row counts. This is about proper structure.

Mike Sofen