Query much slower from php than psql or dbeaver

Started by Ekaterina Amezover 4 years ago4 messagesgeneral

ekaterina.amez@zunibal.com

over 4 years ago

Hi,

After receiving an Unknown Address error with *pgsql-php@postgresql.org* I've
discovered this mailing list is catalogued as Inactive, so I'm sending my
question to this list.

I've made a php cli script that downloads a file from FTP, loads it in a
table and compares against the same table in 2 different databases that are
connected to the main one with pgsql foreign data wrapper (so there are 3
DB involved via linked servers).

I've tested the query with psql and DBeaver and it takes only milliseconds:
it returns 39 records and now there's only 16000 records on the table but
I've tested it with <100K. When I've tested my php script the same query
takes 14 minutes to return (more or less). I've checked there are no
blocking processes while runnig this query, I've analyzed the table,
dropped and recreated, added indexes... and every time I run my script it
takes those 14 minutes.

The script opens the connection and after that makes all of the operations:
some checks, load the file with COPY sentence, move data with some
additional columns to final table, and after this and without dropping the
connection it runs this query. After this long query, script iterates over
the returned records to make some other queries for every record. And
finally there are 2 more queries, that makes the reverse search: the slow
query checks diferences or non-existing records from file data in the 2
linked databases, and the las 2 queries check non-existing records from
their tables in the file. Everything goes smooth except this long-lasting
query.

This is the query:
select t1.aaaa as maindb_aaaa, t1.bbb as maindb_bbb, t1.ccccc as
maindb_ccccc, t1.timestamp_create as maindb_create,
t1.timestamp_closed as maindb_close, t1.ddddddddd as maindb_ddddddddd,
null::text as db1_sth,
t2.eeeeeeee as db1_eeeeeeee, t2.ffffffff as db1_ffffffff, null::text as
db2_sth,
t3.eeeeeeee as db2_eeeeeeee, t3.ffffffff as db2_ffffffff
from table1 t1
left join database1_fdw.table2 t2 on t1.aaaa = t2.btatpd_aaaa
and t1.file = 'file_name.csv'
and t2.btatpd_fecha = '20220119120000'
and substring(t1.bbb from 1 for 3) in (<some_values>)
left join database2_fdw.table2 t3 on t1.aaaa = t3.btatpd_aaaa
and t1.fichero_origen = 'file_name.csv'
and t3.btatpd_fecha = '20220119120000'
and substring(t1.bbb from 1 for 3) in (<some_different_values)
where t1.ccccc = 'ACTIVE'
and t1.fichero_origen = 'file_name.csv'
and ((t2.eeeeeeee is null and t3.eeeeeeee is null)
or
(t2.eeeeeeee is not null and t1.ddddddddd <> t2.ffffffff)
or
(t3.eeeeeeee is not null and t1.ddddddddd <> t3.ffffffff)
)
order by t1.bbb nulls last;

I'm using PHP 5.3.3, with PG 9.6 over CentOS 6.8. I'm connecting to PG with
pg_connect and pg_query because here nobody is using PDO. And I have to
warn you that I'm not a PHP expert. Also I'm aware how old these versions
are, I'm trying to finish things to start with server migration.

This is the explain plan, in case it gives more info:
Sort (cost=5345.39..5385.35 rows=15986 width=212) (actual
time=112.315..112.318 rows=39 loops=1)
Output: t1.aaaa, t1.bbb, t1.ccccc, t1.timestamp_create,
t1.timestamp_closed, t1.ddddddddd, NULL::text, t2.eeeeeeee, t2.ffffffff,
NULL::text, t3.eeeeeeee, t3.ffffffff
Sort Key: t1.bbb
Sort Method: quicksort Memory: 29kB
Buffers: shared hit=1255
-> Hash Left Join (cost=237.54..2587.70 rows=15986 width=212) (actual
time=81.272..112.248 rows=39 loops=1)
Output: t1.aaaa, t1.bbb, t1.ccccc, t1.timestamp_create,
t1.timestamp_closed, t1.ddddddddd, NULL::text, t2.eeeeeeee, t2.ffffffff,
NULL::text, t3.eeeeeeee, t3.ffffffff
Hash Cond: ((t1.aaaa)::text = (t3.btatpd_aaaa)::text)
Join Filter: (((t1.file)::text = 'file_name.csv'::text) AND
("substring"((t1.bbb)::text, 1, 3) = ANY ('{<some_values>}'::text[])))
Rows Removed by Join Filter: 1
Filter: (((t2.eeeeeeee IS NULL) AND (t3.eeeeeeee IS NULL)) OR
((t2.eeeeeeee IS NOT NULL) AND (t1.ddddddddd <> t2.ffffffff)) OR
((t3.eeeeeeee IS NOT NULL) AND (t1.ddddddddd <> t3.ffffffff)))
Rows Removed by Filter: 15794
Buffers: shared hit=1252
-> Hash Left Join (cost=118.77..2408.88 rows=15988 width=149)
(actual time=71.890..101.261 rows=15820 loops=1)
Output: t1.aaaa, t1.bbb, t1.ccccc, t1.timestamp_create,
t1.timestamp_closed, t1.ddddddddd, t1.file, t2.eeeeeeee, t2.ffffffff
Hash Cond: ((t1.aaaa)::text = (t2.btatpd_aaaa)::text)
Join Filter: (((t1.file)::text = 'file_name.csv'::text) AND
("substring"((t1.bbb)::text, 1, 3) = ANY
('{<some_different_values>}'::text[])))
Buffers: shared hit=1252
-> Seq Scan on public.table1 t1 (cost=0.00..2230.06
rows=15988 width=103) (actual time=0.176..19.882 rows=15817 loops=1)
Output: t1.id, t1.aaaa, t1.bbb, t1.ccccc,
t1.timestamp_create, t1.timestamp_closed, t1.sbd_bundle_id, t1.ddddddddd,
t1.file, t1.fecha_carga
Filter: (((t1.file)::text = 'file_name.csv'::text) AND
((t1.ccccc)::text = 'ACTIVE'::text))
Rows Removed by Filter: 49387
Buffers: shared hit=1252
-> Hash (cost=118.73..118.73 rows=3 width=94) (actual
time=71.699..71.699 rows=14244 loops=1)
Output: t2.eeeeeeee, t2.ffffffff, t2.btatpd_aaaa
Buckets: 16384 (originally 1024) Batches: 1
(originally 1) Memory Usage: 1074kB
-> Foreign Scan on database1_fdw.table2 t2
(cost=100.00..118.73 rows=3 width=94) (actual time=3.961..67.271
rows=14244 loops=1)
Output: t2.eeeeeeee, t2.ffffffff, t2.btatpd_aaaa
Remote SQL: SELECT eeeeeeee, btatpd_aaaa,
ffffffff FROM public.table2 WHERE ((btatpd_fecha = '20220119120000'::text))
-> Hash (cost=118.73..118.73 rows=3 width=94) (actual
time=6.512..6.513 rows=3051 loops=1)
Output: t3.eeeeeeee, t3.ffffffff, t3.btatpd_aaaa
Buckets: 4096 (originally 1024) Batches: 1 (originally 1)
Memory Usage: 235kB
-> Foreign Scan on database2_fdw.table2 t3
(cost=100.00..118.73 rows=3 width=94) (actual time=1.494..5.857 rows=3051
loops=1)
Output: t3.eeeeeeee, t3.ffffffff, t3.btatpd_aaaa
Remote SQL: SELECT eeeeeeee, btatpd_aaaa, ffffffff FROM
public.table2 WHERE ((btatpd_fecha = '20220119120000'::text))
Planning time: 2.137 ms
Execution time: 123.077 ms

Anyone can tell me why is this happening and if is there a solution to this?

Thank you for your time.

*Ekaterina Amez González*

ekaterina.amez@zunibal.com

*ZUNIBAL* | Idorsolo 1, 48160-Derio, Spain
<https://maps.google.com/?q=Idorsolo+1,+48160-Derio,+Spain&entry=gmail&source=g>

Tel: +34 944 977 010 <+34%20944%2097%2070%2010> | Fax: +34 944 522 81|
www.zunibal.com

*Advertencia Legal. El contenido de este mensaje y de toda la documentación
anexa es confidencial y va dirigido únicamente al destinatario del mismo.
Si usted no fuera el destinatario le solicitamos que nos informe y no
comunique su contenido a terceros, procediendo a su destrucción. / Legal
advice. This message contains confidential information for the exclusive
use of the recipient. Any unauthorised disclosure, use or dissemination,
either whole or partial, is prohibited. If you are not the intended
recipient of the message, please notify the sender as soon as possible. En
cumplimiento de la Ley Orgánica 15/1999 de Protección de Datos de Carácter
Personal, le informamos que los Datos Personales de usted que están en
nuestra Base de Datos recabados con su consentimiento, forman parte de un
fichero automatizado registrado en la Agencia Española de Protección de
Datos. Estos datos sólo serán utilizados para realizar una correcta gestión
de nuestra relación comercial. Si lo desea podrá ejercitar en todo momento
los derechos de acceso, cancelación u oposición, remitiendo un correo
electrónico a esta dirección. / According to the Law 15/1999 of Personal
Data Protection, we inform your data are included in our data
base with your approval are registered in the spanish Agency of
Data Protection. **These data are only used for our usual business
relationship. If you want you are in the right of accessing or cancelling
them, just sending an e-mail to this address.*

*Antes de imprimir este mensaje, asegúrese de que es necesario hacerlo. /
Before printing this email, assess if it is really needed.*

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Ekaterina Amez (#1)

Re: Query much slower from php than psql or dbeaver

Ekaterina Amez <ekaterina.amez@zunibal.com> writes:

I've tested the query with psql and DBeaver and it takes only milliseconds:
it returns 39 records and now there's only 16000 records on the table but
I've tested it with <100K. When I've tested my php script the same query
takes 14 minutes to return (more or less).

A plausible theory is that it's not really the same query, but differs
in having some values presented as parameters not literal constants.
Depending on exactly how you do it, that can confuse the planner
leading to a poor execution plan and a long runtime.

I'd first try enabling log_statement on the server to see if the
query is really being presented exactly the same way. Another
thing worth trying is auto_explain, to capture the plans actually
being used.

regards, tom lane

Michael Lewis

mlewis@entrata.com

over 4 years ago

In reply to: Tom Lane (#2)

Re: Query much slower from php than psql or dbeaver

When dealing with foreign tables, I believe planning is not the same
because of access to statistics (maybe has improved since 9.6 though). I
just wonder... Would it be a viable option to create a materialized view
using the FDW but then use the PHP script against the local tables only?
Materialized views are not maintained automatically, but you have local
statistics and can create indexes. Just a thought in case the data is not
changing constantly and this might fit the need.

Also, it seems like perhaps the foreign queries might be more targeted if
some data was encouraged to be pre-computed. What would be the expected row
count from just table1?

Note- your explain plan doesn't mention "fichero_origen" field name. Is
that just a copy/paste error?

with cte_interesting_t1_rows_precomputed AS materialized(
select
t1.*,
substring(t1.bbb from 1 for 3) in (<some_values>) AND t1.file =
'file_name.csv' AS needs_t2,
substring(t1.bbb from 1 for 3) in (<some_different_values>) AS needs_t3
FROM
table1 t1
where t1.ccccc = 'ACTIVE'
and t1.fichero_origen = 'file_name.csv'
)

select t1.aaaa as maindb_aaaa, t1.bbb as maindb_bbb, t1.ccccc as
maindb_ccccc, t1.timestamp_create as maindb_create,
t1.timestamp_closed as maindb_close, t1.ddddddddd as maindb_ddddddddd,
null::text as db1_sth,
t2.eeeeeeee as db1_eeeeeeee, t2.ffffffff as db1_ffffffff, null::text as
db2_sth,
t3.eeeeeeee as db2_eeeeeeee, t3.ffffffff as db2_ffffffff
from cte_interesting_t1_rows_precomputed AS t1
left join database1_fdw.table2 AS t2 on t1.aaaa = t2.btatpd_aaaa and
t2.btatpd_fecha = '20220119120000' AND needs_t2
left join database2_fdw.table2 AS t3 on t1.aaaa = t3.btatpd_aaaa and
t3.btatpd_fecha = '20220119120000' AND needs_t3
where
(t2.eeeeeeee is null and t3.eeeeeeee is null)
or
(t2.eeeeeeee is not null and t1.ddddddddd <> t2.ffffffff)
or
(t3.eeeeeeee is not null and t1.ddddddddd <> t3.ffffffff)
order by t1.bbb nulls last;

Ekaterina Amez

ekaterina.amez@zunibal.com

over 4 years ago

In reply to: Michael Lewis (#3)

Re: Query much slower from php than psql or dbeaver

El vie, 21 ene 2022 a las 5:04, Michael Lewis (<mlewis@entrata.com>)
escribió:

When dealing with foreign tables, I believe planning is not the same
because of access to statistics (maybe has improved since 9.6 though). I
just wonder... Would it be a viable option to create a materialized view
using the FDW but then use the PHP script against the local tables only?
Materialized views are not maintained automatically, but you have local
statistics and can create indexes. Just a thought in case the data is not
changing constantly and this might fit the need.

Also, it seems like perhaps the foreign queries might be more targeted if
some data was encouraged to be pre-computed. What would be the expected row
count from just table1?

Note- your explain plan doesn't mention "fichero_origen" field name. Is
that just a copy/paste error?

It's a column renaming mistake.
t1.fichero_origen = t1.file (it's the column where incoming file name is
stored).

with cte_interesting_t1_rows_precomputed AS materialized(
select
t1.*,
substring(t1.bbb from 1 for 3) in (<some_values>) AND t1.file =
'file_name.csv' AS needs_t2,
substring(t1.bbb from 1 for 3) in (<some_different_values>) AS needs_t3
FROM
table1 t1
where t1.ccccc = 'ACTIVE'
and t1.fichero_origen = 'file_name.csv'
)

select t1.aaaa as maindb_aaaa, t1.bbb as maindb_bbb, t1.ccccc as
maindb_ccccc, t1.timestamp_create as maindb_create,
t1.timestamp_closed as maindb_close, t1.ddddddddd as maindb_ddddddddd,
null::text as db1_sth,
t2.eeeeeeee as db1_eeeeeeee, t2.ffffffff as db1_ffffffff, null::text
as db2_sth,
t3.eeeeeeee as db2_eeeeeeee, t3.ffffffff as db2_ffffffff
from cte_interesting_t1_rows_precomputed AS t1
left join database1_fdw.table2 AS t2 on t1.aaaa = t2.btatpd_aaaa and
t2.btatpd_fecha = '20220119120000' AND needs_t2
left join database2_fdw.table2 AS t3 on t1.aaaa = t3.btatpd_aaaa and
t3.btatpd_fecha = '20220119120000' AND needs_t3
where
(t2.eeeeeeee is null and t3.eeeeeeee is null)
or
(t2.eeeeeeee is not null and t1.ddddddddd <> t2.ffffffff)
or
(t3.eeeeeeee is not null and t1.ddddddddd <> t3.ffffffff)
order by t1.bbb nulls last;

I'm doing some tests after reading carefully your answer and Tom's, and
I'll post back.
Thank you both.