BUG #16148: Query on Large table hangs in ETL flows and gives out of memory when run in pgAdmin4

Started by PG Bug reporting formover 6 years ago4 messagesbugs

noreply@postgresql.org

over 6 years ago

The following bug has been logged on the website:

Bug reference: 16148
Logged by: Scott Volkers
Email address: scottvolkers@gmail.com
PostgreSQL version: 9.5.15
Operating system: PostgreSQL 9.5.15 on x86_64-pc-linux-gnu, compiled
Description:

We have a large table and the error occurs with this where clause:
FROM "elliedb"."documentlog" WHERE dcmodifiedutc>(extract(epoch from
TIMESTAMP '2019-11-15 11:30:51')*1000)

When we reduce the scope to current time - 4 hours the query works within 44
seconds.
where dcmodifiedutc > '1575282651000'

Is this expected? Is this a version issue being only 9.5? It seems the
timestamp conversion would be done once and applied to the filter, but it
seems to ballooning the query result being aggregated for the where
clause?
Thank you.

Jeff Janes

jeff.janes@gmail.com

over 6 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #16148: Query on Large table hangs in ETL flows and gives out of memory when run in pgAdmin4

On Wed, Dec 4, 2019 at 9:20 AM PG Bug reporting form <noreply@postgresql.org>
wrote:

The following bug has been logged on the website:

Bug reference: 16148
Logged by: Scott Volkers
Email address: scottvolkers@gmail.com
PostgreSQL version: 9.5.15
Operating system: PostgreSQL 9.5.15 on x86_64-pc-linux-gnu, compiled
Description:

We have a large table and the error occurs with this where clause:
FROM "elliedb"."documentlog" WHERE dcmodifiedutc>(extract(epoch from
TIMESTAMP '2019-11-15 11:30:51')*1000)

When we reduce the scope to current time - 4 hours the query works within
44
seconds.
where dcmodifiedutc > '1575282651000'

Is this expected? Is this a version issue being only 9.5?

From "Now minus 4" hours to now covers 100 fold less time than from
2019-11-15 11:30:51 until now does. Assuming your data is evenly
distributed over the past and doesn't have data from the future, then I
think that yes, selecting 100 time more data is expected to take more time
and more memory. pgAdmin4 is not well suited to loading giant data sets
into memory. You can extract large data sets directly into files. This
will not depend on the version.

It seems the
timestamp conversion would be done once and applied to the filter, but it
seems to ballooning the query result being aggregated for the where
clause?

Is aggregation being used? You haven't shown any aggregation.

Cheers,

Jeff

Scott Volkers

scottvolkers@gmail.com

over 6 years ago

In reply to: Jeff Janes (#2)

Re: BUG #16148: Query on Large table hangs in ETL flows and gives out of memory when run in pgAdmin4

Hi Jeff,

I may not have explained this well.

The long and the short of it is this where clause
FROM "elliedb"."documentlog" WHERE dcmodifiedutc>(extract(epoch from
TIMESTAMP '2019-11-15 11:30:51')*1000)

causes and out of memory error in PGAdmin. The query will not run. I am
testing it there because the same query will not run in Informatica ETL
task flow. It hangs our processes. No error returns in the Informatica
situation.

My reference to aggregation is the presumption of what Postgres Sql engine
is doing with the production of a result set.

On Wed, Dec 4, 2019 at 5:05 PM Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Dec 4, 2019 at 9:20 AM PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 16148
Logged by: Scott Volkers
Email address: scottvolkers@gmail.com
PostgreSQL version: 9.5.15
Operating system: PostgreSQL 9.5.15 on x86_64-pc-linux-gnu, compiled
Description:

We have a large table and the error occurs with this where clause:
FROM "elliedb"."documentlog" WHERE dcmodifiedutc>(extract(epoch from
TIMESTAMP '2019-11-15 11:30:51')*1000)

When we reduce the scope to current time - 4 hours the query works within
44
seconds.
where dcmodifiedutc > '1575282651000'

Is this expected? Is this a version issue being only 9.5?

From "Now minus 4" hours to now covers 100 fold less time than from
2019-11-15 11:30:51 until now does. Assuming your data is evenly
distributed over the past and doesn't have data from the future, then I
think that yes, selecting 100 time more data is expected to take more time
and more memory. pgAdmin4 is not well suited to loading giant data sets
into memory. You can extract large data sets directly into files. This
will not depend on the version.

It seems the
timestamp conversion would be done once and applied to the filter, but it
seems to ballooning the query result being aggregated for the where
clause?

Is aggregation being used? You haven't shown any aggregation.

Cheers,

Jeff

--
Thanks,

Scott Volkers

Jeff Janes

jeff.janes@gmail.com

over 6 years ago

In reply to: Scott Volkers (#3)

Re: BUG #16148: Query on Large table hangs in ETL flows and gives out of memory when run in pgAdmin4

On Thu, Dec 5, 2019 at 9:54 AM Scott Volkers <scottvolkers@gmail.com> wrote:

Hi Jeff,

I may not have explained this well.

The long and the short of it is this where clause
FROM "elliedb"."documentlog" WHERE dcmodifiedutc>(extract(epoch from
TIMESTAMP '2019-11-15 11:30:51')*1000)

causes and out of memory error in PGAdmin. The query will not run. I am
testing it there because the same query will not run in Informatica ETL
task flow. It hangs our processes.

WHERE clauses do not get executed. Whole statements do. We can't verify
or investigate a bug report based on non-executable fragments. I can just
make up a "select count(*)" to go in front, but when I do I observe no bug.

Based on the information provided, the most likely explanation is that you
are trying to read a huge result set into client memory, and you don't have
enough memory to do so. If you want to manipulate a huge result in
pgAdmin4 or Informatica without reading them into client memory, then that
is not a bug in PostgreSQL, it is question about pgAdmin4 or Informatica.

Cheers,

Jeff