Why so long?

Started by Steve Clarkalmost 9 years ago6 messagesgeneral

steve.clark@netwolves.com

almost 9 years ago

Hello,

I am confused. I have a table that has an incrementing primary key id.

When I select max(id) from table is returns almost instantly but
when I select min(id) from table it takes longer than I want to wait.

Shouldn't postgresql be able to quickly find the minimum id value in the index?

pmacct=# explain select max(id) from netflow;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Result (cost=1.13..1.14 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.71..1.13 rows=1 width=8)
-> Index Only Scan Backward using netflow_pkey on netflow (cost=0.71..3799108784.10 rows=9123246080 width=8)
Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# explain select min(id) from netflow;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Result (cost=1.13..1.14 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.71..1.13 rows=1 width=8)
-> Index Only Scan using netflow_pkey on netflow (cost=0.71..3799108784.10 rows=9123246080 width=8)
Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# \timing
Timing is on.
pmacct=# select max(id) from netflow;
max
-------------
17547256873
(1 row)

Time: 0.626 ms

pmacct=# select min(id) from netflow;
^CCancel request sent
ERROR: canceling statement due to user request
Time: 339114.334 ms

Table "public.netflow"
Column | Type | Modifiers
----------------+-----------------------------+---------------------------------------------------------------------
id | bigint | not null default nextval('netflow_id_seq'::regclass)
agent_id | bigint |
bytes | bigint |
stamp_inserted | timestamp without time zone | not null default '0001-01-01 00:00:00'::timestamp without time zone
stamp_updated | timestamp without time zone |
packets | integer | default 0
port_src | integer | default 0
port_dst | integer | default 0
ip_proto | smallint | default 0
tos | smallint | default 0
ip_src | inet | not null default '0.0.0.0'::inet
ip_dst | inet | not null default '0.0.0.0'::inet
Indexes:
"netflow_pkey" PRIMARY KEY, btree (id)
"netflow_ts_key" btree (stamp_inserted)
"netflow_tsu_idx" btree (stamp_updated)
Triggers:
netflow_import AFTER INSERT OR UPDATE ON netflow FOR EACH STATEMENT EXECUTE PROCEDURE netflow_update()

--

steve.clark@netwolves.com

almost 9 years ago

In reply to: Steve Clark (#1)

Re: Why so long?

Should add this is version 9.4.10 of postgresql

On 04/19/2017 11:24 AM, Steve Clark wrote:

Hello,

I am confused. I have a table that has an incrementing primary key id.

When I select max(id) from table is returns almost instantly but
when I select min(id) from table it takes longer than I want to wait.

Shouldn't postgresql be able to quickly find the minimum id value in the index?

pmacct=# explain select max(id) from netflow;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Result (cost=1.13..1.14 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.71..1.13 rows=1 width=8)
-> Index Only Scan Backward using netflow_pkey on netflow (cost=0.71..3799108784.10 rows=9123246080 width=8)
Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# explain select min(id) from netflow;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Result (cost=1.13..1.14 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.71..1.13 rows=1 width=8)
-> Index Only Scan using netflow_pkey on netflow (cost=0.71..3799108784.10 rows=9123246080 width=8)
Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# \timing
Timing is on.
pmacct=# select max(id) from netflow;
max
-------------
17547256873
(1 row)

Time: 0.626 ms

pmacct=# select min(id) from netflow;
^CCancel request sent
ERROR: canceling statement due to user request
Time: 339114.334 ms

Table "public.netflow"
Column | Type | Modifiers
----------------+-----------------------------+---------------------------------------------------------------------
id | bigint | not null default nextval('netflow_id_seq'::regclass)
agent_id | bigint |
bytes | bigint |
stamp_inserted | timestamp without time zone | not null default '0001-01-01 00:00:00'::timestamp without time zone
stamp_updated | timestamp without time zone |
packets | integer | default 0
port_src | integer | default 0
port_dst | integer | default 0
ip_proto | smallint | default 0
tos | smallint | default 0
ip_src | inet | not null default '0.0.0.0'::inet
ip_dst | inet | not null default '0.0.0.0'::inet
Indexes:
"netflow_pkey" PRIMARY KEY, btree (id)
"netflow_ts_key" btree (stamp_inserted)
"netflow_tsu_idx" btree (stamp_updated)
Triggers:
netflow_import AFTER INSERT OR UPDATE ON netflow FOR EACH STATEMENT EXECUTE PROCEDURE netflow_update()

--

--
Stephen Clark
*NetWolves Managed Services, LLC.*
Director of Technology
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
http://www.netwolves.com

jeff.janes@gmail.com

almost 9 years ago

In reply to: Steve Clark (#1)

Re: Why so long?

On Wed, Apr 19, 2017 at 8:24 AM, Steve Clark <steve.clark@netwolves.com>
wrote:

Hello,

I am confused. I have a table that has an incrementing primary key id.

When I select max(id) from table is returns almost instantly but
when I select min(id) from table it takes longer than I want to wait.

Shouldn't postgresql be able to quickly find the minimum id value in the
index?

Not if the low end of the index is stuffed full of obsolete entries, which
haven't been cleaned up because it is not being vacuumed often enough.

Do you have autovacuum on? Have you manually vacuumed the table recently?

Cheers,

Jeff

steve.clark@netwolves.com

almost 9 years ago

In reply to: Jeff Janes (#3)

Re: Why so long?

On 04/19/2017 11:57 AM, Jeff Janes wrote:

On Wed, Apr 19, 2017 at 8:24 AM, Steve Clark <steve.clark@netwolves.com <mailto:steve.clark@netwolves.com>> wrote:

Hello,

I am confused. I have a table that has an incrementing primary key id.

When I select max(id) from table is returns almost instantly but
when I select min(id) from table it takes longer than I want to wait.

Shouldn't postgresql be able to quickly find the minimum id value in the index?

Not if the low end of the index is stuffed full of obsolete entries, which haven't been cleaned up because it is not being vacuumed often enough.

Do you have autovacuum on? Have you manually vacuumed the table recently?

Cheers,

Jeff

Hi Jeff,

Autovacuum is turned on.

schemaname | relname | last_vacuum | last_autovacuum | vacuum_count | autovacuum_count
------------+-----------------------+-------------+-------------------------------+--------------+------------------
public | netflow | | 2017-04-11 01:18:53.261221-04 | 0 | 1

It is a large table.
select pg_size_pretty(pg_relation_size('netflow'));
pg_size_pretty
----------------
1267 GB

select pg_size_pretty(pg_total_relation_size('netflow_pkey'));
pg_size_pretty
----------------
287 GB

Regards,
Steve

dgrowleyml@gmail.com

almost 9 years ago

In reply to: Steve Clark (#1)

Re: Why so long?

On 20 April 2017 at 03:24, Steve Clark <steve.clark@netwolves.com> wrote:

pmacct=# explain select min(id) from netflow;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Result (cost=1.13..1.14 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.71..1.13 rows=1 width=8)
-> Index Only Scan using netflow_pkey on netflow
(cost=0.71..3799108784.10 rows=9123246080 width=8)
Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# \timing
Timing is on.
pmacct=# select max(id) from netflow;
max
-------------
17547256873
(1 row)

Time: 0.626 ms

pmacct=# select min(id) from netflow;
^CCancel request sent
ERROR: canceling statement due to user request
Time: 339114.334 ms

It may help the diagnosis if you run an EXPLAIN (ANALYZE, BUFFERS)
version of this query to completion.

The index pages could simply be cold and coming from disk on a very
much I/O starved system.

More recently added pages are more likely to be cached.

You may also want to consider running the EXPLAIN (ANALYZE, BUFFERS)
after having SET track_io_timing = on;

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

rafia.sabih@enterprisedb.com

almost 9 years ago

In reply to: Steve Clark (#1)

Re: Why so long?

On Wed, Apr 19, 2017 at 8:54 PM, Steve Clark <steve.clark@netwolves.com>
wrote:

Hello,

I am confused. I have a table that has an incrementing primary key id.

When I select max(id) from table is returns almost instantly but
when I select min(id) from table it takes longer than I want to wait.

Shouldn't postgresql be able to quickly find the minimum id value in the
index?

pmacct=# explain select max(id) from netflow;
QUERY
PLAN
------------------------------------------------------------
--------------------------------------------------------------
Result (cost=1.13..1.14 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.71..1.13 rows=1 width=8)
-> Index Only Scan Backward using netflow_pkey on netflow
(cost=0.71..3799108784.10 rows=9123246080 width=8)
Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# explain select min(id) from netflow;
QUERY
PLAN
------------------------------------------------------------
-----------------------------------------------------
Result (cost=1.13..1.14 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.71..1.13 rows=1 width=8)
-> Index Only Scan using netflow_pkey on netflow
(cost=0.71..3799108784.10 rows=9123246080 width=8)
Index Cond: (id IS NOT NULL)
(5 rows)

pmacct=# \timing
Timing is on.
pmacct=# select max(id) from netflow;
max
-------------
17547256873
(1 row)

Time: 0.626 ms

pmacct=# select min(id) from netflow;
^CCancel request sent
ERROR: canceling statement due to user request
Time: 339114.334 ms

Table "public.netflow"
Column | Type
| Modifiers
----------------+-----------------------------+-------------
--------------------------------------------------------
id | bigint | not null default
nextval('netflow_id_seq'::regclass)
agent_id | bigint |
bytes | bigint |
stamp_inserted | timestamp without time zone | not null default
'0001-01-01 00:00:00'::timestamp without time zone
stamp_updated | timestamp without time zone |
packets | integer | default 0
port_src | integer | default 0
port_dst | integer | default 0
ip_proto | smallint | default 0
tos | smallint | default 0
ip_src | inet | not null default
'0.0.0.0'::inet
ip_dst | inet | not null default
'0.0.0.0'::inet
Indexes:
"netflow_pkey" PRIMARY KEY, btree (id)
"netflow_ts_key" btree (stamp_inserted)
"netflow_tsu_idx" btree (stamp_updated)
Triggers:
netflow_import AFTER INSERT OR UPDATE ON netflow FOR EACH STATEMENT
EXECUTE PROCEDURE netflow_update()

Is this some one-time event or you could reproduce it consistently?

--
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/