strange sql behavior

Started by Yu Nieabout 10 years ago17 messagesgeneral
Jump to latest
#1Yu Nie
nieyu04@gmail.com

Hi there,

Recently I am working with a large amount of taxis GIS data and had
encountered some weird performance issues. I am hoping someone in this
community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I have two
separate such tables, one stores a month of data with about 700 million
rows, another stores about 10 days of data with about 300 million rows.
The two tables have the exactly same schema and indexes. There are two
indexes: one on taxiid (text), and the other on the time stamp (date
time). In order to process the data, I need to get all points for a single
taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much faster
for the large table than for the small table, which seems to contradict
with intuition. At the end of message you may find explain (analyze
buffer) results of two particular queries for the same taxiid (one for each
table). You can see that it took much longer (more than 20 times) to get
20k rows from the small table than to get 44 k rows from the large table.
Interestingly it seems that the planner does expect about 1/3 work for the
small table query - yet for some reason, it took much longer to fetch the
rows from the small table. Why there is such a huge performance between
the two seemingly identical queries executed on two different tables?

Is is because the data on the second table is on some mysteriously "broken
part" of the disk? what else could explain such a bizarre behavior? Your
help is greatly appreciated.

The above behavior is consistent through all queries. Another issue I
identified is that for the large table, the query can use the shared buffer
more effectively. For example, after you query one taxiid and immediately
following that query run the same query for another taxi whose id ranks
right behind the first id, then shared hit buffers would be quite high (and
the query would run much faster); this however never works for the small
table.

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish. The planning
time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27 rows=22101
width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Index Scan on data2013_01w_ixtaxiid (cost=0.00..510.33
rows=22101 width=0) (actual time=26.053..26.053 rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The planning
time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53 rows=65512
width=55) (actual time=36.935..5017.463 rows=44204 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid (cost=0.00..1503.92
rows=65512 width=0) (actual time=35.792..35.792 rows=44204 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

#2Melvin Davidson
melvin6925@gmail.com
In reply to: Yu Nie (#1)
Re: strange sql behavior

One thing to look at is the last time both tables were vacuumed/analyzed.

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round( current_setting('autovacuum_vacuum_threshold')::integer +
current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ' "your_small_table", "your_large_table"'
ORDER by 1, 2;

Also, please confirm the indexes for both tables are using the same method
(btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@gmail.com> wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and had
encountered some weird performance issues. I am hoping someone in this
community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I have two
separate such tables, one stores a month of data with about 700 million
rows, another stores about 10 days of data with about 300 million rows.
The two tables have the exactly same schema and indexes. There are two
indexes: one on taxiid (text), and the other on the time stamp (date
time). In order to process the data, I need to get all points for a single
taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much faster
for the large table than for the small table, which seems to contradict
with intuition. At the end of message you may find explain (analyze
buffer) results of two particular queries for the same taxiid (one for each
table). You can see that it took much longer (more than 20 times) to get
20k rows from the small table than to get 44 k rows from the large table.
Interestingly it seems that the planner does expect about 1/3 work for the
small table query - yet for some reason, it took much longer to fetch the
rows from the small table. Why there is such a huge performance between
the two seemingly identical queries executed on two different tables?

Is is because the data on the second table is on some mysteriously "broken
part" of the disk? what else could explain such a bizarre behavior? Your
help is greatly appreciated.

The above behavior is consistent through all queries. Another issue I
identified is that for the large table, the query can use the shared buffer
more effectively. For example, after you query one taxiid and immediately
following that query run the same query for another taxi whose id ranks
right behind the first id, then shared hit buffers would be quite high (and
the query would run much faster); this however never works for the small
table.

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish. The planning
time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27 rows=22101
width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Index Scan on data2013_01w_ixtaxiid
(cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053
rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The planning
time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid
(cost=0.00..1503.92 rows=65512 width=0) (actual time=35.792..35.792
rows=44204 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#3Yu Nie
nieyu04@gmail.com
In reply to: Melvin Davidson (#2)
Re: strange sql behavior

Hi Melvin,

Thanks a lot for your reply. I ran the query you suggested and here are
the results

Large table: "public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
Small table: "public";"data2013_01w";300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

I did the "analyze" on the small table once it was created and indexed. The
table was not subject to any "update/delete/insert" operations since it was
created this morning.

Also, yes, the indexes for both tables are created using exactly same
methods: btree. (they were created using the same piece of code through a
C++ api).

Best, Marco

On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

Show quoted text

One thing to look at is the last time both tables were vacuumed/analyzed.

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round( current_setting('autovacuum_vacuum_threshold')::integer +
current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ' "your_small_table", "your_large_table"'
ORDER by 1, 2;

Also, please confirm the indexes for both tables are using the same method
(btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@gmail.com> wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and had
encountered some weird performance issues. I am hoping someone in this
community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I have two
separate such tables, one stores a month of data with about 700 million
rows, another stores about 10 days of data with about 300 million rows.
The two tables have the exactly same schema and indexes. There are two
indexes: one on taxiid (text), and the other on the time stamp (date
time). In order to process the data, I need to get all points for a single
taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much faster
for the large table than for the small table, which seems to contradict
with intuition. At the end of message you may find explain (analyze
buffer) results of two particular queries for the same taxiid (one for each
table). You can see that it took much longer (more than 20 times) to get
20k rows from the small table than to get 44 k rows from the large table.
Interestingly it seems that the planner does expect about 1/3 work for the
small table query - yet for some reason, it took much longer to fetch the
rows from the small table. Why there is such a huge performance between
the two seemingly identical queries executed on two different tables?

Is is because the data on the second table is on some mysteriously
"broken part" of the disk? what else could explain such a bizarre
behavior? Your help is greatly appreciated.

The above behavior is consistent through all queries. Another issue I
identified is that for the large table, the query can use the shared buffer
more effectively. For example, after you query one taxiid and immediately
following that query run the same query for another taxi whose id ranks
right behind the first id, then shared hit buffers would be quite high (and
the query would run much faster); this however never works for the small
table.

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish. The planning
time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Index Scan on data2013_01w_ixtaxiid
(cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053
rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The planning
time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid
(cost=0.00..1503.92 rows=65512 width=0) (actual time=35.792..35.792
rows=44204 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#4Bill Moran
wmoran@potentialtech.com
In reply to: Yu Nie (#3)
Re: strange sql behavior

Came a little late to the thread, see many comments inline below:

On Mon, 1 Feb 2016 13:16:13 -0600
Yu Nie <nieyu04@gmail.com> wrote:

Thanks a lot for your reply. I ran the query you suggested and here are
the results

Large table: "public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
Small table: "public";"data2013_01w";300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

You didn't do Melvin's query correctly. He specified 11 columns to
select, but you only show 10. Since you don't show the query you
actually ran, we don't know which of the numeric columns is missing.

More information inline below:

On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

One thing to look at is the last time both tables were vacuumed/analyzed.

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round( current_setting('autovacuum_vacuum_threshold')::integer +
current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ' "your_small_table", "your_large_table"'
ORDER by 1, 2;

Also, please confirm the indexes for both tables are using the same method
(btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@gmail.com> wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and had
encountered some weird performance issues. I am hoping someone in this
community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I have two
separate such tables, one stores a month of data with about 700 million
rows, another stores about 10 days of data with about 300 million rows.
The two tables have the exactly same schema and indexes. There are two
indexes: one on taxiid (text), and the other on the time stamp (date
time). In order to process the data, I need to get all points for a single
taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much faster
for the large table than for the small table, which seems to contradict
with intuition. At the end of message you may find explain (analyze
buffer) results of two particular queries for the same taxiid (one for each
table). You can see that it took much longer (more than 20 times) to get
20k rows from the small table than to get 44 k rows from the large table.
Interestingly it seems that the planner does expect about 1/3 work for the
small table query - yet for some reason, it took much longer to fetch the
rows from the small table. Why there is such a huge performance between
the two seemingly identical queries executed on two different tables?

Is is because the data on the second table is on some mysteriously
"broken part" of the disk? what else could explain such a bizarre
behavior? Your help is greatly appreciated.

The above behavior is consistent through all queries. Another issue I
identified is that for the large table, the query can use the shared buffer
more effectively. For example, after you query one taxiid and immediately
following that query run the same query for another taxi whose id ranks
right behind the first id, then shared hit buffers would be quite high (and
the query would run much faster); this however never works for the small
table.

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish. The planning
time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"

^^
Note that despite this table being smaller, Postgres had to read 19816
blocks from disk, which is 2.5x more than the larger table.

" -> Bitmap Index Scan on data2013_01w_ixtaxiid
(cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053
rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The planning
time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid
(cost=0.00..1503.92 rows=65512 width=0) (actual time=35.792..35.792
rows=44204 loops=1)"

^^
Note that the larger table took LONGER to do the index work than the
smaller table, which probably means there's nothing wrong with your
disks or anything ... that's the behavior I would expect.

" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

So, what I'm seeing, is that Postgres is able to figure out _which_ rows
to fetch faster on the small table than the large table, which is what
you would expect, since a smaller index should be faster than a large one.

However, when it goes to actually fetch the row data, it takes
significantly longer on the small table, despite the fact that it's
only fetching 1/3 as many rows. It is, however, doing 2.5x as many
disk reads to get those rows: For the large table, it reads 61MB from
disk, but it reads 160MB to get all the data for the smaller table.

How the data was inserted into each table could lead to similar data
being clustered on common pages on the large table, while it's spread
across many more pages on the small table.

That still doesn't explain it all, though. 2.5x the disk
activity normally wouldn't equate to 28x the time required. Unless
you're disks are horrifically slow? Does this server have a lot of
other activity against the disks? I.e. are other people running
queries that you would have to contend with, or is the server a VM
sharing storage with other VMs, or even a combined use server that
has to share disk access with (for example) a web or mail server
as well? Is the performance difference consistently ~28x?

Other things: what is shared_buffers set to? The queries would seem
to indicate that this server has less than 1M of those two tables
cached in memory at the time you ran those queries, which seems to
suggest that either you've got shared_buffers set very low, or that
there are a lot of other tables that other queries are accessing at
the time you're running these. Perhaps installing pg_buffercache to
have a look at what's using your shared_buffers would be helpful.

--
Bill Moran

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Melvin Davidson
melvin6925@gmail.com
In reply to: Bill Moran (#4)
Re: strange sql behavior

Thanks Bill.Also, it's very important to include the headers with the queries!
Marco,There is no top secret information that is requested, so please do not edit the output.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone-------- Original message --------From: Bill Moran <wmoran@potentialtech.com> Date: 2/1/2016 14:41 (GMT-05:00) To: Yu Nie <nieyu04@gmail.com> Cc: Melvin Davidson <melvin6925@gmail.com>, pgsql-general@postgresql.org Subject: Re: [GENERAL] strange sql behavior

Came a little late to the thread, see many comments inline below:

On Mon, 1 Feb 2016 13:16:13 -0600
Yu Nie <nieyu04@gmail.com> wrote:

Thanks  a lot for your reply. I ran the query you suggested and here are
the results

Large table: "public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
Small table:  "public";"data2013_01w";300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

You didn't do Melvin's query correctly. He specified 11 columns to
select, but you only show 10. Since you don't show the query you
actually ran, we don't know which of the numeric columns is missing.

More information inline below:

On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

One thing to look at is the last time both tables were vacuumed/analyzed.

SELECT n.nspname,
        s.relname,
        c.reltuples::bigint,
        n_tup_ins,
        n_tup_upd,
        n_tup_del,
        date_trunc('second', last_vacuum) as last_vacuum,
        date_trunc('second', last_autovacuum) as last_autovacuum,
        date_trunc('second', last_analyze) as last_analyze,
        date_trunc('second', last_autoanalyze) as last_autoanalyze
        ,
        round( current_setting('autovacuum_vacuum_threshold')::integer +
current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
   FROM pg_stat_all_tables s
   JOIN pg_class c ON c.oid = s.relid
   JOIN pg_namespace n ON (n.oid = c.relnamespace)
  WHERE s.relname NOT LIKE 'pg_%'
    AND s.relname NOT LIKE 'sql_%'
    AND s.relname IN  ' "your_small_table", "your_large_table"'
  ORDER by 1, 2;

Also, please confirm the indexes for both tables are using the same method
(btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@gmail.com> wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and had
encountered some weird performance issues.  I am hoping someone in this
community can help me figure it out.

The taxi data were loaded in 5 minute block into a table.  I have two
separate such tables, one stores a month of data with about 700 million
rows, another stores about 10 days of data with about 300 million rows.
The two tables have the exactly same schema and indexes. There are two
indexes: one on taxiid (text), and the other on the time stamp (date
time).  In order to process the data, I need to get all points for a single
taxis; to do that, I use something like:
  select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much faster
for the large table than for the small table, which seems to contradict
with intuition.   At the end of message you may find explain (analyze
buffer) results of two particular queries for the same taxiid (one for each
table). You can see that it took much longer (more than 20 times) to get
20k rows from the small table than to get 44 k rows from the large table.
Interestingly it seems that the planner does expect about 1/3 work for the
small table query - yet for some reason, it took much longer to fetch the
rows from the small table.   Why there is such a huge performance between
the two seemingly identical queries executed on two different tables?

Is is because the data on the second table is on some mysteriously
"broken part" of the disk?  what else could explain such a bizarre
behavior?  Your help is greatly appreciated.

The above behavior is consistent through all queries.   Another issue I
identified is that for the large table, the query can use the shared buffer
more effectively.  For example, after you query one taxiid and immediately
following that query run the same query for another taxi whose id ranks
right behind the first id, then shared hit buffers would be quite high (and
the query would run much faster); this however never works for the small
table.

Thanks   a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish.  The planning
time is 85256.31

"Sort  (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
"  Sort Key: "time""
"  Sort Method: quicksort  Memory: 3622kB"
"  Buffers: shared hit=92 read=19816"
"  ->  Bitmap Heap Scan on data2013_01w  (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
"        Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
"        Heap Blocks: exact=19826"
"        Buffers: shared hit=92 read=19816"

^^
Note that despite this table being smaller, Postgres had to read 19816
blocks from disk, which is 2.5x more than the larger table.

"        ->  Bitmap Index Scan on data2013_01w_ixtaxiid
(cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053
rows=20288 loops=1)"
"              Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
"              Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish.  The planning
time is 252077.10
"Sort  (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
"  Sort Key: "time""
"  Sort Method: quicksort  Memory: 7753kB"
"  Buffers: shared hit=2 read=7543"
"  ->  Bitmap Heap Scan on data2011_01  (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204 loops=1)"
"        Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
"        Heap Blocks: exact=7372"
"        Buffers: shared hit=2 read=7543"
"        ->  Bitmap Index Scan on data2011_01_ixtaxiid
(cost=0.00..1503.92 rows=65512 width=0) (actual time=35.792..35.792
rows=44204 loops=1)"

^^
Note that the larger table took LONGER to do the index work than the
smaller table, which probably means there's nothing wrong with your
disks or anything ... that's the behavior I would expect.

"              Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
"              Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

So, what I'm seeing, is that Postgres is able to figure out _which_ rows
to fetch faster on the small table than the large table, which is what
you would expect, since a smaller index should be faster than a large one.

However, when it goes to actually fetch the row data, it takes
significantly longer on the small table, despite the fact that it's
only fetching 1/3 as many rows. It is, however, doing 2.5x as many
disk reads to get those rows: For the large table, it reads 61MB from
disk, but it reads 160MB to get all the data for the smaller table.

How the data was inserted into each table could lead to similar data
being clustered on common pages on the large table, while it's spread
across many more pages on the small table.

That still doesn't explain it all, though. 2.5x the disk
activity normally wouldn't equate to 28x the time required. Unless
you're disks are horrifically slow? Does this server have a lot of
other activity against the disks? I.e. are other people running
queries that you would have to contend with, or is the server a VM
sharing storage with other VMs, or even a combined use server that
has to share disk access with (for example) a web or mail server
as well? Is the performance difference consistently ~28x?

Other things: what is shared_buffers set to? The queries would seem
to indicate that this server has less than 1M of those two tables
cached in memory at the time you ran those queries, which seems to
suggest that either you've got shared_buffers set very low, or that
there are a lot of other tables that other queries are accessing at
the time you're running these. Perhaps installing pg_buffercache to
have a look at what's using your shared_buffers would be helpful.

--
Bill Moran

#6Yu Nie
nieyu04@gmail.com
In reply to: Melvin Davidson (#5)
Re: strange sql behavior

Thanks, Bill and Melvin!

Just some quick note/answers before I absorb all the information provided
by Bill.

1. I don't expect many users running queries against the tables, especially
for the small table - since I just created it this morning, and nobody know
about it except myself.

2. The setting in the config:

shared_buffers = 512MB # min 128kB
work_mem = 128MB # min 64kB

3. I am running a Windows 7 with 24 GB RAM. and my postgresal is 9.4.

4. here is the query I ran:

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round( current_setting('autovacuum_vacuum_threshold')::integer +
current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ('data2013_01w', 'data2011_01')
ORDER by 1, 2;

I copied the result from PGAdmin directly, here it is again.:

public;"data2011_01";784236864;784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
public;"data2013_01w";300786432;300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

On Mon, Feb 1, 2016 at 1:45 PM, melvin6925 <melvin6925@gmail.com> wrote:

Show quoted text

Thanks Bill.
Also, it's very important to include the headers with the queries!

Marco,
There is no top secret information that is requested, so please do not
edit the output.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Bill Moran <wmoran@potentialtech.com>
Date: 2/1/2016 14:41 (GMT-05:00)
To: Yu Nie <nieyu04@gmail.com>
Cc: Melvin Davidson <melvin6925@gmail.com>, pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Came a little late to the thread, see many comments inline below:

On Mon, 1 Feb 2016 13:16:13 -0600
Yu Nie <nieyu04@gmail.com> wrote:

Thanks a lot for your reply. I ran the query you suggested and here are
the results

Large table: "public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
Small table: "public";"data2013_01w";300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

You didn't do Melvin's query correctly. He specified 11 columns to
select, but you only show 10. Since you don't show the query you
actually ran, we don't know which of the numeric columns is missing.

More information inline below:

On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

One thing to look at is the last time both tables were

vacuumed/analyzed.

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round( current_setting('autovacuum_vacuum_threshold')::integer +
current_setting('autovacuum_vacuum_scale_factor')::numeric *

C.reltuples)

AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ' "your_small_table", "your_large_table"'
ORDER by 1, 2;

Also, please confirm the indexes for both tables are using the same

method

(btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@gmail.com> wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and had
encountered some weird performance issues. I am hoping someone in

this

community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I have two
separate such tables, one stores a month of data with about 700

million

rows, another stores about 10 days of data with about 300 million

rows.

The two tables have the exactly same schema and indexes. There are two
indexes: one on taxiid (text), and the other on the time stamp (date
time). In order to process the data, I need to get all points for a

single

taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much

faster

for the large table than for the small table, which seems to

contradict

with intuition. At the end of message you may find explain (analyze
buffer) results of two particular queries for the same taxiid (one

for each

table). You can see that it took much longer (more than 20 times) to

get

20k rows from the small table than to get 44 k rows from the large

table.

Interestingly it seems that the planner does expect about 1/3 work

for the

small table query - yet for some reason, it took much longer to fetch

the

rows from the small table. Why there is such a huge performance

between

the two seemingly identical queries executed on two different tables?

Is is because the data on the second table is on some mysteriously
"broken part" of the disk? what else could explain such a bizarre
behavior? Your help is greatly appreciated.

The above behavior is consistent through all queries. Another issue

I

identified is that for the large table, the query can use the shared

buffer

more effectively. For example, after you query one taxiid and

immediately

following that query run the same query for another taxi whose id

ranks

right behind the first id, then shared hit buffers would be quite

high (and

the query would run much faster); this however never works for the

small

table.

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish. The

planning

time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777 rows=20288

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"

^^
Note that despite this table being smaller, Postgres had to read 19816
blocks from disk, which is 2.5x more than the larger table.

" -> Bitmap Index Scan on data2013_01w_ixtaxiid
(cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053
rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The

planning

time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid
(cost=0.00..1503.92 rows=65512 width=0) (actual time=35.792..35.792
rows=44204 loops=1)"

^^
Note that the larger table took LONGER to do the index work than the
smaller table, which probably means there's nothing wrong with your
disks or anything ... that's the behavior I would expect.

" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

So, what I'm seeing, is that Postgres is able to figure out _which_ rows
to fetch faster on the small table than the large table, which is what
you would expect, since a smaller index should be faster than a large one.

However, when it goes to actually fetch the row data, it takes
significantly longer on the small table, despite the fact that it's
only fetching 1/3 as many rows. It is, however, doing 2.5x as many
disk reads to get those rows: For the large table, it reads 61MB from
disk, but it reads 160MB to get all the data for the smaller table.

How the data was inserted into each table could lead to similar data
being clustered on common pages on the large table, while it's spread
across many more pages on the small table.

That still doesn't explain it all, though. 2.5x the disk
activity normally wouldn't equate to 28x the time required. Unless
you're disks are horrifically slow? Does this server have a lot of
other activity against the disks? I.e. are other people running
queries that you would have to contend with, or is the server a VM
sharing storage with other VMs, or even a combined use server that
has to share disk access with (for example) a web or mail server
as well? Is the performance difference consistently ~28x?

Other things: what is shared_buffers set to? The queries would seem
to indicate that this server has less than 1M of those two tables
cached in memory at the time you ran those queries, which seems to
suggest that either you've got shared_buffers set very low, or that
there are a lot of other tables that other queries are accessing at
the time you're running these. Perhaps installing pg_buffercache to
have a look at what's using your shared_buffers would be helpful.

--
Bill Moran

#7Melvin Davidson
melvin6925@gmail.com
In reply to: Yu Nie (#6)
Re: strange sql behavior

And were _your_ queries run on the same day at the same time within a few seconds of each other?
Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone-------- Original message --------From: Yu Nie <nieyu04@gmail.com> Date: 2/1/2016 15:01 (GMT-05:00) To: melvin6925 <melvin6925@gmail.com> Cc: Bill Moran <wmoran@potentialtech.com>, pgsql-general@postgresql.org Subject: Re: [GENERAL] strange sql behavior
Thanks, Bill and Melvin!

Just some quick note/answers before I absorb all the information provided by Bill.

1. I don't expect many users running queries against the tables, especially for the small table - since I just created it this morning, and nobody know about it except myself. 

2. The setting in the config:

shared_buffers = 512MB            # min 128kB
work_mem = 128MB                # min 64kB

3. I am running a Windows 7 with 24 GB RAM.  and my postgresal is 9.4.

4.  here is the query I ran:

SELECT n.nspname,
       s.relname,
       c.reltuples::bigint,
       n_tup_ins,
       n_tup_upd,
       n_tup_del,
       date_trunc('second', last_vacuum) as last_vacuum,
       date_trunc('second', last_autovacuum) as last_autovacuum,
       date_trunc('second', last_analyze) as last_analyze,
       date_trunc('second', last_autoanalyze) as last_autoanalyze
       ,
       round( current_setting('autovacuum_vacuum_threshold')::integer + current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples) AS av_threshold
  FROM pg_stat_all_tables s
  JOIN pg_class c ON c.oid = s.relid
  JOIN pg_namespace n ON (n.oid = c.relnamespace)
 WHERE s.relname NOT LIKE 'pg_%'
   AND s.relname NOT LIKE 'sql_%'
   AND s.relname IN  ('data2013_01w', 'data2011_01')
 ORDER by 1, 2;

I copied the result from PGAdmin directly, here it is again.:

public;"data2011_01";784236864;784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423

public;"data2013_01w";300786432;300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

On Mon, Feb 1, 2016 at 1:45 PM, melvin6925 <melvin6925@gmail.com> wrote:
Thanks Bill.Also, it's very important to include the headers with the queries!
Marco,There is no top secret information that is requested, so please do not edit the output.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone-------- Original message --------From: Bill Moran <wmoran@potentialtech.com> Date: 2/1/2016 14:41 (GMT-05:00) To: Yu Nie <nieyu04@gmail.com> Cc: Melvin Davidson <melvin6925@gmail.com>, pgsql-general@postgresql.org Subject: Re: [GENERAL] strange sql behavior

Came a little late to the thread, see many comments inline below:

On Mon, 1 Feb 2016 13:16:13 -0600
Yu Nie <nieyu04@gmail.com> wrote:

Thanks  a lot for your reply. I ran the query you suggested and here are
the results

Large table: "public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
Small table:  "public";"data2013_01w";300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

You didn't do Melvin's query correctly. He specified 11 columns to
select, but you only show 10. Since you don't show the query you
actually ran, we don't know which of the numeric columns is missing.

More information inline below:

On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

One thing to look at is the last time both tables were vacuumed/analyzed.

SELECT n.nspname,
        s.relname,
        c.reltuples::bigint,
        n_tup_ins,
        n_tup_upd,
        n_tup_del,
        date_trunc('second', last_vacuum) as last_vacuum,
        date_trunc('second', last_autovacuum) as last_autovacuum,
        date_trunc('second', last_analyze) as last_analyze,
        date_trunc('second', last_autoanalyze) as last_autoanalyze
        ,
        round( current_setting('autovacuum_vacuum_threshold')::integer +
current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
   FROM pg_stat_all_tables s
   JOIN pg_class c ON c.oid = s.relid
   JOIN pg_namespace n ON (n.oid = c.relnamespace)
  WHERE s.relname NOT LIKE 'pg_%'
    AND s.relname NOT LIKE 'sql_%'
    AND s.relname IN  ' "your_small_table", "your_large_table"'
  ORDER by 1, 2;

Also, please confirm the indexes for both tables are using the same method
(btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@gmail.com> wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and had
encountered some weird performance issues.  I am hoping someone in this
community can help me figure it out.

The taxi data were loaded in 5 minute block into a table.  I have two
separate such tables, one stores a month of data with about 700 million
rows, another stores about 10 days of data with about 300 million rows.
The two tables have the exactly same schema and indexes. There are two
indexes: one on taxiid (text), and the other on the time stamp (date
time).  In order to process the data, I need to get all points for a single
taxis; to do that, I use something like:
  select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much faster
for the large table than for the small table, which seems to contradict
with intuition.   At the end of message you may find explain (analyze
buffer) results of two particular queries for the same taxiid (one for each
table). You can see that it took much longer (more than 20 times) to get
20k rows from the small table than to get 44 k rows from the large table.
Interestingly it seems that the planner does expect about 1/3 work for the
small table query - yet for some reason, it took much longer to fetch the
rows from the small table.   Why there is such a huge performance between
the two seemingly identical queries executed on two different tables?

Is is because the data on the second table is on some mysteriously
"broken part" of the disk?  what else could explain such a bizarre
behavior?  Your help is greatly appreciated.

The above behavior is consistent through all queries.   Another issue I
identified is that for the large table, the query can use the shared buffer
more effectively.  For example, after you query one taxiid and immediately
following that query run the same query for another taxi whose id ranks
right behind the first id, then shared hit buffers would be quite high (and
the query would run much faster); this however never works for the small
table.

Thanks   a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish.  The planning
time is 85256.31

"Sort  (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
"  Sort Key: "time""
"  Sort Method: quicksort  Memory: 3622kB"
"  Buffers: shared hit=92 read=19816"
"  ->  Bitmap Heap Scan on data2013_01w  (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
"        Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
"        Heap Blocks: exact=19826"
"        Buffers: shared hit=92 read=19816"

^^
Note that despite this table being smaller, Postgres had to read 19816
blocks from disk, which is 2.5x more than the larger table.

"        ->  Bitmap Index Scan on data2013_01w_ixtaxiid
(cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053
rows=20288 loops=1)"
"              Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
"              Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish.  The planning
time is 252077.10
"Sort  (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
"  Sort Key: "time""
"  Sort Method: quicksort  Memory: 7753kB"
"  Buffers: shared hit=2 read=7543"
"  ->  Bitmap Heap Scan on data2011_01  (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204 loops=1)"
"        Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
"        Heap Blocks: exact=7372"
"        Buffers: shared hit=2 read=7543"
"        ->  Bitmap Index Scan on data2011_01_ixtaxiid
(cost=0.00..1503.92 rows=65512 width=0) (actual time=35.792..35.792
rows=44204 loops=1)"

^^
Note that the larger table took LONGER to do the index work than the
smaller table, which probably means there's nothing wrong with your
disks or anything ... that's the behavior I would expect.

"              Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
"              Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

So, what I'm seeing, is that Postgres is able to figure out _which_ rows
to fetch faster on the small table than the large table, which is what
you would expect, since a smaller index should be faster than a large one.

However, when it goes to actually fetch the row data, it takes
significantly longer on the small table, despite the fact that it's
only fetching 1/3 as many rows. It is, however, doing 2.5x as many
disk reads to get those rows: For the large table, it reads 61MB from
disk, but it reads 160MB to get all the data for the smaller table.

How the data was inserted into each table could lead to similar data
being clustered on common pages on the large table, while it's spread
across many more pages on the small table.

That still doesn't explain it all, though. 2.5x the disk
activity normally wouldn't equate to 28x the time required. Unless
you're disks are horrifically slow? Does this server have a lot of
other activity against the disks? I.e. are other people running
queries that you would have to contend with, or is the server a VM
sharing storage with other VMs, or even a combined use server that
has to share disk access with (for example) a web or mail server
as well? Is the performance difference consistently ~28x?

Other things: what is shared_buffers set to? The queries would seem
to indicate that this server has less than 1M of those two tables
cached in memory at the time you ran those queries, which seems to
suggest that either you've got shared_buffers set very low, or that
there are a lot of other tables that other queries are accessing at
the time you're running these. Perhaps installing pg_buffercache to
have a look at what's using your shared_buffers would be helpful.

--
Bill Moran

#8Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Yu Nie (#1)
Re: strange sql behavior

On 02/01/2016 10:35 AM, Yu Nie wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and had
encountered some weird performance issues. I am hoping someone in this
community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I have two
separate such tables, one stores a month of data with about 700 million
rows, another stores about 10 days of data with about 300 million rows.
The two tables have the exactly same schema and indexes. There are two
indexes: one on taxiid (text), and the other on the time stamp (date
time). In order to process the data, I need to get all points for a
single taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much faster
for the large table than for the small table, which seems to contradict
with intuition. At the end of message you may find explain (analyze
buffer) results of two particular queries for the same taxiid (one for
each table). You can see that it took much longer (more than 20 times)
to get 20k rows from the small table than to get 44 k rows from the
large table. Interestingly it seems that the planner does expect about
1/3 work for the small table query - yet for some reason, it took much
longer to fetch the rows from the small table. Why there is such a
huge performance between the two seemingly identical queries executed on
two different tables?

Is is because the data on the second table is on some mysteriously
"broken part" of the disk? what else could explain such a bizarre
behavior? Your help is greatly appreciated.

The above behavior is consistent through all queries. Another issue I
identified is that for the large table, the query can use the shared
buffer more effectively. For example, after you query one taxiid and
immediately following that query run the same query for another taxi
whose id ranks right behind the first id, then shared hit buffers would
be quite high (and the query would run much faster); this however never
works for the small table.

Looks to me the time is being taken up by the 'order by time' portion of
the query.

What happens if run the queries without 'order by time'?

What is the history of the large table, has it been CLUSTERed in the
past for instance?

If I am following the table names correctly it looks like the data is
two years apart in the large and small tables. Do you see the same
issues when you run the query on data for table with a month of data
and then a table with 10 days of data from the same month?

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish. The
planning time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Index Scan on data2013_01w_ixtaxiid (cost=0.00..510.33
rows=22101 width=0) (actual time=26.053..26.053 rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The planning
time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid (cost=0.00..1503.92
rows=65512 width=0) (actual time=35.792..35.792 rows=44204 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Yu Nie
nieyu04@gmail.com
In reply to: Adrian Klaver (#8)
Re: strange sql behavior

Hi Klaver,

Thanks.

1. I don't see order by time makes a difference - in fact, the "analyze"
seems to indicate the sorting is faster for the small table because it uses
less memory.

2. No, the large table has not been clustered. Both tables were created
exactly the same way, loading 5-minute blocks of GIS points of all taxis
through a "copy" command.
When all data are loaded, two indexes are created, one on the taxi id,
and the other on the time stamp.

3. I could not run the last test you suggested since I don't have the table
- it would take several hours to create a table with 10 days of data from
the same month.

On Mon, Feb 1, 2016 at 2:21 PM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 02/01/2016 10:35 AM, Yu Nie wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and had
encountered some weird performance issues. I am hoping someone in this
community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I have two
separate such tables, one stores a month of data with about 700 million
rows, another stores about 10 days of data with about 300 million rows.
The two tables have the exactly same schema and indexes. There are two
indexes: one on taxiid (text), and the other on the time stamp (date
time). In order to process the data, I need to get all points for a
single taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much faster
for the large table than for the small table, which seems to contradict
with intuition. At the end of message you may find explain (analyze
buffer) results of two particular queries for the same taxiid (one for
each table). You can see that it took much longer (more than 20 times)
to get 20k rows from the small table than to get 44 k rows from the
large table. Interestingly it seems that the planner does expect about
1/3 work for the small table query - yet for some reason, it took much
longer to fetch the rows from the small table. Why there is such a
huge performance between the two seemingly identical queries executed on
two different tables?

Is is because the data on the second table is on some mysteriously
"broken part" of the disk? what else could explain such a bizarre
behavior? Your help is greatly appreciated.

The above behavior is consistent through all queries. Another issue I
identified is that for the large table, the query can use the shared
buffer more effectively. For example, after you query one taxiid and
immediately following that query run the same query for another taxi
whose id ranks right behind the first id, then shared hit buffers would
be quite high (and the query would run much faster); this however never
works for the small table.

Looks to me the time is being taken up by the 'order by time' portion of
the query.

What happens if run the queries without 'order by time'?

What is the history of the large table, has it been CLUSTERed in the past
for instance?

If I am following the table names correctly it looks like the data is two
years apart in the large and small tables. Do you see the same issues when
you run the query on data for table with a month of data and then a table
with 10 days of data from the same month?

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish. The
planning time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Index Scan on data2013_01w_ixtaxiid (cost=0.00..510.33
rows=22101 width=0) (actual time=26.053..26.053 rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The planning
time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204 loops=1)"
" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid (cost=0.00..1503.92
rows=65512 width=0) (actual time=35.792..35.792 rows=44204 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

--
Adrian Klaver
adrian.klaver@aklaver.com

#10Yu Nie
nieyu04@gmail.com
In reply to: Yu Nie (#9)
Re: strange sql behavior

Melvin,

Please see attached for the requests results. I ran two queries (each with
a different taxiid that is next to each other) for each table. Note that
for the large table one is much faster than the other because the shared
cache was used for the second query. This does not work however for the
small table.

Many thanks for your willingness to help!

Best, Marco

On Mon, Feb 1, 2016 at 2:25 PM, melvin6925 <melvin6925@gmail.com> wrote:

Show quoted text

Fine. Please rerun both explains and my queries and send ALL output to a
file (F8 in PGADMIN Sql). Then attach the file.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Yu Nie <nieyu04@gmail.com>
Date: 2/1/2016 15:17 (GMT-05:00)
To: melvin6925 <melvin6925@gmail.com>
Subject: Re: [GENERAL] strange sql behavior

Yes, absolutely.

On Mon, Feb 1, 2016 at 2:12 PM, melvin6925 <melvin6925@gmail.com> wrote:

And were _your_ queries run on the same day at the same time within a few
seconds of each other?

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Yu Nie <nieyu04@gmail.com>
Date: 2/1/2016 15:01 (GMT-05:00)
To: melvin6925 <melvin6925@gmail.com>
Cc: Bill Moran <wmoran@potentialtech.com>, pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Thanks, Bill and Melvin!

Just some quick note/answers before I absorb all the information provided
by Bill.

1. I don't expect many users running queries against the tables,
especially for the small table - since I just created it this morning, and
nobody know about it except myself.

2. The setting in the config:

shared_buffers = 512MB # min 128kB
work_mem = 128MB # min 64kB

3. I am running a Windows 7 with 24 GB RAM. and my postgresal is 9.4.

4. here is the query I ran:

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round( current_setting('autovacuum_vacuum_threshold')::integer +
current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ('data2013_01w', 'data2011_01')
ORDER by 1, 2;

I copied the result from PGAdmin directly, here it is again.:

public;"data2011_01";784236864;784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
public;"data2013_01w";300786432;300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

On Mon, Feb 1, 2016 at 1:45 PM, melvin6925 <melvin6925@gmail.com> wrote:

Thanks Bill.
Also, it's very important to include the headers with the queries!

Marco,
There is no top secret information that is requested, so please do not
edit the output.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Bill Moran <wmoran@potentialtech.com>
Date: 2/1/2016 14:41 (GMT-05:00)
To: Yu Nie <nieyu04@gmail.com>
Cc: Melvin Davidson <melvin6925@gmail.com>, pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Came a little late to the thread, see many comments inline below:

On Mon, 1 Feb 2016 13:16:13 -0600
Yu Nie <nieyu04@gmail.com> wrote:

Thanks a lot for your reply. I ran the query you suggested and here

are

the results

Large table: "public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
Small table: "public";"data2013_01w";300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

You didn't do Melvin's query correctly. He specified 11 columns to
select, but you only show 10. Since you don't show the query you
actually ran, we don't know which of the numeric columns is missing.

More information inline below:

On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

One thing to look at is the last time both tables were

vacuumed/analyzed.

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round(

current_setting('autovacuum_vacuum_threshold')::integer +

current_setting('autovacuum_vacuum_scale_factor')::numeric *

C.reltuples)

AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ' "your_small_table", "your_large_table"'
ORDER by 1, 2;

Also, please confirm the indexes for both tables are using the same

method

(btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@gmail.com> wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and had
encountered some weird performance issues. I am hoping someone in

this

community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I have

two

separate such tables, one stores a month of data with about 700

million

rows, another stores about 10 days of data with about 300 million

rows.

The two tables have the exactly same schema and indexes. There are

two

indexes: one on taxiid (text), and the other on the time stamp (date
time). In order to process the data, I need to get all points for

a single

taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much

faster

for the large table than for the small table, which seems to

contradict

with intuition. At the end of message you may find explain

(analyze

buffer) results of two particular queries for the same taxiid (one

for each

table). You can see that it took much longer (more than 20 times)

to get

20k rows from the small table than to get 44 k rows from the large

table.

Interestingly it seems that the planner does expect about 1/3 work

for the

small table query - yet for some reason, it took much longer to

fetch the

rows from the small table. Why there is such a huge performance

between

the two seemingly identical queries executed on two different

tables?

Is is because the data on the second table is on some mysteriously
"broken part" of the disk? what else could explain such a bizarre
behavior? Your help is greatly appreciated.

The above behavior is consistent through all queries. Another

issue I

identified is that for the large table, the query can use the

shared buffer

more effectively. For example, after you query one taxiid and

immediately

following that query run the same query for another taxi whose id

ranks

right behind the first id, then shared hit buffers would be quite

high (and

the query would run much faster); this however never works for the

small

table.

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish. The

planning

time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777 rows=20288

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"

^^
Note that despite this table being smaller, Postgres had to read 19816
blocks from disk, which is 2.5x more than the larger table.

" -> Bitmap Index Scan on data2013_01w_ixtaxiid
(cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053
rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The

planning

time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid
(cost=0.00..1503.92 rows=65512 width=0) (actual time=35.792..35.792
rows=44204 loops=1)"

^^
Note that the larger table took LONGER to do the index work than the
smaller table, which probably means there's nothing wrong with your
disks or anything ... that's the behavior I would expect.

" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

So, what I'm seeing, is that Postgres is able to figure out _which_ rows
to fetch faster on the small table than the large table, which is what
you would expect, since a smaller index should be faster than a large
one.

However, when it goes to actually fetch the row data, it takes
significantly longer on the small table, despite the fact that it's
only fetching 1/3 as many rows. It is, however, doing 2.5x as many
disk reads to get those rows: For the large table, it reads 61MB from
disk, but it reads 160MB to get all the data for the smaller table.

How the data was inserted into each table could lead to similar data
being clustered on common pages on the large table, while it's spread
across many more pages on the small table.

That still doesn't explain it all, though. 2.5x the disk
activity normally wouldn't equate to 28x the time required. Unless
you're disks are horrifically slow? Does this server have a lot of
other activity against the disks? I.e. are other people running
queries that you would have to contend with, or is the server a VM
sharing storage with other VMs, or even a combined use server that
has to share disk access with (for example) a web or mail server
as well? Is the performance difference consistently ~28x?

Other things: what is shared_buffers set to? The queries would seem
to indicate that this server has less than 1M of those two tables
cached in memory at the time you ran those queries, which seems to
suggest that either you've got shared_buffers set very low, or that
there are a lot of other tables that other queries are accessing at
the time you're running these. Perhaps installing pg_buffercache to
have a look at what's using your shared_buffers would be helpful.

--
Bill Moran

Attachments:

analyzestat.csvtext/csv; charset=US-ASCII; name=analyzestat.csvDownload
largeTabel1.csvtext/csv; charset=US-ASCII; name=largeTabel1.csvDownload
largeTabel2.csvtext/csv; charset=US-ASCII; name=largeTabel2.csvDownload
smallTabel1.csvtext/csv; charset=US-ASCII; name=smallTabel1.csvDownload
smallTabel2.csvtext/csv; charset=US-ASCII; name=smallTabel2.csvDownload
#11Melvin Davidson
melvin6925@gmail.com
In reply to: Yu Nie (#10)
Re: strange sql behavior

At this point I can see no possible explanation why this would occur.

Just a thought though, is it possible that data2011_01 was clustered on the
index at some point but data2013_01w has not been clustered?
If you cluster data2013_01w on the index, does the performance change?

On Mon, Feb 1, 2016 at 4:03 PM, Yu Nie <nieyu04@gmail.com> wrote:

Melvin,

Please see attached for the requests results. I ran two queries (each with
a different taxiid that is next to each other) for each table. Note that
for the large table one is much faster than the other because the shared
cache was used for the second query. This does not work however for the
small table.

Many thanks for your willingness to help!

Best, Marco

On Mon, Feb 1, 2016 at 2:25 PM, melvin6925 <melvin6925@gmail.com> wrote:

Fine. Please rerun both explains and my queries and send ALL output to a
file (F8 in PGADMIN Sql). Then attach the file.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Yu Nie <nieyu04@gmail.com>
Date: 2/1/2016 15:17 (GMT-05:00)
To: melvin6925 <melvin6925@gmail.com>
Subject: Re: [GENERAL] strange sql behavior

Yes, absolutely.

On Mon, Feb 1, 2016 at 2:12 PM, melvin6925 <melvin6925@gmail.com> wrote:

And were _your_ queries run on the same day at the same time within a
few seconds of each other?

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Yu Nie <nieyu04@gmail.com>
Date: 2/1/2016 15:01 (GMT-05:00)
To: melvin6925 <melvin6925@gmail.com>
Cc: Bill Moran <wmoran@potentialtech.com>, pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Thanks, Bill and Melvin!

Just some quick note/answers before I absorb all the information
provided by Bill.

1. I don't expect many users running queries against the tables,
especially for the small table - since I just created it this morning, and
nobody know about it except myself.

2. The setting in the config:

shared_buffers = 512MB # min 128kB
work_mem = 128MB # min 64kB

3. I am running a Windows 7 with 24 GB RAM. and my postgresal is 9.4.

4. here is the query I ran:

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round( current_setting('autovacuum_vacuum_threshold')::integer +
current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ('data2013_01w', 'data2011_01')
ORDER by 1, 2;

I copied the result from PGAdmin directly, here it is again.:

public;"data2011_01";784236864;784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
public;"data2013_01w";300786432;300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

On Mon, Feb 1, 2016 at 1:45 PM, melvin6925 <melvin6925@gmail.com> wrote:

Thanks Bill.
Also, it's very important to include the headers with the queries!

Marco,
There is no top secret information that is requested, so please do not
edit the output.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Bill Moran <wmoran@potentialtech.com>
Date: 2/1/2016 14:41 (GMT-05:00)
To: Yu Nie <nieyu04@gmail.com>
Cc: Melvin Davidson <melvin6925@gmail.com>,
pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Came a little late to the thread, see many comments inline below:

On Mon, 1 Feb 2016 13:16:13 -0600
Yu Nie <nieyu04@gmail.com> wrote:

Thanks a lot for your reply. I ran the query you suggested and here

are

the results

Large table: "public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
Small table: "public";"data2013_01w";300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

You didn't do Melvin's query correctly. He specified 11 columns to
select, but you only show 10. Since you don't show the query you
actually ran, we don't know which of the numeric columns is missing.

More information inline below:

On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <melvin6925@gmail.com

wrote:

One thing to look at is the last time both tables were

vacuumed/analyzed.

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round(

current_setting('autovacuum_vacuum_threshold')::integer +

current_setting('autovacuum_vacuum_scale_factor')::numeric *

C.reltuples)

AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ' "your_small_table", "your_large_table"'
ORDER by 1, 2;

Also, please confirm the indexes for both tables are using the same

method

(btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@gmail.com> wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and had
encountered some weird performance issues. I am hoping someone in

this

community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I have

two

separate such tables, one stores a month of data with about 700

million

rows, another stores about 10 days of data with about 300 million

rows.

The two tables have the exactly same schema and indexes. There are

two

indexes: one on taxiid (text), and the other on the time stamp

(date

time). In order to process the data, I need to get all points for

a single

taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much

faster

for the large table than for the small table, which seems to

contradict

with intuition. At the end of message you may find explain

(analyze

buffer) results of two particular queries for the same taxiid (one

for each

table). You can see that it took much longer (more than 20 times)

to get

20k rows from the small table than to get 44 k rows from the large

table.

Interestingly it seems that the planner does expect about 1/3 work

for the

small table query - yet for some reason, it took much longer to

fetch the

rows from the small table. Why there is such a huge performance

between

the two seemingly identical queries executed on two different

tables?

Is is because the data on the second table is on some mysteriously
"broken part" of the disk? what else could explain such a bizarre
behavior? Your help is greatly appreciated.

The above behavior is consistent through all queries. Another

issue I

identified is that for the large table, the query can use the

shared buffer

more effectively. For example, after you query one taxiid and

immediately

following that query run the same query for another taxi whose id

ranks

right behind the first id, then shared hit buffers would be quite

high (and

the query would run much faster); this however never works for the

small

table.

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish. The

planning

time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777 rows=20288

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"

^^
Note that despite this table being smaller, Postgres had to read 19816
blocks from disk, which is 2.5x more than the larger table.

" -> Bitmap Index Scan on data2013_01w_ixtaxiid
(cost=0.00..510.33 rows=22101 width=0) (actual time=26.053..26.053
rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The

planning

time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid
(cost=0.00..1503.92 rows=65512 width=0) (actual time=35.792..35.792
rows=44204 loops=1)"

^^
Note that the larger table took LONGER to do the index work than the
smaller table, which probably means there's nothing wrong with your
disks or anything ... that's the behavior I would expect.

" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

So, what I'm seeing, is that Postgres is able to figure out _which_ rows
to fetch faster on the small table than the large table, which is what
you would expect, since a smaller index should be faster than a large
one.

However, when it goes to actually fetch the row data, it takes
significantly longer on the small table, despite the fact that it's
only fetching 1/3 as many rows. It is, however, doing 2.5x as many
disk reads to get those rows: For the large table, it reads 61MB from
disk, but it reads 160MB to get all the data for the smaller table.

How the data was inserted into each table could lead to similar data
being clustered on common pages on the large table, while it's spread
across many more pages on the small table.

That still doesn't explain it all, though. 2.5x the disk
activity normally wouldn't equate to 28x the time required. Unless
you're disks are horrifically slow? Does this server have a lot of
other activity against the disks? I.e. are other people running
queries that you would have to contend with, or is the server a VM
sharing storage with other VMs, or even a combined use server that
has to share disk access with (for example) a web or mail server
as well? Is the performance difference consistently ~28x?

Other things: what is shared_buffers set to? The queries would seem
to indicate that this server has less than 1M of those two tables
cached in memory at the time you ran those queries, which seems to
suggest that either you've got shared_buffers set very low, or that
there are a lot of other tables that other queries are accessing at
the time you're running these. Perhaps installing pg_buffercache to
have a look at what's using your shared_buffers would be helpful.

--
Bill Moran

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#12Yu Nie
nieyu04@gmail.com
In reply to: Yu Nie (#10)
Re: strange sql behavior

Hi Melvin,

Many thanks for your help. I just deleted the small table,along with
another large unused table (with about 1.2 billion rows) in the same
database, It frees up about 100 Gb space on the disk. I am reloading the
small table again and it will take a few more hours to complete. Once it
is done, I will let you know if the problem persists. I will then report
the info at your suggestion. By the way, I also did not disk
defragmentation after deleting the tables, not sure it helps...

Best, Marco

On Mon, Feb 1, 2016 at 8:50 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

Show quoted text

OK, one more thing to try.

Please send the output of the following query to a file and attach so I
can review.

SELECT c.relname as idxname,
i.indisclustered,
i.indisvalid,
i.indcollation
FROM pg_index i
JOIN pg_class c ON i.indexrelid = c.oid
WHERE i.indrelid IN (SELECT oid
FROM pg_class
WHERE relkind = 'r'
AND relname = 'data2011_01'
OR relname = 'data2013_01w'
)
ORDER BY 1;

On Mon, Feb 1, 2016 at 5:51 PM, Yu Nie <nieyu04@gmail.com> wrote:

Yeah, it is really puzzling. :(

No, I have never attempted to cluster the large table - as I mentioned
before the two tables were created almost exactly the same way (through a
c++ api), except with different data files. I only use theme for query, and
had never done any update/insert/delete since the creation.

On Mon, Feb 1, 2016 at 4:41 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

At this point I can see no possible explanation why this would occur.

Just a thought though, is it possible that data2011_01 was clustered on
the index at some point but data2013_01w has not been clustered?
If you cluster data2013_01w on the index, does the performance change?

On Mon, Feb 1, 2016 at 4:03 PM, Yu Nie <nieyu04@gmail.com> wrote:

Melvin,

Please see attached for the requests results. I ran two queries (each
with a different taxiid that is next to each other) for each table. Note
that for the large table one is much faster than the other because the
shared cache was used for the second query. This does not work however for
the small table.

Many thanks for your willingness to help!

Best, Marco

On Mon, Feb 1, 2016 at 2:25 PM, melvin6925 <melvin6925@gmail.com>
wrote:

Fine. Please rerun both explains and my queries and send ALL output to
a file (F8 in PGADMIN Sql). Then attach the file.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Yu Nie <nieyu04@gmail.com>
Date: 2/1/2016 15:17 (GMT-05:00)
To: melvin6925 <melvin6925@gmail.com>
Subject: Re: [GENERAL] strange sql behavior

Yes, absolutely.

On Mon, Feb 1, 2016 at 2:12 PM, melvin6925 <melvin6925@gmail.com>
wrote:

And were _your_ queries run on the same day at the same time within a
few seconds of each other?

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Yu Nie <nieyu04@gmail.com>
Date: 2/1/2016 15:01 (GMT-05:00)
To: melvin6925 <melvin6925@gmail.com>
Cc: Bill Moran <wmoran@potentialtech.com>,
pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Thanks, Bill and Melvin!

Just some quick note/answers before I absorb all the information
provided by Bill.

1. I don't expect many users running queries against the tables,
especially for the small table - since I just created it this morning, and
nobody know about it except myself.

2. The setting in the config:

shared_buffers = 512MB # min 128kB
work_mem = 128MB # min 64kB

3. I am running a Windows 7 with 24 GB RAM. and my postgresal is
9.4.

4. here is the query I ran:

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round( current_setting('autovacuum_vacuum_threshold')::integer
+ current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ('data2013_01w', 'data2011_01')
ORDER by 1, 2;

I copied the result from PGAdmin directly, here it is again.:

public;"data2011_01";784236864;784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
public;"data2013_01w";300786432;300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

On Mon, Feb 1, 2016 at 1:45 PM, melvin6925 <melvin6925@gmail.com>
wrote:

Thanks Bill.
Also, it's very important to include the headers with the queries!

Marco,
There is no top secret information that is requested, so please do
not edit the output.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Bill Moran <wmoran@potentialtech.com>
Date: 2/1/2016 14:41 (GMT-05:00)
To: Yu Nie <nieyu04@gmail.com>
Cc: Melvin Davidson <melvin6925@gmail.com>,
pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Came a little late to the thread, see many comments inline below:

On Mon, 1 Feb 2016 13:16:13 -0600
Yu Nie <nieyu04@gmail.com> wrote:

Thanks a lot for your reply. I ran the query you suggested and

here are

the results

Large table:

"public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19

17:31:08-06";156847423
Small table:

"public";"data2013_01w";300786444;0;0;"";"";"2016-02-01

08:57:24-06";"2016-02-01 04:01:04-06";60157336

You didn't do Melvin's query correctly. He specified 11 columns to
select, but you only show 10. Since you don't show the query you
actually ran, we don't know which of the numeric columns is missing.

More information inline below:

On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <

melvin6925@gmail.com>

wrote:

One thing to look at is the last time both tables were

vacuumed/analyzed.

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round(

current_setting('autovacuum_vacuum_threshold')::integer +

current_setting('autovacuum_vacuum_scale_factor')::numeric *

C.reltuples)

AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ' "your_small_table", "your_large_table"'
ORDER by 1, 2;

Also, please confirm the indexes for both tables are using the

same method

(btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@gmail.com>

wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and

had

encountered some weird performance issues. I am hoping someone

in this

community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I

have two

separate such tables, one stores a month of data with about 700

million

rows, another stores about 10 days of data with about 300

million rows.

The two tables have the exactly same schema and indexes. There

are two

indexes: one on taxiid (text), and the other on the time stamp

(date

time). In order to process the data, I need to get all points

for a single

taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently

much faster

for the large table than for the small table, which seems to

contradict

with intuition. At the end of message you may find explain

(analyze

buffer) results of two particular queries for the same taxiid

(one for each

table). You can see that it took much longer (more than 20

times) to get

20k rows from the small table than to get 44 k rows from the

large table.

Interestingly it seems that the planner does expect about 1/3

work for the

small table query - yet for some reason, it took much longer to

fetch the

rows from the small table. Why there is such a huge

performance between

the two seemingly identical queries executed on two different

tables?

Is is because the data on the second table is on some

mysteriously

"broken part" of the disk? what else could explain such a

bizarre

behavior? Your help is greatly appreciated.

The above behavior is consistent through all queries. Another

issue I

identified is that for the large table, the query can use the

shared buffer

more effectively. For example, after you query one taxiid and

immediately

following that query run the same query for another taxi whose

id ranks

right behind the first id, then shared hit buffers would be

quite high (and

the query would run much faster); this however never works for

the small

table.

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish.

The planning

time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777 rows=20288

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"

^^
Note that despite this table being smaller, Postgres had to read
19816
blocks from disk, which is 2.5x more than the larger table.

" -> Bitmap Index Scan on data2013_01w_ixtaxiid
(cost=0.00..510.33 rows=22101 width=0) (actual

time=26.053..26.053

rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The

planning

time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid
(cost=0.00..1503.92 rows=65512 width=0) (actual

time=35.792..35.792

rows=44204 loops=1)"

^^
Note that the larger table took LONGER to do the index work than the
smaller table, which probably means there's nothing wrong with your
disks or anything ... that's the behavior I would expect.

" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

So, what I'm seeing, is that Postgres is able to figure out _which_
rows
to fetch faster on the small table than the large table, which is
what
you would expect, since a smaller index should be faster than a
large one.

However, when it goes to actually fetch the row data, it takes
significantly longer on the small table, despite the fact that it's
only fetching 1/3 as many rows. It is, however, doing 2.5x as many
disk reads to get those rows: For the large table, it reads 61MB from
disk, but it reads 160MB to get all the data for the smaller table.

How the data was inserted into each table could lead to similar data
being clustered on common pages on the large table, while it's spread
across many more pages on the small table.

That still doesn't explain it all, though. 2.5x the disk
activity normally wouldn't equate to 28x the time required. Unless
you're disks are horrifically slow? Does this server have a lot of
other activity against the disks? I.e. are other people running
queries that you would have to contend with, or is the server a VM
sharing storage with other VMs, or even a combined use server that
has to share disk access with (for example) a web or mail server
as well? Is the performance difference consistently ~28x?

Other things: what is shared_buffers set to? The queries would seem
to indicate that this server has less than 1M of those two tables
cached in memory at the time you ran those queries, which seems to
suggest that either you've got shared_buffers set very low, or that
there are a lot of other tables that other queries are accessing at
the time you're running these. Perhaps installing pg_buffercache to
have a look at what's using your shared_buffers would be helpful.

--
Bill Moran

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#13Yu Nie
nieyu04@gmail.com
In reply to: Yu Nie (#10)
Re: strange sql behavior

Hi Melvin,

I finished reloading the small table - but the performance difference does
not change at all. It seems that the query always need to read a lot
more disk space to fetch fewer rows for the small table.... Not sure why.

I ran the query you suggested - please see the results in attached.

I plan to create another "small table" using the data from the large table
and see how that works. I will report back when it is done.

Thanks again!

Best, Marco

On Mon, Feb 1, 2016 at 8:50 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

Show quoted text

OK, one more thing to try.

Please send the output of the following query to a file and attach so I
can review.

SELECT c.relname as idxname,
i.indisclustered,
i.indisvalid,
i.indcollation
FROM pg_index i
JOIN pg_class c ON i.indexrelid = c.oid
WHERE i.indrelid IN (SELECT oid
FROM pg_class
WHERE relkind = 'r'
AND relname = 'data2011_01'
OR relname = 'data2013_01w'
)
ORDER BY 1;

On Mon, Feb 1, 2016 at 5:51 PM, Yu Nie <nieyu04@gmail.com> wrote:

Yeah, it is really puzzling. :(

No, I have never attempted to cluster the large table - as I mentioned
before the two tables were created almost exactly the same way (through a
c++ api), except with different data files. I only use theme for query, and
had never done any update/insert/delete since the creation.

On Mon, Feb 1, 2016 at 4:41 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

At this point I can see no possible explanation why this would occur.

Just a thought though, is it possible that data2011_01 was clustered on
the index at some point but data2013_01w has not been clustered?
If you cluster data2013_01w on the index, does the performance change?

On Mon, Feb 1, 2016 at 4:03 PM, Yu Nie <nieyu04@gmail.com> wrote:

Melvin,

Please see attached for the requests results. I ran two queries (each
with a different taxiid that is next to each other) for each table. Note
that for the large table one is much faster than the other because the
shared cache was used for the second query. This does not work however for
the small table.

Many thanks for your willingness to help!

Best, Marco

On Mon, Feb 1, 2016 at 2:25 PM, melvin6925 <melvin6925@gmail.com>
wrote:

Fine. Please rerun both explains and my queries and send ALL output to
a file (F8 in PGADMIN Sql). Then attach the file.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Yu Nie <nieyu04@gmail.com>
Date: 2/1/2016 15:17 (GMT-05:00)
To: melvin6925 <melvin6925@gmail.com>
Subject: Re: [GENERAL] strange sql behavior

Yes, absolutely.

On Mon, Feb 1, 2016 at 2:12 PM, melvin6925 <melvin6925@gmail.com>
wrote:

And were _your_ queries run on the same day at the same time within a
few seconds of each other?

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Yu Nie <nieyu04@gmail.com>
Date: 2/1/2016 15:01 (GMT-05:00)
To: melvin6925 <melvin6925@gmail.com>
Cc: Bill Moran <wmoran@potentialtech.com>,
pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Thanks, Bill and Melvin!

Just some quick note/answers before I absorb all the information
provided by Bill.

1. I don't expect many users running queries against the tables,
especially for the small table - since I just created it this morning, and
nobody know about it except myself.

2. The setting in the config:

shared_buffers = 512MB # min 128kB
work_mem = 128MB # min 64kB

3. I am running a Windows 7 with 24 GB RAM. and my postgresal is
9.4.

4. here is the query I ran:

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round( current_setting('autovacuum_vacuum_threshold')::integer
+ current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ('data2013_01w', 'data2011_01')
ORDER by 1, 2;

I copied the result from PGAdmin directly, here it is again.:

public;"data2011_01";784236864;784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
public;"data2013_01w";300786432;300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

On Mon, Feb 1, 2016 at 1:45 PM, melvin6925 <melvin6925@gmail.com>
wrote:

Thanks Bill.
Also, it's very important to include the headers with the queries!

Marco,
There is no top secret information that is requested, so please do
not edit the output.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Bill Moran <wmoran@potentialtech.com>
Date: 2/1/2016 14:41 (GMT-05:00)
To: Yu Nie <nieyu04@gmail.com>
Cc: Melvin Davidson <melvin6925@gmail.com>,
pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Came a little late to the thread, see many comments inline below:

On Mon, 1 Feb 2016 13:16:13 -0600
Yu Nie <nieyu04@gmail.com> wrote:

Thanks a lot for your reply. I ran the query you suggested and

here are

the results

Large table:

"public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19

17:31:08-06";156847423
Small table:

"public";"data2013_01w";300786444;0;0;"";"";"2016-02-01

08:57:24-06";"2016-02-01 04:01:04-06";60157336

You didn't do Melvin's query correctly. He specified 11 columns to
select, but you only show 10. Since you don't show the query you
actually ran, we don't know which of the numeric columns is missing.

More information inline below:

On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <

melvin6925@gmail.com>

wrote:

One thing to look at is the last time both tables were

vacuumed/analyzed.

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round(

current_setting('autovacuum_vacuum_threshold')::integer +

current_setting('autovacuum_vacuum_scale_factor')::numeric *

C.reltuples)

AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ' "your_small_table", "your_large_table"'
ORDER by 1, 2;

Also, please confirm the indexes for both tables are using the

same method

(btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@gmail.com>

wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and

had

encountered some weird performance issues. I am hoping someone

in this

community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I

have two

separate such tables, one stores a month of data with about 700

million

rows, another stores about 10 days of data with about 300

million rows.

The two tables have the exactly same schema and indexes. There

are two

indexes: one on taxiid (text), and the other on the time stamp

(date

time). In order to process the data, I need to get all points

for a single

taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently

much faster

for the large table than for the small table, which seems to

contradict

with intuition. At the end of message you may find explain

(analyze

buffer) results of two particular queries for the same taxiid

(one for each

table). You can see that it took much longer (more than 20

times) to get

20k rows from the small table than to get 44 k rows from the

large table.

Interestingly it seems that the planner does expect about 1/3

work for the

small table query - yet for some reason, it took much longer to

fetch the

rows from the small table. Why there is such a huge

performance between

the two seemingly identical queries executed on two different

tables?

Is is because the data on the second table is on some

mysteriously

"broken part" of the disk? what else could explain such a

bizarre

behavior? Your help is greatly appreciated.

The above behavior is consistent through all queries. Another

issue I

identified is that for the large table, the query can use the

shared buffer

more effectively. For example, after you query one taxiid and

immediately

following that query run the same query for another taxi whose

id ranks

right behind the first id, then shared hit buffers would be

quite high (and

the query would run much faster); this however never works for

the small

table.

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish.

The planning

time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777 rows=20288

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"

^^
Note that despite this table being smaller, Postgres had to read
19816
blocks from disk, which is 2.5x more than the larger table.

" -> Bitmap Index Scan on data2013_01w_ixtaxiid
(cost=0.00..510.33 rows=22101 width=0) (actual

time=26.053..26.053

rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The

planning

time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid
(cost=0.00..1503.92 rows=65512 width=0) (actual

time=35.792..35.792

rows=44204 loops=1)"

^^
Note that the larger table took LONGER to do the index work than the
smaller table, which probably means there's nothing wrong with your
disks or anything ... that's the behavior I would expect.

" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

So, what I'm seeing, is that Postgres is able to figure out _which_
rows
to fetch faster on the small table than the large table, which is
what
you would expect, since a smaller index should be faster than a
large one.

However, when it goes to actually fetch the row data, it takes
significantly longer on the small table, despite the fact that it's
only fetching 1/3 as many rows. It is, however, doing 2.5x as many
disk reads to get those rows: For the large table, it reads 61MB from
disk, but it reads 160MB to get all the data for the smaller table.

How the data was inserted into each table could lead to similar data
being clustered on common pages on the large table, while it's spread
across many more pages on the small table.

That still doesn't explain it all, though. 2.5x the disk
activity normally wouldn't equate to 28x the time required. Unless
you're disks are horrifically slow? Does this server have a lot of
other activity against the disks? I.e. are other people running
queries that you would have to contend with, or is the server a VM
sharing storage with other VMs, or even a combined use server that
has to share disk access with (for example) a web or mail server
as well? Is the performance difference consistently ~28x?

Other things: what is shared_buffers set to? The queries would seem
to indicate that this server has less than 1M of those two tables
cached in memory at the time you ran those queries, which seems to
suggest that either you've got shared_buffers set very low, or that
there are a lot of other tables that other queries are accessing at
the time you're running these. Perhaps installing pg_buffercache to
have a look at what's using your shared_buffers would be helpful.

--
Bill Moran

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Attachments:

melvin_1.csvtext/csv; charset=US-ASCII; name=melvin_1.csvDownload
#14Yu Nie
nieyu04@gmail.com
In reply to: Yu Nie (#10)
Re: strange sql behavior

One more observation: not sure if it helps with the diagnosis.

If I query based on time interval, not based on taxiid, the query is
generally faster - I believe it is due to the way the data were loaded.
Yet there is still significant performance difference: the large table
seems runs faster than the small table. More importantly, this time the
planner proposes two very different plans for the two tables. Not sure why
Bitmap index was used for the small table...

See results below:

explain (analyze, buffers) select * from data2011_01 where time >=
'2011-01-16 12:00:00' and time < '2011-01-16 12:05:00' order by time;

For small table.
"Sort (cost=529024.61..529411.73 rows=154848 width=55) (actual
time=3761.329..3765.501 rows=149807 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 27211kB"
" Buffers: shared hit=39 read=4304"
" -> Bitmap Heap Scan on data2013_01w (cost=3287.76..515676.33
rows=154848 width=55) (actual time=29.028..3714.592 rows=149807 loops=1)"
" Recheck Cond: (("time" >= '2013-01-16 12:00:00'::timestamp without
time zone) AND ("time" < '2013-01-16 12:05:00'::timestamp without time
zone))"
" Heap Blocks: exact=3929"
" Buffers: shared hit=39 read=4304"
" -> Bitmap Index Scan on data2013_01w_ixtime (cost=0.00..3249.05
rows=154848 width=0) (actual time=17.931..17.931 rows=149807 loops=1)"
" Index Cond: (("time" >= '2013-01-16 12:00:00'::timestamp
without time zone) AND ("time" < '2013-01-16 12:05:00'::timestamp without
time zone))"
" Buffers: shared hit=1 read=413"
"Planning time: 0.203 ms"
"Execution time: 3779.368 ms"

For large table.
"Index Scan using data2011_01_ixtime on data2011_01 (cost=0.57..78012.46
rows=110204 width=55) (actual time=43.765..709.985 rows=104617 loops=1)"
" Index Cond: (("time" >= '2011-01-16 12:00:00'::timestamp without time
zone) AND ("time" < '2011-01-16 12:05:00'::timestamp without time zone))"
" Buffers: shared hit=79619 read=1482"
"Planning time: 0.131 ms"
"Execution time: 713.446 ms"

On Mon, Feb 1, 2016 at 8:50 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

Show quoted text

OK, one more thing to try.

Please send the output of the following query to a file and attach so I
can review.

SELECT c.relname as idxname,
i.indisclustered,
i.indisvalid,
i.indcollation
FROM pg_index i
JOIN pg_class c ON i.indexrelid = c.oid
WHERE i.indrelid IN (SELECT oid
FROM pg_class
WHERE relkind = 'r'
AND relname = 'data2011_01'
OR relname = 'data2013_01w'
)
ORDER BY 1;

On Mon, Feb 1, 2016 at 5:51 PM, Yu Nie <nieyu04@gmail.com> wrote:

Yeah, it is really puzzling. :(

No, I have never attempted to cluster the large table - as I mentioned
before the two tables were created almost exactly the same way (through a
c++ api), except with different data files. I only use theme for query, and
had never done any update/insert/delete since the creation.

On Mon, Feb 1, 2016 at 4:41 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

At this point I can see no possible explanation why this would occur.

Just a thought though, is it possible that data2011_01 was clustered on
the index at some point but data2013_01w has not been clustered?
If you cluster data2013_01w on the index, does the performance change?

On Mon, Feb 1, 2016 at 4:03 PM, Yu Nie <nieyu04@gmail.com> wrote:

Melvin,

Please see attached for the requests results. I ran two queries (each
with a different taxiid that is next to each other) for each table. Note
that for the large table one is much faster than the other because the
shared cache was used for the second query. This does not work however for
the small table.

Many thanks for your willingness to help!

Best, Marco

On Mon, Feb 1, 2016 at 2:25 PM, melvin6925 <melvin6925@gmail.com>
wrote:

Fine. Please rerun both explains and my queries and send ALL output to
a file (F8 in PGADMIN Sql). Then attach the file.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Yu Nie <nieyu04@gmail.com>
Date: 2/1/2016 15:17 (GMT-05:00)
To: melvin6925 <melvin6925@gmail.com>
Subject: Re: [GENERAL] strange sql behavior

Yes, absolutely.

On Mon, Feb 1, 2016 at 2:12 PM, melvin6925 <melvin6925@gmail.com>
wrote:

And were _your_ queries run on the same day at the same time within a
few seconds of each other?

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Yu Nie <nieyu04@gmail.com>
Date: 2/1/2016 15:01 (GMT-05:00)
To: melvin6925 <melvin6925@gmail.com>
Cc: Bill Moran <wmoran@potentialtech.com>,
pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Thanks, Bill and Melvin!

Just some quick note/answers before I absorb all the information
provided by Bill.

1. I don't expect many users running queries against the tables,
especially for the small table - since I just created it this morning, and
nobody know about it except myself.

2. The setting in the config:

shared_buffers = 512MB # min 128kB
work_mem = 128MB # min 64kB

3. I am running a Windows 7 with 24 GB RAM. and my postgresal is
9.4.

4. here is the query I ran:

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round( current_setting('autovacuum_vacuum_threshold')::integer
+ current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ('data2013_01w', 'data2011_01')
ORDER by 1, 2;

I copied the result from PGAdmin directly, here it is again.:

public;"data2011_01";784236864;784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
public;"data2013_01w";300786432;300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

On Mon, Feb 1, 2016 at 1:45 PM, melvin6925 <melvin6925@gmail.com>
wrote:

Thanks Bill.
Also, it's very important to include the headers with the queries!

Marco,
There is no top secret information that is requested, so please do
not edit the output.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Bill Moran <wmoran@potentialtech.com>
Date: 2/1/2016 14:41 (GMT-05:00)
To: Yu Nie <nieyu04@gmail.com>
Cc: Melvin Davidson <melvin6925@gmail.com>,
pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Came a little late to the thread, see many comments inline below:

On Mon, 1 Feb 2016 13:16:13 -0600
Yu Nie <nieyu04@gmail.com> wrote:

Thanks a lot for your reply. I ran the query you suggested and

here are

the results

Large table:

"public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19

17:31:08-06";156847423
Small table:

"public";"data2013_01w";300786444;0;0;"";"";"2016-02-01

08:57:24-06";"2016-02-01 04:01:04-06";60157336

You didn't do Melvin's query correctly. He specified 11 columns to
select, but you only show 10. Since you don't show the query you
actually ran, we don't know which of the numeric columns is missing.

More information inline below:

On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <

melvin6925@gmail.com>

wrote:

One thing to look at is the last time both tables were

vacuumed/analyzed.

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round(

current_setting('autovacuum_vacuum_threshold')::integer +

current_setting('autovacuum_vacuum_scale_factor')::numeric *

C.reltuples)

AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ' "your_small_table", "your_large_table"'
ORDER by 1, 2;

Also, please confirm the indexes for both tables are using the

same method

(btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@gmail.com>

wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data and

had

encountered some weird performance issues. I am hoping someone

in this

community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I

have two

separate such tables, one stores a month of data with about 700

million

rows, another stores about 10 days of data with about 300

million rows.

The two tables have the exactly same schema and indexes. There

are two

indexes: one on taxiid (text), and the other on the time stamp

(date

time). In order to process the data, I need to get all points

for a single

taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently

much faster

for the large table than for the small table, which seems to

contradict

with intuition. At the end of message you may find explain

(analyze

buffer) results of two particular queries for the same taxiid

(one for each

table). You can see that it took much longer (more than 20

times) to get

20k rows from the small table than to get 44 k rows from the

large table.

Interestingly it seems that the planner does expect about 1/3

work for the

small table query - yet for some reason, it took much longer to

fetch the

rows from the small table. Why there is such a huge

performance between

the two seemingly identical queries executed on two different

tables?

Is is because the data on the second table is on some

mysteriously

"broken part" of the disk? what else could explain such a

bizarre

behavior? Your help is greatly appreciated.

The above behavior is consistent through all queries. Another

issue I

identified is that for the large table, the query can use the

shared buffer

more effectively. For example, after you query one taxiid and

immediately

following that query run the same query for another taxi whose

id ranks

right behind the first id, then shared hit buffers would be

quite high (and

the query would run much faster); this however never works for

the small

table.

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish.

The planning

time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777 rows=20288

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"

^^
Note that despite this table being smaller, Postgres had to read
19816
blocks from disk, which is 2.5x more than the larger table.

" -> Bitmap Index Scan on data2013_01w_ixtaxiid
(cost=0.00..510.33 rows=22101 width=0) (actual

time=26.053..26.053

rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The

planning

time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53
rows=65512 width=55) (actual time=36.935..5017.463 rows=44204

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid
(cost=0.00..1503.92 rows=65512 width=0) (actual

time=35.792..35.792

rows=44204 loops=1)"

^^
Note that the larger table took LONGER to do the index work than the
smaller table, which probably means there's nothing wrong with your
disks or anything ... that's the behavior I would expect.

" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

So, what I'm seeing, is that Postgres is able to figure out _which_
rows
to fetch faster on the small table than the large table, which is
what
you would expect, since a smaller index should be faster than a
large one.

However, when it goes to actually fetch the row data, it takes
significantly longer on the small table, despite the fact that it's
only fetching 1/3 as many rows. It is, however, doing 2.5x as many
disk reads to get those rows: For the large table, it reads 61MB from
disk, but it reads 160MB to get all the data for the smaller table.

How the data was inserted into each table could lead to similar data
being clustered on common pages on the large table, while it's spread
across many more pages on the small table.

That still doesn't explain it all, though. 2.5x the disk
activity normally wouldn't equate to 28x the time required. Unless
you're disks are horrifically slow? Does this server have a lot of
other activity against the disks? I.e. are other people running
queries that you would have to contend with, or is the server a VM
sharing storage with other VMs, or even a combined use server that
has to share disk access with (for example) a web or mail server
as well? Is the performance difference consistently ~28x?

Other things: what is shared_buffers set to? The queries would seem
to indicate that this server has less than 1M of those two tables
cached in memory at the time you ran those queries, which seems to
suggest that either you've got shared_buffers set very low, or that
there are a lot of other tables that other queries are accessing at
the time you're running these. Perhaps installing pg_buffercache to
have a look at what's using your shared_buffers would be helpful.

--
Bill Moran

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#15Melvin Davidson
melvin6925@gmail.com
In reply to: Yu Nie (#14)
Re: strange sql behavior

Hmm, the last index query you attached verified nothing was clustered.

Just one more thing to look at. Please send the results of each of the
following to a file and attach.

EXPLAIN ANALYZE SELECT * FROM data2011_01 WHERE taxiid::text =
'SZB07P23'::text;

EXPLAIN ANALYZE SELECT * FROM data2013_01w WHERE taxiid::text =
'SZB07P23'::text;;

On Tue, Feb 2, 2016 at 10:16 AM, Yu Nie <nieyu04@gmail.com> wrote:

One more observation: not sure if it helps with the diagnosis.

If I query based on time interval, not based on taxiid, the query is
generally faster - I believe it is due to the way the data were loaded.
Yet there is still significant performance difference: the large table
seems runs faster than the small table. More importantly, this time the
planner proposes two very different plans for the two tables. Not sure why
Bitmap index was used for the small table...

See results below:

explain (analyze, buffers) select * from data2011_01 where time >=
'2011-01-16 12:00:00' and time < '2011-01-16 12:05:00' order by time;

For small table.
"Sort (cost=529024.61..529411.73 rows=154848 width=55) (actual
time=3761.329..3765.501 rows=149807 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 27211kB"
" Buffers: shared hit=39 read=4304"
" -> Bitmap Heap Scan on data2013_01w (cost=3287.76..515676.33
rows=154848 width=55) (actual time=29.028..3714.592 rows=149807 loops=1)"
" Recheck Cond: (("time" >= '2013-01-16 12:00:00'::timestamp
without time zone) AND ("time" < '2013-01-16 12:05:00'::timestamp without
time zone))"
" Heap Blocks: exact=3929"
" Buffers: shared hit=39 read=4304"
" -> Bitmap Index Scan on data2013_01w_ixtime (cost=0.00..3249.05
rows=154848 width=0) (actual time=17.931..17.931 rows=149807 loops=1)"
" Index Cond: (("time" >= '2013-01-16 12:00:00'::timestamp
without time zone) AND ("time" < '2013-01-16 12:05:00'::timestamp without
time zone))"
" Buffers: shared hit=1 read=413"
"Planning time: 0.203 ms"
"Execution time: 3779.368 ms"

For large table.
"Index Scan using data2011_01_ixtime on data2011_01 (cost=0.57..78012.46
rows=110204 width=55) (actual time=43.765..709.985 rows=104617 loops=1)"
" Index Cond: (("time" >= '2011-01-16 12:00:00'::timestamp without time
zone) AND ("time" < '2011-01-16 12:05:00'::timestamp without time zone))"
" Buffers: shared hit=79619 read=1482"
"Planning time: 0.131 ms"
"Execution time: 713.446 ms"

On Mon, Feb 1, 2016 at 8:50 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

OK, one more thing to try.

Please send the output of the following query to a file and attach so I
can review.

SELECT c.relname as idxname,
i.indisclustered,
i.indisvalid,
i.indcollation
FROM pg_index i
JOIN pg_class c ON i.indexrelid = c.oid
WHERE i.indrelid IN (SELECT oid
FROM pg_class
WHERE relkind = 'r'
AND relname = 'data2011_01'
OR relname = 'data2013_01w'
)
ORDER BY 1;

On Mon, Feb 1, 2016 at 5:51 PM, Yu Nie <nieyu04@gmail.com> wrote:

Yeah, it is really puzzling. :(

No, I have never attempted to cluster the large table - as I mentioned
before the two tables were created almost exactly the same way (through a
c++ api), except with different data files. I only use theme for query, and
had never done any update/insert/delete since the creation.

On Mon, Feb 1, 2016 at 4:41 PM, Melvin Davidson <melvin6925@gmail.com>
wrote:

At this point I can see no possible explanation why this would occur.

Just a thought though, is it possible that data2011_01 was clustered on
the index at some point but data2013_01w has not been clustered?
If you cluster data2013_01w on the index, does the performance change?

On Mon, Feb 1, 2016 at 4:03 PM, Yu Nie <nieyu04@gmail.com> wrote:

Melvin,

Please see attached for the requests results. I ran two queries (each
with a different taxiid that is next to each other) for each table. Note
that for the large table one is much faster than the other because the
shared cache was used for the second query. This does not work however for
the small table.

Many thanks for your willingness to help!

Best, Marco

On Mon, Feb 1, 2016 at 2:25 PM, melvin6925 <melvin6925@gmail.com>
wrote:

Fine. Please rerun both explains and my queries and send ALL output
to a file (F8 in PGADMIN Sql). Then attach the file.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Yu Nie <nieyu04@gmail.com>
Date: 2/1/2016 15:17 (GMT-05:00)
To: melvin6925 <melvin6925@gmail.com>
Subject: Re: [GENERAL] strange sql behavior

Yes, absolutely.

On Mon, Feb 1, 2016 at 2:12 PM, melvin6925 <melvin6925@gmail.com>
wrote:

And were _your_ queries run on the same day at the same time within
a few seconds of each other?

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Yu Nie <nieyu04@gmail.com>
Date: 2/1/2016 15:01 (GMT-05:00)
To: melvin6925 <melvin6925@gmail.com>
Cc: Bill Moran <wmoran@potentialtech.com>,
pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Thanks, Bill and Melvin!

Just some quick note/answers before I absorb all the information
provided by Bill.

1. I don't expect many users running queries against the tables,
especially for the small table - since I just created it this morning, and
nobody know about it except myself.

2. The setting in the config:

shared_buffers = 512MB # min 128kB
work_mem = 128MB # min 64kB

3. I am running a Windows 7 with 24 GB RAM. and my postgresal is
9.4.

4. here is the query I ran:

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as last_autoanalyze
,
round(
current_setting('autovacuum_vacuum_threshold')::integer +
current_setting('autovacuum_vacuum_scale_factor')::numeric * C.reltuples)
AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ('data2013_01w', 'data2011_01')
ORDER by 1, 2;

I copied the result from PGAdmin directly, here it is again.:

public;"data2011_01";784236864;784236885;0;0;"";"";"";"2016-01-19
17:31:08-06";156847423
public;"data2013_01w";300786432;300786444;0;0;"";"";"2016-02-01
08:57:24-06";"2016-02-01 04:01:04-06";60157336

On Mon, Feb 1, 2016 at 1:45 PM, melvin6925 <melvin6925@gmail.com>
wrote:

Thanks Bill.
Also, it's very important to include the headers with the queries!

Marco,
There is no top secret information that is requested, so please do
not edit the output.

Sent via the Samsung Galaxy S® 6, an AT&T 4G LTE smartphone
-------- Original message --------
From: Bill Moran <wmoran@potentialtech.com>
Date: 2/1/2016 14:41 (GMT-05:00)
To: Yu Nie <nieyu04@gmail.com>
Cc: Melvin Davidson <melvin6925@gmail.com>,
pgsql-general@postgresql.org
Subject: Re: [GENERAL] strange sql behavior

Came a little late to the thread, see many comments inline below:

On Mon, 1 Feb 2016 13:16:13 -0600
Yu Nie <nieyu04@gmail.com> wrote:

Thanks a lot for your reply. I ran the query you suggested and

here are

the results

Large table:

"public";"data2011_01";784236885;0;0;"";"";"";"2016-01-19

17:31:08-06";156847423
Small table:

"public";"data2013_01w";300786444;0;0;"";"";"2016-02-01

08:57:24-06";"2016-02-01 04:01:04-06";60157336

You didn't do Melvin's query correctly. He specified 11 columns to
select, but you only show 10. Since you don't show the query you
actually ran, we don't know which of the numeric columns is missing.

More information inline below:

On Mon, Feb 1, 2016 at 1:00 PM, Melvin Davidson <

melvin6925@gmail.com>

wrote:

One thing to look at is the last time both tables were

vacuumed/analyzed.

SELECT n.nspname,
s.relname,
c.reltuples::bigint,
n_tup_ins,
n_tup_upd,
n_tup_del,
date_trunc('second', last_vacuum) as last_vacuum,
date_trunc('second', last_autovacuum) as last_autovacuum,
date_trunc('second', last_analyze) as last_analyze,
date_trunc('second', last_autoanalyze) as

last_autoanalyze

,
round(

current_setting('autovacuum_vacuum_threshold')::integer +

current_setting('autovacuum_vacuum_scale_factor')::numeric *

C.reltuples)

AS av_threshold
FROM pg_stat_all_tables s
JOIN pg_class c ON c.oid = s.relid
JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE s.relname NOT LIKE 'pg_%'
AND s.relname NOT LIKE 'sql_%'
AND s.relname IN ' "your_small_table", "your_large_table"'
ORDER by 1, 2;

Also, please confirm the indexes for both tables are using the

same method

(btree?).

On Mon, Feb 1, 2016 at 1:35 PM, Yu Nie <nieyu04@gmail.com>

wrote:

Hi there,

Recently I am working with a large amount of taxis GIS data

and had

encountered some weird performance issues. I am hoping

someone in this

community can help me figure it out.

The taxi data were loaded in 5 minute block into a table. I

have two

separate such tables, one stores a month of data with about

700 million

rows, another stores about 10 days of data with about 300

million rows.

The two tables have the exactly same schema and indexes. There

are two

indexes: one on taxiid (text), and the other on the time stamp

(date

time). In order to process the data, I need to get all points

for a single

taxis; to do that, I use something like:
select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently

much faster

for the large table than for the small table, which seems to

contradict

with intuition. At the end of message you may find explain

(analyze

buffer) results of two particular queries for the same taxiid

(one for each

table). You can see that it took much longer (more than 20

times) to get

20k rows from the small table than to get 44 k rows from the

large table.

Interestingly it seems that the planner does expect about 1/3

work for the

small table query - yet for some reason, it took much longer

to fetch the

rows from the small table. Why there is such a huge

performance between

the two seemingly identical queries executed on two different

tables?

Is is because the data on the second table is on some

mysteriously

"broken part" of the disk? what else could explain such a

bizarre

behavior? Your help is greatly appreciated.

The above behavior is consistent through all queries.

Another issue I

identified is that for the large table, the query can use the

shared buffer

more effectively. For example, after you query one taxiid and

immediately

following that query run the same query for another taxi whose

id ranks

right behind the first id, then shared hit buffers would be

quite high (and

the query would run much faster); this however never works for

the small

table.

Thanks a lot!

Best, Marco

Results for the small table: it took 141 seconds to finish.

The planning

time is 85256.31

"Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual
time=141419.499..141420.025 rows=20288 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 3622kB"
" Buffers: shared hit=92 read=19816"
" -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27
rows=22101 width=55) (actual time=50.762..141374.777

rows=20288 loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=19826"
" Buffers: shared hit=92 read=19816"

^^
Note that despite this table being smaller, Postgres had to read
19816
blocks from disk, which is 2.5x more than the larger table.

" -> Bitmap Index Scan on data2013_01w_ixtaxiid
(cost=0.00..510.33 rows=22101 width=0) (actual

time=26.053..26.053

rows=20288 loops=1)"
" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=4 read=78"
"Planning time: 0.144 ms"
"Execution time: 141421.154 ms"

Results for the large table: it took 5 seconds to finish. The

planning

time is 252077.10
"Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual
time=5038.571..5039.765 rows=44204 loops=1)"
" Sort Key: "time""
" Sort Method: quicksort Memory: 7753kB"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Heap Scan on data2011_01

(cost=1520.29..246672.53

rows=65512 width=55) (actual time=36.935..5017.463 rows=44204

loops=1)"

" Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Heap Blocks: exact=7372"
" Buffers: shared hit=2 read=7543"
" -> Bitmap Index Scan on data2011_01_ixtaxiid
(cost=0.00..1503.92 rows=65512 width=0) (actual

time=35.792..35.792

rows=44204 loops=1)"

^^
Note that the larger table took LONGER to do the index work than the
smaller table, which probably means there's nothing wrong with your
disks or anything ... that's the behavior I would expect.

" Index Cond: ((taxiid)::text = 'SZB00S41'::text)"
" Buffers: shared hit=2 read=171"
"Planning time: 0.127 ms"
"Execution time: 5042.134 ms"

So, what I'm seeing, is that Postgres is able to figure out _which_
rows
to fetch faster on the small table than the large table, which is
what
you would expect, since a smaller index should be faster than a
large one.

However, when it goes to actually fetch the row data, it takes
significantly longer on the small table, despite the fact that it's
only fetching 1/3 as many rows. It is, however, doing 2.5x as many
disk reads to get those rows: For the large table, it reads 61MB
from
disk, but it reads 160MB to get all the data for the smaller table.

How the data was inserted into each table could lead to similar data
being clustered on common pages on the large table, while it's
spread
across many more pages on the small table.

That still doesn't explain it all, though. 2.5x the disk
activity normally wouldn't equate to 28x the time required. Unless
you're disks are horrifically slow? Does this server have a lot of
other activity against the disks? I.e. are other people running
queries that you would have to contend with, or is the server a VM
sharing storage with other VMs, or even a combined use server that
has to share disk access with (for example) a web or mail server
as well? Is the performance difference consistently ~28x?

Other things: what is shared_buffers set to? The queries would seem
to indicate that this server has less than 1M of those two tables
cached in memory at the time you ran those queries, which seems to
suggest that either you've got shared_buffers set very low, or that
there are a lot of other tables that other queries are accessing at
the time you're running these. Perhaps installing pg_buffercache to
have a look at what's using your shared_buffers would be helpful.

--
Bill Moran

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#16John R Pierce
pierce@hogranch.com
In reply to: Melvin Davidson (#15)
Re: strange sql behavior

please stop top posting, and quoting 100s and 100s of lines of old
dreck. This list uses inline posting, and its preferred to edit out
any unimportant junk from the quoted postings.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Yu Nie (#1)
Re: strange sql behavior

On 2016-02-01 12:35:35 -0600, Yu Nie wrote:

Recently I am working with a large amount of taxis GIS data and had encountered
some weird performance issues.  I am hoping someone in this community can help
me figure it out.

The taxi data were loaded in 5 minute block into a table.  I have two separate
such tables, one stores a month of data with about 700 million rows, another
stores about 10 days of data with about 300 million rows.  The two tables have
the exactly same schema and indexes. There are two indexes: one on taxiid
(text), and the other on the time stamp (date time).  In order to process the
data, I need to get all points for a single taxis; to do that, I use something
like:
 select * from table1 where taxiid = 'SZB00S41' order by time;
What puzzled me greatly is that this query runs consistently much faster for
the large table than for the small table, which seems to contradict with
intuition.

[...]

Results for the small table: it took 141 seconds to finish.  The planning time
is 85256.31

"Sort  (cost=85201.05..85256.31 rows=22101 width=55) (actual time=
141419.499..141420.025 rows=20288 loops=1)"
"  Sort Key: "time""
"  Sort Method: quicksort  Memory: 3622kB"
"  Buffers: shared hit=92 read=19816"
"  ->  Bitmap Heap Scan on data2013_01w  (cost=515.86..83606.27 rows=22101
width=55) (actual time=50.762..141374.777 rows=20288 loops=1)"
"        Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
"        Heap Blocks: exact=19826"
"        Buffers: shared hit=92 read=19816"

^^^^^^^^^^
[...]

Results for the large table: it took 5 seconds to finish.  The planning time is
252077.10
"Sort  (cost=251913.32..252077.10 rows=65512 width=55) (actual time=
5038.571..5039.765 rows=44204 loops=1)"
"  Sort Key: "time""
"  Sort Method: quicksort  Memory: 7753kB"
"  Buffers: shared hit=2 read=7543"
"  ->  Bitmap Heap Scan on data2011_01  (cost=1520.29..246672.53 rows=65512
width=55) (actual time=36.935..5017.463 rows=44204 loops=1)"
"        Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)"
"        Heap Blocks: exact=7372"
"        Buffers: shared hit=2 read=7543"

^^^^^^^^^
[....]

The obvious difference is that the query for the smaller table needs to
read about 2.5 times as many blocks (for 1/3 of the records) from the
disk. This suggests that the data for a single taxi is more localized in
the larger table. In addition, the average time per block on the smaller
table is about 7 ms, which is a typical random seek time for a disk. So
the blocks are probably randomly scattered through the table. For the
larger table, the average time is well below 1 ms, so there are probably
many consecutive blocks to read.

There are 2880 5 minute intervals in 10 days. You have about 22k records
per taxi, so there are about 7.6 records for each taxi per interval.
This is very close to the number of records per block in your second
query (65512/7372 = 8.9). I suspect that the records in your larger
table are sorted by taxiid within each interval.

You can almost certainly get a similar speedup by sorting each 5 minute
interval by taxi id before appending it to the table.

If querying by taxiid is typical and your table is static, you should
consider clustering the table by taxiid. If your table is updated every
5 minutes, you could partition it by day and cluster each partition as
soon as it is not written any more.

hp

--
_ | Peter J. Holzer | I want to forget all about both belts and
|_|_) | | suspenders; instead, I want to buy pants
| | | hjp@hjp.at | that actually fit.
__/ | http://www.hjp.at/ | -- http://noncombatant.org/