Index scan prefetch?

Started by Konstantin Knizhnikalmost 8 years ago4 messages
#1Konstantin Knizhnik
k.knizhnik@postgrespro.ru

Hi, hackers.

I was faced with the following bad performance use case with Postgres:
there is a huge append-only table with serial key (ID)
which is permanently appended using multithreaded pgloader.

Ideally (if there is no parallel load), this table should be clustered
by "ID" key: records are stored in the heap in ID increasing order.
So index scan using ID index should be very efficient, because records
with subsequent IDs are expected to belong to the same heap page.

In practice I see a lot of random reads and bad performance (read speed
is about 3Mb/sec, instead of 300Mb/sec for sequential read).

This is the typical page layout:

select ctid,"ID" from T limit 100;
ctid  |    ID
--------+-----------
 (0,1)  | 230827730
 (0,2)  | 230830833
 (0,3)  | 230840164
 (0,4)  | 230853499
 (0,5)  | 230862073
 (0,6)  | 230864443
 (0,7)  | 230870332
 (0,8)  | 230872992
 (0,9)  | 230873052
 (0,10) | 230875245
 (0,11) | 230877107
 (0,12) | 230878144
 (0,13) | 230878612
 (0,14) | 230885569
 (0,15) | 230885726
 (0,16) | 230886305
 (0,17) | 230886606
 (0,18) | 230892813
 (0,19) | 230895012
 (0,20) | 230895041
 (0,21) | 230895175
 (0,22) | 230896732
 (0,23) | 230897719
 (0,24) | 230897841
 (0,25) | 230898386
 (0,26) | 230899992
 (0,27) | 230900646
 (0,28) | 230901053
 (0,29) | 230901717
 (0,30) | 230904115
 (0,31) | 230908189

And this is the result of index scan:

select ctid,"ID" from T where "ID"  > 1669762074  limit 100;
   ctid    |     ID
-----------+------------
 (1890,22) | 1669762075
 (2035,20) | 1669762076
 (2104,19) | 1669762077
 (2518,27) | 1669762078
 (2530,15) | 1669762079
 (2552,32) | 1669762080
 (2578,30) | 1669762081
 (2587,19) | 1669762082
 (2676,17) | 1669762083
 (2701,15) | 1669762084
 (2707,20) | 1669762085
 (2804,22) | 1669762086
 (2837,28) | 1669762087
 (2902,17) | 1669762088
 (2972,28) | 1669762089
 (2992,36) | 1669762090
 (3014,27) | 1669762091
 (3038,24) | 1669762092
 (3063,22) | 1669762093
 (3116,33) | 1669762094
 (3131,28) | 1669762095
 (3139,19) | 1669762096
 (3167,17) | 1669762097
 (3170,28) | 1669762098
 (3202,21) | 1669762099
 (3213,35) | 1669762100
 (3220,21) | 1669762101
 (3236,21) | 1669762102
 (3282,21) | 1669762103
 (3314,22) | 1669762104
 (3366,26) | 1669762105

So records with subsequent IDs are not located sequentially but still
locality is good enough.
I think that prefetch can solve this problem.
But now  effective_io_concurrency parameter is  applicable only for
bitmap scan.

Will it be useful to support it also for index scan?
Or there are some other ways to address this problem?

Thanks in advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Konstantin Knizhnik (#1)
Re: Index scan prefetch?

On Mon, Mar 26, 2018 at 12:43:02PM +0300, Konstantin Knizhnik wrote:

Hi, hackers.

I was faced with the following bad performance use case with Postgres: there
is a huge append-only table with serial key (ID)
which is permanently appended using multithreaded pgloader.

I think this could be similar to the problem I reported here:
/messages/by-id/20160524173914.GA11880@telsasoft.com

The analysis at the time was that, due to "repeated keys", a btree index on the
timestamp column had non-consecutive heap TIDs (btree insertion uses random
early insertion point to avoid superlinear lookup cost during such insertions).

But, our case also involved multiple child processes simultaneously inserting
into same table, and I wonder if "repeated keys" were more or less unrelated to
the problem. The behavior is maybe caused easily by simultaneous insertions
"clustered" around the same target: for us: now(), for you: nextval().

But now effective_io_concurrency parameter is applicable only for bitmap

...

Will it be useful to support it also for index scan?
Or there are some other ways to address this problem?

Does your case perform well with bitmap heap scan (I mean bitmap scan of the
single index)? It seems to me that prefetch wouldn't help, as it would just
incur the same random cost you're already seeing; the solution may be to choose
another plan(bitmap) with sequential access to enable read-ahead,

Also: Claudio mentioned here that bitmap prefetch can cause the kernel to avoid
its own readahead, negatively affecting some queries:
/messages/by-id/8fb758a1-d7fa-4dcc-fb5b-07a992ae6a32@gmail.com

What's the pg_stats "correlation" for the table column with index being
scanned? How many tuples? Would you send explain(analyze,buffers) for the
problem query, and with SET enable_bitmapscan=off; ?

Justin

#3Claudio Freire
klaussfreire@gmail.com
In reply to: Justin Pryzby (#2)
Re: Index scan prefetch?

On Mon, Mar 26, 2018 at 4:59 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:

But now effective_io_concurrency parameter is applicable only for bitmap

...

Will it be useful to support it also for index scan?
Or there are some other ways to address this problem?

Does your case perform well with bitmap heap scan (I mean bitmap scan of the
single index)? It seems to me that prefetch wouldn't help, as it would just
incur the same random cost you're already seeing; the solution may be to choose
another plan(bitmap) with sequential access to enable read-ahead,

Also: Claudio mentioned here that bitmap prefetch can cause the kernel to avoid
its own readahead, negatively affecting some queries:
/messages/by-id/8fb758a1-d7fa-4dcc-fb5b-07a992ae6a32@gmail.com

What's the pg_stats "correlation" for the table column with index being
scanned? How many tuples? Would you send explain(analyze,buffers) for the
problem query, and with SET enable_bitmapscan=off; ?

Also, check out this thread:

http://www.postgresql-archive.org/Prefetch-index-pages-for-B-Tree-index-scans-td5728926.html

#4konstantin knizhnik
k.knizhnik@postgrespro.ru
In reply to: Justin Pryzby (#2)
Re: Index scan prefetch?

On Mar 26, 2018, at 10:59 PM, Justin Pryzby wrote:

On Mon, Mar 26, 2018 at 12:43:02PM +0300, Konstantin Knizhnik wrote:

Hi, hackers.

I was faced with the following bad performance use case with Postgres: there
is a huge append-only table with serial key (ID)
which is permanently appended using multithreaded pgloader.

I think this could be similar to the problem I reported here:
/messages/by-id/20160524173914.GA11880@telsasoft.com

The analysis at the time was that, due to "repeated keys", a btree index on the
timestamp column had non-consecutive heap TIDs (btree insertion uses random
early insertion point to avoid superlinear lookup cost during such insertions).

But, our case also involved multiple child processes simultaneously inserting
into same table, and I wonder if "repeated keys" were more or less unrelated to
the problem. The behavior is maybe caused easily by simultaneous insertions
"clustered" around the same target: for us: now(), for you: nextval().

But now effective_io_concurrency parameter is applicable only for bitmap

...

Will it be useful to support it also for index scan?
Or there are some other ways to address this problem?

Does your case perform well with bitmap heap scan (I mean bitmap scan of the
single index)? It seems to me that prefetch wouldn't help, as it would just
incur the same random cost you're already seeing; the solution may be to choose
another plan(bitmap) with sequential access to enable read-ahead,

Also: Claudio mentioned here that bitmap prefetch can cause the kernel to avoid
its own readahead, negatively affecting some queries:
/messages/by-id/8fb758a1-d7fa-4dcc-fb5b-07a992ae6a32@gmail.com

What's the pg_stats "correlation" for the table column with index being
scanned? How many tuples? Would you send explain(analyze,buffers) for the
problem query, and with SET enable_bitmapscan=off; ?

Thank you very much for your feedback.

Setting random_page_cost to 10 forces Postgres to use bitmap scan:

paradoxdb=# explain select * from public.people_raw where ("ID" < 2068113880) and not (("ID" < 1669762074)) order by 2,1,3;
QUERY PLAN
—---------------------------------------------------------------------------------------------------—
Sort (cost=269989441.71..270965997.99 rows=390622510 width=351)
Sort Key: "LastName", "FirstName", "MiddleName"
-> Index Scan using "People_pkey" on people_raw (cost=0.58..88742525.10 rows=390622510 width=351)
Index Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
(4 rows)

paradoxdb=# set random_page_cost to 10;
SET
paradoxdb=# explain select * from public.people_raw where ("ID" < 2068113880) and not (("ID" < 1669762074)) order by 2,1,3;
QUERY PLAN
—---------------------------------------------------------------------------------------------—
Sort (cost=397271527.57..398248083.84 rows=390622510 width=351)
Sort Key: "LastName", "FirstName", "MiddleName"
-> Bitmap Heap Scan on people_raw (cost=14905651.30..108450832.96 rows=390622510 width=351)
Recheck Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
-> Bitmap Index Scan on "People_pkey" (cost=0.00..14807995.68 rows=390622510 width=0)
Index Cond: (("ID" < 2068113880) AND ("ID" >= 1669762074))
(6 rows)

Correlation of "ID" column is 0.978884.

Sorry, it is customer's system and I have limited access to it.
Next load of data is expected during this week, so we will check how bitmap scan can improve performance.
According to IO traffic, I expect up to 100 times improvement.