Add free-behind capability for large sequential scans

Started by Amit Kumar Khareover 24 years ago5 messageshackers

skamit2000@yahoo.com

over 24 years ago

Hi All,

(1)I am Amit Kumar Khare, I am doing MCS from UIUC USA
offcampus from India.

(2) We have been asked to enhance postgreSQL in one of
our assignments. So I
have chosen to pick "Add free-behind capability for
large sequential scans"
from TODO list. Many thanks to Mr. Bruce Momjian who
helped me out and
suggested to make a patch for this problem.

(3)As explained to me by Mr. Bruce, the problem
description is that if say
cache size is 1 mb and a sequential scan is done
through a 2mb file over and
over again the cache becomes useless.Because by the
time the second read of
the table happens the first 1mb has been forced out of
the cache already.Thus
the idea is not to cache very large sequential scans,
but to cache index scans
small sequential scans.

(4)what I think the problem arises because of default
LRU page replacement
policy. So I think we have to make use of MRU or LRU-K
page replacement
policies.

(5)But I am not sure and I wish more input into the
problem description from
you all. I have started reading the buffer manager
code and I found that
freelist.c may be needed to be modified and may be
some other too since we
have to identify the large sequential scans.

Please help me out

Regards
Amit Kumar Khare

__________________________________________________
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!
http://greetings.yahoo.com

Tom Lane

tgl@sss.pgh.pa.us

over 24 years ago

In reply to: Amit Kumar Khare (#1)

Re: Add free-behind capability for large sequential scans

Amit Kumar Khare <skamit2000@yahoo.com> writes:

(4)what I think the problem arises because of default LRU page
replacement policy. So I think we have to make use of MRU or LRU-K
page replacement policies.

(5)But I am not sure and I wish more input into the problem
description from you all. I have started reading the buffer manager
code and I found that freelist.c may be needed to be modified and may
be some other too since we have to identify the large sequential
scans.

I do not think it's a good idea for the buffer manager to directly try
to recognize sequential scans; any such attempt will fall down on the
fundamental problem that there may be more than one backend accessing
the same table concurrently. Plus it would introduce undesirable
coupling between the buffer manager and higher-level code. I like the
idea of using LRU-K or other advanced page replacement policies,
instead of plain LRU.

I did some experimentation with LRU-2 awhile back and didn't see any
measurable performance improvement in the small number of test cases
I tried. But I was not looking at the issue of cache-flushing caused
by large seqscans (the test cases I tried probably didn't do any
seqscans at all). It's quite possible that that's a sufficient reason
to adopt LRU-2 anyway.

regards, tom lane

Amit Kumar Khare

skamit2000@yahoo.com

over 24 years ago

In reply to: Tom Lane (#2)

Re: Add free-behind capability for large sequential scans

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

Amit Kumar Khare <skamit2000@yahoo.com> writes:

(4)what I think the problem arises because of

default LRU page

replacement policy. So I think we have to make use

of MRU or LRU-K

page replacement policies.

(5)But I am not sure and I wish more input into

the problem

description from you all. I have started reading

the buffer manager

code and I found that freelist.c may be needed to

be modified and may

be some other too since we have to identify the

large sequential

scans.

I do not think it's a good idea for the buffer
manager to directly try
to recognize sequential scans; any such attempt will
fall down on the
fundamental problem that there may be more than one
backend accessing
the same table concurrently. Plus it would
introduce undesirable
coupling between the buffer manager and higher-level
code. I like the
idea of using LRU-K or other advanced page
replacement policies,
instead of plain LRU.

Sir, What I have in my mind is some thing like what
Hong-Tai Chou and David J. DeWitt proposes in there
paper titled "An Evaluation of Buffer Manaagement
Strategies for Relational Database Systems", when talk
about there "DBMIN" algorithm.

The problem is same here(Please correct me if I am
wrong) what they talk of. They have recognized like
Professor Stonebraker certain Access patterns (like
clustered sequential, looping sequential etc.)in
Database Systems and recomend a "Composite page
replacement policy". But how the buffer manager will
know what policy has to be applied? They say "When a
file is opened, its associated locality set size and
REPLACEMENT POLICY are given to the buffer manager".

I am thinking of implementing similar strategy. Since
Planner/Optimizer hand over the execution plan to the
executor it can also pass the information regarding
page replacement. Then executor can pass this
information through
heapam->relcache->smgr -> bufmgr -> finally to
freelist.c (I may be wrong in the sequence, this
conclusion is from my first study of code. I know I
have to go through it over and over again to get hang
of it)

Hence as you said we can avoid the undesirable
coupling between the buffer manager and higher-level
code.

Sir, is this scheme feasible? if not then can you
guide me ?

Amit khare

I did some experimentation with LRU-2 awhile back
and didn't see any
measurable performance improvement in the small
number of test cases
I tried. But I was not looking at the issue of
cache-flushing caused
by large seqscans (the test cases I tried probably
didn't do any
seqscans at all). It's quite possible that that's a
sufficient reason
to adopt LRU-2 anyway.
regards, tom lane

---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the
unregister command
(send "unregister YourEmailAddressHere" to

majordomo@postgresql.org)

__________________________________________________
Do You Yahoo!?
Send FREE Valentine eCards with Yahoo! Greetings!
http://greetings.yahoo.com

Neil Padgett

npadgett@redhat.com

over 24 years ago

In reply to: Amit Kumar Khare (#3)

Re: Add free-behind capability for large sequential scans

On Wed, 2002-02-13 at 03:11, Amit Kumar Khare wrote:

[clip]

The problem is same here(Please correct me if I am
wrong) what they talk of. They have recognized like
Professor Stonebraker certain Access patterns (like
clustered sequential, looping sequential etc.)in
Database Systems and recomend a "Composite page
replacement policy". But how the buffer manager will
know what policy has to be applied? They say "When a
file is opened, its associated locality set size and
REPLACEMENT POLICY are given to the buffer manager".

This is indeed one possibility. However, the problem, as pointed out in
[1]: E.H. O'Neil, P.E. O'Neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. _In Proceeedings of the 1993 ACM Sigmod International Conference on Management of Data_, pages 297-306, 1993.
execution analysis is hard. The real problem is -- how do you handle the
interaction of multiple simultaneous queries with the buffer pool?

This problem led for a search for a new approach, which in turn led to
simpler algorithms, like LRU-K [1]E.H. O'Neil, P.E. O'Neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. _In Proceeedings of the 1993 ACM Sigmod International Conference on Management of Data_, pages 297-306, 1993. and 2Q [2]Theodore Johnson and Dennis Shasha. 2Q: A Low Overhead High Performace Buffer Amanagement Replacement algorithm. In _Proceedings of the 20th VLDB Conference_, pages 439-450, 1994.. I'd much rather we pursue
algorithms of this type.

Neil

[1]: E.H. O'Neil, P.E. O'Neil, and G. Weikum. The LRU-K page replacement algorithm for database disk buffering. _In Proceeedings of the 1993 ACM Sigmod International Conference on Management of Data_, pages 297-306, 1993.
algorithm for database disk buffering. _In Proceeedings of the 1993 ACM
Sigmod International Conference on Management of Data_, pages 297-306,
1993.

[2]: Theodore Johnson and Dennis Shasha. 2Q: A Low Overhead High Performace Buffer Amanagement Replacement algorithm. In _Proceedings of the 20th VLDB Conference_, pages 439-450, 1994.
Performace Buffer Amanagement Replacement algorithm. In _Proceedings of
the 20th VLDB Conference_, pages 439-450, 1994.

--
Neil Padgett
Red Hat Canada Ltd. E-Mail: npadgett@redhat.com
2323 Yonge Street, Suite #300,
Toronto, ON M4P 2C9

Neil Padgett

npadgett@redhat.com

over 24 years ago

In reply to: Neil Padgett (#4)

Re: Add free-behind capability for large sequential scans

On Wed, 2002-02-13 at 12:44, Neil Padgett wrote:

[1] E.H. O'Neil, P.E. O'Neil, and G. Weikum. The LRU-K page replacement
algorithm for database disk buffering. _In Proceeedings of the 1993 ACM
Sigmod International Conference on Management of Data_, pages 297-306,
1993.

[2] Theodore Johnson and Dennis Shasha. 2Q: A Low Overhead High
Performace Buffer Amanagement Replacement algorithm. In _Proceedings of
the 20th VLDB Conference_, pages 439-450, 1994.

I've received some mail expressing interest in reading these papers. So,
I've grabbed some electronic copies from CiteSeer, and I've made them
available at:

http://people.redhat.com/npadgett/buffering-papers/

Neil

--
Neil Padgett
Red Hat Canada Ltd. E-Mail: npadgett@redhat.com
2323 Yonge Street, Suite #300,
Toronto, ON M4P 2C9