ice-broker scan thread

david_list@boreham.org

over 20 years ago

In reply to: Qingqing Zhou (#1)

Re: ice-broker scan thread

Qingqing Zhou wrote:

I am considering add an "ice-broker scan thread" to accelerate PostgreSQL
sequential scan IO speed. The basic idea of this thread is just like the
"read-ahead" method, but the difference is this one does not read the data
into shared buffer pool directly, instead, it reads the data into file
system cache, which makes the integration easy and this is unique to
PostgreSQL.

Interesting, and I wondered about this too. But for my taste the
demonstrated benefit really
isn't large enough to make it worthwhile.
BTW, I heard a long time ago that NTFS has quite fancy read-ahead, where
it attempts
to detect the application's access pattern including if it is reading
sequentially and even
if there is a 'stride' to the accesses when they're not contiguous. I
would imagine that
other filesystems attempt similar tricks. So one might expect a simple
linear prefectch
to not help much in the presence of such a filesystem.

Were you worried about the icebreaker thread getting too far ahead of
the scan ?
If it did it might page out the data you're about to read, I think. Of
course this could
be fixed by having the read ahead thread perodically check the current
location being
read by the query thread and pausing if it's got too far ahead.

Anyway, the recent performance thread has been intersting to me because
in all my career
I've never seen a database that scanned scads of data from disk to
process a query.
Typically the problems I work on arrange to read the entire database
into memory.
I think I need to get out more... ;)

swm@linuxworld.com.au

over 20 years ago

In reply to: Qingqing Zhou (#1)

Re: ice-broker scan thread

On Mon, 28 Nov 2005, Qingqing Zhou wrote:

I am considering add an "ice-broker scan thread" to accelerate PostgreSQL
sequential scan IO speed. The basic idea of this thread is just like the
"read-ahead" method, but the difference is this one does not read the data
into shared buffer pool directly, instead, it reads the data into file
system cache, which makes the integration easy and this is unique to
PostgreSQL.

MySQL, Oracle and others implement read-ahead threads to simulate async IO
'pre-fetching'. I've been experimenting with two ideas. The first is to
increase the readahead when we're doing sequential scans (see prototype
patch using posix fadvise attached). I've not got any hardware at the
moment which I can test this patch on but I am waiting on some dbt-3
results which should indicate whether fadvise is a good idea or a bad one.

The second idea is using posix async IO at key points within the system
to better parallelise CPU and IO work. There areas I think we could use
async IO are: during sequential scans, use async IO to do pre-fetching of
blocks; inside WAL, begin flushing WAL buffers to disk before we commit;
and, inside the background writer/check point process, asynchronously
write out pages and, potentially, asynchronously build new checkpoint segments.

The motivation for using async IO is two fold: first, the results of this
paper[1]http://www.vldb2005.org/program/paper/wed/p1116-hall.pdf are compelling; second, modern OSs support async IO. I know that
Linux[2]http://lse.sourceforge.net/io/aionotes.txt, Solaris[3]http://developers.sun.com/solaris/articles/event_completion.html - I'm fairly sure they have a posix AIO wrapper around these routines, but I cannot see it documented anywhere :-(, AIX and Windows all have async IO and I presume that
all their rivals have it as well.

The fundamental premise of the paper mentioned above is that if the
database is busy, IO should be busy. With our current block-at-a-time
processing, this isn't always the case. This is why Qingqing's read-ahead
thread makes sense. My reason for mailing is, however, that the async IO
results are more compelling than the read ahead thread.

I haven't had time to prototype whether we can easily implement async IO
but I am planning to work on it in December. The two main goals will be to
a) integrate and utilise async IO, at least within the executor context,
and b) build a primitive kind of scheduler so that we stop prefetching
when we know that there are a certain number of outstanding IOs for a
given device.

Thanks,

Gavin

[1]: http://www.vldb2005.org/program/paper/wed/p1116-hall.pdf
[2]: http://lse.sourceforge.net/io/aionotes.txt
[3]: http://developers.sun.com/solaris/articles/event_completion.html - I'm fairly sure they have a posix AIO wrapper around these routines, but I cannot see it documented anywhere :-(
fairly sure they have a posix AIO wrapper around these routines, but I
cannot see it documented anywhere :-(

Christopher Kings-Lynne

chriskl@familyhealth.com.au

over 20 years ago

In reply to: David Boreham (#2)

Re: ice-broker scan thread

Qingqing,

I am considering add an "ice-broker scan thread" to accelerate PostgreSQL
sequential scan IO speed. The basic idea of this thread is just like the
"read-ahead" method, but the difference is this one does not read the
data
into shared buffer pool directly, instead, it reads the data into file
system cache, which makes the integration easy and this is unique to
PostgreSQL.

You probably mean "ice-breaker" by the way :)

Chris

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Gavin Sherry (#3)

Re: ice-broker scan thread

Gavin Sherry <swm@linuxworld.com.au> writes:

I haven't had time to prototype whether we can easily implement async IO

Just as with any suggestion to depend on threads, you are going to have
to show results that border on astounding to have any chance of getting
this in. Otherwise the portability issues are just going to make it not
worth the trouble.

regards, tom lane

david_list@boreham.org

over 20 years ago

In reply to: Gavin Sherry (#3)

Re: ice-broker scan thread

Gavin Sherry wrote:

MySQL, Oracle and others implement read-ahead threads to simulate async IO

I always believed that Oracle used async file I/O. Not that I've seen their
code, but I'm fairly sure they funded the addition of kernel aio to Linux
a few years back.

But....Oracle comes from a time long ago when threads and decent
filesystems didn't exist, so some of the things they do may not be
appropriate
to add to a product that doesn't have them today.

Now...network async I/O...that'd be really useful in my world...

swm@linuxworld.com.au

over 20 years ago

In reply to: David Boreham (#6)

Re: ice-broker scan thread

On Mon, 28 Nov 2005, David Boreham wrote:

Gavin Sherry wrote:

MySQL, Oracle and others implement read-ahead threads to simulate async IO

I always believed that Oracle used async file I/O. Not that I've seen their
code, but I'm fairly sure they funded the addition of kernel aio to Linux
a few years back.

That's right.

But....Oracle comes from a time long ago when threads and decent
filesystems didn't exist, so some of the things they do may not be
appropriate
to add to a product that doesn't have them today.

The paper I linked to seemed to suggest that they weren't using async IO
in 9.2 -- which is fairly old. I'm not sure why the authors didn't test
10g.

Gavin

Mark Kirkwood

mark.kirkwood@catalyst.net.nz

over 20 years ago

In reply to: Tom Lane (#5)

Re: ice-broker scan thread

Tom Lane wrote:

Gavin Sherry <swm@linuxworld.com.au> writes:

I haven't had time to prototype whether we can easily implement async IO

Just as with any suggestion to depend on threads, you are going to have
to show results that border on astounding to have any chance of getting
this in. Otherwise the portability issues are just going to make it not
worth the trouble.

Do these ideas require threads in principle? ISTM that there could be
(additional) process(es) waiting to perform pre-fetching or async io,
and we could use the usual IPC machinary to talk between them...

cheers

Mark

david_list@boreham.org

over 20 years ago

In reply to: Gavin Sherry (#7)

Re: ice-broker scan thread

The paper I linked to seemed to suggest that they weren't using async IO
in 9.2 -- which is fairly old. I'm not sure why the authors didn't test
10g.

...<reads paper>... ok, interesting. Did they say that Oracle isn't
using aio ?
I can't see that. They that Oracle has no more than one outstanding I/O
operation in flight per concurrent query,
and they appear to think that's a bad thing. I'm not seeing
that myself. Perhaps once I sleep on it, it'll become clear what they're
getting at.

One theory for lack of aio in Oracle as tested in that paper would be
that they
were testing on Linux. Since aio is relatively new in Linux I wouldn't
be surprised
if Oracle didn't actually use it until it's known to be widely deployed
in the field
and to have proven reliability. Perhaps we've reached that state around now,
and so Oracle may not yet have released an aio-capable Linux version of
their
RDBMS. Just a theory...someone from those tubular towers lurking here
could tell us for sure I guess...

#10

swm@linuxworld.com.au

over 20 years ago

In reply to: Tom Lane (#5)

Re: ice-broker scan thread

On Mon, 28 Nov 2005, Tom Lane wrote:

Gavin Sherry <swm@linuxworld.com.au> writes:

I haven't had time to prototype whether we can easily implement async IO

Just as with any suggestion to depend on threads, you are going to have
to show results that border on astounding to have any chance of getting
this in. Otherwise the portability issues are just going to make it not
worth the trouble.

The architecture I am looking at would not rely on threads.

I didn't want to jump on list and waive my hands until I had something to
show, but since Qingqing is looking at the issue I thought I better raise
it.

Gavin

#11

Mark Kirkwood

mark.kirkwood@catalyst.net.nz

over 20 years ago

In reply to: Gavin Sherry (#7)

Re: ice-broker scan thread

Gavin Sherry wrote:

The paper I linked to seemed to suggest that they weren't using async IO
in 9.2 -- which is fairly old. I'm not sure why the authors didn't test
10g.

There have been async io type parameters in Oracle's init.ora files from
(at least) 8i (disk_async_io=true IIRC) - on Solaris anyway. Whether
this enabled real or simulated async io is probably a good question - I
recall during testing turning it off and seeing kio()? or similar type
calls become write()/read() in truss oupout.

regards

Mark

#12

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Mark Kirkwood (#8)

Re: ice-broker scan thread

On Mon, 28 Nov 2005, Mark Kirkwood wrote:

Do these ideas require threads in principle? ISTM that there could be
(additional) process(es) waiting to perform pre-fetching or async io,
and we could use the usual IPC machinary to talk between them...

Right. I use threads because it is easy to write simulation program :-)

Regards,
Qingqing

#13

Jonah H. Harris

jonah.harris@gmail.com

over 20 years ago

In reply to: Gavin Sherry (#10)

Re: ice-broker scan thread

FYI, I've personally used Oracle 9.2.0.4's async IO on Linux and have seen
several installations which make use of it also.

Show quoted text

On 11/28/05, Gavin Sherry <swm@linuxworld.com.au> wrote:

On Mon, 28 Nov 2005, Tom Lane wrote:

Gavin Sherry <swm@linuxworld.com.au> writes:

I haven't had time to prototype whether we can easily implement async

IO

Just as with any suggestion to depend on threads, you are going to have
to show results that border on astounding to have any chance of getting
this in. Otherwise the portability issues are just going to make it not
worth the trouble.

The architecture I am looking at would not rely on threads.

I didn't want to jump on list and waive my hands until I had something to
show, but since Qingqing is looking at the issue I thought I better raise
it.

Gavin

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#14

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Gavin Sherry (#3)

Re: ice-broker scan thread

On Mon, 28 Nov 2005, Gavin Sherry wrote:

MySQL, Oracle and others implement read-ahead threads to simulate async IO
'pre-fetching'.

Due to my tests on Windows (using the attached program and change
enable_aio=true), seems aio doesn't help as a separate thread - but maybe
because my usage is wrong ...

Regards,
Qingqing

#15

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Gavin Sherry (#10)

Re: ice-broker scan thread

On Mon, 28 Nov 2005, Gavin Sherry wrote:

I didn't want to jump on list and waive my hands until I had something to
show, but since Qingqing is looking at the issue I thought I better raise
it.

Don't worry :-) I separate the logic into a standalone program in order to
let more people can help on this issue.

Regards,
Qingqing

#16

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Qingqing Zhou (#1)

Re: ice-broker scan thread

"David Boreham" <david_list@boreham.org> wrote

BTW, I heard a long time ago that NTFS has quite fancy read-ahead, where
it attempts to detect the application's access pattern including if it is
reading sequentially and even if there is a 'stride' to the accesses when
they're not contiguous. I would imagine that other filesystems attempt
similar tricks. So one might expect a simple linear prefectch
to not help much in the presence of such a filesystem.

So we need more tests. I understand how smart current file systems are, and
seems that depends on the interval that you send next file block read
request (decided by cpu_cost parameter in my program).

I imagine on a multi-way machine with strong IO device, the ice-breaker
could do much better ...

Were you worried about the icebreaker thread getting too far ahead of the
scan ? If it did it might page out the data you're about to read, I think.
Of course this could be fixed by having the read ahead thread perodically
check the current location being read by the query thread and pausing if
it's got too far ahead.

Right.

Regards,
Qingqing

#17

david_list@boreham.org

over 20 years ago

In reply to: Qingqing Zhou (#14)

Re: ice-broker scan thread

Qingqing Zhou wrote:

On Mon, 28 Nov 2005, Gavin Sherry wrote:

MySQL, Oracle and others implement read-ahead threads to simulate async IO
'pre-fetching'.

Due to my tests on Windows (using the attached program and change
enable_aio=true), seems aio doesn't help as a separate thread - but maybe
because my usage is wrong ...

I don't think your NT overlapped I/O code is quite right. At least
I think it will issue reads at a high rate without waiting for any of them
to complete. Beyond some point that has to give the kernel gut-rot.
But anyway, I wouldn't expect the use of aio to make any
significant difference in an already threaded test program.
The point of aio is to allow
I/O concurrency _without_ the use of threads or multiple processes.
You could re-write your program to have a single thread but use aio.
In that case it should show the same read ahead benefit that you see
with the thread.

#18

swm@linuxworld.com.au

over 20 years ago

In reply to: Qingqing Zhou (#14)

Re: ice-broker scan thread

On Mon, 28 Nov 2005, Qingqing Zhou wrote:

On Mon, 28 Nov 2005, Gavin Sherry wrote:

MySQL, Oracle and others implement read-ahead threads to simulate async IO
'pre-fetching'.

Due to my tests on Windows (using the attached program and change
enable_aio=true), seems aio doesn't help as a separate thread - but maybe
because my usage is wrong ...

Right, I would imagine that it's very close. I intend to use kernel based
async IO so that we can have the prefetch effect of your sample program
without the need for threads.

Thanks,

Gavin

#19