Asynchronous I/O Support

Started by Raja Agrawalover 19 years ago30 messageshackers
Jump to latest
#1Raja Agrawal
raja.agrawal@gmail.com

Postgre8.1 doesn't seem to support asynchronous I/O. Has its design
been thought off already?

To tried doing with a simple example:
For a Index Nest loop join:
Fetch the outer tuples in an array, and then send all the
corresponding inner-tuple fetch requests asynchronously. Hence while
the IO is done for inner relation the new outer-tuple array can be
populated and other join operations can happen. This is maximum
overlap we could think of (doing minimal changes).

[The current implementation does sync IO, that is it fetches a outer
tuple, then requests corresponding inner tuple (waits till it gets),
does the processing, get another inner/outer tuple and so on.]

We have made appropriate changes in nodeNestloop.c but are unable to
track down how it issues the IO and gets the tuple in the slot.

Help! -- how to issue a async IO (given kernel 2.6 supports AIO), and
does a callback sceme or a sync IO on top of AIO, which of these will be best?

Also, as Graefe's paper suggests, a producer-consumer (thread-based)
is the best way to do this. But how to implement threading? (in case
its possible to?)

Sincere regards,
Raja Agrawal

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Raja Agrawal (#1)
Re: Asynchronous I/O Support

On Sun, Oct 15, 2006 at 04:16:07AM +0530, Raja Agrawal wrote:

Postgre8.1 doesn't seem to support asynchronous I/O. Has its design
been thought off already?

Sure, I even implemented it once. Didn't get any faster. At that point
I realised that my kernel didn't actually support async I/O, and the
glibc emulation sucks for anything other than network I/O, so I gave
up.

Maybe one of these days I should work out if my current system supports
it, and give it another go...

Have enough systems actually got to the point of actually supporting
async I/O that it's worth implementing?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#3Luke Lonergan
llonergan@greenplum.com
In reply to: Martijn van Oosterhout (#2)
Re: Asynchronous I/O Support

Martijn,

On 10/15/06 10:56 AM, "Martijn van Oosterhout" <kleptog@svana.org> wrote:

Have enough systems actually got to the point of actually supporting
async I/O that it's worth implementing?

I think there are enough high end applications / systems that need it at
this point.

The killer use-case we've identified is for the scattered I/O associated
with index + heap scans in Postgres. If we can issue ~5-15 I/Os in advance
when the TIDs are widely separated it has the potential to increase the I/O
speed by the number of disks in the tablespace being scanned. At this
point, that pattern will only use one disk.

- Luke

#4Neil Conway
neilc@samurai.com
In reply to: Martijn van Oosterhout (#2)
Re: Asynchronous I/O Support

On Sun, 2006-10-15 at 19:56 +0200, Martijn van Oosterhout wrote:

Sure, I even implemented it once. Didn't get any faster.

Did you just do something akin to s/read/aio_read/ etc., or something
more ambitious? I think that really taking advantage of the ability to
have multiple I/O requests outstanding would take some leg work.

Maybe one of these days I should work out if my current system supports
it, and give it another go...

At least according to [1]http://lse.sourceforge.net/io/aio.html, kernel AIO on Linux still doesn't work for
buffered (i.e. non-O_DIRECT) files. There have been patches available
for quite some time that implement this, but I'm not sure when they are
likely to get into the mainline kernel.

-Neil

[1]: http://lse.sourceforge.net/io/aio.html

#5Martijn van Oosterhout
kleptog@svana.org
In reply to: Neil Conway (#4)
Re: Asynchronous I/O Support

On Sun, Oct 15, 2006 at 02:26:12PM -0400, Neil Conway wrote:

On Sun, 2006-10-15 at 19:56 +0200, Martijn van Oosterhout wrote:

Sure, I even implemented it once. Didn't get any faster.

Did you just do something akin to s/read/aio_read/ etc., or something
more ambitious? I think that really taking advantage of the ability to
have multiple I/O requests outstanding would take some leg work.

Sure. Basically, at certain strategic points in the code there were
extra ReadAsyncBuffer() commands (the IndexScan node and the b-tree
scan code). This command was allowed to do nothing, but if there were
not too many outstanding requests and a buffer was available, it would
allocate a buffer and initiate an AIO request for it.

IIRC there was a table of outstanding requests (I think I originally
allowed up to 32) and when a normal ReadBuffer() found the block had
already been requested, it "waited" on that block.

The principle was that the index-scan node would read a page full of
tids, submit a ReadAsyncBuffer() on each one, and then proceed as
normal. Fairly unintrusive patch all up. ifdeffing it out is safe, and
#defineing ReadAsyncBuffer() away causes the compiler to optimise the
loop away altogether.

The POSIX AIO layer sucks somewhat so it was tricky but it did work.
The hardest part is really how to decide if a buffer currently in the
buffercache is worth more than an asyncronously loaded buffer that may
not be used.

I posted the results ot -hackers some time ago, so you can always try
that.

At least according to [1], kernel AIO on Linux still doesn't work for
buffered (i.e. non-O_DIRECT) files. There have been patches available
for quite some time that implement this, but I'm not sure when they are
likely to get into the mainline kernel.

You can also do it by spawning off threads to do the requests. The
glibc emulation uses threads, but only allows one outstanding request
per file, which makes it useless for our purposes...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#6Merlin Moncure
mmoncure@gmail.com
In reply to: Luke Lonergan (#3)
Re: Asynchronous I/O Support

On 10/15/06, Luke Lonergan <llonergan@greenplum.com> wrote:

Martijn,
The killer use-case we've identified is for the scattered I/O associated
with index + heap scans in Postgres. If we can issue ~5-15 I/Os in advance
when the TIDs are widely separated it has the potential to increase the I/O
speed by the number of disks in the tablespace being scanned. At this
point, that pattern will only use one disk.

did you have a chance to look at posix_fadvise?

merlin

#7Florian Weimer
fweimer@bfk.de
In reply to: Neil Conway (#4)
Re: Asynchronous I/O Support

* Neil Conway:

[1] http://lse.sourceforge.net/io/aio.html

Last Modified Mon, 07 Jun 2004 12:00:09 GMT

But you are right -- it seems that io_submit still blocks without
O_DIRECT. *sigh*

--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Durlacher Allee 47 tel: +49-721-96201-1
D-76131 Karlsruhe fax: +49-721-96201-99

#8Raja Agrawal
raja.agrawal@gmail.com
In reply to: Florian Weimer (#7)
Re: Asynchronous I/O Support

Have a look at this:
[2]: http://www-128.ibm.com/developerworks/linux/library/l-async/

This gives a good description of AIO.

I'm doing some testing. Will notify, if I get any positive results.

Please let me know, if you get any ideas after reading [2]http://www-128.ibm.com/developerworks/linux/library/l-async/.

Regards,
Raja

Show quoted text

On 10/17/06, Florian Weimer <fweimer@bfk.de> wrote:

* Neil Conway:

[1] http://lse.sourceforge.net/io/aio.html

Last Modified Mon, 07 Jun 2004 12:00:09 GMT

But you are right -- it seems that io_submit still blocks without
O_DIRECT. *sigh*

--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Durlacher Allee 47 tel: +49-721-96201-1
D-76131 Karlsruhe fax: +49-721-96201-99

#9NikhilS
nikkhils@gmail.com
In reply to: Luke Lonergan (#3)
Re: Asynchronous I/O Support

Hi,

"bgwriter doing aysncronous I/O for the dirty buffers that it is supposed to
sync"
Another decent use-case?

Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com

On 10/15/06, Luke Lonergan <llonergan@greenplum.com> wrote:

Martijn,

On 10/15/06 10:56 AM, "Martijn van Oosterhout" <kleptog@svana.org> wrote:

Have enough systems actually got to the point of actually supporting
async I/O that it's worth implementing?

I think there are enough high end applications / systems that need it at
this point.

The killer use-case we've identified is for the scattered I/O associated
with index + heap scans in Postgres. If we can issue ~5-15 I/Os in
advance
when the TIDs are widely separated it has the potential to increase the
I/O
speed by the number of disks in the tablespace being scanned. At this
point, that pattern will only use one disk.

- Luke

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
All the world's a stage, and most of us are desperately unrehearsed.

#10Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: NikhilS (#9)
Re: Asynchronous I/O Support

NikhilS wrote:

Hi,

"bgwriter doing aysncronous I/O for the dirty buffers that it is
supposed to sync"
Another decent use-case?

Regards,
Nikhils
EnterpriseDB http://www.enterprisedb.com

On 10/15/06, *Luke Lonergan* <llonergan@greenplum.com
<mailto:llonergan@greenplum.com>> wrote:

Martijn,

On 10/15/06 10:56 AM, "Martijn van Oosterhout" <kleptog@svana.org
<mailto:kleptog@svana.org>> wrote:

Have enough systems actually got to the point of actually supporting
async I/O that it's worth implementing?

I think there are enough high end applications / systems that need it at
this point.

The killer use-case we've identified is for the scattered I/O
associated
with index + heap scans in Postgres. If we can issue ~5-15 I/Os in
advance
when the TIDs are widely separated it has the potential to increase
the I/O
speed by the number of disks in the tablespace being scanned. At this
point, that pattern will only use one disk.

Is it worth considering using readv(2) instead?

Cheers

Mark

#11Martijn van Oosterhout
kleptog@svana.org
In reply to: Mark Kirkwood (#10)
Re: Asynchronous I/O Support

On Wed, Oct 18, 2006 at 08:04:29PM +1300, Mark Kirkwood wrote:

"bgwriter doing aysncronous I/O for the dirty buffers that it is
supposed to sync"
Another decent use-case?

Good idea, but async i/o is generally poorly supported.

Is it worth considering using readv(2) instead?

Err, readv allows you to split a single consecutive read into multiple
buffers. Doesn't help at all for reads on widely areas of a file.

Have a ncie day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#12Bruno Wolff III
bruno@wolff.to
In reply to: Neil Conway (#4)
Re: Asynchronous I/O Support

On Sun, Oct 15, 2006 at 14:26:12 -0400,
Neil Conway <neilc@samurai.com> wrote:

At least according to [1], kernel AIO on Linux still doesn't work for
buffered (i.e. non-O_DIRECT) files. There have been patches available
for quite some time that implement this, but I'm not sure when they are
likely to get into the mainline kernel.

-Neil

[1] http://lse.sourceforge.net/io/aio.html

An improvement is going into 2.6.19 to handle asynchronous vector reads
and writes. This was covered by Linux Weekly News a couple of weeks ago:
http://lwn.net/Articles/201682/

#13NikhilS
nikkhils@gmail.com
In reply to: Martijn van Oosterhout (#11)
Re: Asynchronous I/O Support

Hi,

On 10/18/06, Martijn van Oosterhout <kleptog@svana.org> wrote:

On Wed, Oct 18, 2006 at 08:04:29PM +1300, Mark Kirkwood wrote:

"bgwriter doing aysncronous I/O for the dirty buffers that it is
supposed to sync"
Another decent use-case?

Good idea, but async i/o is generally poorly supported.

Async i/o is stably supported on most *nix (apart from Linux 2.6.*) plus
Windows.
Guess it would be still worth it, since one fine day 2.6.* will start
supporting it properly too.

Regards,
Nikhils

Is it worth considering using readv(2) instead?

Err, readv allows you to split a single consecutive read into multiple
buffers. Doesn't help at all for reads on widely areas of a file.

Have a ncie day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

From each according to his ability. To each according to his ability to

litigate.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFFNhtyIB7bNG8LQkwRApNAAJ9mOhEaFqU59HRCCoJS9k9HCZZl5gCdHDWt
FurlswevGH4CWErsjcWmwVk=
=sQoa
-----END PGP SIGNATURE-----

--
All the world's a stage, and most of us are desperately unrehearsed.

#14Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Bruno Wolff III (#12)
Re: Asynchronous I/O Support

At least according to [1], kernel AIO on Linux still doesn't work

for

buffered (i.e. non-O_DIRECT) files. There have been patches

available

for quite some time that implement this, but I'm not sure when they
are likely to get into the mainline kernel.

-Neil

[1] http://lse.sourceforge.net/io/aio.html

An improvement is going into 2.6.19 to handle asynchronous
vector reads and writes. This was covered by Linux Weekly
News a couple of weeks ago:
http://lwn.net/Articles/201682/

That is orthogonal. We don't really need vector io so much, since we
rely
on OS readahead. We want asyc IO to tell the OS earlier, that we will
need
these random pages, and continue our work in the meantime.
For random IO it is really important to tell the OS and disk subsystem
many pages in parallel so it can optimize head movements and busy more
than
one disk at a time.

Andreas

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Zeugswetter Andreas SB SD (#14)
Re: Asynchronous I/O Support

Zeugswetter Andreas ADI SD wrote:

An improvement is going into 2.6.19 to handle asynchronous
vector reads and writes. This was covered by Linux Weekly
News a couple of weeks ago:
http://lwn.net/Articles/201682/

That is orthogonal. We don't really need vector io so much, since we
rely on OS readahead. We want asyc IO to tell the OS earlier, that we
will need these random pages, and continue our work in the meantime.

Of course, you can use asynchronous vector write with a single entry in
the vector if you want to perform an asynchronous write.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#16Mark Mielke
mark@mark.mielke.cc
In reply to: NikhilS (#13)
Re: [SPAM?] Re: Asynchronous I/O Support

On Fri, Oct 20, 2006 at 11:13:33AM +0530, NikhilS wrote:

Good idea, but async i/o is generally poorly supported.

Async i/o is stably supported on most *nix (apart from Linux 2.6.*) plus
Windows.
Guess it would be still worth it, since one fine day 2.6.* will start
supporting it properly too.

Only if it can be shown that async I/O actually results in an improvement.

Currently, it's speculation, with the one trial implementation showing
little to no improvement. Support is a big word in the face of this
initial evidence... :-)

It's possible that the PostgreSQL design limits the effectiveness of
such things. It's possible that PostgreSQL, having been optimized to not
use features such as these, has found a way of operating better,
contrary to those who believe that async I/O, threads, and so on, are
faster. It's possible that async I/O is supported, but poorly implemented
on most systems.

Take into account that async I/O doesn't guarantee parallel I/O. The
concept of async I/O is that an application can proceed to work on other
items while waiting for scheduled work in the background. This can be
achieved with a background system thread (GLIBC?). There is no requirement
that it actually process the requests in parallel. In fact, any system that
did process the requests in parallel, would be easier to run to a halt.
For example, for the many systems that do not use RAID, we would potentially
end up with scattered reads across the disk all running in parallel, with
no priority on the reads, which could mean that data we do not yet need
is returned first, causing PostgreSQL to be unable to move forwards. If
the process is CPU bound at all, this could be an overall loss.

Point being, async I/O isn't a magic bullet. There is no evidence that it
would improve the situation on any platform.

One would need to consider the PostgreSQL architecture, determine where
the bottleneck actually is, and understand why it is a bottleneck fully,
before one could decide how to fix it. So, what is the bottleneck? Is
PostgreSQL unable to max out the I/O bandwidth? Where? Why?

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#17Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Mark Mielke (#16)
Re: [SPAM?] Re: Asynchronous I/O Support

Good idea, but async i/o is generally poorly supported.

Only if it can be shown that async I/O actually results in an
improvement.

sure.

fix it. So, what is the bottleneck? Is PostgreSQL unable to
max out the I/O bandwidth? Where? Why?

Yup, that would be the scenario where it helps (provided that you have
a smart disk or a disk array and an intelligent OS aio implementation).
It would be used to fetch the data pages pointed at from an index leaf,
or the next level index pages.
We measured the IO bandwidth difference on Windows with EMC as beeing
nearly proportional to parallel outstanding requests up to at least
16-32.

Andreas

#18Mark Mielke
mark@mark.mielke.cc
In reply to: Zeugswetter Andreas SB SD (#17)
Re: [SPAM?] Re: Asynchronous I/O Support

On Fri, Oct 20, 2006 at 05:37:48PM +0200, Zeugswetter Andreas ADI SD wrote:

Yup, that would be the scenario where it helps (provided that you have
a smart disk or a disk array and an intelligent OS aio implementation).
It would be used to fetch the data pages pointed at from an index leaf,
or the next level index pages.
We measured the IO bandwidth difference on Windows with EMC as beeing
nearly proportional to parallel outstanding requests up to at least

Measured it using what? I was under the impression only one
proof-of-implementation existed, and that the scenarios and
configuration of the person who wrote it, did not show significant
improvement.

You have PostgreSQL on Windows with EMC with async I/O support to
test with?

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#19Martijn van Oosterhout
kleptog@svana.org
In reply to: Mark Mielke (#16)
Re: [SPAM?] Re: Asynchronous I/O Support

On Fri, Oct 20, 2006 at 10:05:01AM -0400, mark@mark.mielke.cc wrote:

Only if it can be shown that async I/O actually results in an improvement.

Currently, it's speculation, with the one trial implementation showing
little to no improvement. Support is a big word in the face of this
initial evidence... :-)

Yeah, the single test so far on a system that didn't support
asyncronous I/O doesn't prove anything. It would help if there was a
reasonable system that did support async i/o so it could be tested
properly.

Point being, async I/O isn't a magic bullet. There is no evidence that it
would improve the situation on any platform.

I think it's likely to help with index scan. Prefetching index leaf
pages I think could be good. As would prefectching pages from a
(bitmap) index scan.

It won't help much on very simple queries, but where it should shine is
a merge join across two index scans. Currently postgresql would do
something like:

Loop
Fetch left tuple for join
Fetch btree leaf
Fetch tuple off disk
Fetch right tuples for join
Fetch btree leaf
Fetch tuple off disk

Currently it fetches a block fro one file, then a block from the other,
back and forth. with async i/o you could read from both files and the
indexes simultaneously, thus is theory leading to better i/o
throughput.

One would need to consider the PostgreSQL architecture, determine where
the bottleneck actually is, and understand why it is a bottleneck fully,
before one could decide how to fix it. So, what is the bottleneck? Is
PostgreSQL unable to max out the I/O bandwidth? Where? Why?

For systems where postgresql is unable to saturate the i/o bandwidth,
this is the proposed solution. Are there others?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#19)
Re: [SPAM?] Re: Asynchronous I/O Support

On Fri, Oct 20, 2006 at 10:05:01AM -0400, mark@mark.mielke.cc wrote:

One would need to consider the PostgreSQL architecture, determine where
the bottleneck actually is, and understand why it is a bottleneck fully,
before one could decide how to fix it. So, what is the bottleneck?

I think Mark's point is not being taken sufficiently to heart in this
thread.

It's not difficult at all to think of reasons why attempted read-ahead
could be a net loss. One that's bothering me right at the moment is
that each such request would require a visit to the shared buffer
manager to see if we already have the desired page in buffers. (Unless
you think it'd be cheaper to force the kernel to uselessly read the
page...) Then another visit when we actually need the page. That means
that readahead will double the contention for the buffer manager locks,
which is likely to put us right back into the context swap storm problem
that we've spent the last couple of releases working out of.

So far I've seen no evidence that async I/O would help us, only a lot
of wishful thinking.

regards, tom lane

#21Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#20)
#22Martijn van Oosterhout
kleptog@svana.org
In reply to: Merlin Moncure (#21)
#23Merlin Moncure
mmoncure@gmail.com
In reply to: Martijn van Oosterhout (#22)
#24Bruce Momjian
bruce@momjian.us
In reply to: Martijn van Oosterhout (#22)
#25Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Martijn van Oosterhout (#22)
#26Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Merlin Moncure (#21)
#27Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Mark Mielke (#18)
#28Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Zeugswetter Andreas SB SD (#25)
#29Martijn van Oosterhout
kleptog@svana.org
In reply to: Ron Mayer (#28)
#30NikhilS
nikkhils@gmail.com
In reply to: Martijn van Oosterhout (#29)