pg_fallocate
Hi,
I'l like to add fallocate() system call to improve sequential read/write
peformance. fallocate() system call is different from posix_fallocate()
that is zero-fille algorithm to reserve continues disk space. fallocate()
is almost less overhead alogotithm to reserve continues disk space than
posix_fallocate().
It will be needed by sorted checkpoint and more faster vacuum command in
near the future.
If you get more detail information, please see linux manual.
I go sight seeing in Dublin with Ishii-san now:-)
Regards,
--
Mitsumasa KONDO
NTT Open Source Software
Attachments:
pg_fallocate_v0.patchapplication/octet-stream; name=pg_fallocate_v0.patchDownload+30-1
On Thu, Oct 31, 2013 at 9:16 AM, Mitsumasa KONDO
<kondo.mitsumasa@gmail.com> wrote:
I'l like to add fallocate() system call to improve sequential read/write
peformance. fallocate() system call is different from posix_fallocate() that
is zero-fille algorithm to reserve continues disk space. fallocate() is
almost less overhead alogotithm to reserve continues disk space than
posix_fallocate().It will be needed by sorted checkpoint and more faster vacuum command in
near the future.If you get more detail information, please see linux manual.
I go sight seeing in Dublin with Ishii-san now:-)
Our last attempts to improve performance in this area died in a fire
when it turned out that code that should have been an improvement fell
down over inexplicable ext4 behavior. I think, therefore, that
extensive benchmarking of this or any other proposed approach is
absolutely essential.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/31/13, 9:16 AM, Mitsumasa KONDO wrote:
I'l like to add fallocate() system call to improve sequential read/write
peformance. fallocate() system call is different from posix_fallocate()
that is zero-fille algorithm to reserve continues disk space.
fallocate() is almost less overhead alogotithm to reserve continues disk
space than posix_fallocate().
Your patch seems to be missing a bit that defines HAVE_FALLOCATE,
probably something in configure.in.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 31, 2013 at 01:16:44PM +0000, Mitsumasa KONDO wrote:
--- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -383,6 +383,21 @@ pg_flush_data(int fd, off_t offset, off_t amount) return 0; }+/* + * pg_fallocate --- advise OS that the data pre-allocate continus file segments + * in physical disk. + * + * Not all platforms have fallocate. Some platforms only have posix_fallocate, + * but it ped zero fill to get pre-allocate file segmnets. It is not good + * peformance when extend new segmnets, so we don't use posix_fallocate. + */ +int +pg_fallocate(File file, int flags, off_t offset, off_t nbytes) +{ +#if defined(HAVE_FALLOCATE) + return fallocate(VfdCache[file].fd, flags, offset, nbytes); +#endif +}
You should set errno to ENOSYS and return -1 if HAVE_FALLOCATE isn't
defined.
--- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -24,6 +24,7 @@ #include <unistd.h> #include <fcntl.h> #include <sys/file.h> +#include <linux/falloc.h>
This would have to be wrapped in #ifdef HAVE_FALLOCATE or
HAVE_LINUX_FALLOC_H; if you want to create a wrapper around fallocate() you
should add PG defines for the flags, too. Otherwise it's probably easier to
just call fallocate() directly inside an #ifdef block as you did in xlog.c.
@@ -510,6 +511,10 @@ mdextend(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, * if bufmgr.c had to dump another buffer of the same file to make room * for the new page's buffer. */ + + if(forknum == 1) + pg_fallocate(v->mdfd_vfd, FALLOC_FL_KEEP_SIZE, 0, RELSEG_SIZE); +
Return value should be checked; if it's -1 and errno is something else than
ENOSYS or EOPNOTSUPP the disk space allocation failed and you must return an
error.
/ Oskari
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers