Bottlenecks with large number of relation segment files

Started by Amit Langoteover 12 years ago15 messages
#1Amit Langote
amitlangote09@gmail.com

Hello,

I am looking the effect of having large number of relation files under
$PGDATA/base/ (for example, in cases where I choose lower segment size
using --with-segsize). Consider a case where I am working with a large
database with large relations, for example a database similar in size
to what "pgbench -i -s 3500" would be.

May the routines in fd.c become bottleneck with a large number of
concurrent connections to above database, say something like "pgbench
-j 8 -c 128"? Is there any other place I should be paying attention
to?

--
Amit Langote

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Amit Langote (#1)
Re: [GENERAL] Bottlenecks with large number of relation segment files

Hi Amit,

(2013/08/05 15:23), Amit Langote wrote:

May the routines in fd.c become bottleneck with a large number of
concurrent connections to above database, say something like "pgbench
-j 8 -c 128"? Is there any other place I should be paying attention
to?

What kind of file system did you use?

When we open file, ext3 or ext4 file system seems to sequential search inode for
opening file in file directory.
And PostgreSQL limit FD 1000 per process. It seems too small.
Please change src/backend/storage/file/fd.c at "max_files_per_process = 1000;"
If we rewrite it, We can change limit of FD per process. I have already created
fix-patch about this problem in postgresql.conf, and will submit next CF.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Amit Langote
amitlangote09@gmail.com
In reply to: KONDO Mitsumasa (#2)
Re: [GENERAL] Bottlenecks with large number of relation segment files

On Mon, Aug 5, 2013 at 5:01 PM, KONDO Mitsumasa
<kondo.mitsumasa@lab.ntt.co.jp> wrote:

Hi Amit,

(2013/08/05 15:23), Amit Langote wrote:

May the routines in fd.c become bottleneck with a large number of
concurrent connections to above database, say something like "pgbench
-j 8 -c 128"? Is there any other place I should be paying attention
to?

What kind of file system did you use?

When we open file, ext3 or ext4 file system seems to sequential search inode
for opening file in file directory.
And PostgreSQL limit FD 1000 per process. It seems too small.
Please change src/backend/storage/file/fd.c at "max_files_per_process =
1000;"
If we rewrite it, We can change limit of FD per process. I have already
created fix-patch about this problem in postgresql.conf, and will submit
next CF.

Thank you for replying Kondo-san.
The file system is ext4.
So, within the limits of max_files_per_process, the routines of file.c
should not become a bottleneck?

--
Amit Langote

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4John R Pierce
pierce@hogranch.com
In reply to: KONDO Mitsumasa (#2)
Re: Bottlenecks with large number of relation segment files

On 8/5/2013 1:01 AM, KONDO Mitsumasa wrote:

When we open file, ext3 or ext4 file system seems to sequential search
inode for opening file in file directory.

no, ext3/4 uses H-tree structures to search directories over 1 block
long quite efficiently.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Amit Langote (#3)
Re: Bottlenecks with large number of relation segment files

(2013/08/05 17:14), Amit Langote wrote:

So, within the limits of max_files_per_process, the routines of file.c
should not become a bottleneck?

It may not become bottleneck.
1 FD consumes 160 byte in 64bit system. See linux manual at "epoll".

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Andres Freund
andres@2ndquadrant.com
In reply to: KONDO Mitsumasa (#5)
Re: Bottlenecks with large number of relation segment files

On 2013-08-05 18:40:10 +0900, KONDO Mitsumasa wrote:

(2013/08/05 17:14), Amit Langote wrote:

So, within the limits of max_files_per_process, the routines of file.c
should not become a bottleneck?

It may not become bottleneck.
1 FD consumes 160 byte in 64bit system. See linux manual at "epoll".

That limit is about max_user_watches, not the general cost of an
fd. Afair they take up a a good more than that. Also, there are global
limits to the amount of filehandles that can simultaneously opened on a
system.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Florian Weimer
fweimer@redhat.com
In reply to: John R Pierce (#4)
Re: Bottlenecks with large number of relation segment files

On 08/05/2013 10:42 AM, John R Pierce wrote:

On 8/5/2013 1:01 AM, KONDO Mitsumasa wrote:

When we open file, ext3 or ext4 file system seems to sequential search
inode for opening file in file directory.

no, ext3/4 uses H-tree structures to search directories over 1 block
long quite efficiently.

And the Linux dentry cache is rather aggressive, so most of the time,
only the in-memory hash table will be consulted. (The dentry cache only
gets flushed on severe memory pressure.)

--
Florian Weimer / Red Hat Product Security Team

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: [GENERAL] Bottlenecks with large number of relation segment files

Andres Freund <andres@2ndquadrant.com> writes:

... Also, there are global
limits to the amount of filehandles that can simultaneously opened on a
system.

Yeah. Raising max_files_per_process puts you at serious risk that
everything else on the box will start falling over for lack of available
FD slots. (PG itself tends to cope pretty well, since fd.c knows it can
drop some other open file when it gets EMFILE.) We more often have to
tell people to lower that limit than to raise it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Andres Freund (#6)
Re: [GENERAL] Bottlenecks with large number of relation segment files

(2013/08/05 19:28), Andres Freund wrote:

On 2013-08-05 18:40:10 +0900, KONDO Mitsumasa wrote:

(2013/08/05 17:14), Amit Langote wrote:

So, within the limits of max_files_per_process, the routines of file.c
should not become a bottleneck?

It may not become bottleneck.
1 FD consumes 160 byte in 64bit system. See linux manual at "epoll".

That limit is about max_user_watches, not the general cost of an
fd. Afair they take up a a good more than that.

OH! It's my mistake... I retry to read about FD in linux manual at "proc".
It seems that a process having FD can see in /proc/[pid]/fd/.
And it seems symbolic link and consume 64byte memory per FD.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Tom Lane (#8)
Re: [GENERAL] Bottlenecks with large number of relation segment files

(2013/08/05 21:23), Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

... Also, there are global
limits to the amount of filehandles that can simultaneously opened on a
system.

Yeah. Raising max_files_per_process puts you at serious risk that
everything else on the box will start falling over for lack of available
FD slots.

Is it Really? When I use hadoop like NOSQL storage, I set large number of FD.
Actually, Hadoop Wiki is writing following.

http://wiki.apache.org/hadoop/TooManyOpenFiles

Too Many Open Files

You can see this on Linux machines in client-side applications, server code or even in test runs.
It is caused by per-process limits on the number of files that a single user/process can have open, which was introduced in the 2.6.27 kernel. The default value, 128, was chosen because "that should be enough".

In Hadoop, it isn't.

~

ulimit -n 8192

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Florian Weimer (#7)
Re: Bottlenecks with large number of relation segment files

(2013/08/05 20:38), Florian Weimer wrote:

On 08/05/2013 10:42 AM, John R Pierce wrote:

On 8/5/2013 1:01 AM, KONDO Mitsumasa wrote:

When we open file, ext3 or ext4 file system seems to sequential search
inode for opening file in file directory.

no, ext3/4 uses H-tree structures to search directories over 1 block
long quite efficiently.

And the Linux dentry cache is rather aggressive, so most of the time, only the
in-memory hash table will be consulted. (The dentry cache only gets flushed on
severe memory pressure.)

Are you really? When I put large number of files in same directory and open,
it is very very slow. But open directory is not.
So I think it's only directory search. Not file search in same directory. And I
heard before it.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Andres Freund
andres@2ndquadrant.com
In reply to: KONDO Mitsumasa (#10)
Re: [GENERAL] Bottlenecks with large number of relation segment files

On 2013-08-06 19:19:41 +0900, KONDO Mitsumasa wrote:

(2013/08/05 21:23), Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

... Also, there are global
limits to the amount of filehandles that can simultaneously opened on a
system.

Yeah. Raising max_files_per_process puts you at serious risk that
everything else on the box will start falling over for lack of available
FD slots.

Is it Really? When I use hadoop like NOSQL storage, I set large number of FD.
Actually, Hadoop Wiki is writing following.

http://wiki.apache.org/hadoop/TooManyOpenFiles

Too Many Open Files

You can see this on Linux machines in client-side applications, server code or even in test runs.
It is caused by per-process limits on the number of files that a single user/process can have open, which was introduced in the 2.6.27 kernel. The default value, 128, was chosen because "that should be enough".

The first paragraph (which you're quoting with 128) is talking about
epoll which we don't use. The second paragraph indeed talks about the
max numbers of fds. Of *one* process.
Postgres uses a *process* based model. So, max_files_per_process is about
the the number of fds in a single backend. You need to multiply it by
max_connections + a bunch to get to the overall number of FDs.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Andres Freund (#12)
Re: [HACKERS] Bottlenecks with large number of relation segment files

(2013/08/06 19:33), Andres Freund wrote:

On 2013-08-06 19:19:41 +0900, KONDO Mitsumasa wrote:

(2013/08/05 21:23), Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

... Also, there are global
limits to the amount of filehandles that can simultaneously opened on a
system.

Yeah. Raising max_files_per_process puts you at serious risk that
everything else on the box will start falling over for lack of available
FD slots.

Is it Really? When I use hadoop like NOSQL storage, I set large number of FD.
Actually, Hadoop Wiki is writing following.

http://wiki.apache.org/hadoop/TooManyOpenFiles

Too Many Open Files

You can see this on Linux machines in client-side applications, server code or even in test runs.
It is caused by per-process limits on the number of files that a single user/process can have open, which was introduced in the 2.6.27 kernel. The default value, 128, was chosen because "that should be enough".

The first paragraph (which you're quoting with 128) is talking about
epoll which we don't use. The second paragraph indeed talks about the
max numbers of fds. Of *one* process.

Yes. I have been already understood like that.

Postgres uses a *process* based model. So, max_files_per_process is about
the the number of fds in a single backend. You need to multiply it by
max_connections + a bunch to get to the overall number of FDs.

Yes, too. I think max_file_per_process seems too small. In NoSQL, it was
recommended large number of FD. However, I do not know whether it is really
enough in PostgreSQL. If we use PostgreSQL with big data, we might need to change
max_file_per_process, I think.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14Florian Weimer
fweimer@redhat.com
In reply to: KONDO Mitsumasa (#11)
Re: Bottlenecks with large number of relation segment files

On 08/06/2013 12:28 PM, KONDO Mitsumasa wrote:

(2013/08/05 20:38), Florian Weimer wrote:

On 08/05/2013 10:42 AM, John R Pierce wrote:

On 8/5/2013 1:01 AM, KONDO Mitsumasa wrote:

When we open file, ext3 or ext4 file system seems to sequential search
inode for opening file in file directory.

no, ext3/4 uses H-tree structures to search directories over 1 block
long quite efficiently.

And the Linux dentry cache is rather aggressive, so most of the time,
only the
in-memory hash table will be consulted. (The dentry cache only gets
flushed on
severe memory pressure.)

Are you really? When I put large number of files in same directory and
open, it is very very slow. But open directory is not.

The first file name resolution is slow, but subsequent resolutions
typically happen from the dentry cache. (The cache is not populated
when the directory is opened.)

--
Florian Weimer / Red Hat Product Security Team

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15KONDO Mitsumasa
kondo.mitsumasa@lab.ntt.co.jp
In reply to: Florian Weimer (#14)
Re: Bottlenecks with large number of relation segment files

(2013/08/06 20:19), Florian Weimer wrote:

The first file name resolution is slow, but subsequent resolutions typically
happen from the dentry cache. (The cache is not populated when the directory is
opened.)

I see. I understand why ext file system is slow when we put large number of files.
Thank you for your advise!

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general