Can not create more than 32766 databases in ufs file system.

Started by fulan Pengover 16 years ago15 messages
#1fulan Peng
fulanpeng@gmail.com

Hi, pgsql-committers!

I cannot created more than 32766 databases with freeBSD in one setup,
not as the document says, "as many as you like".
I found the problem is that the directory pgsql/data/base cannot hold
more than 32766 subdirectories.
I suggest to make 32766 subdirectories in base directory, say /base/0,
/base/1, .... /base/32765. Then in each subdirectory to put the
database description.
This way, we can have 32766x32766 databases. This is kind of "as many
as you like".
The ZFS system is not a solution for me. It is snail slow.

#2Mark Mielke
mark@mark.mielke.cc
In reply to: fulan Peng (#1)
Re: Can not create more than 32766 databases in ufs file system.

Not sure that this really belongs on pgsql-committers - maybe pgsql-hackers?

No matter what scheme PostgreSQL uses for storing the data, there can be
underlying file system limitations. One solution, for example, would be
to use a file system that does not have a limitation of 32k
subdirectories. Although ext3 and/or ufs has this limit - ext4 has
removed this limited.

There are many ways PostgreSQL could work around this problem - your
suggestion of using sub-directories being one of them - but what happens
if this causes performance degradation for existing users, due to the
extra file system lookups required on every access?

Another solution would be to store everything in the same file.

In any case, I think this would be a significant architecture change for
something that sounds like a bad idea. I would expect having 32k
databases to have significant performance degradations in other ways. In
particular, I am thinking about having to open a file descriptor for
each of these files. What sort of database architecture requires 32k
databases or tables for the same PostgreSQL instance? Have you
considered having an additional field for your primary key and combining
several tables into one?

Cheers,
mark

On 09/12/2009 02:49 PM, fulan Peng wrote:

Hi, pgsql-committers!

I cannot created more than 32766 databases with freeBSD in one setup,
not as the document says, "as many as you like".
I found the problem is that the directory pgsql/data/base cannot hold
more than 32766 subdirectories.
I suggest to make 32766 subdirectories in base directory, say /base/0,
/base/1, .... /base/32765. Then in each subdirectory to put the
database description.
This way, we can have 32766x32766 databases. This is kind of "as many
as you like".
The ZFS system is not a solution for me. It is snail slow.

--
Mark Mielke<mark@mielke.cc>

#3Stephen Frost
sfrost@snowman.net
In reply to: Mark Mielke (#2)
Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

(removed -committers)

Mark,

* Mark Mielke (mark@mark.mielke.cc) wrote:

No matter what scheme PostgreSQL uses for storing the data, there can be
underlying file system limitations.

This is true, but there's a reason we only create 1GB files too. I
wouldn't be against a scheme such as described to minimize the impact to
PG of these limitations.

There are many ways PostgreSQL could work around this problem - your
suggestion of using sub-directories being one of them - but what happens
if this causes performance degradation for existing users, due to the
extra file system lookups required on every access?

Ehhh, it's likely to be cached.. Sounds like a stretch to me that this
would actually be a performance hit. If it turns out to really be one,
we could just wait to move to subdirectories until some threshold (eg-
30k) is hit.

Another solution would be to store everything in the same file.

eh?

In any case, I think this would be a significant architecture change for
something that sounds like a bad idea. I would expect having 32k
databases to have significant performance degradations in other ways.

Actually, I think some of the changes to remove flatfiles might improve
our performance with large numbers of databases. I also don't see how
this would be a significant architecture change at all. If there are
still issues that make having lots of databases slow, we might want to
look into fixing those issues rather than saying "well, just don't do
that".

In
particular, I am thinking about having to open a file descriptor for
each of these files. What sort of database architecture requires 32k
databases or tables for the same PostgreSQL instance? Have you
considered having an additional field for your primary key and combining
several tables into one?

I've got a ton of instances that have >32K tables. My approach is
generally to keep the number of databases low, while having lots of
schemas, but there are distinct downsides to that (specifically related
to hiding information.. something alot of people care about, but
thankfully I don't have to).

thanks,

Stephen

#4Stephen Frost
sfrost@snowman.net
In reply to: Stephen Frost (#3)
Re: Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

* Stephen Frost (sfrost@snowman.net) wrote:

Ehhh, it's likely to be cached.. Sounds like a stretch to me that this
would actually be a performance hit. If it turns out to really be one,
we could just wait to move to subdirectories until some threshold (eg-
30k) is hit.

Thinking this through a bit more, I realized that I didn't explain my
thought here very well. My idea would be-
do everything as we do now, until we hit a threshold (perhaps an easy
one would be '10000').
Once we hit the threshold, create a subdirectory first and then the new
database directory in that. eg:

0/
1/
2/
3/
4/
[...]
9999/
00001/0/
00001/1/
00001/2/
00001/3/
00001/[...]
00001/9999/
00002/0/
00002/1/
00002/2/
00002/[...]
00002/9999/
00003/0/
00003/1/
[...]
09999/0/
09999/1/
09999/[...]
09999/9999/

This would allow for 220M+ databases. I'm not sure how bad it'd be to
introduce another field to pg_database which provides the directory (as
it'd now be distinct from the oid..) or if that might require alot of
changes. Not sure how easy it'd be to implement something to address
this problem while we continue to tie the directory name to the oid.

Thanks,

Stephen

#5Mark Mielke
mark@mark.mielke.cc
In reply to: Stephen Frost (#3)
Re: Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

On 09/12/2009 03:33 PM, Stephen Frost wrote:

* Mark Mielke (mark@mark.mielke.cc) wrote:

No matter what scheme PostgreSQL uses for storing the data, there can be
underlying file system limitations.

This is true, but there's a reason we only create 1GB files too. I
wouldn't be against a scheme such as described to minimize the impact to
PG of these limitations.

Ok - but "minimizing the impact" does not necessarily mean "keep doing
what we are doing, but work around the issue." One interpretation of the
problem is that the problem is that PostgreSQL is trying to use too many
sub-directories in the same containing directory. I would argue that the
problem is that PostgreSQL requires so many sub-directories in the first
place.

There are many database designs that do not require one file per
database. Berkeley DB JE, for instance, treats each "database" as one
root in a larger tree. The entire database is stored in one set of
files, where the files are created due to database volume, not database
quantity. Tables can be thought of similarly.

There are many ways PostgreSQL could work around this problem - your
suggestion of using sub-directories being one of them - but what happens
if this causes performance degradation for existing users, due to the
extra file system lookups required on every access?

Ehhh, it's likely to be cached.. Sounds like a stretch to me that this
would actually be a performance hit. If it turns out to really be one,
we could just wait to move to subdirectories until some threshold (eg-
30k) is hit.

Cached does not eliminate the cost. It just means it doesn't have to go
to disk. It still needs to traverse an additional level of the VFS tree.
Sure, this is designed to be cheap - but this avoids the real cost from
consideration - that of having so many subdirectories in the first place.

Another solution would be to store everything in the same file.

eh?

There is no technical requirement for PostgreSQL to separate data in
databases or tables on subdirectory or file boundaries. Nothing wrong
with having one or more large files that contain everything. PostgreSQL
doesn't happen to do this today - but it's bothered me at times that it
has so many files in the database directory - even very small tables
require their own files.

In any case, I think this would be a significant architecture change for
something that sounds like a bad idea. I would expect having 32k
databases to have significant performance degradations in other ways.

Actually, I think some of the changes to remove flatfiles might improve
our performance with large numbers of databases. I also don't see how
this would be a significant architecture change at all. If there are
still issues that make having lots of databases slow, we might want to
look into fixing those issues rather than saying "well, just don't do
that".

I guess I'm not seeing how using 32k tables is a sensible model. So yes,
things can be done to reduce the cost - but it seems like something is
wrong if this is truly a requirement. There are alternative models of
storage that would not require 32k tables, that likely perform better.
Although, I don't know your requirements, so perhaps I am missing something.

In
particular, I am thinking about having to open a file descriptor for
each of these files. What sort of database architecture requires 32k
databases or tables for the same PostgreSQL instance? Have you
considered having an additional field for your primary key and combining
several tables into one?

I've got a ton of instances that have>32K tables. My approach is
generally to keep the number of databases low, while having lots of
schemas, but there are distinct downsides to that (specifically related
to hiding information.. something alot of people care about, but
thankfully I don't have to).

Do you agree with me that having 32k open file descriptors (or worse,
open on demand file descriptors that need to be re-opened many times) is
a problem?

Looking at PostgreSQL today - I don't think it's designed to scale to
this. Looking at SQL today, I think I would find it difficult to justify
creating a solution that requires this capability.

Honestly - it seems a bit insane. Sorry. :-) Maybe I'm just naive...

Cheers,
mark

--
Mark Mielke<mark@mielke.cc>

#6Mark Mielke
mark@mark.mielke.cc
In reply to: Stephen Frost (#4)
Re: Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

On 09/12/2009 03:48 PM, Stephen Frost wrote:

This would allow for 220M+ databases. I'm not sure how bad it'd be to
introduce another field to pg_database which provides the directory (as
it'd now be distinct from the oid..) or if that might require alot of
changes. Not sure how easy it'd be to implement something to address
this problem while we continue to tie the directory name to the oid.

Other than bragging rights - what part of this would be a GOOD thing? :-)

My God - I thought 32k databases in the same directory was insane.
220M+???????

Hehehe...

If you can patch PostgreSQL to support such extremes without hurting my
performance - I'll shut up and leave you be. :-)

Cheers,
mark

--
Mark Mielke<mark@mielke.cc>

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#3)
Re: Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

Stephen Frost <sfrost@snowman.net> writes:

* Mark Mielke (mark@mark.mielke.cc) wrote:

No matter what scheme PostgreSQL uses for storing the data, there can be
underlying file system limitations.

This is true, but there's a reason we only create 1GB files too. I
wouldn't be against a scheme such as described to minimize the impact to
PG of these limitations.

There are plenty of filesystems available that do not have this silly
limitation, so I don't see a reason for us to work around it. If the
OP is on a platform that only offers UFS and ZFS, and he doesn't like
either of those, maybe he should find another platform.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Mielke (#6)
Re: Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

Mark Mielke <mark@mark.mielke.cc> writes:

My God - I thought 32k databases in the same directory was insane.
220M+???????

Considering that the system catalogs alone occupy about 5MB per
database, that would require an impressive amount of storage...

In practice I think users would be complaining about our choice
to instantiate the catalogs per-database a lot sooner than they'd
hit the subdirectory-count limit.

BTW, there is another avenue that the OP could look into if he
really wants this many databases on a UFS filesystem: split them up
into multiple tablespaces.

regards, tom lane

#9Stephen Frost
sfrost@snowman.net
In reply to: Mark Mielke (#5)
Re: Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

* Mark Mielke (mark@mark.mielke.cc) wrote:

There is no technical requirement for PostgreSQL to separate data in
databases or tables on subdirectory or file boundaries. Nothing wrong
with having one or more large files that contain everything.

Uhh, except where you run into system limitations on file size (eg- a 2G
max file size..). You'll note PG creates files up to 1G and then splits
them into separate files. It's not done just because it's fun.

I guess I'm not seeing how using 32k tables is a sensible model.

For one thing, there's partitioning. For another, there's a large user
base. 32K tables is, to be honest, not all that many, especially for
some of these databases which reach into the multi-TB range..

So yes,
things can be done to reduce the cost - but it seems like something is
wrong if this is truly a requirement.

I have no idea what you've been working with, but I hardly think it
makes sense for PG to consider over 32k tables as not worth supporting.

There are alternative models of
storage that would not require 32k tables, that likely perform better.

Eh? You would advocate combining tables for no reason other than you
think it's bad to have alot?

Do you agree with me that having 32k open file descriptors (or worse,
open on demand file descriptors that need to be re-opened many times) is
a problem?

Nope.

Looking at PostgreSQL today - I don't think it's designed to scale to
this. Looking at SQL today, I think I would find it difficult to justify
creating a solution that requires this capability.

Actually, I find that PG handles it pretty well. And we used to be an
Oracle shop.

Thanks,

Stephen

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#9)
Re: Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

Stephen Frost <sfrost@snowman.net> writes:

* Mark Mielke (mark@mark.mielke.cc) wrote:

I guess I'm not seeing how using 32k tables is a sensible model.

For one thing, there's partitioning. For another, there's a large user
base. 32K tables is, to be honest, not all that many, especially for
some of these databases which reach into the multi-TB range..

I believe the filesystem limit the OP is hitting is on the number of
*subdirectories* per directory, not on the number of plain files.
If we had a hard limit at 32K tables many people would have hit it
before now.

So the question I would ask goes more like "do you really need 32K
databases in one installation? Have you considered using schemas
instead?" Databases are, by design, pretty heavyweight objects.

regards, tom lane

#11Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#10)
Re: Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

I believe the filesystem limit the OP is hitting is on the number of
*subdirectories* per directory, not on the number of plain files.

Right, I'm not entirely sure how we got onto the question of number of
tables.

So the question I would ask goes more like "do you really need 32K
databases in one installation? Have you considered using schemas
instead?" Databases are, by design, pretty heavyweight objects.

I agree, but at the same time, we might: a) update our documentation to
indicate it depends on the filesystem, and b) consider how we might
work around this limit (and if we feel the effort to be worth it).

Thanks,

Stephen

#12Mark Mielke
mark@mark.mielke.cc
In reply to: Stephen Frost (#9)
Re: Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

On 09/12/2009 04:17 PM, Stephen Frost wrote:

* Mark Mielke (mark@mark.mielke.cc) wrote:

There is no technical requirement for PostgreSQL to separate data in
databases or tables on subdirectory or file boundaries. Nothing wrong
with having one or more large files that contain everything.

Uhh, except where you run into system limitations on file size (eg- a 2G
max file size..). You'll note PG creates files up to 1G and then splits
them into separate files. It's not done just because it's fun.

This becomes a bit of a side thread - but note that I carefully didn't
say "exactly one file". I said "one or more large files that contain
everything". That is, if we have 3 databases each of size 50 Mbytes,
there is no technical reason why this cannot be stored within a single
150 Mbyte data file. Sure, if it goes beyond 2G, we can break it into a
set of files, and treat each file as a 2G "block" in a virtual larger
storage pool. VMWare has this for storing virtual drives.

If we assume that 32k *databases* is reasonable for a single instance,
for 32k databases to *require* 32k immediate sub-directories is the real
problem. This can be solved either by: 1) Adding additional depth to the
directory height to work around this limit (what the OP and you are
proposing), or 2) Storing multiple databases within the same files or
sub-directories. If you really must have this amount of scalability, I
am suggesting that you consider all of the resources required to access
32k worth of sub-directories in the file systems, specifically including
file descriptors, inodes, the backing bitmaps or extent mappings that
allocate from the file system free space, the rather inefficient
directory layouts of many file systems (many file systems still do
LINEAR searches for filenames, making file lookups linearly slower as
the directory becomes larger), and the kernel memory caches that track
all of these little details. The POSIX guarantees required are certainly
more heavy weight than the requirements that PostgreSQL has, and I am
certain it is possible to create a targetted solution to this problem
that is simpler and faster. For only a few databases and a few files,
the effort isn't worth it. But, if supporting 32k+ *databases*, or even
32k+ tables and indexes is a major requirement, and a major design
target, then PostgreSQL should do this stuff itself.

Modern file systems don't have the 2G problem. ext2/ext3 for 4Kbyte
blocks (standard) supports up to 2Tbytes. This also matches the
practical limit on addressing a single physical disk, at least on the
platforms I am familiar with. The requirement to stay under 2G for a
single file is a bit out dated.

I guess I'm not seeing how using 32k tables is a sensible model.

For one thing, there's partitioning. For another, there's a large user
base. 32K tables is, to be honest, not all that many, especially for
some of these databases which reach into the multi-TB range..

Talking philosophically - the need to use table-based partitioning to
achieve acceptable performance or storage requirements is somewhat of a
hacky work around. It's effectively moving the database query logic back
into the application space, where the application must know which tables
contain which data. The inherited tables and automatic constraint-based
query planning helps out, but it's still an elaborate hack. It's
exposing data that the application should not need to care about, and
then making it possible to hide some of it again. Table partitioning
should be far more automatic. I don't want to break my theoretical table
containing every call made on my network into per-hour tables, each with
a constraint for the time range it includes data for. I want to create a
table, with a timestamp column, fill it with billions of records,
provide a few hints, and the database engine should be smart enough to
partition the table such that my queries "just work".

Back to reality - maybe things have not reached this level of maturity
yet, and people with practical requirements today, have found that they
need to use very complex manual partitioning schemes that chew up
thousands of tables.

So yes,
things can be done to reduce the cost - but it seems like something is
wrong if this is truly a requirement.

I have no idea what you've been working with, but I hardly think it
makes sense for PG to consider over 32k tables as not worth supporting.

I don't advocate any limits. However, I also don't advocate designing
PostgreSQL specifically for the case of 32k tables. If you want to use
32k tables, then you better have a file system that supports 32k+ files
in a single directory, and a kernel that is able to work efficiently
when postgres has thousands or more file descriptors open and in use at
the same time. The system *supports* 32k tables, but if you look at the
design, you'll see that it is not optimal for 32k tables. Even with the
changes to reduce the use of flat files and such, it's still not a
design that makes 32k tables optimal. If you want to create 32k tables -
I don't want to stop you.

But, I'm not going to agree that this is optimal or that PostgreSQL
should be put in excessive effort to make it optimal. :-)

Specifically with regard to my comments about creating a truly scalable
system that would support 32k databases - I don't think it's worth it. I
know it is possible, but I don't want to see effort put into it.

There are alternative models of
storage that would not require 32k tables, that likely perform better.

Eh? You would advocate combining tables for no reason other than you
think it's bad to have alot?

"likely perform better" is not "no reason".

Do you agree with me that having 32k open file descriptors (or worse,
open on demand file descriptors that need to be re-opened many times) is
a problem?

Nope.

If you don't think it represents a scaleable problem, then I think it
puts your ability to judge the situation in question. :-)

Looking at PostgreSQL today - I don't think it's designed to scale to
this. Looking at SQL today, I think I would find it difficult to justify
creating a solution that requires this capability.

Actually, I find that PG handles it pretty well. And we used to be an
Oracle shop.

Everything is relative. Throw enough RAM and CPU at the problem, and
things can appear acceptable and scaleable. Hit a limit, and ... oops

You do remember we're in a thread talking about how the # of databases
is limited on some systems, right? :-)

"Pretty well" means "good enough for you". It means nothing more or less.

Cheers,
mark

--
Mark Mielke<mark@mielke.cc>

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#10)
Re: Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

Tom Lane wrote:

So the question I would ask goes more like "do you really need 32K
databases in one installation? Have you considered using schemas
instead?" Databases are, by design, pretty heavyweight objects.

That's a fair question. OTOH, devising a scheme to get around it would
not be terribly difficult, would it? I can imagine a scheme where the
subdir for a database was lo/hi for some division of the database oid. I
guess it could make matters ugly for pg_migrator, though.

cheers

andrew

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#13)
Re: Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

So the question I would ask goes more like "do you really need 32K
databases in one installation? Have you considered using schemas
instead?" Databases are, by design, pretty heavyweight objects.

That's a fair question. OTOH, devising a scheme to get around it would
not be terribly difficult, would it? I can imagine a scheme where the
subdir for a database was lo/hi for some division of the database oid. I
guess it could make matters ugly for pg_migrator, though.

As I said earlier, the number-of-subdirectories issue is not the
important thing. The OP was already up to 160GB worth of system
catalogs before his filesystem wimped out, and would be needing
terabytes if he wanted to go significantly past the filesystem limit.
So there is no point in devising some clever workaround for the
limitations of one filesystem unless you want to reconsider our system
catalog representation --- and that will carry actual user-visible
functional costs; it's not just a cute hack somewhere in the guts of
the system.

regards, tom lane

#15Josh Berkus
josh@agliodbs.com
In reply to: Stephen Frost (#11)
Re: Re: [COMMITTERS] Can not create more than 32766 databases in ufs file system.

So the question I would ask goes more like "do you really need 32K
databases in one installation? Have you considered using schemas
instead?" Databases are, by design, pretty heavyweight objects.

I agree, but at the same time, we might: a) update our documentation to
indicate it depends on the filesystem, and b) consider how we might
work around this limit (and if we feel the effort to be worth it).

I don't feel it's worth the effort.

I can think of lots of hosted application configurations where one might
need 33K tables. Note that PostgreSQL *already* handles this better
than Oracle or MySQL do -- I know at least one case where our ability to
handle large numbers of tables was a reason for migration from Oracle to
PostgreSQL.

However, I can think of no legitimate reason to need 33K active
databases in a single instance. I think someone has confused databases
with schema ... or even with tables. Filemaker developer, maybe? Or
maybe it 10 active databases and 32.99K archive ones ... in which case
they should be dumped to compressed backup and dropped.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com