MySQL file system

Started by Ned Lillyabout 25 years ago26 messagesgeneral
Jump to latest
#1Ned Lilly
ned@greatbridge.com

Anyone heard about this?

http://no.spam.ee/~tonu/mysqlfs.html

--
----------------------------------------------------
Ned Lilly e: ned@greatbridge.com
Vice President w: www.greatbridge.com
Evangelism / Hacker Relations v: 757.233.5523
Great Bridge, LLC f: 757.233.5555

#2Brett W. McCoy
bmccoy@chapelperilous.net
In reply to: Ned Lilly (#1)
Re: MySQL file system

On Tue, 16 Jan 2001, Ned Lilly wrote:

Anyone heard about this?

http://no.spam.ee/~tonu/mysqlfs.html

That only confirms the criticisms that some have leveled at MySQL of being
an SQL interface to a filesystem and not a true relational database
management system.

-- Brett
http://www.chapelperilous.net/~bmccoy/
---------------------------------------------------------------------------
Boys, you have ALL been selected to LEAVE th' PLANET in 15 minutes!!

#3Joseph Shraibman
jks@selectacast.net
In reply to: Ned Lilly (#1)
Re: MySQL file system

Karl DeBisschop wrote:

Ned Lilly wrote:

Anyone heard about this?

http://no.spam.ee/~tonu/mysqlfs.html

I was bout to copy about 3000 RFCs onto my hard drive last night.
Bowsing them online was just too painfully slow.

Then I realized that although I had the disk space for the total file
size, they were mostly smaller than a block on my disk, so I was going
to loose alot of space.

My first thought was, well postgress would stor that as one file - that
might work. Then I thought "I'm trying to be FAST here -- any time spent
on this is a distraction. Too bad there is no such thing as
PostgreSQL/FS"

So what is needed it to be able to mount a .zip or a .tar file as a file
system with the loopback device. Mounting a database on the filesystem
sounds pretty useless to me.

--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio. http://www.targabot.com

#4Alfred Perlstein
bright@wintelcom.net
In reply to: Joseph Shraibman (#3)
Re: MySQL file system

* Joseph Shraibman <jks@selectacast.net> [010116 19:29] wrote:

Karl DeBisschop wrote:

Ned Lilly wrote:

Anyone heard about this?

http://no.spam.ee/~tonu/mysqlfs.html

I was bout to copy about 3000 RFCs onto my hard drive last night.
Bowsing them online was just too painfully slow.

Then I realized that although I had the disk space for the total file
size, they were mostly smaller than a block on my disk, so I was going
to loose alot of space.

My first thought was, well postgress would stor that as one file - that
might work. Then I thought "I'm trying to be FAST here -- any time spent
on this is a distraction. Too bad there is no such thing as
PostgreSQL/FS"

So what is needed it to be able to mount a .zip or a .tar file as a file
system with the loopback device. Mounting a database on the filesystem
sounds pretty useless to me.

Raw disk access allows:

1) removing the double buffering problem associated with read/write + shm
2) your own filesystem semantics and explicit block flushing versus whole
file flushing (fsync)
3) removing layers of abstraction, you loose the FS abstraction and allow
the database abstraction to take over which shortens the code path
and gives complete control over the data to the database.

However it limits:

1) doing backups, you must either have a utility that can read the
'dbfs' or use 'dd' to copy the entire disk.
(this is ok, we have pg_dump)
2) taking advantage of built in filesystem advantages like snapshots,
logging and other advanced features.
(only things i can think of is snapshots to make an immediate backup
but then again we have pg_dump as well as filesystems that can grow)
3) portability, not all OS's support character devices, the ones that
don't can't safely guarantee data write ordering.
4) development time, as the database now has a filesystem module
to maintain.

I'm sure there's a few points that I missed but there are some
advantages, Oracle does or did have a "raw disk" mode.

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#5Bruce Momjian
bruce@momjian.us
In reply to: Alfred Perlstein (#4)
Re: MySQL file system

So what is needed it to be able to mount a .zip or a .tar file as a file
system with the loopback device. Mounting a database on the filesystem
sounds pretty useless to me.

Raw disk access allows:

1) removing the double buffering problem associated with read/write + shm
2) your own filesystem semantics and explicit block flushing versus whole
file flushing (fsync)
3) removing layers of abstraction, you loose the FS abstraction and allow
the database abstraction to take over which shortens the code path
and gives complete control over the data to the database.

However it limits:

1) doing backups, you must either have a utility that can read the
'dbfs' or use 'dd' to copy the entire disk.
(this is ok, we have pg_dump)
2) taking advantage of built in filesystem advantages like snapshots,
logging and other advanced features.
(only things i can think of is snapshots to make an immediate backup
but then again we have pg_dump as well as filesystems that can grow)
3) portability, not all OS's support character devices, the ones that
don't can't safely guarantee data write ordering.
4) development time, as the database now has a filesystem module
to maintain.

I'm sure there's a few points that I missed but there are some
advantages, Oracle does or did have a "raw disk" mode.

I think the conclusion we came up to in the past is that raw database
access was important in the 80's with older file systems, but modern
ones don't need raw access, and most database vendors are moving away
from recommanding raw access.

[ FYI, Linux ext2 is not a modern file system.]

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#6Karl DeBisschop
karl@debisschop.net
In reply to: Bruce Momjian (#5)
Re: MySQL file system

Bruce Momjian wrote:

So what is needed it to be able to mount a .zip or a .tar file as a file
system with the loopback device. Mounting a database on the filesystem
sounds pretty useless to me.

Which can be done with cramfs on Linux. Still could be a cute trick with PostgreSQL that could be somewhat handy from time to time.

<snip>

I'm sure there's a few points that I missed but there are some
advantages, Oracle does or did have a "raw disk" mode.

I think the conclusion we came up to in the past is that raw database
access was important in the 80's with older file systems, but modern
ones don't need raw access, and most database vendors are moving away
from recommanding raw access.

I think this was a different sense of 'raw access' - it meant 'allow the user to see a database as if it were a filesystem' if I read the post correctly. Not the same issue, I think.

--
Karl DeBisschop kdebisschop@alert.infoplease.com
Learning Network/Information Please http://www.infoplease.com
Netsaint Plugin Developer kdebisschop@users.sourceforge.net

#7J.H.M. Dassen (Ray)
jdassen@cistron-office.nl
In reply to: Joseph Shraibman (#3)
Re: MySQL file system

On Tue, Jan 16, 2001 at 22:21:19 -0500, Joseph Shraibman wrote:

So what is needed it to be able to mount a .zip or a .tar file as a file
system with the loopback device.

For Linux, there used to be a "userfs" that allowed this kind of thing.
AFAIK development stopped quite some time ago, as there were numerous
problems with deadlocks.

Mounting a database on the filesystem sounds pretty useless to me.

Depends on the database, I guess. There is a succesful configuration
management system that uses a database as its backend and exports
trees/branches/versions via NFS. This combines the ease of use of a
filesystem (rather than "projects" in some IDE) with the reliability
(transactions) provided by a DBMS.

Ray
--
"Never trust a poll you haven't rigged yourself."

#8Marc SCHAEFER
schaefer@alphanet.ch
In reply to: J.H.M. Dassen (Ray) (#7)
Re: Re: MySQL file system

On Wed, 17 Jan 2001, J.H.M. Dassen (Ray) wrote:

So what is needed it to be able to mount a .zip or a .tar file as a file
system with the loopback device.

For Linux, there used to be a "userfs" that allowed this kind of thing.
AFAIK development stopped quite some time ago, as there were numerous
problems with deadlocks.

I am working (since a long time now), on a migration filesystem where you
can extend a filesystem with data on tapes, NFS, etc, using a stacking
filesystem (FIST). Migration can be scheduled or automatic, when the fs
is full. It only migrates/unmigrates complete files.

http://www-internal.alphanet.ch/~schaefer/mfs.html

It's a bit your poor man's HSM. It is supposed it will be one day merged
with filed (http://www-internal.alphanet.ch/~schaefer/filed.html).

That's Linux-only, however, and still very gamma.

#9Michael Ansley
Michael.Ansley@intec-telecom-systems.com
In reply to: Marc SCHAEFER (#8)
RE: MySQL file system

Um, I think that file servers may have use for it (DBFS), as they will want
to store documents and media files in a database with various plug-ins
available to be able to get into files, and more importantly, to be able to
find them quickly when required. How do I find all documents with the
keyword 'telecommunications'? I could do a grep -r, or I could have it all
stored in a database with plug-ins having extracted the keywords on insert,
and run an SQL query (there are better examples, this is just a simple one).
I think the file system idea is a good one, *in a niche area*, and I think
that Postgres is probably better suited to doing it than mySQL, given some
of the things that Postgres can do, and the ORDB descriptor.

In Ned's case, I agree that a .tar.gz mount (well it's not actually a mount,
simply a smart file browser) is probably the best solution. It's the way I
do it, and I'm reasonably happy with it.

Cheers...

MikeA

-----Original Message-----
From: Joseph Shraibman [mailto:jks@selectacast.net]
Sent: 17 January 2001 03:21
To: Karl DeBisschop
Cc: Ned Lilly; PostgreSQL General
Subject: Re: [GENERAL] MySQL file system

Karl DeBisschop wrote:

Ned Lilly wrote:

Anyone heard about this?

http://no.spam.ee/~tonu/mysqlfs.html

I was bout to copy about 3000 RFCs onto my hard drive last night.
Bowsing them online was just too painfully slow.

Then I realized that although I had the disk space for the total file
size, they were mostly smaller than a block on my disk, so I was going
to loose alot of space.

My first thought was, well postgress would stor that as one file - that
might work. Then I thought "I'm trying to be FAST here -- any time spent
on this is a distraction. Too bad there is no such thing as
PostgreSQL/FS"

So what is needed it to be able to mount a .zip or a .tar file as a file
system with the loopback device. Mounting a database on the filesystem
sounds pretty useless to me.

--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio. http://www.targabot.com

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
Nick West - Global Infrastructure Manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************

#10Robert D. Nelson
RDNELSON@co.centre.pa.us
In reply to: Michael Ansley (#9)
RE: MySQL file system

Then I realized that although I had the disk space for the total file
size, they were mostly smaller than a block on my disk, so I was going
to loose alot of space.

The solution to this is sub-allocation, which Novell has achieved. NOT a DB
filesystem.

It's not earth shattering, and sure there are other solutions. But it's
cute and could be useful. Add some bugs and a few security compromises,
and it almost looks like M$.

Heh, except the stuff from Microsoft is likely more useful ;)

Rob Nelson
rdnelson@co.centre.pa.us

#11Robert D. Nelson
RDNELSON@co.centre.pa.us
In reply to: Robert D. Nelson (#10)
RE: MySQL file system

Raw disk access allows:

If I'm correct, mysql is providing a filesystem, not a way to access raw
disk, like Oracle does. Huge difference there - with a filesystem, you have
overhead of FS *and* SQL at the same time.

Rob Nelson
rdnelson@co.centre.pa.us

#12Fausto Guzzetti
Fausto.Guzzetti@irpi.pg.cnr.it
In reply to: Robert D. Nelson (#10)
Re: MySQL file system

Hello,

I have what I think is a trivial problem. But I wasn't able to find a solution
in the FAQ or in the docs I have.

It has to do with case sensitivity. I am working with an application of
PostrreSQL (7.02) and PHP4.

My table name is (for example) avi_f_amm and contains a field named Number
(capital N)

The query select * from avi_f_amm works fine.
The query select * from avi_f_amm where Number gt 1 does not because number
(small cap n) is to a field in the table.
In other words Postgresql does not recognize "Number" and thinks it is "number"

Any idea of what is going on or where should I look in the documentation?

Fausto

--
Fausto Guzzetti
CNR - IRPI Perugia
via della Madonna Alta, 126
06128 PERUGIA (Italy)
Tel. +39 075.505.4943
Fax. +39 075.505.1325
E.mail F.Guzzetti@irpi.pg.cnr.it

#13Henshall, Stuart - WCP
SHenshall@westcountrypublications.co.uk
In reply to: Fausto Guzzetti (#12)
RE: MySQL file system

I believe "Number" should work, refer to the postgres Manual:
http://www.postgresql.org/docs/user/x524.htm
- Stuart

Show quoted text

-----Original Message-----
From: Fausto Guzzetti [SMTP:Fausto.Guzzetti@irpi.pg.cnr.it]
Sent: Wednesday, January 17, 2001 1:21 PM
To: PostgreSQL General
Subject: Re: MySQL file system

Hello,

I have what I think is a trivial problem. But I wasn't able to find a
solution
in the FAQ or in the docs I have.

It has to do with case sensitivity. I am working with an application of
PostrreSQL (7.02) and PHP4.

My table name is (for example) avi_f_amm and contains a field named Number
(capital N)

The query select * from avi_f_amm works fine.
The query select * from avi_f_amm where Number gt 1 does not because
number
(small cap n) is to a field in the table.
In other words Postgresql does not recognize "Number" and thinks it is
"number"

Any idea of what is going on or where should I look in the documentation?

Fausto

--
Fausto Guzzetti
CNR - IRPI Perugia
via della Madonna Alta, 126
06128 PERUGIA (Italy)
Tel. +39 075.505.4943
Fax. +39 075.505.1325
E.mail F.Guzzetti@irpi.pg.cnr.it

#14Prasanth Kumar
kumar1@home.com
In reply to: Robert D. Nelson (#10)
Re: MySQL file system

<snip>

Then I realized that although I had the disk space for the total file
size, they were mostly smaller than a block on my disk, so I was going
to loose alot of space.

The solution to this is sub-allocation, which Novell has achieved. NOT a

DB

filesystem.

It's not earth shattering, and sure there are other solutions. But it's
cute and could be useful. Add some bugs and a few security compromises,
and it almost looks like M$.

Heh, except the stuff from Microsoft is likely more useful ;)

Rob Nelson
rdnelson@co.centre.pa.us

Doesn't ReiserFS support sub-allocation? I have never used it but remember
that it was supposed to be efficient in storage of small files.

#15Alfred Perlstein
bright@wintelcom.net
In reply to: Robert D. Nelson (#11)
Re: MySQL file system

* Robert D. Nelson <RDNELSON@co.centre.pa.us> [010117 05:17] wrote:

Raw disk access allows:

If I'm correct, mysql is providing a filesystem, not a way to access raw
disk, like Oracle does. Huge difference there - with a filesystem, you have
overhead of FS *and* SQL at the same time.

Oh, so it's sort of like /proc for mysql?

What a terrible waste of time and resources. :(

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#16Adam Lang
aalang@rutgersinsurance.com
In reply to: Robert D. Nelson (#11)
Re: MySQL file system

I wasn't following the thread too closely, but database for a filesystem has
been done. BeOS uses a database for a filesystem as well as AS/400 and
Mainframes.

Adam Lang
Systems Engineer
Rutgers Casualty Insurance Company
http://www.rutgersinsurance.com
----- Original Message -----
From: "Alfred Perlstein" <bright@wintelcom.net>
To: "Robert D. Nelson" <RDNELSON@co.centre.pa.us>
Cc: "Joseph Shraibman" <jks@selectacast.net>; "Karl DeBisschop"
<karl@debisschop.net>; "Ned Lilly" <ned@greatbridge.com>; "PostgreSQL
General" <pgsql-general@postgresql.org>
Sent: Wednesday, January 17, 2001 12:23 PM
Subject: Re: [GENERAL] MySQL file system

* Robert D. Nelson <RDNELSON@co.centre.pa.us> [010117 05:17] wrote:

Raw disk access allows:

If I'm correct, mysql is providing a filesystem, not a way to access raw
disk, like Oracle does. Huge difference there - with a filesystem, you

have

Show quoted text

overhead of FS *and* SQL at the same time.

Oh, so it's sort of like /proc for mysql?

What a terrible waste of time and resources. :(

--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

#17Patrick Welche
prlw1@newn.cam.ac.uk
In reply to: Fausto Guzzetti (#12)
Re: Re: MySQL file system

On Wed, Jan 17, 2001 at 02:21:01PM +0100, Fausto Guzzetti wrote:

Hello,

I have what I think is a trivial problem. But I wasn't able to find a solution
in the FAQ or in the docs I have.

It has to do with case sensitivity. I am working with an application of
PostrreSQL (7.02) and PHP4.

My table name is (for example) avi_f_amm and contains a field named Number
(capital N)

The query select * from avi_f_amm works fine.
The query select * from avi_f_amm where Number gt 1 does not because number
(small cap n) is to a field in the table.
In other words Postgresql does not recognize "Number" and thinks it is "number"

Any idea of what is going on or where should I look in the documentation?

Try:

select * from avi_f_amm where "Number" > 1;

Cheers,

Patrick

#18Gordan Bobic
gordan@freeuk.com
In reply to: Marc SCHAEFER (#8)
Tuning queries and distinct behaviour

Is there a way to tune queries?

I'm doing queries that join around 5-6 tables. All join fields are indexed
either in hash (where tables are small enough and join is done on "="), or
btree (big tables, not joined on "="). The tables have between several
hundred and several tens of millions of records. The problem is that this
tends to take a _VERY_ long time. I know that I'm asking for a bit much on
such a huge task, but if I break the queries down manually into 10 simper
ones that I could run manually with temporary tables, each of those would
take a few seconds at most. The optimizer occasionally decides to do
sequential scans, and this is probably what is killing the performance. Is
there any way to give the optimizer hints? I suspect that it would help in
a vast majority of cases if it looked ad the where clauses in views and
selects on views and tried cutting down the working set through that, and
then pruning down the rest as it went along. It just seems a bit strange
that doing subqueries with temporary tables should be so much faster.

Doing VACUUM ANALYZE often helps, but not always. Is there any way to give
the optimizer hints on how to speed up selects on views that do big joins,
both inner and outer?

Another question - I have to do a join on the "<" operation. Something
like:

SELECT * FROM Table1, Table2 WHERE Table1.Field1 < Table2.Field2 ORDER BY
Table1.Field1 DESC, Table2.Field2 DESC;

This will give me a very large set of records. However, I only really need
a few of those records. I only want the highest Field1, Field2 combination
records for some other field in Table1. Effectively, something like:

CREATE VIEW SomeView AS SELECT * FROM Table1, Table2 WHERE Table1.Field1 <
Table2.Field2 ORDER BY Table1.Field1 DESC, Table2.Field2 DESC;

and then doing:

SELECT DISTINCT ON (Table1.Field3) * FROM SomeView. I would hope that
DISTINCT would pick the first record returned for each of the different
occurences in SomeView.

Unfortunately it doesn't. By having a quick scan at the returned records,
it doesn't seem to pick them in any particular order. Is this the case? And
is there an SQL standard that says which records should DISTINCT return
(first, last, or arbitrary)? And is there a way to enforce this behaviour,
just in case the default behaviour changes in the future?

Alternatively, can anyone think of a solution to this problem?

Thanks.

Gordan

#19Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Gordan Bobic (#18)
Re: Tuning queries and distinct behaviour

On Thu, 18 Jan 2001, Gordan Bobic wrote:

Is there a way to tune queries?

I'm doing queries that join around 5-6 tables. All join fields are indexed
either in hash (where tables are small enough and join is done on "="), or
btree (big tables, not joined on "="). The tables have between several
hundred and several tens of millions of records. The problem is that this
tends to take a _VERY_ long time. I know that I'm asking for a bit much on
such a huge task, but if I break the queries down manually into 10 simper
ones that I could run manually with temporary tables, each of those would
take a few seconds at most. The optimizer occasionally decides to do
sequential scans, and this is probably what is killing the performance. Is
there any way to give the optimizer hints? I suspect that it would help in
a vast majority of cases if it looked ad the where clauses in views and
selects on views and tried cutting down the working set through that, and
then pruning down the rest as it went along. It just seems a bit strange
that doing subqueries with temporary tables should be so much faster.

Is it deciding that a lot of rows will match when it does the sequential
scans? I guess a copy of the schema and query would help.

Doing VACUUM ANALYZE often helps, but not always. Is there any way to give
the optimizer hints on how to speed up selects on views that do big joins,
both inner and outer?

Another question - I have to do a join on the "<" operation. Something
like:

SELECT * FROM Table1, Table2 WHERE Table1.Field1 < Table2.Field2 ORDER BY
Table1.Field1 DESC, Table2.Field2 DESC;

This will give me a very large set of records. However, I only really need
a few of those records. I only want the highest Field1, Field2 combination
records for some other field in Table1. Effectively, something like:

CREATE VIEW SomeView AS SELECT * FROM Table1, Table2 WHERE Table1.Field1 <
Table2.Field2 ORDER BY Table1.Field1 DESC, Table2.Field2 DESC;

and then doing:

SELECT DISTINCT ON (Table1.Field3) * FROM SomeView. I would hope that
DISTINCT would pick the first record returned for each of the different
occurences in SomeView.

Unfortunately it doesn't. By having a quick scan at the returned records,
it doesn't seem to pick them in any particular order. Is this the case? And
is there an SQL standard that says which records should DISTINCT return
(first, last, or arbitrary)? And is there a way to enforce this behaviour,
just in case the default behaviour changes in the future?

DISTINCT ON is a non-standard "feature" and yes, without additional help,
it will not pick a particular row (well, it's probably the first one
it comes across). I believe if you use order by you can get the
first/last by the metric you ordered by. So if you did Field3, Field1
DESC, Field2 DESC in the order by it should give you the highest Field1
valued row, and then higest Field2 for ties.

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gordan Bobic (#18)
Re: Tuning queries and distinct behaviour

"Gordan Bobic" <gordan@freeuk.com> writes:

It just seems a bit strange
that doing subqueries with temporary tables should be so much faster.

Hard to respond to this complaint without a specific example.

SELECT DISTINCT ON (Table1.Field3) * FROM SomeView. I would hope that
DISTINCT would pick the first record returned for each of the different
occurences in SomeView.

It does, but kindly notice that you have not constrained what the first
record returned is. See the example of SELECT DISTINCT ON usage in the
SELECT reference page,
http://www.postgresql.org/users-lounge/docs/7.0/postgres/sql-select.htm

regards, tom lane

#21Gordan Bobic
gordan@freeuk.com
In reply to: Marc SCHAEFER (#8)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gordan Bobic (#21)
#23Mike Hoskins
mikehoskins@yahoo.com
In reply to: Adam Lang (#16)
#24Mike Hoskins
mikehoskins@yahoo.com
In reply to: Adam Lang (#16)
#25Paul M Foster
paulf@quillandmouse.com
In reply to: Mike Hoskins (#23)
#26Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: Mike Hoskins (#24)