sorting big tables :(

Started by Michael Richardsover 27 years ago26 messages
#1Michael Richards
miker@scifair.acadiau.ca

I think soon people are going to start calling me Mr. Big...Tables...

I have a big table. 40M rows.
On the disk, it's size is:
2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table.
How should one decide based on table size how much room is needed?

Also, this simple table consisting of only 2 int4 values is the exact size
of an equally sized table consisting of only one int2. There seems to be
too much overhead here. I realise there are extra things that have to be
saved, but I am not getting the size/performance I had hoped for... I am
starting to think this segment of the database would be better implemented
without a dbms because it is not expected to change at all...

-Mike

#2Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Michael Richards (#1)
Re: [HACKERS] sorting big tables :(

I think soon people are going to start calling me Mr. Big...Tables...

I have a big table. 40M rows.
On the disk, it's size is:
2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table.
How should one decide based on table size how much room is needed?

Also, this simple table consisting of only 2 int4 values is the exact size
of an equally sized table consisting of only one int2. There seems to be
too much overhead here. I realise there are extra things that have to be
saved, but I am not getting the size/performance I had hoped for... I am
starting to think this segment of the database would be better implemented
without a dbms because it is not expected to change at all...

It is taking so much disk space because it is using a TAPE sorting
method, by breaking the file into tape chunks and sorting in pieces, the
merging.

Can you try increasing your postgres -S parameter to some huge amount like 32MB
and see if that helps? It should.

i.e.

postmaster -i -B 400 $DEBUG -o '-F -S 1024' "$@" >server.log 2>&1

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#3Michael Richards
miker@scifair.acadiau.ca
In reply to: Bruce Momjian (#2)
Re: [HACKERS] sorting big tables :(

On Fri, 15 May 1998, Bruce Momjian wrote:

I have a big table. 40M rows.
On the disk, it's size is:
2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table.
How should one decide based on table size how much room is needed?

It is taking so much disk space because it is using a TAPE sorting
method, by breaking the file into tape chunks and sorting in pieces, the

The files grow until I have 6 files of almost a gig each. At that point, I
start running out of space...
This TAPE sotring method. It is a simple merge sort? Do you know of a way
this could be done while using constant space and no more complexity in
the algorithim. Even if it is a little slower, the DBMS could decide based
on the table size whether it should use the tape sort or another one...
Bubble sort would not be my first choice tho :)

-Mike

#4Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Michael Richards (#3)
Re: [HACKERS] sorting big tables :(

On Fri, 15 May 1998, Bruce Momjian wrote:

I have a big table. 40M rows.
On the disk, it's size is:
2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table.
How should one decide based on table size how much room is needed?

It is taking so much disk space because it is using a TAPE sorting
method, by breaking the file into tape chunks and sorting in pieces, the

The files grow until I have 6 files of almost a gig each. At that point, I
start running out of space...
This TAPE sotring method. It is a simple merge sort? Do you know of a way
this could be done while using constant space and no more complexity in
the algorithim. Even if it is a little slower, the DBMS could decide based
on the table size whether it should use the tape sort or another one...
Bubble sort would not be my first choice tho :)

Tape sort is a standard Knuth sorting. It basically sorts in pieces,
and merges. If you don't do this, the accessing around gets very poor
as you page fault all over the file, and the cache becomes useless.

There is something optimal about having seven sort files. Not sure what
to suggest. No one has complained about this before.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#5Noname
dg@illustra.com
In reply to: Bruce Momjian (#4)
Re: [HACKERS] sorting big tables :(

On Fri, 15 May 1998, Bruce Momjian wrote:

I have a big table. 40M rows.
On the disk, it's size is:
2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table.
How should one decide based on table size how much room is needed?

It is taking so much disk space because it is using a TAPE sorting
method, by breaking the file into tape chunks and sorting in pieces, the

The files grow until I have 6 files of almost a gig each. At that point, I
start running out of space...
This TAPE sotring method. It is a simple merge sort? Do you know of a way
this could be done while using constant space and no more complexity in
the algorithim. Even if it is a little slower, the DBMS could decide based
on the table size whether it should use the tape sort or another one...
Bubble sort would not be my first choice tho :)

Tape sort is a standard Knuth sorting. It basically sorts in pieces,
and merges. If you don't do this, the accessing around gets very poor
as you page fault all over the file, and the cache becomes useless.

There is something optimal about having seven sort files. Not sure what
to suggest. No one has complained about this before.

I think this is a bug. There is no reason to use more than a little bit over
three times the input size for a sort. This is: input file, run files, output
file. If we are not able to sort a 2 gig table on a 9 gig partition we need
to fix it. I suspect we have a bug in the implementation, but perhaps we
need to look at our choice of algorithm. In any case this problem should go
on the todo list.

-dg

David Gould dg@illustra.com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
"Of course, someone who knows more about this will correct me if I'm wrong,
and someone who knows less will correct me if I'm right."
--David Palmer (palmer@tybalt.caltech.edu)

#6The Hermit Hacker
scrappy@hub.org
In reply to: Noname (#5)
Re: [HACKERS] sorting big tables :(

On Sun, 17 May 1998, David Gould wrote:

I think this is a bug. There is no reason to use more than a little bit over
three times the input size for a sort. This is: input file, run files, output
file. If we are not able to sort a 2 gig table on a 9 gig partition we need
to fix it. I suspect we have a bug in the implementation, but perhaps we
need to look at our choice of algorithm. In any case this problem should go
on the todo list.

Have to agree here...

Micheal...if you were to dump that table into a text file, how big
would it turn out to be? Much smaller then 2gig, no? Then perform a Unix
sort on that, how long would that take? Then reload the data...

Needing more then 7gig to sort a 2gig table sounds slightly off to
me as well :(

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#7Michael Richards
miker@scifair.acadiau.ca
In reply to: Bruce Momjian (#4)
Re: [HACKERS] sorting big tables :(

On Sun, 17 May 1998, Bruce Momjian wrote:

I have a big table. 40M rows.
On the disk, it's size is:
2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table.
How should one decide based on table size how much room is needed?

Tape sort is a standard Knuth sorting. It basically sorts in pieces,
and merges. If you don't do this, the accessing around gets very poor
as you page fault all over the file, and the cache becomes useless.

Right. I wasn't reading the right chapter. Internal sorting is much
different than external sorts. Internal suggests the use of a Quicksort
algorithim.
Marc and I discussed over lunch. If I did a select * into, would it not
make more sense to sort the results into the resulting table rather than
into pieces and then copy into a table? From my limited knowlege, I think
this should save 8/7 N the space.
In this issue, I think there must be a lot more overhead than necessary.
The table consists of only
int4, int4, int2
I read 10 bytes / row of actual data here.
Instead, 40M/2gigs is about
50 bytes / record
What is there other than oid (4? bytes)

-Mike

#8Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Michael Richards (#7)
Re: [HACKERS] sorting big tables :(

On Sun, 17 May 1998, Bruce Momjian wrote:

I have a big table. 40M rows.
On the disk, it's size is:
2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table.
How should one decide based on table size how much room is needed?

Tape sort is a standard Knuth sorting. It basically sorts in pieces,
and merges. If you don't do this, the accessing around gets very poor
as you page fault all over the file, and the cache becomes useless.

Right. I wasn't reading the right chapter. Internal sorting is much
different than external sorts. Internal suggests the use of a Quicksort
algorithim.
Marc and I discussed over lunch. If I did a select * into, would it not
make more sense to sort the results into the resulting table rather than
into pieces and then copy into a table? From my limited knowlege, I think
this should save 8/7 N the space.
In this issue, I think there must be a lot more overhead than necessary.

Not sure if the internal tape is the same structure as a real table, but
I doubt it. I seem to remember there is less overhead.

The table consists of only
int4, int4, int2
I read 10 bytes / row of actual data here.
Instead, 40M/2gigs is about
50 bytes / record
What is there other than oid (4? bytes)

Internal stuff so it looks like a real table, even though it is a
result, I think.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#9The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#8)
Re: [HACKERS] sorting big tables :(

On Tue, 19 May 1998, Bruce Momjian wrote:

On Sun, 17 May 1998, Bruce Momjian wrote:

I have a big table. 40M rows.
On the disk, it's size is:
2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table.
How should one decide based on table size how much room is needed?

Tape sort is a standard Knuth sorting. It basically sorts in pieces,
and merges. If you don't do this, the accessing around gets very poor
as you page fault all over the file, and the cache becomes useless.

Right. I wasn't reading the right chapter. Internal sorting is much
different than external sorts. Internal suggests the use of a Quicksort
algorithim.
Marc and I discussed over lunch. If I did a select * into, would it not
make more sense to sort the results into the resulting table rather than
into pieces and then copy into a table? From my limited knowlege, I think
this should save 8/7 N the space.
In this issue, I think there must be a lot more overhead than necessary.

Not sure if the internal tape is the same structure as a real table, but
I doubt it. I seem to remember there is less overhead.

The table consists of only
int4, int4, int2
I read 10 bytes / row of actual data here.
Instead, 40M/2gigs is about
50 bytes / record
What is there other than oid (4? bytes)

Internal stuff so it looks like a real table, even though it is a
result, I think.

Okay...I get to jump in here with both feet and arms flailing :)

Michael and I had lunch today and talked about this, and I asked him to
send an email in to the list about it...unfortunately, he didn't translate
our chat very well for here :)

This whole things makes absolutely no sense to me, as far as why it takes
2.5 times more space to *sort* the table as the table size itself.

He starts with a 2gig table, and it runs out of disk space on a 9gig file
system...

Now, looking at question 3.26 in the FAQ, we have:

40 bytes + each row header (approximate)
10 bytes + two int4 fields + one int2 field
4 bytes + pointer on page to tuple
-------- =
54 bytes per row

The data page size in PostgreSQL is 8192(8k) bytes, so:

8192 bytes per page
------------------- = 157 rows per database page (rounded up)
54 bytes per row

40000000 data rows
----------------- = 254777 database pages
157 rows per page

254777 database pages * 8192 bytes per page = 2,087,133,184 or ~1.9gig

Now, as a text file, this would amount to, what...~50MB?

So, if I were to do a 'copy out' to a text file, a Unix sort and then a
'copy in', I would use up *less* disk space (by several orders of
magnitude) then doing the sort inside of PostgreSQL?

Why?

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org

#10Michal Mosiewicz
mimo@interdata.com.pl
In reply to: The Hermit Hacker (#9)
Re: [HACKERS] sorting big tables :(

The Hermit Hacker wrote:

Now, as a text file, this would amount to, what...~50MB?

40M of records to produce a 50MB text file? How would you sort such a
*compressed* file? ;-)

So, if I were to do a 'copy out' to a text file, a Unix sort and then a
'copy in', I would use up *less* disk space (by several orders of
magnitude) then doing the sort inside of PostgreSQL?

Well, I think it might be optimised slightly. Am I right that postgres
uses heap (i.e. they look like tables) files during sorting? While this
is a merge sort, those files doesn't have to be a table-like files.
Certainly, they might variable length records without pages (aren't they
used sequentially). Moreover we would consider packing tape files before
writting them down if necessary. Of course it will result in some
performance dropdown. However it's better to have less performance that
being unable to sort it at all.

Last question... What's the purpose of such a big sort? If somebody gets
40M of sorted records in a result of some query, what would he do with
it? Is he going to spent next years on reading this lecture? I mean,
isn't it worth to query the database for necessary informations only and
then sort it?

Mike

--
WWW: http://www.lodz.pdi.net/~mimo tel: Int. Acc. Code + 48 42 148340
add: Michal Mosiewicz * Bugaj 66 m.54 * 95-200 Pabianice * POLAND

#11The Hermit Hacker
scrappy@hub.org
In reply to: Michal Mosiewicz (#10)
Re: [HACKERS] sorting big tables :(

On Wed, 20 May 1998, Michal Mosiewicz wrote:

The Hermit Hacker wrote:

Now, as a text file, this would amount to, what...~50MB?

40M of records to produce a 50MB text file? How would you sort such a
*compressed* file? ;-)

My math off? 40M rows at 11bytes each (2xint4+int2+\n?) oops...ya, just
off by a factor of ten...still, 500MB is a quarter of the size of the 2gig
file we started with...

So, if I were to do a 'copy out' to a text file, a Unix sort and then a
'copy in', I would use up *less* disk space (by several orders of
magnitude) then doing the sort inside of PostgreSQL?

Well, I think it might be optimised slightly. Am I right that postgres
uses heap (i.e. they look like tables) files during sorting? While this
is a merge sort, those files doesn't have to be a table-like files.
Certainly, they might variable length records without pages (aren't they
used sequentially). Moreover we would consider packing tape files before
writting them down if necessary. Of course it will result in some
performance dropdown. However it's better to have less performance that
being unable to sort it at all.

Last question... What's the purpose of such a big sort? If somebody gets
40M of sorted records in a result of some query, what would he do with
it? Is he going to spent next years on reading this lecture? I mean,
isn't it worth to query the database for necessary informations only and
then sort it?

this I don't know...I never even really thought about that,
actually...Michael? :) Only you can answer that one.

#12Bruce Momjian
maillist@candle.pha.pa.us
In reply to: The Hermit Hacker (#11)
Re: [HACKERS] sorting big tables :(

On Wed, 20 May 1998, Michal Mosiewicz wrote:

The Hermit Hacker wrote:

Now, as a text file, this would amount to, what...~50MB?

40M of records to produce a 50MB text file? How would you sort such a
*compressed* file? ;-)

My math off? 40M rows at 11bytes each (2xint4+int2+\n?) oops...ya, just
off by a factor of ten...still, 500MB is a quarter of the size of the 2gig
file we started with...

Actually, my description of the use of tape files was somewhat off.
Actually, the file is sorted by putting several batches in each tape
file, then reading the batches make another tape file with bigger
batches until there is one tape file and one big sorted batch. Also, if
the data is already sorted, it can do it in one pass, without making all
those small batches because of the way the data structure sorts them in
memory. Only Knuth can do the description justice, but suffice it to
say that the data can appear up to two places at once.

This is the first time I remember someone complaining about it.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#13Bruce Momjian
maillist@candle.pha.pa.us
In reply to: The Hermit Hacker (#11)
Re: [HACKERS] sorting big tables :(

Well, I think it might be optimised slightly. Am I right that postgres
uses heap (i.e. they look like tables) files during sorting? While this
is a merge sort, those files doesn't have to be a table-like files.
Certainly, they might variable length records without pages (aren't they
used sequentially). Moreover we would consider packing tape files before
writting them down if necessary. Of course it will result in some
performance dropdown. However it's better to have less performance that
being unable to sort it at all.

Last question... What's the purpose of such a big sort? If somebody gets
40M of sorted records in a result of some query, what would he do with
it? Is he going to spent next years on reading this lecture? I mean,
isn't it worth to query the database for necessary informations only and
then sort it?

this I don't know...I never even really thought about that,
actually...Michael? :) Only you can answer that one.

I have an idea. Can he run CLUSTER on the data? If so, the sort will
not use small batches, and the disk space during sort will be reduced.
However, I think CLUSTER will NEVER finish on such a file, unless it is
already pretty well sorted.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#14The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#13)
Re: [HACKERS] sorting big tables :(

On Wed, 20 May 1998, Bruce Momjian wrote:

Well, I think it might be optimised slightly. Am I right that postgres
uses heap (i.e. they look like tables) files during sorting? While this
is a merge sort, those files doesn't have to be a table-like files.
Certainly, they might variable length records without pages (aren't they
used sequentially). Moreover we would consider packing tape files before
writting them down if necessary. Of course it will result in some
performance dropdown. However it's better to have less performance that
being unable to sort it at all.

Last question... What's the purpose of such a big sort? If somebody gets
40M of sorted records in a result of some query, what would he do with
it? Is he going to spent next years on reading this lecture? I mean,
isn't it worth to query the database for necessary informations only and
then sort it?

this I don't know...I never even really thought about that,
actually...Michael? :) Only you can answer that one.

I have an idea. Can he run CLUSTER on the data? If so, the sort will
not use small batches, and the disk space during sort will be reduced.
However, I think CLUSTER will NEVER finish on such a file, unless it is
already pretty well sorted.

Okay...then we *do* have a table size limit problem? Tables that
just get too large to be manageable? Maybe this is one area we should be
looking at as far as performance is concerned?

One thing that just pop'd to mind, concerning the above CLUSTER
command...what would it take to have *auto-cluster'ng*? Maybe provide a
means of marking a field in a table for this purpose?

One of the things that the Unix FS does is auto-defragmenting, at
least the UFS one does. Whenever the system is idle (from my
understanding), the kernel uses that time to clean up the file systems, to
reduce the file system fragmentation.

This is by no means SQL92, but it would be a neat
"extension"...let me specify a "CLUSTER on" field. Then, as I'm entering
data into the database, periodically check for fragmentation of the data
and clean up accordingly. If done by the system, reasonably often, it
shouldn't take up *too* much time, as most of the data should already be
in order...

That would have the side-benefit of speeding up the "ORDER by" on
that field also...

#15Bruce Momjian
maillist@candle.pha.pa.us
In reply to: The Hermit Hacker (#14)
Re: [HACKERS] sorting big tables :(

I have an idea. Can he run CLUSTER on the data? If so, the sort will
not use small batches, and the disk space during sort will be reduced.
However, I think CLUSTER will NEVER finish on such a file, unless it is
already pretty well sorted.

Okay...then we *do* have a table size limit problem? Tables that
just get too large to be manageable? Maybe this is one area we should be
looking at as far as performance is concerned?

Well, cluster moves one row at a time, so if the table is very
fragmented, the code is slow because it is seeking all over the table.
See the cluster manual pages for an alternate solution, the uses ORDER
BY.

One thing that just pop'd to mind, concerning the above CLUSTER
command...what would it take to have *auto-cluster'ng*? Maybe provide a
means of marking a field in a table for this purpose?

Hard to do. That's what we have indexes for.

One of the things that the Unix FS does is auto-defragmenting, at
least the UFS one does. Whenever the system is idle (from my
understanding), the kernel uses that time to clean up the file systems, to
reduce the file system fragmentation.

This is by no means SQL92, but it would be a neat
"extension"...let me specify a "CLUSTER on" field. Then, as I'm entering
data into the database, periodically check for fragmentation of the data
and clean up accordingly. If done by the system, reasonably often, it
shouldn't take up *too* much time, as most of the data should already be
in order...

That would have the side-benefit of speeding up the "ORDER by" on
that field also...

We actually can have a CLUSTER ALL command, that does this. No one has
implemented it yet.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#16Tom
tom@sdf.com
In reply to: The Hermit Hacker (#14)
Re: [HACKERS] sorting big tables :(

On Wed, 20 May 1998, The Hermit Hacker wrote:

One of the things that the Unix FS does is auto-defragmenting, at
least the UFS one does. Whenever the system is idle (from my
understanding), the kernel uses that time to clean up the file systems, to
reduce the file system fragmentation.

No, that doesn't happen. The only way to eliminate fragmentation is a
dump/newfs/restore cycle. UFS does do fragmentation avoidance (which is
reason UFS filesystems have a 10% reserve).

Tom

#17The Hermit Hacker
scrappy@hub.org
In reply to: Tom (#16)
Re: [HACKERS] sorting big tables :(

On Wed, 20 May 1998, Tom wrote:

On Wed, 20 May 1998, The Hermit Hacker wrote:

One of the things that the Unix FS does is auto-defragmenting, at
least the UFS one does. Whenever the system is idle (from my
understanding), the kernel uses that time to clean up the file systems, to
reduce the file system fragmentation.

No, that doesn't happen. The only way to eliminate fragmentation is a
dump/newfs/restore cycle. UFS does do fragmentation avoidance (which is
reason UFS filesystems have a 10% reserve).

Okay, then we have two different understandings of this. My
understanding was that the 10% reserve gave the OS a 'temp area' in which
to move blocks to/from so that it could defrag on the fly...

Am CC'ng this into freebsd-hackers@freebsd.org for a "third
opinion"...am willing to admit I'm wrong *grin*

#18Bruce Momjian
maillist@candle.pha.pa.us
In reply to: The Hermit Hacker (#17)
Re: [HACKERS] sorting big tables :(

Okay, then we have two different understandings of this. My
understanding was that the 10% reserve gave the OS a 'temp area' in which
to move blocks to/from so that it could defrag on the fly...

Am CC'ng this into freebsd-hackers@freebsd.org for a "third
opinion"...am willing to admit I'm wrong *grin*

You are wrong.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#19The Hermit Hacker
scrappy@hub.org
In reply to: Bruce Momjian (#18)
Re: [HACKERS] sorting big tables :(

On Wed, 20 May 1998, Bruce Momjian wrote:

Okay, then we have two different understandings of this. My
understanding was that the 10% reserve gave the OS a 'temp area' in which
to move blocks to/from so that it could defrag on the fly...

Am CC'ng this into freebsd-hackers@freebsd.org for a "third
opinion"...am willing to admit I'm wrong *grin*

You are wrong.

I just love short answers *roll eyes*

#20Bruce Momjian
maillist@candle.pha.pa.us
In reply to: The Hermit Hacker (#19)
Re: [HACKERS] sorting big tables :(

On Wed, 20 May 1998, Bruce Momjian wrote:

Okay, then we have two different understandings of this. My
understanding was that the 10% reserve gave the OS a 'temp area' in which
to move blocks to/from so that it could defrag on the fly...

Am CC'ng this into freebsd-hackers@freebsd.org for a "third
opinion"...am willing to admit I'm wrong *grin*

You are wrong.

I just love short answers *roll eyes*

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)
#21Eivind Eklund
eivind@yes.no
In reply to: The Hermit Hacker (#17)
Re: [HACKERS] sorting big tables :(

On Wed, May 20, 1998 at 01:17:34PM -0400, The Hermit Hacker wrote:

On Wed, 20 May 1998, Tom wrote:

No, that doesn't happen. The only way to eliminate fragmentation is a
dump/newfs/restore cycle. UFS does do fragmentation avoidance (which is
reason UFS filesystems have a 10% reserve).

Okay, then we have two different understandings of this. My
understanding was that the 10% reserve gave the OS a 'temp area' in which
to move blocks to/from so that it could defrag on the fly...

No. What is done is (quite correctly) fragmentation avoidance. Big
files are even sometimes fragmented on purpose, to allow small files
that are written later to avoid being fragmented.

Eivind.

#22Stupor Genius
stuporg@erols.com
In reply to: Michal Mosiewicz (#10)
RE: [HACKERS] sorting big tables :(

Last question... What's the purpose of such a big sort? If somebody gets
40M of sorted records in a result of some query, what would he do with
it? Is he going to spent next years on reading this lecture? I mean,
isn't it worth to query the database for necessary informations only and
then sort it?

Not all query results are for human eyes. I'd venture to say that more
queries are fed into report generators for formatting than are looked at
directly from psql.

A sort is required in some cases where not explicitly requested.

For example, a GROUP BY clause. You _could_ get the data back ungrouped,
but then you'd have to pipe it to another application or script to do the
sorting and then the grouping (or perhaps group on the fly). But then
perhaps that app/script will eat all the memory or disk space and you'd
be in the same pickle as before.

darrenk

#23Oleg Broytmann
phd@comus.ru
In reply to: The Hermit Hacker (#17)
Re: [HACKERS] sorting big tables :(

Hello!

On Wed, 20 May 1998, The Hermit Hacker wrote:

No, that doesn't happen. The only way to eliminate fragmentation is a
dump/newfs/restore cycle. UFS does do fragmentation avoidance (which is
reason UFS filesystems have a 10% reserve).

Okay, then we have two different understandings of this. My
understanding was that the 10% reserve gave the OS a 'temp area' in which
to move blocks to/from so that it could defrag on the fly...

No, you are wrong. This 10% is temp area reserved for emergent
situations - when root bring system down to single-user and do system
maintainance.

Oleg.
----
Oleg Broytmann http://members.tripod.com/~phd2/ phd2@earthling.net
Programmers don't die, they just GOSUB without RETURN.

#24The Hermit Hacker
scrappy@hub.org
In reply to: Oleg Broytmann (#23)
Re: [HACKERS] sorting big tables :(

On Thu, 21 May 1998, Oleg Broytmann wrote:

Hello!

On Wed, 20 May 1998, The Hermit Hacker wrote:

No, that doesn't happen. The only way to eliminate fragmentation is a
dump/newfs/restore cycle. UFS does do fragmentation avoidance (which is
reason UFS filesystems have a 10% reserve).

Okay, then we have two different understandings of this. My
understanding was that the 10% reserve gave the OS a 'temp area' in which
to move blocks to/from so that it could defrag on the fly...

No, you are wrong. This 10% is temp area reserved for emergent
situations - when root bring system down to single-user and do system
maintainance.

Actually, in this one you are only partly right. Only root has
*access* to using that extra 10%, but, as I've been corrected by several
ppl, including a couple on the FreeBSD list, that 10% is meant to
*prevent/reduce* fragmentation.

#25Andreas Zeugswetter
andreas.zeugswetter@telecom.at
In reply to: The Hermit Hacker (#24)
Re: [HACKERS] sorting big tables :(

I have an idea. Can he run CLUSTER on the data? If so, the sort will
not use small batches, and the disk space during sort will be reduced.

I think a real winner would be to use an existing index. This is what others do
to eliminate a sort completely. Of course the optimizer has to choose what is cheaper
on a per query basis (index access or sort of result set).
result set small --> use sort
result set large --> use available index

Andreas

#26Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Andreas Zeugswetter (#25)
Re: [HACKERS] sorting big tables :(

I have an idea. Can he run CLUSTER on the data? If so, the sort will
not use small batches, and the disk space during sort will be reduced.

I think a real winner would be to use an existing index. This is what others do
to eliminate a sort completely. Of course the optimizer has to choose what is cheaper
on a per query basis (index access or sort of result set).
result set small --> use sort
result set large --> use available index

Keep in mind an index is going to be seeking all over the table, making
the cache of limited use. Sometime, when doing a join, the optimizer
chooses a sequential scan rather than use an index for this reason, and
the sequential scan is faster.

-- 
Bruce Momjian                          |  830 Blythe Avenue
maillist@candle.pha.pa.us              |  Drexel Hill, Pennsylvania 19026
  +  If your life is a hard drive,     |  (610) 353-9879(w)
  +  Christ can be your backup.        |  (610) 853-3000(h)