tables > 1 gig
Config.h has this. Does this need to be updated because we can't vacuum
multi-segment relations? I have changed it to 7F000000:
/*
* RELSEG_SIZE is the maximum number of blocks allowed in one disk file.
* Thus, the maximum size of a single file is RELSEG_SIZE * BLCKSZ;
* relations bigger than that are divided into multiple files.
*
* CAUTION: RELSEG_SIZE * BLCKSZ must be less than your OS' limit on file
* size. This is typically 2Gb or 4Gb in a 32-bit operating system. By
* default, we make the limit 1Gb to avoid any possible integer-overflow
* problems within the OS. A limit smaller than necessary only means we
* divide a large relation into more chunks than necessary, so it seems
* best to err in the direction of a small limit. (Besides, a power-of-2
* value saves a few cycles in md.c.)
*
* CAUTION: you had best do an initdb if you change either BLCKSZ or
* RELSEG_SIZE.
*/
#define RELSEG_SIZE (0x40000000 / BLCKSZ)
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <maillist@candle.pha.pa.us> writes:
Config.h has this. Does this need to be updated because we can't vacuum
multi-segment relations? I have changed it to 7F000000:
Why? I thought we'd fixed the mdtruncate issue.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofThu17Jun1999102955-0400199906171429.KAA21749@candle.pha.pa.us | Resolved by subject fallback
Bruce Momjian <maillist@candle.pha.pa.us> writes:
Config.h has this. Does this need to be updated because we can't vacuum
multi-segment relations? I have changed it to 7F000000:Why? I thought we'd fixed the mdtruncate issue.
regards, tom lane
I am told we did not by Hiroshi. It was news to me too.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <maillist@candle.pha.pa.us> writes:
Config.h has this. Does this need to be updated because we can't vacuum
multi-segment relations? I have changed it to 7F000000:Why? I thought we'd fixed the mdtruncate issue.
regards, tom lane
I am told we did not by Hiroshi. It was news to me too.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Import Notes
Resolved by subject fallback
Bruce Momjian <maillist@candle.pha.pa.us> writes:
Config.h has this. Does this need to be updated because we can't vacuum
multi-segment relations? I have changed it to 7F000000:Why? I thought we'd fixed the mdtruncate issue.
I am told we did not by Hiroshi. It was news to me too.
Then we'd better fix the underlying problem. We can't change
RELSEG_SIZE for a minor release, unless you want to give up the
principle of not forcing initdb at minor releases.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofThu17Jun1999105144-0400199906171451.KAA23006@candle.pha.pa.us | Resolved by subject fallback
Bruce Momjian <maillist@candle.pha.pa.us> writes:
Config.h has this. Does this need to be updated because we can't vacuum
multi-segment relations? I have changed it to 7F000000:Why? I thought we'd fixed the mdtruncate issue.
I am told we did not by Hiroshi. It was news to me too.
Then we'd better fix the underlying problem. We can't change
RELSEG_SIZE for a minor release, unless you want to give up the
principle of not forcing initdb at minor releases.
Why can't we increase it?
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <maillist@candle.pha.pa.us> writes:
Then we'd better fix the underlying problem. We can't change
RELSEG_SIZE for a minor release, unless you want to give up the
principle of not forcing initdb at minor releases.
Why can't we increase it?
Consider a 1.5-gig table. 6.5 will store it as one gig in file "table",
one-half gig in file "table.1". Now recompile with larger RELSEG_SIZE.
The file manager will now expect to find all blocks of the relation in
file "table", and will never go to "table.1" at all. Presto, you lost
a bunch of data.
Bottom line is just as it says in the config.h comments: you can't
change either BLCKSZ or RELSEG_SIZE without doing initdb.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofThu17Jun1999110628-0400199906171506.LAA25209@candle.pha.pa.us | Resolved by subject fallback
Bruce Momjian <maillist@candle.pha.pa.us> writes:
Then we'd better fix the underlying problem. We can't change
RELSEG_SIZE for a minor release, unless you want to give up the
principle of not forcing initdb at minor releases.Why can't we increase it?
Consider a 1.5-gig table. 6.5 will store it as one gig in file "table",
one-half gig in file "table.1". Now recompile with larger RELSEG_SIZE.
The file manager will now expect to find all blocks of the relation in
file "table", and will never go to "table.1" at all. Presto, you lost
a bunch of data.Bottom line is just as it says in the config.h comments: you can't
change either BLCKSZ or RELSEG_SIZE without doing initdb.
OK. I will reverse it out. I never thought that far ahead. Not sure
how we can fix this easily, nor do I understand why more people aren't
complaining about not being able to vacuum tables that are 1.5 gigs that
they used to be able to vacuum.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <maillist@candle.pha.pa.us> writes:
... nor do I understand why more people aren't
complaining about not being able to vacuum tables that are 1.5 gigs that
they used to be able to vacuum.
Most likely, not very many people with tables that big have adopted 6.5
yet ... if I were running a big site, I'd probably wait for 6.5.1 on
general principles ;-)
I think what we ought to do is finish working out how to make mdtruncate
safe for concurrent backends, and then do it. That's the right
long-term answer anyway.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofThu17Jun1999111322-0400199906171513.LAA25295@candle.pha.pa.us | Resolved by subject fallback
Bruce Momjian <maillist@candle.pha.pa.us> writes:
... nor do I understand why more people aren't
complaining about not being able to vacuum tables that are 1.5 gigs that
they used to be able to vacuum.Most likely, not very many people with tables that big have adopted 6.5
yet ... if I were running a big site, I'd probably wait for 6.5.1 on
general principles ;-)I think what we ought to do is finish working out how to make mdtruncate
safe for concurrent backends, and then do it. That's the right
long-term answer anyway.
Problem is, no one knows how right now. I liked unlinking every
segment, but was told by Hiroshi that causes a problem with concurrent
access and vacuum because the old backends still think it is there.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <maillist@candle.pha.pa.us> writes:
Config.h has this. Does this need to be updated because we can't vacuum
multi-segment relations? I have changed it to 7F000000:Why? I thought we'd fixed the mdtruncate issue.
I am told we did not by Hiroshi. It was news to me too.
Then we'd better fix the underlying problem. We can't change
RELSEG_SIZE for a minor release, unless you want to give up the
principle of not forcing initdb at minor releases.
No initdb for minor releases!
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <maillist@candle.pha.pa.us> writes:
I think what we ought to do is finish working out how to make mdtruncate
safe for concurrent backends, and then do it. That's the right
long-term answer anyway.
Problem is, no one knows how right now. I liked unlinking every
segment, but was told by Hiroshi that causes a problem with concurrent
access and vacuum because the old backends still think it is there.
I haven't been paying much attention, but I imagine that what's really
going on here is that once vacuum has collected all the still-good
tuples at the front of the relation, it doesn't bother to go through
the remaining blocks of the relation and mark everything dead therein?
It just truncates the file after the last block that it put tuples into,
right?
If this procedure works correctly for vacuuming a simple one-segment
table, then it would seem that truncation of all the later segments to
zero length should work correctly.
You could truncate to zero length *and* then unlink the files if you
had a mind to do that, but I can see why unlink without truncate would
not work reliably.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofThu17Jun1999112432-0400199906171524.LAA25822@candle.pha.pa.us | Resolved by subject fallback
Bruce Momjian <maillist@candle.pha.pa.us> writes:
I think what we ought to do is finish working out how to make mdtruncate
safe for concurrent backends, and then do it. That's the right
long-term answer anyway.Problem is, no one knows how right now. I liked unlinking every
segment, but was told by Hiroshi that causes a problem with concurrent
access and vacuum because the old backends still think it is there.I haven't been paying much attention, but I imagine that what's really
going on here is that once vacuum has collected all the still-good
tuples at the front of the relation, it doesn't bother to go through
the remaining blocks of the relation and mark everything dead therein?
It just truncates the file after the last block that it put tuples into,
right?If this procedure works correctly for vacuuming a simple one-segment
table, then it would seem that truncation of all the later segments to
zero length should work correctly.You could truncate to zero length *and* then unlink the files if you
had a mind to do that, but I can see why unlink without truncate would
not work reliably.
That seems like the issue. The more complex problem is that when the
relation lookes a segment via vacuum, things go strange on the other
backends. Hiroshi seems to have a good testbed for this, and I thought
it was fixed, so I didn't notice.
Unlinking allows other backends to keep their open segments of the
tables, but that causes some problems with backends opening segments
they think still exist and they can't be opened.
Truncating segments causes problems because backends are still accessing
their own copies of the tables, and truncate modified what is seen in
their open file descriptors.
We basically have two methods, and both have problems under certain
circumstances. I wonder if we unlink the files, but then create
zero-length segments for the ones we unlink. If people think that may
fix the problems, it is easy to do that, and we can do it atomically
using the rename() system call. Create the zero-length file under a
temp name, then rename it to the segment file name. That may do the
trick of allowing existing file descriptors to stay active, while having
segments in place for those that need to see them.
Comments?
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
I haven't been paying much attention, but I imagine that what's really
going on here is that once vacuum has collected all the still-good
tuples at the front of the relation, it doesn't bother to go through
the remaining blocks of the relation and mark everything dead therein?
It just truncates the file after the last block that it put tuples into,
right?If this procedure works correctly for vacuuming a simple one-segment
table, then it would seem that truncation of all the later segments to
zero length should work correctly.
Not sure about that. When we truncate single segment file, the table is
being destroyed, so we invalidate it in the catalog cache and tell other
backends. Also, we have a problem with DROP TABLE in a transaction
while others are using it as described by a bug report a few days ago,
so I don't think we have that 100% either.
You could truncate to zero length *and* then unlink the files if you
had a mind to do that, but I can see why unlink without truncate would
not work reliably.
That is interesting. I never thought of that. Hiroshi, can you test
that idea? If it is the non-existance of the file that other backends
are checking for, my earlier idea of rename() with truncated file kept
in place may be better.
Also, I see why we are not getting more bug reports. They only get this
when the table looses a segment, so it is OK to vacuum large tables as
long as the table doesn't loose a segment during the vacuum.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
I haven't been paying much attention, but I imagine that what's really
going on here is that once vacuum has collected all the still-good
tuples at the front of the relation, it doesn't bother to go through
the remaining blocks of the relation and mark everything dead therein?
It just truncates the file after the last block that it put tuples into,
right?If this procedure works correctly for vacuuming a simple one-segment
table, then it would seem that truncation of all the later segments to
zero length should work correctly.Not sure about that. When we truncate single segment file, the table is
being destroyed, so we invalidate it in the catalog cache and tell other
backends. Also, we have a problem with DROP TABLE in a transaction
while others are using it as described by a bug report a few days ago,
so I don't think we have that 100% either.That is interesting. I never thought of that. Hiroshi, can you test
that idea? If it is the non-existance of the file that other backends
are checking for, my earlier idea of rename() with truncated file kept
in place may be better.Also, I see why we are not getting more bug reports. They only get this
when the table looses a segment, so it is OK to vacuum large tables as
long as the table doesn't loose a segment during the vacuum.
OK, this is 100% wrong. We truncate from vacuum any time the table size
changes, and vacuum of large tables will fail even if not removing a
segment. I forgot vacuum does this to reduce disk table size.
I wonder if truncating a file to reduce its size will cause other table
readers to have problems. I though vacuum had an exlusive lock on the
table during vacuum, and if so, why are other backends having troubles?
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, June 18, 1999 12:54 AM
To: Bruce Momjian
Cc: PostgreSQL-development; Inoue@tpf.co.jp
Subject: Re: [HACKERS] tables > 1 gigBruce Momjian <maillist@candle.pha.pa.us> writes:
I think what we ought to do is finish working out how to make
mdtruncate
safe for concurrent backends, and then do it. That's the right
long-term answer anyway.Problem is, no one knows how right now. I liked unlinking every
segment, but was told by Hiroshi that causes a problem with concurrent
access and vacuum because the old backends still think it is there.I haven't been paying much attention, but I imagine that what's really
going on here is that once vacuum has collected all the still-good
tuples at the front of the relation, it doesn't bother to go through
the remaining blocks of the relation and mark everything dead therein?
It just truncates the file after the last block that it put tuples into,
right?If this procedure works correctly for vacuuming a simple one-segment
table, then it would seem that truncation of all the later segments to
zero length should work correctly.You could truncate to zero length *and* then unlink the files if you
had a mind to do that, but I can see why unlink without truncate would
not work reliably.
Unlinking unused segments after truncating to zero length may cause
the result such as
Existent backends write to the truncated file to extend the relation
while new backends create a new segment file to extend the relation.
Comments ?
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
I haven't been paying much attention, but I imagine that what's really
going on here is that once vacuum has collected all the still-good
tuples at the front of the relation, it doesn't bother to go through
the remaining blocks of the relation and mark everything dead therein?
It just truncates the file after the last block that it puttuples into,
right?
If this procedure works correctly for vacuuming a simple one-segment
table, then it would seem that truncation of all the later segments to
zero length should work correctly.Not sure about that. When we truncate single segment file, the table is
being destroyed, so we invalidate it in the catalog cache and tell other
backends. Also, we have a problem with DROP TABLE in a transaction
while others are using it as described by a bug report a few days ago,
so I don't think we have that 100% either.
The problem is that (virtual) file descriptors,relcache entries ... etc
are local to each process. I don't know the certain way to tell other
processes just in time that target resources should be invalidated.
That is interesting. I never thought of that. Hiroshi, can you test
that idea? If it is the non-existance of the file that other backends
are checking for, my earlier idea of rename() with truncated file kept
in place may be better.Also, I see why we are not getting more bug reports. They only get this
when the table looses a segment, so it is OK to vacuum large tables as
long as the table doesn't loose a segment during the vacuum.OK, this is 100% wrong. We truncate from vacuum any time the table size
changes, and vacuum of large tables will fail even if not removing a
segment. I forgot vacuum does this to reduce disk table size.I wonder if truncating a file to reduce its size will cause other table
readers to have problems.
Current implementation has a hidden bug.
Once the size of a segment reached RELSEG_SIZE,mdnblocks()
wouldn't check the real size of the segment any more.
I'm not sure such other bugs doesn't exist any more.
It's one of the reason why I don't recommend to apply my trial patch
to mdtruncate().
I though vacuum had an exlusive lock on the
table during vacuum, and if so, why are other backends having troubles?
We could not see any errors by unlinking segmented relations when
commands are executed sequentailly.
Vacuum calls RelationInvalidateHeapTuple() for a pg_class tuple and
and other backends could recognize that the relcachle entry must be
invalidated while executing StartTransaction() or CommandCounter
Increment().
Even though the target relation is locked exclusively by vacuum,other
backends could StartTransaction(),CommandCounterIncrement(),
parse,analyze,rewrite,optimize,start Executor Stage and open relations.
We could not rely on exclusive lock so much.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
Unlinking unused segments after truncating to zero length may cause
the result such asExistent backends write to the truncated file to extend the relation
while new backends create a new segment file to extend the relation.
How about my idea of creating a truncated file, the renaming it to the
table file. That keeps the table open for other open file descriptors,
but put a zero-length file in place in an atomic manner.
Fact is that the current code is really bad, so I request you do your
best, and let's get it in there for people to review and improve if
necessary.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
-----Original Message-----
From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
Sent: Friday, June 18, 1999 11:31 AM
To: Hiroshi Inoue
Cc: PostgreSQL-development
Subject: Re: [HACKERS] tables > 1 gigUnlinking unused segments after truncating to zero length may cause
the result such asExistent backends write to the truncated file to extend
the relation
while new backends create a new segment file to extend the
relation.
How about my idea of creating a truncated file, the renaming it to the
table file. That keeps the table open for other open file descriptors,
but put a zero-length file in place in an atomic manner.
Sorry,I couldn't understand what you mean.
What is differenct from truncating existent files to zero length ?
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
Unlinking unused segments after truncating to zero length may cause
the result such asExistent backends write to the truncated file to extend
the relation
while new backends create a new segment file to extend the
relation.
How about my idea of creating a truncated file, the renaming it to the
table file. That keeps the table open for other open file descriptors,
but put a zero-length file in place in an atomic manner.Sorry,I couldn't understand what you mean.
What is differenct from truncating existent files to zero length ?
Glad to explain. Here is the pseudocode:
create temp file, make it zero length, call it 'zz'
rename(zz,tablename)
What this does is to create a zero length file, and the rename unlinks
the tablename file, and puts the zero-length file in it's place.
rename() is atomic, so there is no time that the table file does not
exist.
It allows backends that have the table open via a descriptor to keep the
table unchanged, while new backends see a zero-length file.
Does this help?
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
How about my idea of creating a truncated file, the renaming it to the
table file. That keeps the table open for other open filedescriptors,
but put a zero-length file in place in an atomic manner.
Sorry,I couldn't understand what you mean.
What is differenct from truncating existent files to zero length ?Glad to explain. Here is the pseudocode:
create temp file, make it zero length, call it 'zz'
rename(zz,tablename)What this does is to create a zero length file, and the rename unlinks
the tablename file, and puts the zero-length file in it's place.
rename() is atomic, so there is no time that the table file does not
exist.
Let
i1 be the inode of zz
i2 be the inode of tablename
before rename().
Does this mean
New backends read/write i1 inode and
backends that have the table open read/write i2 inode ?
If so,it seems wrong.
All backends should see same data.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
What this does is to create a zero length file, and the rename unlinks
the tablename file, and puts the zero-length file in it's place.
rename() is atomic, so there is no time that the table file does not
exist.Let
i1 be the inode of zz
i2 be the inode of tablename
before rename().Does this mean
New backends read/write i1 inode and
backends that have the table open read/write i2 inode ?If so,it seems wrong.
All backends should see same data.
Yes, I can see your point. It would show them different views of the
table.
So, as you were saying, we have no way of invalidating file descriptors
of other backends for secondary segments. Why does truncating the file
not work? Any ideas?
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
What this does is to create a zero length file, and the rename unlinks
the tablename file, and puts the zero-length file in it's place.
rename() is atomic, so there is no time that the table file does not
exist.Let
i1 be the inode of zz
i2 be the inode of tablename
before rename().Does this mean
New backends read/write i1 inode and
backends that have the table open read/write i2 inode ?If so,it seems wrong.
All backends should see same data.Yes, I can see your point. It would show them different views of the
table.So, as you were saying, we have no way of invalidating file descriptors
of other backends for secondary segments.
It seems DROP TABLE has a similar problem.
It has been already solved ?
Why does truncating the file
not work? Any ideas?
I have gotten no bug reports for my trial implementation.
AFAIK,only Ole Gjerde has tested my patch.
Is it sufficient ?
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
Yes, I can see your point. It would show them different views of the
table.So, as you were saying, we have no way of invalidating file descriptors
of other backends for secondary segments.It seems DROP TABLE has a similar problem.
It has been already solved ?
Not solved. Someone reported it recently.
Why does truncating the file
not work? Any ideas?I have gotten no bug reports for my trial implementation.
AFAIK,only Ole Gjerde has tested my patch.
Is it sufficient ?
Yes. We need something, and maybe after we add it, people can do
testing and find any problems. It is better to apply it than to leave
it as it currently exists, no?
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Yes, I can see your point. It would show them different views of the
table.So, as you were saying, we have no way of invalidating file
descriptors
of other backends for secondary segments.
Why does truncating the file
not work? Any ideas?I have gotten no bug reports for my trial implementation.
AFAIK,only Ole Gjerde has tested my patch.
Is it sufficient ?Yes. We need something, and maybe after we add it, people can do
testing and find any problems. It is better to apply it than to leave
it as it currently exists, no?
OK,here is my patch for PostgreSQL6.5-release.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp
*** storage/smgr/md.c.orig Fri Jun 11 12:20:06 1999
--- storage/smgr/md.c Fri Jun 18 15:10:54 1999
***************
*** 674,684 ****
segno = 0;
for (;;)
{
! if (v->mdfd_lstbcnt == RELSEG_SIZE
! || (nblocks = _mdnblocks(v->mdfd_vfd, BLCKSZ)) == RELSEG_SIZE)
{
-
- v->mdfd_lstbcnt = RELSEG_SIZE;
segno++;
if (v->mdfd_chain == (MdfdVec *) NULL)
--- 674,685 ----
segno = 0;
for (;;)
{
! nblocks = _mdnblocks(v->mdfd_vfd, BLCKSZ);
! if (nblocks > RELSEG_SIZE)
! elog(FATAL, "segment too big in mdnblocks!");
! v->mdfd_lstbcnt = nblocks;
! if (nblocks == RELSEG_SIZE)
{
segno++;
if (v->mdfd_chain == (MdfdVec *) NULL)
***************
*** 711,732 ****
MdfdVec *v;
#ifndef LET_OS_MANAGE_FILESIZE
! int curnblk;
curnblk = mdnblocks(reln);
! if (curnblk / RELSEG_SIZE > 0)
! {
! elog(NOTICE, "Can't truncate multi-segments relation %s",
! reln->rd_rel->relname.data);
! return curnblk;
! }
#endif
fd = RelationGetFile(reln);
v = &Md_fdvec[fd];
if (FileTruncate(v->mdfd_vfd, nblocks * BLCKSZ) < 0)
return -1;
return nblocks;
--- 712,766 ----
MdfdVec *v;
#ifndef LET_OS_MANAGE_FILESIZE
! int curnblk,
! i,
! oldsegno,
! newsegno,
! lastsegblocks;
! MdfdVec **varray;
curnblk = mdnblocks(reln);
! if (nblocks > curnblk)
! return -1;
! oldsegno = curnblk / RELSEG_SIZE;
! newsegno = nblocks / RELSEG_SIZE;
!
#endif
fd = RelationGetFile(reln);
v = &Md_fdvec[fd];
+ #ifndef LET_OS_MANAGE_FILESIZE
+ varray = (MdfdVec **)palloc((oldsegno + 1) * sizeof(MdfdVec *));
+ for (i = 0; i <= oldsegno; i++)
+ {
+ if (!v)
+ elog(ERROR,"segment isn't open in mdtruncate!");
+ varray[i] = v;
+ v = v->mdfd_chain;
+ }
+ for (i = oldsegno; i > newsegno; i--)
+ {
+ v = varray[i];
+ if (FileTruncate(v->mdfd_vfd, 0) < 0)
+ {
+ pfree(varray);
+ return -1;
+ }
+ v->mdfd_lstbcnt = 0;
+ }
+ /* Calculate the # of blocks in the last segment */
+ lastsegblocks = nblocks - (newsegno * RELSEG_SIZE);
+ v = varray[i];
+ pfree(varray);
+ if (FileTruncate(v->mdfd_vfd, lastsegblocks * BLCKSZ) < 0)
+ return -1;
+ v->mdfd_lstbcnt = lastsegblocks;
+ #else
if (FileTruncate(v->mdfd_vfd, nblocks * BLCKSZ) < 0)
return -1;
+ v->mdfd_lstbcnt = nblocks;
+ #endif
return nblocks;
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
Unlinking unused segments after truncating to zero length may cause
the result such as
Existent backends write to the truncated file to extend the relation
while new backends create a new segment file to extend the relation.
Ooh, good point. So, unless we want to invent some way for the process
that's running vacuum to force other backends to close their FDs for
segment files, the *only* correct solution is to truncate to zero length
but leave the files in place.
I still don't quite see why there is such a big problem, however, unless
you're asserting that vacuum is broken for single-segment tables too.
Surely vacuum acquires a lock over the whole table, not just a segment?
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofFri18Jun1999105138+0900000201beb92d1a7fc2c02801007e@cadzone.tpf.co.jp | Resolved by subject fallback
Thank you. Applied.
[Charset iso-8859-1 unsupported, filtering to ASCII...]
Yes, I can see your point. It would show them different views of the
table.So, as you were saying, we have no way of invalidating file
descriptors
of other backends for secondary segments.
Why does truncating the file
not work? Any ideas?I have gotten no bug reports for my trial implementation.
AFAIK,only Ole Gjerde has tested my patch.
Is it sufficient ?Yes. We need something, and maybe after we add it, people can do
testing and find any problems. It is better to apply it than to leave
it as it currently exists, no?OK,here is my patch for PostgreSQL6.5-release.
Regards.
Hiroshi Inoue
Inoue@tpf.co.jp*** storage/smgr/md.c.orig Fri Jun 11 12:20:06 1999 --- storage/smgr/md.c Fri Jun 18 15:10:54 1999 *************** *** 674,684 **** segno = 0; for (;;) { ! if (v->mdfd_lstbcnt == RELSEG_SIZE ! || (nblocks = _mdnblocks(v->mdfd_vfd, BLCKSZ)) == RELSEG_SIZE) { - - v->mdfd_lstbcnt = RELSEG_SIZE; segno++;if (v->mdfd_chain == (MdfdVec *) NULL) --- 674,685 ---- segno = 0; for (;;) { ! nblocks = _mdnblocks(v->mdfd_vfd, BLCKSZ); ! if (nblocks > RELSEG_SIZE) ! elog(FATAL, "segment too big in mdnblocks!"); ! v->mdfd_lstbcnt = nblocks; ! if (nblocks == RELSEG_SIZE) { segno++;if (v->mdfd_chain == (MdfdVec *) NULL)
***************
*** 711,732 ****
MdfdVec *v;#ifndef LET_OS_MANAGE_FILESIZE
! int curnblk;curnblk = mdnblocks(reln);
! if (curnblk / RELSEG_SIZE > 0)
! {
! elog(NOTICE, "Can't truncate multi-segments relation %s",
! reln->rd_rel->relname.data);
! return curnblk;
! }
#endiffd = RelationGetFile(reln);
v = &Md_fdvec[fd];if (FileTruncate(v->mdfd_vfd, nblocks * BLCKSZ) < 0)
return -1;return nblocks;
--- 712,766 ---- MdfdVec *v;#ifndef LET_OS_MANAGE_FILESIZE
! int curnblk,
! i,
! oldsegno,
! newsegno,
! lastsegblocks;
! MdfdVec **varray;curnblk = mdnblocks(reln);
! if (nblocks > curnblk)
! return -1;
! oldsegno = curnblk / RELSEG_SIZE;
! newsegno = nblocks / RELSEG_SIZE;
!
#endiffd = RelationGetFile(reln);
v = &Md_fdvec[fd];+ #ifndef LET_OS_MANAGE_FILESIZE + varray = (MdfdVec **)palloc((oldsegno + 1) * sizeof(MdfdVec *)); + for (i = 0; i <= oldsegno; i++) + { + if (!v) + elog(ERROR,"segment isn't open in mdtruncate!"); + varray[i] = v; + v = v->mdfd_chain; + } + for (i = oldsegno; i > newsegno; i--) + { + v = varray[i]; + if (FileTruncate(v->mdfd_vfd, 0) < 0) + { + pfree(varray); + return -1; + } + v->mdfd_lstbcnt = 0; + } + /* Calculate the # of blocks in the last segment */ + lastsegblocks = nblocks - (newsegno * RELSEG_SIZE); + v = varray[i]; + pfree(varray); + if (FileTruncate(v->mdfd_vfd, lastsegblocks * BLCKSZ) < 0) + return -1; + v->mdfd_lstbcnt = lastsegblocks; + #else if (FileTruncate(v->mdfd_vfd, nblocks * BLCKSZ) < 0) return -1; + v->mdfd_lstbcnt = nblocks; + #endifreturn nblocks;
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Fri, 18 Jun 1999, Bruce Momjian wrote:
[snip - mdtruncate patch]
While talking about this whole issue, there is one piece missing.
Currently there is no way to dump a database/table over 2 GB.
When it hits the 2GB OS limit, it just silently stops and gives no
indication that it didn't finish.
It's not a problem for me yet, but I'm getting very close. I have one
database with 3 tables over 2GB(in postgres space), but they still come
out under 2GB after a dump. I can't do a pg_dump on the whole database
however, which would be very nice.
I suppose it wouldn't be overly hard to have pg_dump/pg_dumpall do
something similar to what postgres does with segments. I haven't looked
at it yet however, so I can't say for sure.
Comments?
Ole Gjerde
Ole Gjerde wrote:
On Fri, 18 Jun 1999, Bruce Momjian wrote:
[snip - mdtruncate patch]While talking about this whole issue, there is one piece missing.
Currently there is no way to dump a database/table over 2 GB.
When it hits the 2GB OS limit, it just silently stops and gives no
indication that it didn't finish.It's not a problem for me yet, but I'm getting very close. I have one
database with 3 tables over 2GB(in postgres space), but they still come
out under 2GB after a dump. I can't do a pg_dump on the whole database
however, which would be very nice.I suppose it wouldn't be overly hard to have pg_dump/pg_dumpall do
something similar to what postgres does with segments. I haven't looked
at it yet however, so I can't say for sure.Comments?
As pg_dump writes to stdout, you can just use standard *nix tools:
1. use compressed dumps
pg_dump really_big_db | gzip > really_big_db.dump.gz
reload with
gunzip -c really_big_db.dump.gz | psql newdb
or
cat really_big_db.dump.gz | gunzip | psql newdb
2. use split
pg_dump really_big_db | split -b 1m - really_big_db.dump.
reload with
cat really_big_db.dump.* | pgsql newdb
-----------------------
Hannu
Bruce Momjian <maillist@candle.pha.pa.us> writes:
Then we'd better fix the underlying problem. We can't change
RELSEG_SIZE for a minor release, unless you want to give up the
principle of not forcing initdb at minor releases.Why can't we increase it?
Consider a 1.5-gig table. 6.5 will store it as one gig in file "table",
one-half gig in file "table.1". Now recompile with larger RELSEG_SIZE.
The file manager will now expect to find all blocks of the relation in
file "table", and will never go to "table.1" at all. Presto, you lost
a bunch of data.
Bottom line is just as it says in the config.h comments: you can't
change either BLCKSZ or RELSEG_SIZE without doing initdb.
Sorry for backing up so far on this thread...
Would it be possible to make BLCKSZ and/or RELSEG_SIZE (the latter
perhaps the most important, and perhaps the easiest?) a configurable
parameter which is read out of a global variable for each database? If
so, we could later think about moving it, along with things like
"default character set", to pg_database as per-db information, and
make it an option on CREATE DATABASE. That kind of thing might make
in-place upgrades easier too.
- Thomas
--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
Would it be possible to make BLCKSZ and/or RELSEG_SIZE (the latter
perhaps the most important, and perhaps the easiest?) a configurable
parameter which is read out of a global variable for each database?
Doable, perhaps, but I'm not quite sure why it's worth the trouble...
there doesn't seem to be that much value in running different DBs with
different values inside a single installation. Tweaking BLCKSZ, in
particular, will become fairly uninteresting once we solve the tuples-
bigger-than-a-block problem.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofSun20Jun1999002020+0000376C33C4.A623F476@alumni.caltech.edu | Resolved by subject fallback
I suppose it wouldn't be overly hard to have pg_dump/pg_dumpall do
something similar to what postgres does with segments. I haven't looked
at it yet however, so I can't say for sure.
I would not integrate such functionality into pg_dump, since it is not
necessary.
A good thing though would be a little HOWTO on splitting and/or compressing
pg_dump output.
The principle is:
backup:
mkfifo tapepipe
( gzip --fast -c < tapepipe | split -b512m - database.dump.gz. ) &
pg_dump -f tapepipe regression
rm tapepipe
restore:
createdb regression
cat database.dump.gz.* | gzip -cd | psql regression
Instead of gzip you could use a faster compressor like lzop, but you get the
idea :-)
Andreas
Import Notes
Resolved by subject fallback
A good thing though would be a little HOWTO on splitting and/or
compressing pg_dump output.
I already snarfed the description from Hannu Krosing and have put it
into manage-ag.sgml (which I've not yet committed to the web site;
will do so soon).
- Thomas
--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California