Is mdextend really safe?
Earlier we saw some bug reports from someone who had a buffer flush fail do to
ENOSPC. We asserted then that that should never happen because when we extend
the relation we write out the new blocks so any ENOSPC errors out to happen at
that point, not when a buffer is flushed.
However looking at mdextend it only writes out the requested block. Any blocks
between the end of the table and the requested block are *not* written out. We
count on the OS to implicitly fill those blocks with zeros.
On Unix that creates a sparse file where the intervening blocks are not
allocated. When we later write out those blocks the filesystem then has to
allocate space for them. IIRC the bug reports were from Windows. I'm not sure
what NTFS's behaviour with sparse files is.
Now this only matters if we ever call mdextend on a block which isn't the
block immediately following the end of file. Is that true?
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!
* Gregory Stark:
On Unix that creates a sparse file where the intervening blocks are
not allocated. When we later write out those blocks the filesystem
then has to allocate space for them.
This seems to happen relatively rarely. Creating temporary holes like
this usually results in heavily fragmented files on the file systems I
use, and I don't see this with PostgreSQL. (It's one of my gripes
with Berkeley DB.)
However, I looked at the code recently and couldn't figure out *why*
PostgreSQL's observed behavior is this way. 8-(
--
Florian Weimer <fweimer@bfk.de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99
Gregory Stark napsal(a):
On Unix that creates a sparse file where the intervening blocks are not
allocated. When we later write out those blocks the filesystem then has to
allocate space for them. IIRC the bug reports were from Windows. I'm not sure
what NTFS's behaviour with sparse files is.
NTFS has sparse file feature, but how it works ...
Now this only matters if we ever call mdextend on a block which isn't the
block immediately following the end of file. Is that true?
I think, that it could happens only during wal log replay, but at the
end everything should be OK. Look into ReadBuffer_common there is
following code:
00226 /* Substitute proper block number if caller asked for P_NEW */
00227 if (isExtend)
00228 blockNum = smgrnblocks(smgr, forkNum);
Zdenek
Gregory Stark wrote:
Now this only matters if we ever call mdextend on a block which isn't the
block immediately following the end of file. Is that true?
I don't think so.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com