pg_total_relation_size() and CHECKPOINT

Started by Zubkovsky, Sergeyalmost 18 years ago29 messages
#1Zubkovsky, Sergey
Sergey.Zubkovsky@transas.com

Hello,

I've detected that a result value of pg_total_relation_size() for an
actively updated table might be significantly differ from a result that
is returned after explicit CHECKPOINT command execution.

select version();

"PostgreSQL 8.2.6 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC)
3.4.2 (mingw-special)"

I understand the reasons of such behavior: cache buffers must be flushed
in order to be sure that pg_total_relation_size() result will be like we
expect.

Is this right?

If so I think it would be very useful to add a description of this
peculiarity to the PG documentation.

Thanks in advance,

Zubkovsky Sergey

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zubkovsky, Sergey (#1)
Re: pg_total_relation_size() and CHECKPOINT

"Zubkovsky, Sergey" <Sergey.Zubkovsky@transas.com> writes:

I've detected that a result value of pg_total_relation_size() for an
actively updated table might be significantly differ from a result that
is returned after explicit CHECKPOINT command execution.

Uh, can you show a specific example of what you mean?

I understand the reasons of such behavior: cache buffers must be flushed
in order to be sure that pg_total_relation_size() result will be like we
expect.

I wouldn't think so. The space for a page is allocated immediately when
needed --- its *contents* might not be up to date, but that shouldn't
affect pg_total_relation_size.

regards, tom lane

#3Zubkovsky, Sergey
Sergey.Zubkovsky@transas.com
In reply to: Tom Lane (#2)
Re: pg_total_relation_size() and CHECKPOINT

Hi,

Here is my example.

We are creating 2 tables:

create table t1 ( a int, b varchar( 30 ) );

create table t1_arh ( c text );

and filling 't1':

insert into t1 select generate_series(1, 100000 ), generate_series(1,
100000 );

The "arch_table_sp" user-function will be used for extracting data from
't1' and archiving it to 't1_arh'.

CREATE OR REPLACE FUNCTION "arch_table_sp" ( tblName name, arcTblName
name )

RETURNS void

AS $BODY$

DECLARE

fn text;

chunk CONSTANT bigint := 512*1024;

off bigint := 0;

rdBytes bigint;

buf text;

BEGIN

SELECT setting INTO STRICT fn FROM pg_settings WHERE name =
'data_directory';

fn := fn || '/tbldata.txt';

PERFORM pg_file_unlink( fn );

EXECUTE 'COPY ( SELECT * FROM ' || quote_ident( tblName ) || ' ) TO
' || quote_literal( fn );

EXECUTE '

CREATE OR REPLACE FUNCTION "__InsertChunk__sp" ( data text )

RETURNS void AS

$_$

INSERT INTO ' || quote_ident( arcTblName ) || ' ( c ) VALUES ( $1 );

$_$

LANGUAGE sql;';

LOOP

buf := pg_file_read( fn, off, chunk );

rdBytes := length( buf );

IF ( rdBytes > 0 ) THEN

PERFORM "__InsertChunk__sp"( buf );

off := off + rdBytes;

END IF;

EXIT WHEN ( rdBytes <> chunk );

END LOOP;

PERFORM pg_file_unlink( fn );

END;

$BODY$ LANGUAGE plpgsql;

Now we are executing the following statements in one transaction:

select "arch_table_sp"( 't1', 't1_arh' );

select pg_total_relation_size( 't1_arh' );

The result is 417792 (in the general case it may be another value, for
example, I received 303104, 573440 and etc).

If we are executing these statements in separate transactions with a
couple of seconds between them than we have received another value:

truncate table t1_arh;

select "arch_table_sp"( 't1', 't1_arh' );

select pg_total_relation_size( 't1_arh' );

The result is 688128!

With explicit CHECKPOINT we will have one more value:

truncate table t1_arh;

select "arch_table_sp"( 't1', 't1_arh' );

CHECKPOINT;

select pg_total_relation_size( 't1_arh' );

The result is 696320!

It would be interesting why we have such results...

It's obviously that CHECKPOINT is not a good decision.

Can you suggest some other approach instead of explicit CHECKPOINT?

Sorry for my English.

I hope this example is quite clear.

Thanks in advance,

Zubkovsky Sergey

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, March 13, 2008 11:21 PM
To: Zubkovsky, Sergey
Cc: pgsql-docs@postgresql.org
Subject: Re: [DOCS] pg_total_relation_size() and CHECKPOINT

"Zubkovsky, Sergey" <Sergey.Zubkovsky@transas.com> writes:

I've detected that a result value of pg_total_relation_size() for an

actively updated table might be significantly differ from a result

that

is returned after explicit CHECKPOINT command execution.

Uh, can you show a specific example of what you mean?

I understand the reasons of such behavior: cache buffers must be

flushed

in order to be sure that pg_total_relation_size() result will be like

we

expect.

I wouldn't think so. The space for a page is allocated immediately when

needed --- its *contents* might not be up to date, but that shouldn't

affect pg_total_relation_size.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zubkovsky, Sergey (#3)
Re: pg_total_relation_size() and CHECKPOINT

[ moved to -hackers --- see original thread here
http://archives.postgresql.org/pgsql-docs/2008-03/msg00039.php
]

"Zubkovsky, Sergey" <Sergey.Zubkovsky@transas.com> writes:

Here is my example.

Hmm ... on my Fedora machine I get the same result (704512) in
all these cases, which is what I'd expect. (The exact value
could vary across platforms, of course.)

You said you were using the MinGW build --- maybe MinGW's version
of stat(2) isn't trustworthy?

regards, tom lane

#5Zubkovsky, Sergey
Sergey.Zubkovsky@transas.com
In reply to: Tom Lane (#4)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

The previous results were received on PG 8.3 version:

"PostgreSQL 8.3.0, compiled by Visual C++ build 1400"

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, March 14, 2008 7:19 PM
To: Zubkovsky, Sergey
Cc: pgsql-docs@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [DOCS] pg_total_relation_size() and CHECKPOINT

[ moved to -hackers --- see original thread here
http://archives.postgresql.org/pgsql-docs/2008-03/msg00039.php
]

"Zubkovsky, Sergey" <Sergey.Zubkovsky@transas.com> writes:

Here is my example.

Hmm ... on my Fedora machine I get the same result (704512) in
all these cases, which is what I'd expect. (The exact value
could vary across platforms, of course.)

You said you were using the MinGW build --- maybe MinGW's version
of stat(2) isn't trustworthy?

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zubkovsky, Sergey (#5)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

"Zubkovsky, Sergey" <Sergey.Zubkovsky@transas.com> writes:

The previous results were received on PG 8.3 version:
"PostgreSQL 8.3.0, compiled by Visual C++ build 1400"

Hmm. I find the whole thing fairly worrisome, because what it suggests
is that Windows isn't actually allocating file space during smgrextend,
which would mean that we'd be prone to running out of disk space at
unfortunate times --- like during a checkpoint, after we've already
promised the client the data is committed.

Can any Windows hackers look into this and find out what's really
happening?

regards, tom lane

#7Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#6)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

"Zubkovsky, Sergey" <Sergey.Zubkovsky@transas.com> writes:

The previous results were received on PG 8.3 version:
"PostgreSQL 8.3.0, compiled by Visual C++ build 1400"

Hmm. I find the whole thing fairly worrisome, because what it suggests
is that Windows isn't actually allocating file space during smgrextend,
which would mean that we'd be prone to running out of disk space at
unfortunate times --- like during a checkpoint, after we've already
promised the client the data is committed.

Surely we can't lose after the fsync? Losing at commit rather than at the time
of insert might still be poor, but how could we lose after we've promised the
data is committed?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#7)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Gregory Stark <stark@enterprisedb.com> writes:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Hmm. I find the whole thing fairly worrisome, because what it suggests
is that Windows isn't actually allocating file space during smgrextend,
which would mean that we'd be prone to running out of disk space at
unfortunate times --- like during a checkpoint, after we've already
promised the client the data is committed.

Surely we can't lose after the fsync? Losing at commit rather than at
the time of insert might still be poor, but how could we lose after
we've promised the data is committed?

What I'm afraid of is write() returning ENOSPC for a write to a disk
block we thought we had allocated previously. If such a situation is
persistent we'd be unable to flush dirty data from shared buffers and
thus never be able to complete a checkpoint. We'd never *get* to the
fsync, so whether the data is safe after fsync is moot.

The way it is supposed to work is that ENOSPC ought to happen during
smgrextend, that is before we've put any data into a shared buffer
corresponding to a new page of the file. With that, we will never be
able to commit a transaction that requires disk space we don't have.

The real question here is whether Windows' stat() is telling the truth
about how much filesystem space has actually been allocated to a file.
It seems entirely possible that it's not; but if it is, then I think we
have a problem.

regards, tom lane

#9Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#8)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Tom Lane wrote:

The real question here is whether Windows' stat() is telling the truth
about how much filesystem space has actually been allocated to a file.
It seems entirely possible that it's not; but if it is, then I think we
have a problem.

Has this been examined by a Windows hacker?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#9)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Alvaro Herrera wrote:

Tom Lane wrote:

The real question here is whether Windows' stat() is telling the truth
about how much filesystem space has actually been allocated to a file.
It seems entirely possible that it's not; but if it is, then I think we
have a problem.

Has this been examined by a Windows hacker?

If someone can suggest a test program I'll be happy to run it.

cheers

andrew

#11Zubkovsky, Sergey
Sergey.Zubkovsky@transas.com
In reply to: Andrew Dunstan (#10)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Can anybody tell me how filesystem space is allocated and point me to
the sources if it's possible?
I have some experience with programming for Windows and I'll try to
investigate this problem.

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: Wednesday, March 26, 2008 4:56 PM
To: Alvaro Herrera
Cc: Tom Lane; Gregory Stark; Zubkovsky, Sergey;
pgsql-hackers@postgresql.org; Magnus Hagander
Subject: Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

Alvaro Herrera wrote:

Tom Lane wrote:

The real question here is whether Windows' stat() is telling the

truth

about how much filesystem space has actually been allocated to a

file.

It seems entirely possible that it's not; but if it is, then I think

we

have a problem.

Has this been examined by a Windows hacker?

If someone can suggest a test program I'll be happy to run it.

cheers

andrew

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#10)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Andrew Dunstan <andrew@dunslane.net> writes:

Alvaro Herrera wrote:

Tom Lane wrote:

The real question here is whether Windows' stat() is telling the truth
about how much filesystem space has actually been allocated to a file.
It seems entirely possible that it's not; but if it is, then I think we
have a problem.

Has this been examined by a Windows hacker?

If someone can suggest a test program I'll be happy to run it.

One thing that would be good is just to see who else can reproduce
the original observation:
http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php

It might occur only on some versions of Windows, for instance.

regards, tom lane

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#12)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Alvaro Herrera wrote:

Tom Lane wrote:

The real question here is whether Windows' stat() is telling the truth
about how much filesystem space has actually been allocated to a file.
It seems entirely possible that it's not; but if it is, then I think we
have a problem.

Has this been examined by a Windows hacker?

If someone can suggest a test program I'll be happy to run it.

One thing that would be good is just to see who else can reproduce
the original observation:
http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php

It might occur only on some versions of Windows, for instance.

I have reproduced it in XP-Pro/SP2 running in a VMWare machine on an FC6
host.

cheers

andrew

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#13)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

The real question here is whether Windows' stat() is telling the truth
about how much filesystem space has actually been allocated to a file.

One thing that would be good is just to see who else can reproduce
the original observation:
http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php

I have reproduced it in XP-Pro/SP2 running in a VMWare machine on an FC6
host.

OK, so the next question is do we really have an issue, or is this just
an observational artifact? What I'd try is deliberately running the
machine out of disk space with a long series of inserts, and then see
whether subsequent checkpoint attempts fail due to ENOSPC errors while
trying to write out dirty buffers.

To avoid conflating this effect with anything else, it'd be best if you
could put the DB on its own small partition, and *not* put pg_xlog
there.

regards, tom lane

#15Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#14)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

The real question here is whether Windows' stat() is telling the truth
about how much filesystem space has actually been allocated to a file.

One thing that would be good is just to see who else can reproduce
the original observation:
http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php

I have reproduced it in XP-Pro/SP2 running in a VMWare machine on an FC6
host.

OK, so the next question is do we really have an issue, or is this just
an observational artifact? What I'd try is deliberately running the
machine out of disk space with a long series of inserts, and then see
whether subsequent checkpoint attempts fail due to ENOSPC errors while
trying to write out dirty buffers.

To avoid conflating this effect with anything else, it'd be best if you
could put the DB on its own small partition, and *not* put pg_xlog
there.

I'm working on this (thank goodness for junctions). Maybe we shopuld
look at providing a config setting for pg_xlog.

cheers

andrew

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#14)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

The real question here is whether Windows' stat() is telling the truth
about how much filesystem space has actually been allocated to a file.

One thing that would be good is just to see who else can reproduce
the original observation:
http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php

I have reproduced it in XP-Pro/SP2 running in a VMWare machine on an FC6
host.

OK, so the next question is do we really have an issue, or is this just
an observational artifact? What I'd try is deliberately running the
machine out of disk space with a long series of inserts, and then see
whether subsequent checkpoint attempts fail due to ENOSPC errors while
trying to write out dirty buffers.

To avoid conflating this effect with anything else, it'd be best if you
could put the DB on its own small partition, and *not* put pg_xlog
there.

OK, a very large insert failed as expected. Checkpoint succeeded. Then
vacuum recovered the space.

I suspect that the size reported by stat() is a little delayed here, but
the file system is keeping proper track of it, so the lseek that tries
to extend the file fails at the right spot.

cheers

andrew

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#16)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Andrew Dunstan <andrew@dunslane.net> writes:

I suspect that the size reported by stat() is a little delayed here, but
the file system is keeping proper track of it, so the lseek that tries
to extend the file fails at the right spot.

Hmm. If it really works that way, one would hope Microsoft would've
documented that someplace. Can anyone find a statement that Windows'
stat() is not current?

regards, tom lane

#18Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#17)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

On Thu, 27 Mar 2008 00:13:42 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

I suspect that the size reported by stat() is a little delayed
here, but the file system is keeping proper track of it, so the
lseek that tries to extend the file fails at the right spot.

Hmm. If it really works that way, one would hope Microsoft would've
documented that someplace. Can anyone find a statement that Windows'
stat() is not current?

I'm not in a position to test it myself now (doing training, and then
I'll be off to pg-east...), but it'd be interesting to see if it acts
the same way with GetFileSize(), or if it's just stat()...

/Magnus

#19Alvaro Herrera
alvherre@commandprompt.com
In reply to: Andrew Dunstan (#15)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Andrew Dunstan wrote:

I'm working on this (thank goodness for junctions). Maybe we shopuld
look at providing a config setting for pg_xlog.

I hope you mean an initdb switch -- otherwise it is way too easy to
misuse.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#20Zubkovsky, Sergey
Sergey.Zubkovsky@transas.com
In reply to: Tom Lane (#17)
1 attachment(s)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Maybe this helps:

"It is not an error to set a file pointer to a position beyond the end
of the file. The size of the file does not increase until you call the
SetEndOfFile, WriteFile, or WriteFileEx function. A write operation
increases the size of the file to the file pointer position plus the
size of the buffer written, which results in the intervening bytes
uninitialized."

http://msdn2.microsoft.com/en-us/library/aa365541(VS.85).aspx

According to Windows' lseek implementation (attached) SetEndOfFile()
isn't called for this case.

Thanks,
Sergey Zubkovsky

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, March 27, 2008 7:14 AM
To: Andrew Dunstan
Cc: Alvaro Herrera; Gregory Stark; Zubkovsky, Sergey;
pgsql-hackers@postgresql.org; Magnus Hagander
Subject: Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

Andrew Dunstan <andrew@dunslane.net> writes:

I suspect that the size reported by stat() is a little delayed here,

but

the file system is keeping proper track of it, so the lseek that tries

to extend the file fails at the right spot.

Hmm. If it really works that way, one would hope Microsoft would've
documented that someplace. Can anyone find a statement that Windows'
stat() is not current?

regards, tom lane

Attachments:

lseek.capplication/octet-stream; name=lseek.cDownload
/***
*lseek.c - change file position
*
*       Copyright (c) Microsoft Corporation. All rights reserved.
*
*Purpose:
*       defines _lseek() - move the file pointer
*
*******************************************************************************/

#include <cruntime.h>
#include <oscalls.h>
#include <mtdll.h>
#include <io.h>
#include <internal.h>
#include <stdlib.h>
#include <errno.h>
#include <msdos.h>
#include <stdio.h>

/***
*long _lseek(fh,pos,mthd) - move the file pointer
*
*Purpose:
*       Moves the file pointer associated with fh to a new position.
*       The new position is pos bytes (pos may be negative) away
*       from the origin specified by mthd.
*
*       If mthd == SEEK_SET, the origin in the beginning of file
*       If mthd == SEEK_CUR, the origin is the current file pointer position
*       If mthd == SEEK_END, the origin is the end of the file
*
*       Multi-thread:
*       _lseek()    = locks/unlocks the file
*       _lseek_nolock() = does NOT lock/unlock the file (it is assumed that
*                     the caller has the aquired the file lock,if needed).
*
*Entry:
*       int fh - file handle to move file pointer on
*       long pos - position to move to, relative to origin
*       int mthd - specifies the origin pos is relative to (see above)
*
*Exit:
*       returns the offset, in bytes, of the new position from the beginning
*       of the file.
*       returns -1L (and sets errno) if fails.
*       Note that seeking beyond the end of the file is not an error.
*       (although seeking before the beginning is.)
*
*Exceptions:
*
*******************************************************************************/


/* define locking/validating lseek */
long __cdecl _lseek (
        int fh,
        long pos,
        int mthd
        )
{
        int r;

        /* validate fh */
        _CHECK_FH_CLEAR_OSSERR_RETURN( fh, EBADF, -1 );
        _VALIDATE_CLEAR_OSSERR_RETURN((fh >= 0 && (unsigned)fh < (unsigned)_nhandle), EBADF, -1);
        _VALIDATE_CLEAR_OSSERR_RETURN((_osfile(fh) & FOPEN), EBADF, -1);

        _lock_fh(fh);                   /* lock file handle */

        __try {
                if ( _osfile(fh) & FOPEN )
                        r = _lseek_nolock(fh, pos, mthd);   /* seek */
                else {
                        errno = EBADF;
                        _doserrno = 0;
                        r = -1;
                        _ASSERTE(("Invalid file descriptor. File possibly closed by a different thread",0));
                }
        }
        __finally {
                _unlock_fh(fh);         /* unlock file handle */
        }

        return r;
}

/* define core _lseek -- doesn't lock or validate fh */
long __cdecl _lseek_nolock (
        int fh,
        long pos,
        int mthd
        )
{
        ULONG newpos;                   /* new file position */
        ULONG dosretval;                /* o.s. return value */
        HANDLE osHandle;        /* o.s. handle value */


        /* tell o.s. to seek */

#if SEEK_SET != FILE_BEGIN || SEEK_CUR != FILE_CURRENT || SEEK_END != FILE_END
    #error Xenix and Win32 seek constants not compatible
#endif  /* SEEK_SET != FILE_BEGIN || SEEK_CUR != FILE_CURRENT || SEEK_END != FILE_END */
        if ((osHandle = (HANDLE)_get_osfhandle(fh)) == (HANDLE)-1)
        {
            errno = EBADF;
            _ASSERTE(("Invalid file descriptor",0));
            return -1;
        }

        if ((newpos = SetFilePointer(osHandle, pos, NULL, mthd)) == -1)
                dosretval = GetLastError();
        else
                dosretval = 0;

        if (dosretval) {
                /* o.s. error */
                _dosmaperr(dosretval);
                return -1;
        }

        _osfile(fh) &= ~FEOFLAG;        /* clear the ctrl-z flag on the file */
        return newpos;                  /* return */
}
#21Andrew Dunstan
andrew@dunslane.net
In reply to: Zubkovsky, Sergey (#20)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Zubkovsky, Sergey wrote:

Maybe this helps:

"It is not an error to set a file pointer to a position beyond the end
of the file. The size of the file does not increase until you call the
SetEndOfFile, WriteFile, or WriteFileEx function. A write operation
increases the size of the file to the file pointer position plus the
size of the buffer written, which results in the intervening bytes
uninitialized."

http://msdn2.microsoft.com/en-us/library/aa365541(VS.85).aspx

According to Windows' lseek implementation (attached) SetEndOfFile()
isn't called for this case.

Yes, but we immediately follow the lseek bye a write(). See
src/backend/storage/smgr/md.c:mdextend() .

cheers

andrew

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#19)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Alvaro Herrera <alvherre@commandprompt.com> writes:

Andrew Dunstan wrote:

I'm working on this (thank goodness for junctions). Maybe we shopuld
look at providing a config setting for pg_xlog.

I hope you mean an initdb switch -- otherwise it is way too easy to
misuse.

There's one already ..

regards, tom lane

#23Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#22)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Tom Lane wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Andrew Dunstan wrote:

I'm working on this (thank goodness for junctions). Maybe we shopuld
look at providing a config setting for pg_xlog.

I hope you mean an initdb switch -- otherwise it is way too easy to
misuse.

There's one already ..

heh, the things that creep up on you when you're not looking ...

cheers

andrew

#24Zubkovsky, Sergey
Sergey.Zubkovsky@transas.com
In reply to: Andrew Dunstan (#21)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

It seems I've found the cause and the workaround of the problem.
MSVC's stat() is implemented by using FindNextFile().
MSDN contains the following suspicious paragraph аbout FindNextFile():

"In rare cases, file attribute information on NTFS file systems may not be current at the time you call this function. To obtain the current NTFS file system file attributes, call GetFileInformationByHandle."

Since we generally cannot open an examined file, we need another way.

In the prepared custom build of PG 8.3.1 the native MSVC's stat() was rewrote by adding GetFileAttributesEx() to correct stat's st_size value.
I had seen that a result of MSVC's stat() and a result of GetFileAttributesEx() may be differ by the file size values at least.

The most important thing is the test in the original post
( http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php )
doesn't reproduce any inconsistence now.
All work fine.

This was tested on my WinXP SP2 platform but I suppose it will work on any NT-based OS.

Thanks,
Sergey Zubkovsky

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: Thursday, March 27, 2008 3:54 PM
To: Zubkovsky, Sergey
Cc: Tom Lane; Alvaro Herrera; Gregory Stark; pgsql-hackers@postgresql.org; Magnus Hagander
Subject: Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

Zubkovsky, Sergey wrote:

Maybe this helps:

"It is not an error to set a file pointer to a position beyond the end
of the file. The size of the file does not increase until you call the
SetEndOfFile, WriteFile, or WriteFileEx function. A write operation
increases the size of the file to the file pointer position plus the
size of the buffer written, which results in the intervening bytes
uninitialized."

http://msdn2.microsoft.com/en-us/library/aa365541(VS.85).aspx

According to Windows' lseek implementation (attached) SetEndOfFile()
isn't called for this case.

Yes, but we immediately follow the lseek bye a write(). See
src/backend/storage/smgr/md.c:mdextend() .

cheers

andrew

#25Zubkovsky, Sergey
Sergey.Zubkovsky@transas.com
In reply to: Zubkovsky, Sergey (#24)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

We are not alone ;-)

http://sourceforge.net/project/shownotes.php?group_id=129038&amp;release_id=557649

Version 1.02
2007-01-25
* Fix the stat function (on Windows) to use GetFileAttributesEx insead of FindFirstFile

-----Original Message-----
From: Zubkovsky, Sergey
Sent: Friday, March 28, 2008 6:43 PM
To: Andrew Dunstan
Cc: Tom Lane; Alvaro Herrera; Gregory Stark; pgsql-hackers@postgresql.org; Magnus Hagander
Subject: RE: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

It seems I've found the cause and the workaround of the problem.
MSVC's stat() is implemented by using FindNextFile().
MSDN contains the following suspicious paragraph аbout FindNextFile():

"In rare cases, file attribute information on NTFS file systems may not be current at the time you call this function. To obtain the current NTFS file system file attributes, call GetFileInformationByHandle."

Since we generally cannot open an examined file, we need another way.

In the prepared custom build of PG 8.3.1 the native MSVC's stat() was rewrote by adding GetFileAttributesEx() to correct stat's st_size value.
I had seen that a result of MSVC's stat() and a result of GetFileAttributesEx() may be differ by the file size values at least.

The most important thing is the test in the original post
( http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php )
doesn't reproduce any inconsistence now.
All work fine.

This was tested on my WinXP SP2 platform but I suppose it will work on any NT-based OS.

Thanks,
Sergey Zubkovsky

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: Thursday, March 27, 2008 3:54 PM
To: Zubkovsky, Sergey
Cc: Tom Lane; Alvaro Herrera; Gregory Stark; pgsql-hackers@postgresql.org; Magnus Hagander
Subject: Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

Zubkovsky, Sergey wrote:

Maybe this helps:

"It is not an error to set a file pointer to a position beyond the end
of the file. The size of the file does not increase until you call the
SetEndOfFile, WriteFile, or WriteFileEx function. A write operation
increases the size of the file to the file pointer position plus the
size of the buffer written, which results in the intervening bytes
uninitialized."

http://msdn2.microsoft.com/en-us/library/aa365541(VS.85).aspx

According to Windows' lseek implementation (attached) SetEndOfFile()
isn't called for this case.

Yes, but we immediately follow the lseek bye a write(). See
src/backend/storage/smgr/md.c:mdextend() .

cheers

andrew

#26Andrew Dunstan
andrew@dunslane.net
In reply to: Zubkovsky, Sergey (#25)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Zubkovsky, Sergey wrote:

In the prepared custom build of PG 8.3.1 the native MSVC's stat() was rewrote by adding GetFileAttributesEx() to correct stat's st_size value.
I had seen that a result of MSVC's stat() and a result of GetFileAttributesEx() may be differ by the file size values at least.

The most important thing is the test in the original post
( http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php )
doesn't reproduce any inconsistence now.
All work fine.

This was tested on my WinXP SP2 platform but I suppose it will work on any NT-based OS.

If you have a patch, please send it to the -patches list.

cheers

andrew

#27Zubkovsky, Sergey
Sergey.Zubkovsky@transas.com
In reply to: Andrew Dunstan (#26)
3 attachment(s)
Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

Hi,

Here is the patch.
Check it, please.

Locations of the added files:
src\include\port\win32_msvc\sys\stat.inl
src\port\stat_pg_fixed.c

Thank you.

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: Saturday, March 29, 2008 4:18 PM
To: Zubkovsky, Sergey
Cc: Tom Lane; Alvaro Herrera; Gregory Stark;
pgsql-hackers@postgresql.org; Magnus Hagander
Subject: Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

Zubkovsky, Sergey wrote:

In the prepared custom build of PG 8.3.1 the native MSVC's stat() was

rewrote by adding GetFileAttributesEx() to correct stat's st_size value.

I had seen that a result of MSVC's stat() and a result of

GetFileAttributesEx() may be differ by the file size values at least.

The most important thing is the test in the original post
( http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php )
doesn't reproduce any inconsistence now.
All work fine.

This was tested on my WinXP SP2 platform but I suppose it will work on

any NT-based OS.

If you have a patch, please send it to the -patches list.

cheers

andrew

Attachments:

Mkvcbuild.pm.patchapplication/octet-stream; name=Mkvcbuild.pm.patchDownload
Index: tools/msvc/Mkvcbuild.pm
===================================================================
RCS file: /home/zsm/pgcvsrep/pgsql/src/tools/msvc/Mkvcbuild.pm,v
retrieving revision 1.27
diff -U3 -r1.27 Mkvcbuild.pm
--- tools/msvc/Mkvcbuild.pm	16 Apr 2008 14:19:56 -0000	1.27
+++ tools/msvc/Mkvcbuild.pm	28 Apr 2008 22:46:56 -0000
@@ -47,7 +47,7 @@
       unsetenv.c getaddrinfo.c gettimeofday.c kill.c open.c rand.c
       snprintf.c strlcat.c strlcpy.c copydir.c dirmod.c exec.c noblock.c path.c pipe.c
       pgsleep.c pgstrcasecmp.c qsort.c qsort_arg.c sprompt.c thread.c
-      getopt.c getopt_long.c dirent.c rint.c win32error.c);
+      getopt.c getopt_long.c dirent.c rint.c win32error.c stat_pg_fixed.c);
 
     $libpgport = $solution->AddProject('libpgport','lib','misc');
     $libpgport->AddDefine('FRONTEND');
stat_pg_fixed.capplication/octet-stream; name=stat_pg_fixed.cDownload
stat.inlapplication/octet-stream; name=stat.inlDownload
#28Magnus Hagander
magnus@hagander.net
In reply to: Zubkovsky, Sergey (#27)
Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

I already applied a different patch for this quite some time ago. So
what's in HEAD (and current in 8.3 and 8.2) should be fixed already.

//Magnus

Zubkovsky, Sergey wrote:

Show quoted text

Hi,

Here is the patch.
Check it, please.

Locations of the added files:
src\include\port\win32_msvc\sys\stat.inl
src\port\stat_pg_fixed.c

Thank you.

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: Saturday, March 29, 2008 4:18 PM
To: Zubkovsky, Sergey
Cc: Tom Lane; Alvaro Herrera; Gregory Stark;
pgsql-hackers@postgresql.org; Magnus Hagander
Subject: Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

Zubkovsky, Sergey wrote:

In the prepared custom build of PG 8.3.1 the native MSVC's stat() was

rewrote by adding GetFileAttributesEx() to correct stat's st_size value.

I had seen that a result of MSVC's stat() and a result of

GetFileAttributesEx() may be differ by the file size values at least.

The most important thing is the test in the original post
( http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php )
doesn't reproduce any inconsistence now.
All work fine.

This was tested on my WinXP SP2 platform but I suppose it will work on

any NT-based OS.

If you have a patch, please send it to the -patches list.

cheers

andrew

#29Zubkovsky, Sergey
Sergey.Zubkovsky@transas.com
In reply to: Magnus Hagander (#28)
Re: [DOCS] pg_total_relation_size() and CHECKPOINT

Magnus,

Please, take a look at my implementation of stat().
It has at least two advantages: it's faster and doesn't have a bug with
local-to-UTC time conversion that native msvc's stat() has.
Maybe it will be useful.

Thank you.

-----Original Message-----
From: Magnus Hagander [mailto:magnus@hagander.net]
Sent: Tuesday, April 29, 2008 4:54 PM
To: Zubkovsky, Sergey
Cc: Andrew Dunstan; Tom Lane; Alvaro Herrera; Gregory Stark;
pgsql-patches@postgresql.org
Subject: Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

I already applied a different patch for this quite some time ago. So
what's in HEAD (and current in 8.3 and 8.2) should be fixed already.

//Magnus

Zubkovsky, Sergey wrote:

Hi,

Here is the patch.
Check it, please.

Locations of the added files:
src\include\port\win32_msvc\sys\stat.inl
src\port\stat_pg_fixed.c

Thank you.

-----Original Message-----
From: Andrew Dunstan [mailto:andrew@dunslane.net]
Sent: Saturday, March 29, 2008 4:18 PM
To: Zubkovsky, Sergey
Cc: Tom Lane; Alvaro Herrera; Gregory Stark;
pgsql-hackers@postgresql.org; Magnus Hagander
Subject: Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

Zubkovsky, Sergey wrote:

In the prepared custom build of PG 8.3.1 the native MSVC's stat() was

rewrote by adding GetFileAttributesEx() to correct stat's st_size

value.

I had seen that a result of MSVC's stat() and a result of

GetFileAttributesEx() may be differ by the file size values at least.

The most important thing is the test in the original post
( http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php )
doesn't reproduce any inconsistence now.
All work fine.

This was tested on my WinXP SP2 platform but I suppose it will work

on

Show quoted text

any NT-based OS.

If you have a patch, please send it to the -patches list.

cheers

andrew