Kludge in pg_standby.c

Started by Gregory Starkabout 18 years ago7 messages
#1Gregory Stark
stark@enterprisedb.com

There's a suspicious ifdef in pg_standby for WIN32 which smells like a kludge
added to work around a Windows problem which makes it work but at great
expense:

#ifdef WIN32
/*
* Windows reports that the file has the right number of bytes
* even though the file is still being copied and cannot be
* opened by pg_standby yet. So we wait for sleeptime secs
* before attempting to restore. If that is not enough, we
* will rely on the retry/holdoff mechanism.
*/
pg_usleep(sleeptime * 1000000L);
#endif

This happens before we return *any* WAL file to be processed. That means it
slows down the processing of any file by 1s. On a server which has fallen
behind this means it can't process files as quickly as it can copy them, it's
limited to at most 1/s.

I think it wouldn't be hard to do this properly. We can try to open the file,
handle the expected Windows error by sleeping for 1s and repeating until we
can successfully open it. Something like (untested):

bool success = false;
int fd, tries = 10;
while (--tries) {
fd = open(WALFilePath, O_RDONLY);
if (fd >= 0) {
close(fd);
success = true;
break;
} else if (errno == EWINDOWSBLOWS) {
usleep(1000000);
} else {
perror("pg_standby open:");
exit(2);
}
}
if (!success) {
fprintf(stderr, "pg_standby: couldn't open file \"%s\" due to \"%s\",
WALFilePath, strerror(EWINDOWSBLOWS));
exit(2);
}

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

#2Bruce Momjian
bruce@momjian.us
In reply to: Gregory Stark (#1)
Re: Kludge in pg_standby.c

Magnus, have you looked at this yet?

---------------------------------------------------------------------------

Gregory Stark wrote:

There's a suspicious ifdef in pg_standby for WIN32 which smells like a kludge
added to work around a Windows problem which makes it work but at great
expense:

#ifdef WIN32
/*
* Windows reports that the file has the right number of bytes
* even though the file is still being copied and cannot be
* opened by pg_standby yet. So we wait for sleeptime secs
* before attempting to restore. If that is not enough, we
* will rely on the retry/holdoff mechanism.
*/
pg_usleep(sleeptime * 1000000L);
#endif

This happens before we return *any* WAL file to be processed. That means it
slows down the processing of any file by 1s. On a server which has fallen
behind this means it can't process files as quickly as it can copy them, it's
limited to at most 1/s.

I think it wouldn't be hard to do this properly. We can try to open the file,
handle the expected Windows error by sleeping for 1s and repeating until we
can successfully open it. Something like (untested):

bool success = false;
int fd, tries = 10;
while (--tries) {
fd = open(WALFilePath, O_RDONLY);
if (fd >= 0) {
close(fd);
success = true;
break;
} else if (errno == EWINDOWSBLOWS) {
usleep(1000000);
} else {
perror("pg_standby open:");
exit(2);
}
}
if (!success) {
fprintf(stderr, "pg_standby: couldn't open file \"%s\" due to \"%s\",
WALFilePath, strerror(EWINDOWSBLOWS));
exit(2);
}

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#3Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#2)
Re: Kludge in pg_standby.c

no. I have been without a working win32 build environment for a while. Working on fixing that right now..

/Magnus

Show quoted text

------- Original Message -------
From: Bruce Momjian <bruce@momjian.us>
To: Gregory Stark <stark@enterprisedb.com>, Magnus Hagander <magnus@hagander.net>
Sent: 08-04-07, 1:54:37
Subject: Re: [HACKERS] Kludge in pg_standby.c

Magnus, have you looked at this yet?

---------------------------------------------------------------------------

Gregory Stark wrote:

There's a suspicious ifdef in pg_standby for WIN32 which smells like a kludge
added to work around a Windows problem which makes it work but at great
expense:

#ifdef WIN32
/*
* Windows reports that the file has the right number of bytes
* even though the file is still being copied and cannot be
* opened by pg_standby yet. So we wait for sleeptime secs
* before attempting to restore. If that is not enough, we
* will rely on the retry/holdoff mechanism.
*/
pg_usleep(sleeptime * 1000000L);
#endif

This happens before we return *any* WAL file to be processed. That means it
slows down the processing of any file by 1s. On a server which has fallen
behind this means it can't process files as quickly as it can copy them, it's
limited to at most 1/s.

I think it wouldn't be hard to do this properly. We can try to open the file,
handle the expected Windows error by sleeping for 1s and repeating until we
can successfully open it. Something like (untested):

bool success = false;
int fd, tries = 10;
while (--tries) {
fd = open(WALFilePath, O_RDONLY);
if (fd >= 0) {
close(fd);
success = true;
break;
} else if (errno == EWINDOWSBLOWS) {
usleep(1000000);
} else {
perror("pg_standby open:");
exit(2);
}
}
if (!success) {
fprintf(stderr, "pg_standby: couldn't open file \"%s\" due to \"%s\",
WALFilePath, strerror(EWINDOWSBLOWS));
exit(2);
}

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#4Bruce Momjian
bruce@momjian.us
In reply to: Gregory Stark (#1)
Re: Kludge in pg_standby.c

I have moved this to the next commit-fest.

---------------------------------------------------------------------------

Gregory Stark wrote:

There's a suspicious ifdef in pg_standby for WIN32 which smells like a kludge
added to work around a Windows problem which makes it work but at great
expense:

#ifdef WIN32
/*
* Windows reports that the file has the right number of bytes
* even though the file is still being copied and cannot be
* opened by pg_standby yet. So we wait for sleeptime secs
* before attempting to restore. If that is not enough, we
* will rely on the retry/holdoff mechanism.
*/
pg_usleep(sleeptime * 1000000L);
#endif

This happens before we return *any* WAL file to be processed. That means it
slows down the processing of any file by 1s. On a server which has fallen
behind this means it can't process files as quickly as it can copy them, it's
limited to at most 1/s.

I think it wouldn't be hard to do this properly. We can try to open the file,
handle the expected Windows error by sleeping for 1s and repeating until we
can successfully open it. Something like (untested):

bool success = false;
int fd, tries = 10;
while (--tries) {
fd = open(WALFilePath, O_RDONLY);
if (fd >= 0) {
close(fd);
success = true;
break;
} else if (errno == EWINDOWSBLOWS) {
usleep(1000000);
} else {
perror("pg_standby open:");
exit(2);
}
}
if (!success) {
fprintf(stderr, "pg_standby: couldn't open file \"%s\" due to \"%s\",
WALFilePath, strerror(EWINDOWSBLOWS));
exit(2);
}

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#5Alvaro Herrera
alvherre@commandprompt.com
In reply to: Bruce Momjian (#2)
Re: Kludge in pg_standby.c

Gregory Stark wrote:

There's a suspicious ifdef in pg_standby for WIN32 which smells like a kludge
added to work around a Windows problem which makes it work but at great
expense:

#ifdef WIN32
/*
* Windows reports that the file has the right number of bytes
* even though the file is still being copied and cannot be
* opened by pg_standby yet. So we wait for sleeptime secs
* before attempting to restore. If that is not enough, we
* will rely on the retry/holdoff mechanism.
*/
pg_usleep(sleeptime * 1000000L);
#endif

FWIW, it seems that this may be fixed with Magnus' patch to change
stat() on Win32. Is there anyone with a working warm standby PITR setup
on Win32 that could test it?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#6Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#5)
Re: Kludge in pg_standby.c

Alvaro Herrera wrote:

Gregory Stark wrote:

There's a suspicious ifdef in pg_standby for WIN32 which smells like a kludge
added to work around a Windows problem which makes it work but at great
expense:

#ifdef WIN32
/*
* Windows reports that the file has the right number of bytes
* even though the file is still being copied and cannot be
* opened by pg_standby yet. So we wait for sleeptime secs
* before attempting to restore. If that is not enough, we
* will rely on the retry/holdoff mechanism.
*/
pg_usleep(sleeptime * 1000000L);
#endif

FWIW, it seems that this may be fixed with Magnus' patch to change
stat() on Win32. Is there anyone with a working warm standby PITR setup
on Win32 that could test it?

Is this fixed and this block of code can be removed?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#7Simon Riggs
simon@2ndquadrant.com
In reply to: Bruce Momjian (#6)
Re: Kludge in pg_standby.c

On Mon, 2008-06-23 at 17:38 -0400, Bruce Momjian wrote:

Is this fixed and this block of code can be removed?

There'll be some action for the next CommitFest.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support