Re: [ADMIN] ERROR: could not read block
[Apologies for the delayed response; fighting through a backlog.]
I checked with out DBAs, and they are willing to test it.
By they way, they found that they were getting this on a pg_dump,
too. We will test both failure cases. If the test goes OK, we would
be happy to leave it in production with this patch.
-Kevin
"Qingqing Zhou" <zhouqq@cs.toronto.edu> >>>
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
Would a simple retry loop actually help? It's not clear to me how
persistent such a failure would be.
[with reply to all followup threads] Yeah, this is the key and we
definitely
have no 100% guarantee that several retries will solve the problem -
just as
the situation in pg_unlink/pg_rename. But shall we do something now? If
Kevin could help on testing(you may have to revert the registry changes
:-()
, I would like to send a patch in the retry style.
Regards,
Qingqing
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
On Wed, 30 Nov 2005, Kevin Grittner wrote:
I checked with out DBAs, and they are willing to test it.
Thanks, that's very nice!
By they way, they found that they were getting this on a pg_dump,
too. We will test both failure cases. If the test goes OK, we would
be happy to leave it in production with this patch.
I can believe that pg_dump faces the similar situtation, i.e., running out
of kernel buffers. But seems pg_dump supports "-Z 0..9" option which uses
some external I/O functions from zlib. This part may be not easy to
"retry".
Magnus, do you want to work on this? If not, I will give it a try.
Regards,
Qingqing
Due to its size, in the Windows environment we can't dump this
database in any format except plain text, so the zlib issues don't
apply here.
-Kevin
Qingqing Zhou <zhouqq@cs.toronto.edu> >>>
By they way, they found that they were getting this on a pg_dump,
too. We will test both failure cases. If the test goes OK, we would
be happy to leave it in production with this patch.
I can believe that pg_dump faces the similar situtation, i.e., running
out
of kernel buffers. But seems pg_dump supports "-Z 0..9" option which
uses
some external I/O functions from zlib. This part may be not easy to
"retry".
Import Notes
Resolved by subject fallback
I come up with a patch to fix server-side problem. The basic idea is to
convert ERROR_NO_SYSTEM_RESOURCES to EINTR and add code to do retry unless
a new error encountered or successfully done. I tweak the FileRead() logic
on "returnCode <= 0" a little bit by separating it to "<0" and "==0"
parts. This is because if our read passed EOF, read() will not set errno
which may cause a dead loop if a previous read() is interrupted.
For windows, I set a one second waiting time - this should be ok since the
problem is very rare. If the error is permenate, you can always SIGINT the
process since the waiting is done by pg_usleep().
Regards,
Qingqing
---
Index: src/backend/storage/file/fd.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/storage/file/fd.c,v
retrieving revision 1.122
diff -c -r1.122 fd.c
*** src/backend/storage/file/fd.c 22 Nov 2005 18:17:20 -0000 1.122
--- src/backend/storage/file/fd.c 1 Dec 2005 01:09:59 -0000
***************
*** 1009,1019 ****
if (returnCode < 0)
return returnCode;
! returnCode = read(VfdCache[file].fd, buffer, amount);
! if (returnCode > 0)
! VfdCache[file].seekPos += returnCode;
! else
! VfdCache[file].seekPos = FileUnknownPos;
return returnCode;
}
--- 1009,1052 ----
if (returnCode < 0)
return returnCode;
! for (;;)
! {
! returnCode = read(VfdCache[file].fd, buffer, amount);
!
! if (returnCode > 0)
! VfdCache[file].seekPos += returnCode;
! else if (returnCode == 0)
! VfdCache[file].seekPos = FileUnknownPos;
! else
! {
! #ifdef WIN32
! DWORD error = GetLastError();
!
! switch (error)
! {
! /*
! * Since we are using buffered IO now, so windows may run
! * out of kernel buffer and return a "Insufficient system
! * resources" error. Retry to solve it.
! */
! case ERROR_NO_SYSTEM_RESOURCES:
! pg_usleep(1000);
! errno = EINTR;
! break;
! default:
! _dosmaperr(error);
! Assert(errno != EINTR);
! }
! #endif
! /* Ok if interrupted and retry */
! if (errno == EINTR)
! continue;
!
! VfdCache[file].seekPos = FileUnknownPos;
! }
!
! break;
! }
return returnCode;
}
***************
*** 1033,1049 ****
if (returnCode < 0)
return returnCode;
! errno = 0;
! returnCode = write(VfdCache[file].fd, buffer, amount);
! /* if write didn't set errno, assume problem is no disk space */
! if (returnCode != amount && errno == 0)
! errno = ENOSPC;
! if (returnCode > 0)
! VfdCache[file].seekPos += returnCode;
! else
! VfdCache[file].seekPos = FileUnknownPos;
return returnCode;
}
--- 1066,1108 ----
if (returnCode < 0)
return returnCode;
! for (;;)
! {
! errno = 0;
! returnCode = write(VfdCache[file].fd, buffer, amount);
! /* if write didn't set errno, assume problem is no disk space */
! if (returnCode != amount && errno == 0)
! errno = ENOSPC;
! if (returnCode > 0)
! VfdCache[file].seekPos += returnCode;
! else
! {
! #ifdef WIN32
! DWORD error = GetLastError();
!
! switch (error)
! {
! /* see comments in FileRead() */
! case ERROR_NO_SYSTEM_RESOURCES:
! pg_usleep(1000);
! errno = EINTR;
! break;
! default:
! _dosmaperr(error);
! Assert(errno != EINTR);
! }
! #endif
! /* Ok if interrupted and retry */
! if (errno == EINTR)
! continue;
!
! VfdCache[file].seekPos = FileUnknownPos;
! }
!
! break;
! }
return returnCode;
}
Qingqing Zhou <zhouqq@cs.toronto.edu> writes:
! default:
! _dosmaperr(error);
! Assert(errno != EINTR);
What's the point of that ... didn't it already happen inside read()?
regards, tom lane
On Thu, 1 Dec 2005, Tom Lane wrote:
Qingqing Zhou <zhouqq@cs.toronto.edu> writes:
! default:
! _dosmaperr(error);
! Assert(errno != EINTR);What's the point of that ... didn't it already happen inside read()?
Recall that we have some reports that read() failed to convert some
windows error number to some meaningful errno. For example, the
ERROR_SHARING_VIOLATION error was converted to EINVAL. So we do it
ourselves here and we can get better diagnostic information if this error
is reported again.
Regards,
Qingqing