Winsock error 10035 while trying to upgrade from 8.0 to 8.2

Started by Cyril VELTERalmost 19 years ago13 messagesgeneral
Jump to latest
#1Cyril VELTER
cyril.velter@metadys.com

I'm trying to upgrade a pretty big database (60G) from postgres 8.0 to
postgres 8.2 on windows 2000 Server (both version running on the same machine
on different ports). During the migration process, I always get an error at
some point (never the same) :

LOG: could not receive data from client: Unknown winsock error 10035

which is followed by

LOG: incomplete message from client
ERROR: unexpected EOF on a client connexion
FATAL: invalid frontend message type 53 psql -U postgres -p 5433

Moving the 8.2 postgres instance to a winxp pro machine, the migration is
successfull.

I've searched google but didn't find anything related to postgres.

cyril

Source database : postgres 8.0.9
Destination database : postgres 8.2.4
OS : Windows 2000 Server SP4
migration is done on linux using 8.2.4 binaries (since piping pg_dump output
on windows stop on the first ctrl-Z) with "pg_dump -h YYY XXX | psql -h YYY -p
5433 XXX"

#2Magnus Hagander
magnus@hagander.net
In reply to: Cyril VELTER (#1)
Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

Cyril VELTER wrote:

I'm trying to upgrade a pretty big database (60G) from postgres 8.0 to
postgres 8.2 on windows 2000 Server (both version running on the same machine
on different ports). During the migration process, I always get an error at
some point (never the same) :

Interesting. 10035 is "A non-blocking socket operation could not be
completed immediatly".
Question: Does this error come fromthe 8.0 or the 8.2 server?

Also, do you use SSL?

//Magnus

#3Cyril VELTER
cyril.velter@metadys.com
In reply to: Magnus Hagander (#2)
[Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

magnus@hagander.net wrote :
Cyril VELTER wrote:

I'm trying to upgrade a pretty big database (60G) from postgres 8.0 to
postgres 8.2 on windows 2000 Server (both version running on the same

machine

on different ports). During the migration process, I always get an error at

some point (never the same) :

Interesting. 10035 is "A non-blocking socket operation could not be
completed immediatly".
Question: Does this error come fromthe 8.0 or the 8.2 server?

It comes from the 8.2 server message log

Also, do you use SSL?

No I'm not. It's not even complied in the server nor in the pg_dump binary.

The server is built on windows using MSYS simply with ./configure && make all
&& make install

I've been able to reproduce the problem 6 times (at random points in the
process, but it never complete successfully). Is there any test I can do to
help investigate the problem ?

cyril

Show quoted text

//Magnus

#4Magnus Hagander
magnus@hagander.net
In reply to: Cyril VELTER (#3)
Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

Cyril VELTER wrote:

No I'm not. It's not even complied in the server nor in the pg_dump binary.

The server is built on windows using MSYS simply with ./configure && make all
&& make install

I've been able to reproduce the problem 6 times (at random points in the
process, but it never complete successfully). Is there any test I can do to
help investigate the problem ?

Sorry I haven't gotten back to you for a while.

Yeah, if you can attach a debugger to the backend (assuming you have a
predictable backend it happens to - but if you're loading, you are using
a single session, I assume?), add a breakpoint around the area of the
problem, and get a backtrace from exactly where it shows up, that would
help.

//Magnus

#5Cyril VELTER
cyril.velter@metadys.com
In reply to: Magnus Hagander (#4)
[Re] Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

Cyril VELTER wrote:

No I'm not. It's not even complied in the server nor in the pg_dump

binary.

The server is built on windows using MSYS simply with ./configure && make

all

&& make install

I've been able to reproduce the problem 6 times (at random points in the
process, but it never complete successfully). Is there any test I can do to

help investigate the problem ?

Sorry I haven't gotten back to you for a while.

Yeah, if you can attach a debugger to the backend (assuming you have a
predictable backend it happens to - but if you're loading, you are using
a single session, I assume?), add a breakpoint around the area of the
problem, and get a backtrace from exactly where it shows up, that would
help.

Thanks for your reply. I'll try to do this. I've installed gdb on the
problematic machine and recompiled postgres with debug symbols (configure
--enable-debug)

I'm not very familiar with gdb. Could you give some direction on setting the
breakpoint. After running gdb on the postgres.exe file, I'm not able to set the
breakpoint (b socket.c:574 give me an error).

Searching the source files, it seems the error message is generated in
port/win32/socket.c line 594.

Thanks,

cyril

#6Magnus Hagander
magnus@hagander.net
In reply to: Cyril VELTER (#5)
Re: [Re] Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

Cyril VELTER wrote:

Cyril VELTER wrote:

No I'm not. It's not even complied in the server nor in the pg_dump

binary.

The server is built on windows using MSYS simply with ./configure && make

all

&& make install

I've been able to reproduce the problem 6 times (at random points in the
process, but it never complete successfully). Is there any test I can do to

help investigate the problem ?

Sorry I haven't gotten back to you for a while.

Yeah, if you can attach a debugger to the backend (assuming you have a
predictable backend it happens to - but if you're loading, you are using
a single session, I assume?), add a breakpoint around the area of the
problem, and get a backtrace from exactly where it shows up, that would
help.

Thanks for your reply. I'll try to do this. I've installed gdb on the
problematic machine and recompiled postgres with debug symbols (configure
--enable-debug)

I'm not very familiar with gdb. Could you give some direction on setting the
breakpoint. After running gdb on the postgres.exe file, I'm not able to set the
breakpoint (b socket.c:574 give me an error).

Hmm, I keep forgetting that. There is some serious black magic required
to get gdb to even approach working state on win32. I'm too used to
working with the msvc build now. I've never actually got it working
myself, but I know others have. Hopefully someone can speak up here? :-)

Searching the source files, it seems the error message is generated in
port/win32/socket.c line 594.

Right, but the important thing is which path down to that function is it
generated in. Which is why a backtrace would help.

Looking at the code, the problem is probably somewhere in
pgwin32_recv(). Now, it really shouldn't end up doing what you're
seeing, but obviously it is.

Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK?
Thoughts?

//Magnus

#7Cyril VELTER
cyril.velter@metadys.com
In reply to: Magnus Hagander (#6)
[Re] Re: [Re] Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

Cyril VELTER wrote:

Cyril VELTER wrote:

No I'm not. It's not even complied in the server nor in the pg_dump

binary.

The server is built on windows using MSYS simply with ./configure &&

make

all

&& make install

I've been able to reproduce the problem 6 times (at random points in the

process, but it never complete successfully). Is there any test I can do

to

help investigate the problem ?

Sorry I haven't gotten back to you for a while.

Yeah, if you can attach a debugger to the backend (assuming you have a
predictable backend it happens to - but if you're loading, you are using
a single session, I assume?), add a breakpoint around the area of the
problem, and get a backtrace from exactly where it shows up, that would
help.

Thanks for your reply. I'll try to do this. I've installed gdb on the
problematic machine and recompiled postgres with debug symbols (configure
--enable-debug)

I'm not very familiar with gdb. Could you give some direction on setting

the

breakpoint. After running gdb on the postgres.exe file, I'm not able to set

the

breakpoint (b socket.c:574 give me an error).

Hmm, I keep forgetting that. There is some serious black magic required
to get gdb to even approach working state on win32. I'm too used to
working with the msvc build now. I've never actually got it working
myself, but I know others have. Hopefully someone can speak up here? :-)

I don't have msvc available.

Searching the source files, it seems the error message is generated in
port/win32/socket.c line 594.

Right, but the important thing is which path down to that function is it
generated in. Which is why a backtrace would help.

Yes, I understand that.

Looking at the code, the problem is probably somewhere in
pgwin32_recv(). Now, it really shouldn't end up doing what you're
seeing, but obviously it is.

After looking at the code of pgwin32_recv(), I don't understand why
pgwin32_waitforsinglesocket() is called with the FD_ACCEPT argument.

Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK?
Thoughts?

I've modified pgwin32_recv() to do that (repeat the
pgwin32_waitforsinglesocket() / WSARecv while the error is WSAEWOULDBLOCK and
not raising this error. I've an upgrade running right now (I will have the
result in the next hours).

cyril

#8Cyril VELTER
cyril.velter@metadys.com
In reply to: Magnus Hagander (#6)
Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

Cyril VELTER wrote:

Searching the source files, it seems the error message is generated in
port/win32/socket.c line 594.

Right, but the important thing is which path down to that function is it
generated in. Which is why a backtrace would help.

Yes, I understand that.

Looking at the code, the problem is probably somewhere in
pgwin32_recv(). Now, it really shouldn't end up doing what you're
seeing, but obviously it is.

After looking at the code of pgwin32_recv(), I don't understand why
pgwin32_waitforsinglesocket() is called with the FD_ACCEPT argument.

Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK?
Thoughts?

I've modified pgwin32_recv() to do that (repeat the
pgwin32_waitforsinglesocket() / WSARecv while the error is WSAEWOULDBLOCK and

not raising this error. I've an upgrade running right now (I will have the
result in the next hours).

Replying to myself, the upgrade is not finished yet, but I can confirm that
there is cases where pgwin32_waitforsinglesocket() return and the WSARecv
immediatly fail. I-ve modified the end of pgwin32_recv() :

/* No error, zero bytes (win2000+) or error+WSAEWOULDBLOCK (<=nt4) */

for(;;) {
if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT,
INFINITE) == 0)
return -1;

r = WSARecv(s, &wbuf, 1, &b, &flags, NULL, NULL);
if (r == SOCKET_ERROR)
{
printf("SOCKERROR");
if (WSAGetLastError() != WSAEWOULDBLOCK)
{
TranslateSocketError();
return -1;
}
}
else
{
return b;
}
}

The printf("SOCKERROR") line have been hit two times.

Any though ?

Once this upgrade is finished, I will make another try removing FD_ACCEPT from

the pgwin32_waitforsinglesocket() call.

cyril

#9Magnus Hagander
magnus@hagander.net
In reply to: Cyril VELTER (#8)
Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

Cyril VELTER wrote:

Cyril VELTER wrote:

Searching the source files, it seems the error message is generated in
port/win32/socket.c line 594.

Right, but the important thing is which path down to that function is it
generated in. Which is why a backtrace would help.

Yes, I understand that.

Looking at the code, the problem is probably somewhere in
pgwin32_recv(). Now, it really shouldn't end up doing what you're
seeing, but obviously it is.

After looking at the code of pgwin32_recv(), I don't understand why
pgwin32_waitforsinglesocket() is called with the FD_ACCEPT argument.

Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK?
Thoughts?

I've modified pgwin32_recv() to do that (repeat the
pgwin32_waitforsinglesocket() / WSARecv while the error is WSAEWOULDBLOCK and

not raising this error. I've an upgrade running right now (I will have the
result in the next hours).

Replying to myself, the upgrade is not finished yet, but I can confirm that
there is cases where pgwin32_waitforsinglesocket() return and the WSARecv
immediatly fail. I-ve modified the end of pgwin32_recv() :

/* No error, zero bytes (win2000+) or error+WSAEWOULDBLOCK (<=nt4) */

for(;;) {
if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT,
INFINITE) == 0)
return -1;

r = WSARecv(s, &wbuf, 1, &b, &flags, NULL, NULL);
if (r == SOCKET_ERROR)
{
printf("SOCKERROR");
if (WSAGetLastError() != WSAEWOULDBLOCK)
{
TranslateSocketError();
return -1;
}
}
else
{
return b;
}
}

The printf("SOCKERROR") line have been hit two times.

Any though ?

Once this upgrade is finished, I will make another try removing FD_ACCEPT from

the pgwin32_waitforsinglesocket() call.

Hmm. That really isn't supposed to happen, but seems it is. Does it work
when you add that loop, though? Spits out the message and works, or does
it spit out the message and still not work?

I'm also a bit worried about it getting caught in a tight loop if the
error codes are wrong, but probably it just goes back into waitfor.. and
blocks the second time. Otherwise, you'd see screenfuls of that message.

Can you determine if it was hit two times right after each other, or if
there was time between them?

//Magnus

#10Cyril VELTER
cyril.velter@metadys.com
In reply to: Magnus Hagander (#9)
[Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

Cyril VELTER wrote:

Cyril VELTER wrote:

Searching the source files, it seems the error message is generated in
port/win32/socket.c line 594.

Right, but the important thing is which path down to that function is it
generated in. Which is why a backtrace would help.

Yes, I understand that.

Looking at the code, the problem is probably somewhere in
pgwin32_recv(). Now, it really shouldn't end up doing what you're
seeing, but obviously it is.

After looking at the code of pgwin32_recv(), I don't understand why
pgwin32_waitforsinglesocket() is called with the FD_ACCEPT argument.

Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK?
Thoughts?

I've modified pgwin32_recv() to do that (repeat the
pgwin32_waitforsinglesocket() / WSARecv while the error is WSAEWOULDBLOCK

and

not raising this error. I've an upgrade running right now (I will have the

result in the next hours).

Replying to myself, the upgrade is not finished yet, but I can confirm

that

there is cases where pgwin32_waitforsinglesocket() return and the WSARecv
immediatly fail. I-ve modified the end of pgwin32_recv() :

/* No error, zero bytes (win2000+) or error+WSAEWOULDBLOCK (<=nt4) */

for(;;) {
if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT,
INFINITE) == 0)
return -1;

r = WSARecv(s, &wbuf, 1, &b, &flags, NULL, NULL);
if (r == SOCKET_ERROR)
{
printf("SOCKERROR");
if (WSAGetLastError() != WSAEWOULDBLOCK)
{
TranslateSocketError();
return -1;
}
}
else
{
return b;
}
}

The printf("SOCKERROR") line have been hit two times.

Any though ?

Once this upgrade is finished, I will make another try removing FD_ACCEPT

from

the pgwin32_waitforsinglesocket() call.

Hmm. That really isn't supposed to happen, but seems it is. Does it work
when you add that loop, though? Spits out the message and works, or does
it spit out the message and still not work?

OK, I've the results of my tests :

With the previous code, then message "SOCKERROR" is printed 5 times during the
whole process (100 Gb dump import with psql). There one group of three and one
group of two, but I don't have timestamps and am not sure if they are printing
in the same loop or not. The import is finally successful.

The second test I have done is to remove FD_ACCEPT I still have the message
one times, but it still happen. The import is also sucessfull.

I'm also a bit worried about it getting caught in a tight loop if the
error codes are wrong, but probably it just goes back into waitfor.. and
blocks the second time. Otherwise, you'd see screenfuls of that message.

Can you determine if it was hit two times right after each other, or if
there was time between them?

For the first test I don't known the amount of time between them (I have two
groups separeted in the logs with other messages).

What do you think ? may be a bug in the windows server installation I have
(this machines have not been updated for some times, perhaps I should try to do
that and see if the problem is still there. In the long run, I plan to upgrade
to windows 2003).

cyril

#11Magnus Hagander
magnus@hagander.net
In reply to: Cyril VELTER (#10)
Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

Cyril VELTER wrote:

OK, I've the results of my tests :

With the previous code, then message "SOCKERROR" is printed 5 times during the
whole process (100 Gb dump import with psql). There one group of three and one
group of two, but I don't have timestamps and am not sure if they are printing
in the same loop or not. The import is finally successful.

Ok.

The second test I have done is to remove FD_ACCEPT I still have the message
one times, but it still happen. The import is also sucessfull.

Ok. So FD_ACCEPT is not the fix. Good, I didn't think it would be.

I'm also a bit worried about it getting caught in a tight loop if the
error codes are wrong, but probably it just goes back into waitfor.. and
blocks the second time. Otherwise, you'd see screenfuls of that message.

Can you determine if it was hit two times right after each other, or if
there was time between them?

For the first test I don't known the amount of time between them (I have two
groups separeted in the logs with other messages).

Ok. I'm thinking of just sticking a minimal wait in there to protect
against absolute runaway, but that should be enough I think.

What do you think ? may be a bug in the windows server installation I have
(this machines have not been updated for some times, perhaps I should try to do
that and see if the problem is still there. In the long run, I plan to upgrade
to windows 2003).

I don't *think* it should be a bug with your version, it doesn't look
like it. but if you're not on the latest service pack, that's certainly
possible. Please update to latest servicepack + updates from Windows
Update / WSUS, and let me know if the problem persists.

Meanwhile, I'll try to cook up a patch.

//Magnus

#12Cyril VELTER
cyril.velter@metadys.com
In reply to: Magnus Hagander (#11)
[Re] Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

De : mailto:magnus@hagander.net

Cyril VELTER wrote:

OK, I've the results of my tests :

With the previous code, then message "SOCKERROR" is printed 5 times during

the

whole process (100 Gb dump import with psql). There one group of three and

one

group of two, but I don't have timestamps and am not sure if they are

printing

in the same loop or not. The import is finally successful.

Ok.

The second test I have done is to remove FD_ACCEPT I still have the

message

one times, but it still happen. The import is also sucessfull.

Ok. So FD_ACCEPT is not the fix. Good, I didn't think it would be.

I'm also a bit worried about it getting caught in a tight loop if the
error codes are wrong, but probably it just goes back into waitfor.. and
blocks the second time. Otherwise, you'd see screenfuls of that message.

Can you determine if it was hit two times right after each other, or if
there was time between them?

For the first test I don't known the amount of time between them (I have

two

groups separeted in the logs with other messages).

Ok. I'm thinking of just sticking a minimal wait in there to protect
against absolute runaway, but that should be enough I think.

What do you think ? may be a bug in the windows server installation I have

(this machines have not been updated for some times, perhaps I should try

to do

that and see if the problem is still there. In the long run, I plan to

upgrade

to windows 2003).

I don't *think* it should be a bug with your version, it doesn't look
like it. but if you're not on the latest service pack, that's certainly
possible. Please update to latest servicepack + updates from Windows
Update / WSUS, and let me know if the problem persists.

I AM on the latest service pack (on 2k it would be VERY OLD otherwise), but I
only do an update with windows update once in a year. I'll schedule an update
in the next weeks and keep you informed about the results.

Meanwhile, I'll try to cook up a patch.

thanks for your help

cyril

#13Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#11)
Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

On Tue, May 29, 2007 at 11:25:30PM +0200, Magnus Hagander wrote:

What do you think ? may be a bug in the windows server installation I have
(this machines have not been updated for some times, perhaps I should try to do
that and see if the problem is still there. In the long run, I plan to upgrade
to windows 2003).

I don't *think* it should be a bug with your version, it doesn't look
like it. but if you're not on the latest service pack, that's certainly
possible. Please update to latest servicepack + updates from Windows
Update / WSUS, and let me know if the problem persists.

Meanwhile, I'll try to cook up a patch.

I have applied a patch for this to HEAD and 8.2. It includes a small wait
so we don't hit it too hard, and a limit on 5 retries before we simply give
up - so we don't end up in an infinite loop.

//Magnus