psql \copy hanging

Started by Arnaud L.over 6 years ago19 messagesgeneral
Jump to latest
#1Arnaud L.
arnaud.listes@codata.eu

Hi list

On Windows, I have an unattended script that exports data overnight by
issuing psql commands like :
psql -f mycommands.sql
whith mycommands.sql containing a bunch of \copy lines like :
\copy (SELECT * FROM someview) TO
'\\windowsserver\windowsshare\somefile.csv'

When I run this script by hand, everything is fine but when it runs
overnight it hangs and allways on the same line.
There is no lock in the database, and the connection is marked as active
in the backend and running the statement "COPY (SELECT * FROM myview) TO
STDOUT".
So where can things be hanging ?
My first guess (appart from db lock) was that the target file was locked
by the filesystem. But wouldn't psql know about it and fail with an error ?

Thanks for your help !

--
Arnaud

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Arnaud L. (#1)
Re: psql \copy hanging

On 8/22/19 1:47 AM, Arnaud L. wrote:

Hi list

On Windows, I have an unattended script that exports data overnight by
issuing psql commands like :
psql -f mycommands.sql
whith mycommands.sql containing a bunch of \copy lines like :
\copy (SELECT * FROM someview) TO
'\\windowsserver\windowsshare\somefile.csv'

When I run this script by hand, everything is fine but when it runs
overnight it hangs and allways on the same line.
There is no lock in the database, and the connection is marked as active
in the backend and running the statement "COPY (SELECT * FROM myview) TO
STDOUT".
So where can things be hanging ?

The first thing I think of when seeing this sort of behavior on Windows
is anti-virus software. Doing you have any running on the machine?

You say it always hangs on the same line. Is that line doing something
different from the others? Say writing to a different location.

My first guess (appart from db lock) was that the target file was locked
by the filesystem. But wouldn't psql know about it and fail with an error ?

Thanks for your help !

--
Arnaud

--
Adrian Klaver
adrian.klaver@aklaver.com

#3Arnaud L.
arnaud.listes@codata.eu
In reply to: Adrian Klaver (#2)
Re: psql \copy hanging

Le 22/08/2019 à 16:00, Adrian Klaver a écrit :

The first thing I think of when seeing this sort of behavior on Windows
is anti-virus software. Doing you have any running on the machine?

You say it always hangs on the same line. Is that line doing something
different from the others? Say writing to a different location.

Thanks for your insights Adrian.

There is no AV software on the machine as it is a Server version of
Windows. (I know some people run AV on Windows Server, but I don't)

All the lines write to the same location which is a network share, but
the hanging one is clearly the longest query to execute and the output
file is one of the biggest one (though not the biggest).

Note that the share points to a local folder. The fact the psql writes
to a share is only for robustness so that this script could be executed
from any machine.

Cheers

--
Arnaud

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Arnaud L. (#3)
Re: psql \copy hanging

On 8/22/19 7:07 AM, Arnaud L. wrote:

Le 22/08/2019 à 16:00, Adrian Klaver a écrit :

The first thing I think of when seeing this sort of behavior on Windows
is anti-virus software. Doing you have any running on the machine?

You say it always hangs on the same line. Is that line doing something
different from the others? Say writing to a different location.

Thanks for your insights Adrian.

There is no AV software on the machine as it is a Server version of
Windows. (I know some people run AV on Windows Server, but I don't)

All the lines write to the same location which is a network share, but
the hanging one is clearly the longest query to execute and the output
file is one of the biggest one (though not the biggest).

The only thing I can think of at the moment is to put the offending line
in a separate script and see what happens.

Note that the share points to a local folder. The fact the psql writes
to a share is only for robustness so that this script could be executed
from any machine.

Cheers

--
Arnaud

--
Adrian Klaver
adrian.klaver@aklaver.com

#5Arnaud L.
arnaud.listes@codata.eu
In reply to: Adrian Klaver (#4)
Re: psql \copy hanging

Le 22/08/2019 à 16:21, Adrian Klaver a écrit :

The only thing I can think of at the moment is to put the offending line
in a separate script and see what happens.

Then I think I'll first try to switch the line orders. I'll execute this
action last and see if it changes anything. If it doesn't, I'll break
the script into smaller parts.

Thanks for you help !

--
Arnaud

#6Jerry Sievers
gsievers19@comcast.net
In reply to: Arnaud L. (#1)
Re: psql \copy hanging

"Arnaud L." <arnaud.listes@codata.eu> writes:

Hi list

On Windows, I have an unattended script that exports data overnight by
issuing psql commands like :
psql -f mycommands.sql
whith mycommands.sql containing a bunch of \copy lines like :
\copy (SELECT * FROM someview) TO
'\\windowsserver\windowsshare\somefile.csv'

When I run this script by hand, everything is fine but when it runs
overnight it hangs and allways on the same line.

Is the overnight run being done as the same DB user you are testing
with?

If not, then perhaps the automated user deviates in resource settings
and/or permissions, in turn resulting in sluggish performance caused by
thrashing and/or inferior query plan.

There is no lock in the database, and the connection is marked as
active in the backend and running the statement "COPY (SELECT * FROM
myview) TO STDOUT".
So where can things be hanging ?
My first guess (appart from db lock) was that the target file was
locked by the filesystem. But wouldn't psql know about it and fail
with an error ?

Thanks for your help !

--
Arnaud

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net

#7Arnaud L.
arnaud.listes@codata.eu
In reply to: Jerry Sievers (#6)
Re: psql \copy hanging

Le 22/08/2019 à 20:00, Jerry Sievers a écrit :

Is the overnight run being done as the same DB user you are testing
with?

If not, then perhaps the automated user deviates in resource settings
and/or permissions, in turn resulting in sluggish performance caused by
thrashing and/or inferior query plan.

Hi Jerry,
yes, they both run under the same user. I've checked my other scheduled
tasks and I don't see what could interfere, but then this is quite
complex so I could easily overlook something.
Unfortunately, tonight's run failed for a totally different reason...

Thanks for your help

Cheers
--
Arnaud

#8Arnaud L.
arnaud.listes@codata.eu
In reply to: Arnaud L. (#5)
Re: psql \copy hanging

Le 22/08/2019 à 16:23, Arnaud L. a écrit :

Le 22/08/2019 à 16:21, Adrian Klaver a écrit :

The only thing I can think of at the moment is to put the offending line
in a separate script and see what happens.

Then I think I'll first try to switch the line orders. I'll execute this
action last and see if it changes anything. If it doesn't, I'll break
the script into smaller parts.

It ran fine over the week-end so I'll consider this as temporary glitch.
It's always dissatisfying not to know what has happened, but I'm not
going to try too hard to reproduce this.

Thanks for helping and sorry for the noise !

Cheers
--
Arnaud

#9Arnaud L.
arnaud.listes@codata.eu
In reply to: Arnaud L. (#1)
Re: psql \copy hanging

Le 22/08/2019 à 10:47, Arnaud L. a écrit :

On Windows, I have an unattended script that exports data overnight by
issuing psql commands like :
psql -f mycommands.sql
whith mycommands.sql containing a bunch of \copy lines like :
\copy (SELECT * FROM someview) TO
'\\windowsserver\windowsshare\somefile.csv'

When I run this script by hand, everything is fine but when it runs
overnight it hangs and allways on the same line.
There is no lock in the database, and the connection is marked as active
in the backend and running the statement "COPY (SELECT * FROM myview) TO
STDOUT".
So where can things be hanging ?
My first guess (appart from db lock) was that the target file was locked
by the filesystem. But wouldn't psql know about it and fail with an error ?

OK, so it happended again... I'm really puzzled.
There is no lock on the FS that I can find (openfiles reports 0 lock on
this file).
The client psql process is sitting idle, doing nothing.
In PostgreSQL, the connection is active with the query being simple copy
(select * from view) to stdout.
In pg_stat_activity wait_event_type IS NULL, state is active,
query_start is 2019-08-27 00:18:20.471957+02 and state_change is
2019-08-27 00:18:20.47196+02...
Nothing in postgresql LOG around that time.
Later I have : 2019-08-27 01:00:04 CEST LOG: could not receive data
from client: An existing connection was forcibly closed by the remote host.
But I realy doubt this is related...

Any other idea ? I'll change the lines order for tonight's run, but that
is not what I'd call a solution...

Thanks a lot !

--
Arnaud

#10Luca Ferrari
fluca1978@gmail.com
In reply to: Arnaud L. (#9)
Re: psql \copy hanging

On Tue, Aug 27, 2019 at 9:54 AM Arnaud L. <arnaud.listes@codata.eu> wrote:

Any other idea ? I'll change the lines order for tonight's run, but that
is not what I'd call a solution...

Does it hangs against the same line content or the same line number?
Are you able to run the script automatically during working hours (to
avoid firewalling or upgrades running in parrallel to your nightly
script execution)?
Any chance something is querying the same data and a lock blocks the
transaction (pg_locks)?

Just throwing on the table some desperate ideas....

Luca

#11Arnaud L.
arnaud.listes@codata.eu
In reply to: Luca Ferrari (#10)
Re: psql \copy hanging

Le 27/08/2019 à 10:00, Luca Ferrari a écrit :

On Tue, Aug 27, 2019 at 9:54 AM Arnaud L. <arnaud.listes@codata.eu> wrote:

Any other idea ? I'll change the lines order for tonight's run, but that
is not what I'd call a solution...

Does it hangs against the same line content or the same line number?
Are you able to run the script automatically during working hours (to
avoid firewalling or upgrades running in parrallel to your nightly
script execution)?
Any chance something is querying the same data and a lock blocks the
transaction (pg_locks)?

Hi Luca.
I can run the script just fine during working hours.
I checked pg_locks, and this pid is the only process requesting locks.
It has around 100 of them since it is a view querying multiple tables,
but I see nothing blocking. All locks are granted.
I have a csv output of pg_locks so I can post this if asked.

It hangs against the same line in the sql script, all lines being "\copy
(select ....) to 'file on unc share'".
This line is simply the longest running query because the view inside
the select outputs almost 1M rows and does some subqueries.

Also when the script is hung, output has not started (file size is 0).

Just throwing on the table some desperate ideas....

Desperate ideas are very welcome !

Cheers
--
Arnaud

#12Luca Ferrari
fluca1978@gmail.com
In reply to: Arnaud L. (#11)
Re: psql \copy hanging

On Tue, Aug 27, 2019 at 10:48 AM Arnaud L. <arnaud.listes@codata.eu> wrote:

I can run the script just fine during working hours.

I meant thru your scheduler (cron or something).

It hangs against the same line in the sql script, all lines being "\copy
(select ....) to 'file on unc share'".

This is a new detail to me: what if you output to a local file and
move it after on the share?

This line is simply the longest running query because the view inside
the select outputs almost 1M rows and does some subqueries.

I still don't get: is the line content the longest or the highest
numbered in the output?

Also when the script is hung, output has not started (file size is 0).

This makes me think there is some problem with the share, not the
script. Again: test against a local file that you move (rsync?) to the
share after.

Luca

#13Arnaud L.
arnaud.listes@codata.eu
In reply to: Luca Ferrari (#12)
Re: psql \copy hanging

Le 27/08/2019 à 10:57, Luca Ferrari a écrit :

On Tue, Aug 27, 2019 at 10:48 AM Arnaud L. <arnaud.listes@codata.eu> wrote:

I can run the script just fine during working hours.

I meant thru your scheduler (cron or something).

Yes, it runs fine too. There is indeed something happening on the server
at this specific time but I don't know what.

It hangs against the same line in the sql script, all lines being "\copy
(select ....) to 'file on unc share'".

This is a new detail to me: what if you output to a local file and
move it after on the share?

I could do this but it would make the script a lot more complicated.
There are a lot of views that I \copy directly to this share, and this
is the only one that poses any problem.

I still don't get: is the line content the longest or the highest
numbered in the output?

Sorry, I don't undertand either. Your question was "Does it hangs
against the same line content or the same line number?".
I run an sql script that contains a bunch of \copy commands. It is
always the same \copy that is hanging.

Also when the script is hung, output has not started (file size is 0).

This makes me think there is some problem with the share, not the
script. Again: test against a local file that you move (rsync?) to the
share after.

Everything is windows based, so rsync is not really an option. There are
alternatives of course.
I could write the file locally, but for robustness I'm trying not to
rely on the local FS so that this script could be run from any machine
on the network without modifications.

Cheers
--
Arnaud

#14Luca Ferrari
fluca1978@gmail.com
In reply to: Arnaud L. (#13)
Re: psql \copy hanging

On Tue, Aug 27, 2019 at 12:34 PM Arnaud L. <arnaud.listes@codata.eu> wrote:

I could do this but it would make the script a lot more complicated.
There are a lot of views that I \copy directly to this share, and this
is the only one that poses any problem.

I would however give it a try. I would schedule a nightly execution
with a local path to see if that completes.

Sorry, I don't undertand either. Your question was "Does it hangs
against the same line content or the same line number?".
I run an sql script that contains a bunch of \copy commands. It is
always the same \copy that is hanging.

Now I got it: your script is hanging against a specific \copy command,
while I was thinking it was hanging once it was dumping a specific
table line from your copy.

I could write the file locally, but for robustness I'm trying not to
rely on the local FS so that this script could be run from any machine
on the network without modifications.

My personal experience with shares and network that are not so much
stable is that they can block, and that is why I'm suggesting to try
on the local filesystem to see if that is a share-related problem or a
resource problem. Also scheduling the script at a different time could
help, because it could be at that time the machine (and it could mean
the sharing machine) is busy at the point it does not respond.

Luca

#15Arnaud L.
arnaud.listes@codata.eu
In reply to: Luca Ferrari (#14)
Re: psql \copy hanging

Le 27/08/2019 à 13:13, Luca Ferrari a écrit :

My personal experience with shares and network that are not so much
stable is that they can block, and that is why I'm suggesting to try
on the local filesystem to see if that is a share-related problem or a
resource problem. Also scheduling the script at a different time could
help, because it could be at that time the machine (and it could mean
the sharing machine) is busy at the point it does not respond.

I'm quite confident that changing the schedule would help (since the
script runs fine during work hours), but the time-table is already quite
busy.
I move the offending line at the end of the script, so it will run some
minutes later, maybe this will be enough.
I'll also try to write it directly on a local folder to rule this out.

Cheers
--
Arnaud

#16Shaozhong SHI
shishaozhong@gmail.com
In reply to: Arnaud L. (#15)
Re: psql \copy hanging

Interesting topic.

Is there any easy way to debug, and get correct error messages?

Regards,

Shao

On Tue, 27 Aug 2019 at 12:17, Arnaud L. <arnaud.listes@codata.eu> wrote:

Show quoted text

Le 27/08/2019 à 13:13, Luca Ferrari a écrit :

My personal experience with shares and network that are not so much
stable is that they can block, and that is why I'm suggesting to try
on the local filesystem to see if that is a share-related problem or a
resource problem. Also scheduling the script at a different time could
help, because it could be at that time the machine (and it could mean
the sharing machine) is busy at the point it does not respond.

I'm quite confident that changing the schedule would help (since the
script runs fine during work hours), but the time-table is already quite
busy.
I move the offending line at the end of the script, so it will run some
minutes later, maybe this will be enough.
I'll also try to write it directly on a local folder to rule this out.

Cheers
--
Arnaud

#17Arnaud L.
arnaud.listes@codata.eu
In reply to: Arnaud L. (#15)
Re: psql \copy hanging

Le 27/08/2019 à 13:17, Arnaud L. a écrit :

I move the offending line at the end of the script, so it will run some
minutes later, maybe this will be enough.

OK, so this was enough for last night's schedule to run without problem.
I still don't get it so I'm not satisfied with this solution, but at
least it works.
I'll keep the list informed if something new arises.

Thanks to everyone for helping !

Cheers
--
Arnaud

#18Luca Ferrari
fluca1978@gmail.com
In reply to: Arnaud L. (#17)
Re: psql \copy hanging

On Wed, Aug 28, 2019 at 9:09 AM Arnaud L. <arnaud.listes@codata.eu> wrote:

OK, so this was enough for last night's schedule to run without problem.
I still don't get it so I'm not satisfied with this solution, but at
least it works.
I'll keep the list informed if something new arises.

I don't want to be pedantic, but I would have tried with a single
change at a time.
And my bet is: the local file would do the trick (i.e., it is a weird
share problem).

If you are going to invest some time, you could also try to write a
small file on the share just before the copy starts, so that you are
guaranteed the share is working. Something like:

echo $(date) >> $SHARE/log.txt
psql 'copy ...'
echo 'done' >> $SHARE/log.txt

Luca

#19Arnaud L.
arnaud.listes@codata.eu
In reply to: Luca Ferrari (#18)
Re: psql \copy hanging

Le 28/08/2019 à 09:43, Luca Ferrari a écrit :

I don't want to be pedantic, but I would have tried with a single
change at a time.
And my bet is: the local file would do the trick (i.e., it is a weird
share problem).

You're not don't worry. This process is quite important in our workflow
(not critical), so I can't really afford to make tests for weeks.

Moving the offending line to the end of the script is what I'd consider
"single change". To me it makes something clear : the problem occurs
only at the very specific time this command is running.

If you are going to invest some time, you could also try to write a
small file on the share just before the copy starts, so that you are
guaranteed the share is working. Something like:

echo $(date) >> $SHARE/log.txt
psql 'copy ...'
echo 'done' >> $SHARE/log.txt

Well, I do know that the share is working just before this command runs
because ALL the commands in the script write to this specific share.
The script is basically outputting many views results in a single share,
something like :
\copy (select * from view1) TO '\\myserver\myshare\file1.csv'
\copy (select * from view2) TO '\\myserver\myshare\file2.csv'
...
\copy (select * from view99) TO '\\myserver\myshare\file99.csv'
So the \copy command right before (say view15 for instance) is writing
to this same share just milliseconds before the the problematic command
(view16) tries to do the same.
Since this particular view takes some time to execute, there is some
time between the moment the file gets created on the share (which
happens immediately when the \copy command runs if I get it right), and
the moment psql receives content and starts writing to it (my tests
suggest 1min to 1min30s). Either psql doesn't receive anything
(possible, since the connection is marked as active but it does not look
as if it's doing anything at all), or there has been some timeout.
It could have been tcp keepalive, but in Windows the default is 2h I
believe and postgresql uses system default if nothing is specified in
conf (which is my case).

So with all this in mind I'd rather think I have a problem with either
psql's \copy or with my query on the server side. But I'm not rulling
anything out of course.

One other thing I could try is using COPY TO STDOUT \g. From what I
understand in the documentation this would not be 100% similar to what
\copy is doing.
Anyway, if it works with the current setup I won't dig into it much deeper.

Cheers
--
Arnaud