bug on log generation ?
I have to fix log files because its content is not properly formatted, I´m
using version 14.4 but that happened when I was using version 11 too. It
happens only when that statement is huge, or because it is a long sequence
of updates in a WITH or DO statements, or because i´m updating a bytea
field. This kind of statement occurs on log dozens of times a day, but just
2 or 3 are wrongly formatted.
one example is a WITH statement with dozens of inserts and updates on it
2022-08-06 10:05:10.822
-03,"user_0566","db",1956432,"10.158.0.17:56098",62ee66ee.1dda50,1,"SELECT",2022-08-06
10:04:46 -03,266/383801,229907216,LOG,00000,"duration: 1735.107 ms execute
PRSTMTST155939968154/PORTALST155939968154: /*Cobranca.ImportaCobranca*/
WITH Receber1 AS (UPDATE fin_Re>
NNLiquidado1 AS (UPDATE fin_ReceberNossoNumero SET Status = 7 WHERE
NossoNumero = any($${90062164}$$)),
--statement continues, with some more dozens of update/inserts
NNDesconsiderado48 AS (UPDATE fin_recebernossonumero SET status = 9 WHERE
receber_id = 104201 AND status = 1 AND nossonumero <> 90086321),
NNExcluir48 AS (UPDATE fin_recebernossonumero SET status = 4 WHERE
receber_id = 104201 AND status = any($IN${2,3}$IN$) AND nossonumero <>
90086321 RETURNING recebernossonumero_id),
Baixa48 AS (INSERT INTO fin_ReceberBaixa(Historico, Desconto, Valor,
Lancamento) VALUES ($vs$Paga2022-08-06 10:07:07.505
-03,"user_0566","db",1956432,"10.158.0.17:56098",62ee66ee.1dda50,2,"idle",2022-08-06
10:04:46 -03,266/0,0,FATAL,57P01,"terminating connection due to
administrator command",,,,,,,,,"2022062701adriana.aguiar","client
backend",,0
2022-08-06 10:07:07.507
-03,"user_0328","db",1957035,"10.158.0.17:57194",62ee6730.1ddcab,1,"idle",2022-08-06
10:05:52 -03,410/0,0,FATAL,57P01,"terminating connection due to
administrator command",,,,,,,,,"2022062701tmk06.madureira","client
backend",,0
if you search for "$vs$Paga2022-08-06 10:07:07.505" you´ll see that
"$vs$Paga" is still part of first statement but "2022-08-06 10:07:07.505"
is the starting of next statement, but there are some missing chars of
previous statement.
another example is just one update with a large bytea field on it
2022-08-06 15:57:46.955
-03,"user_0591","db",2103042,"10.158.0.17:43662",62eeb9aa.201702,1,"INSERT",2022-08-06
15:57:46 -03,49/639939,230013197,LOG,00000,"duration: 11.012 ms execute
PRSTMTST1612483842/PORTALST1612483842: WITH upsert AS (
UPDATE sys_var SET varBlob = $1 WHERE name = $2 RETURNING *) INSERT
INTO sys_var (Name, varBlob) SELECT $3 , $4 WHERE NOT EXISTS (SELECT *
FROM upsert)","parameters: $1 =
'\x3c3f786d6 --field content continues
$2 = '/users/39818/XMLConfig', $3 = '/users/39818/XMLConfig', $4 =
'\x3c3f786d6 --field content continues
e4445583e2d313c2f47524f555045445f494e4445583e32022-08-06 15:58:42.436
-03,"user_0591","db",2103042,"10.158.0.17:43662",62eeb9aa.201702,2,"idle",2022-08-06
15:57:46 -03,49/0,0,FATAL,57P01,"terminating connection due to
administrator command",,,,,,,,,"","client backend",,-4199849316459484872
2022-08-06 15:58:42.436
-03,"user_0143","db",2103112,"10.158.0.17:43794",62eeb9bf.201748,1,"idle",2022-08-06
15:58:07 -03,44/0,0,FATAL,57P01,"terminating connection due to
administrator command",,,,,,,,,"2022062701joyceb.l@hotmail.com","client
backend",,0
Here "4445583e32022-08-06 15:58:42.436", where bytea content "4445583e" was
being displayed and the next statement started with "32022-08-06
15:58:42.436".
Obviously because that the previous line is not finished correctly and I
cannot import log files properly, so I have to edit those log files to
properly import them.
thanks
Marcos
Marcos Pegoraro <marcos@f10.com.br> writes:
I have to fix log files because its content is not properly formatted,
What mechanism are you using to store the log? If syslog is involved,
it's reputed to drop data under load.
regards, tom lane
it´s csvlog only
Atenciosamente,
Em dom., 7 de ago. de 2022 às 11:12, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
Show quoted text
Marcos Pegoraro <marcos@f10.com.br> writes:
I have to fix log files because its content is not properly formatted,
What mechanism are you using to store the log? If syslog is involved,
it's reputed to drop data under load.regards, tom lane
Hi,
On 2022-08-07 11:56:44 -0300, Marcos Pegoraro wrote:
it�s csvlog only
How are you running postgres? If the logger process runs into trouble it might
write to stderr.
Is there a chance your huge statements would make you run out of space?
Greetings,
Andres Freund
How are you running postgres? If the logger process runs into trouble it
might
write to stderr.Is there a chance your huge statements would make you run out of space?
Well, I don't think it is a out of space problem, because it
doesn´t stop logging, it just splits that message. As you can see, the next
message is logged properly. And that statement is not so huge, these
statements have not more than 10 or 20kb. And as I said these statements
occur dozens of times a day, but only once or twice is not correctly logged
An additional info, that splitted message has an out of order log time. At
that time the log file was having 2 or 3 logs per second, and that message was
1 or 2 minutes later. It seems like it occurs now but it's stored a minute
or two later.
thanks
Marcos
On 2022-08-08 Mo 07:34, Marcos Pegoraro wrote:
How are you running postgres? If the logger process runs into
trouble it might
write to stderr.Is there a chance your huge statements would make you run out of
space?Well, I don't think it is a out of space problem, because it
doesn´t stop logging, it just splits that message. As you can see, the
next message is logged properly. And that statement is not so huge,
these statements have not more than 10 or 20kb. And as I said these
statements occur dozens of times a day, but only once or twice is
not correctly logged
An additional info, that splitted message has an out of order log
time. At that time the log file was having 2 or 3 logs per second, and
that message was 1 or 2 minutes later. It seems like it occurs now but
it's stored a minute or two later.
It looks like a failure of the log chunking protocol, with long messages
being improperly interleaved. I don't think we've had reports of such a
failure since commit c17e863bc7 back in 2012, but maybe my memory is
failing.
What platform is this on? Is it possible that on some platform the chunk
size we're using is not doing an atomic write?
syslogger.h says:
#ifdef PIPE_BUF
/* Are there any systems with PIPE_BUF > 64K? Unlikely, but ... */
#if PIPE_BUF > 65536
#define PIPE_CHUNK_SIZE 65536
#else
#define PIPE_CHUNK_SIZE ((int) PIPE_BUF)
#endif
#else /* not defined */
/* POSIX says the value of PIPE_BUF must be at least 512, so use that */
#define PIPE_CHUNK_SIZE 512
#endif
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes:
What platform is this on? Is it possible that on some platform the chunk
size we're using is not doing an atomic write?
Another idea is that some of the write() calls are failing --- elog.c
doesn't check for that. Eyeing the POSIX spec for write(), I wonder
if somehow the pipe has gotten set into O_NONBLOCK mode and we're
not retrying EAGAIN failures.
regards, tom lane
What platform is this on? Is it possible that on some platform the chunk
size we're using is not doing an atomic write?
Until last year we were Ubuntu 16.04 and Postgres 11 with the latest minor
update.
This January we changed to Ubuntu 20.04 and Postgres 14, now updated to
14.4.
But the problem occured on both old and new SO and Postgres versions.
Right now I opened the current log file and there are 20 or 30 of these
statements and all of them are fine, maybe tomorrow the problem comes back,
maybe this afternoon.
thanks
Marcos
On 2022-08-08 Mo 11:07, Marcos Pegoraro wrote:
What platform is this on? Is it possible that on some platform the
chunk
size we're using is not doing an atomic write?
Until last year we were Ubuntu 16.04 and Postgres 11 with the latest
minor update.
This January we changed to Ubuntu 20.04 and Postgres 14, now updated
to 14.4.But the problem occured on both old and new SO and Postgres versions.
Right now I opened the current log file and there are 20 or 30 of
these statements and all of them are fine, maybe tomorrow the problem
comes back, maybe this afternoon.
OK, we really need a repeatable test if possible. Perhaps a pgbench run
with lots of concurrent runs of a some very long query would do the trick.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Hi,
On 2022-08-08 10:32:22 -0400, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
What platform is this on? Is it possible that on some platform the chunk
size we're using is not doing an atomic write?Another idea is that some of the write() calls are failing --- elog.c
doesn't check for that.
I was suspicious of those as well. It might be a good idea to at least write
such failures to stderr, otherwise it's just about impossible to debug. Not
that stderr will always point anywhere useful...
I can imagine that a system under heavy memory pressure might fail writing, if
there's a lot of writes in a row or such.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2022-08-08 10:32:22 -0400, Tom Lane wrote:
Another idea is that some of the write() calls are failing --- elog.c
doesn't check for that.
I was suspicious of those as well. It might be a good idea to at least write
such failures to stderr, otherwise it's just about impossible to debug. Not
that stderr will always point anywhere useful...
Uh ... what we are talking about is a failure to write to stderr.
It's not likely that adding more output will help.
Having said that, looping on EAGAIN seems like a reasonably harmless
change. Whether it will help here is really hard to say, though.
regards, tom lane
Hi,
On 2022-08-08 12:19:22 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2022-08-08 10:32:22 -0400, Tom Lane wrote:
Another idea is that some of the write() calls are failing --- elog.c
doesn't check for that.I was suspicious of those as well. It might be a good idea to at least write
such failures to stderr, otherwise it's just about impossible to debug. Not
that stderr will always point anywhere useful...Uh ... what we are talking about is a failure to write to stderr.
It's not likely that adding more output will help.
I forgot that we don't preserve the original stderr in some other fd, likely
because the logger itself still has it open and can use write_stderr().
Greetings,
Andres Freund
OK, we really need a repeatable test if possible. Perhaps a pgbench run
with lots of concurrent runs of a some very long query would do the trick.
OK, I can do it but ... strangely that error usually occurs at random
times, sometimes at 08:00, sometimes at 19:00, and it's busier between
10:00 and 16:00. If I cron some of those queries to run every second is
enough ? What exactly do you expect to see on log files ?