log : bad file dscriptor????

Started by Ati Rosseletabout 17 years ago5 messages
#1Ati Rosselet
ati.rosselet@gmail.com

I'm still getting a lot of these entries in my eventlog whenever I have a
reasonably large amount of logging:

Event Type: Error
Event Source: PostgreSQL
Event Category: None
Event ID: 0
Date: 10/22/2008
Time: 9:36:28 AM
User: N/A
Computer: ----------
Description:
could not write to log file: Bad file descriptor

production server 8.3.3 (tried with 8.3.4 - same thing), win 2003 server.
I'd register this as a formal bug, but I can't figure out how to reproduce
it.. and
apparently noone else has seen this? Or is noone else running postgres on
win2003??? (survey says??)

The timing of each appears to be when log is rolled over (size limited to
10MB) and postgres attempts to write a log entry at the same time:

Any ideas? help? How to proceed???
Cheers
Ati

My log settings from postgresql.conf:

#-----------------------------
-------------------------------------------------
# ERROR REPORTING AND LOGGING
#------------------------------------------------------------------------------
# - Where to Log -
log_destination = 'stderr' # Valid values are combinations of
# This is used when logging to stderr:
logging_collector = on # Enable capturing of stderr and csvlog

# These are only used if logging_collector is on:
#log_directory = 'pg_log' # directory where log files are written,
# can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
# can include strftime() escapes
#log_truncate_on_rotation = off # If on, an existing log file of the
# same name as the new log file will be
# truncated rather than appended to.
# But such truncation only occurs on
# time-driven rotation, not on restarts
# or size-driven rotation. Default is
# off, meaning append to existing files
# in all cases.
#log_rotation_age = 1d # Automatic rotation of logfiles will
# happen after that time. 0 to disable.
log_rotation_size = 10MB # Automatic rotation of logfiles will
# happen after that much log output.
# 0 to disable.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'

# - When to Log -

#client_min_messages = notice # values in order of decreasing detail:

#log_min_messages = notice # values in order of decreasing detail:

#log_error_verbosity = default # terse, default, or verbose messages

#log_min_error_statement = error # values in order of decreasing detail:
#log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements
# and their durations, > 0 logs only
# statements running at least this time.

#silent_mode = off # DO NOT USE without syslog or
# logging_collector
# (change requires restart)
# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = off
#log_checkpoints = off
log_connections = on
log_disconnections = on
#log_duration = off
#log_hostname = off
#log_line_prefix = '%t ' # special values:

log_line_prefix = '%t [%h:%u@%d] '
#log_lock_waits = off # log lock waits >= deadlock_timeout
log_statement = 'all' # none, ddl, mod, all
#log_temp_files = -1 # log temporary files equal or larger
# than specified size;
# -1 disables, 0 logs all temp files
#log_timezone = unknown # actually, defaults to TZ environment

#2Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Ati Rosselet (#1)
3 attachment(s)
Re: log : bad file dscriptor????
--- Sorry about the previous mail; that did not have the proper subject line
(for no  fault of mine)

Hi All,

Acting on a customer's report I analyzed this bug, and have found a fix
for it. It is not a critical error, but it definitely is a bug, and can have
security implications.

This error is raised from syslogger.c, and since this sub-process is not
responsible for any backend communication, the backend (or the non-backend)
that raised this error has already successfully done its work of
communicating the message back to the application, if any.

The security implication is that, that the actual error which was
supposed to be also logged in the server log file, is lost; and instead we
have this error message in the WindowsErrorLog/Syslog.

Issue:
====

On Windows, the write to log file is done by a thread (whose main
function is pipeThread() ), and since it works completely independent of the
SysLoggerMain() ( which is responsible for calling logfile_rotate()
periodically, which in turn changes the global variable syslogFile) this is
causing a race condition due to an error in the way we are using the related
critical section.

Here's the flow in my opinion which is causing this error:

main => SysLoggerMain() and thread => pipeThread()

main : calls logfile_rotate()
thread: calls process_pipe_input()/flush_pipe_input()
-> calls write_syslogger_file()
thread: reads in the current value of global variable
syslogFile.
main : enters critical section sysfileSection, and
assigns a new value to the global variable
syslogFile
thread: blocked by the same critical section
main : leaves critical section
thread: enters the critical section and attempts to
write to the old value of syslogFile.
OS : throws an error, as that handle is already
closed by main

The solution is to read the global variable inside the critical section, in
write_syslogger_file().

How to reproduce:
=================

Apply the syslogger_race_bug_reproduce.patch patch.

Set the following GUCs in postgresql.conf:

logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_size = 1kB
log_min_messages = notice

Restart the DB for these changes to take effect.

Create the following plpgsql function:

create or replace function raise_notices() returns void as $$
begin
loop
raise notice 'dummy';
end loop;
end;
$$ language plpgsql;

Execute: select raise_notices();

Keep an eye on Windows error log (refresh often;
screenshot<http://img5.imageshack.us/img5/9477/windowserrorlogq.jpg&gt;
).

Attachments:
=========

syslogger_race_bug_reproduce.patch :
To reproduce the bug.

syslogger_race_bug_reproduce_temp_fix.patch :
To see the effect of final patch, but with other support code.

syslogger_race_bugfix.patch :
Final patch, ready for application to HEAD.

Best regards,

On Sat, Oct 25, 2008 at 8:43 PM, Ati Rosselet <ati.rosselet@gmail.com>wrote:

I'm still getting a lot of these entries in my eventlog whenever I have a
reasonably large amount of logging:

Event Type: Error
Event Source: PostgreSQL
Event Category: None
Event ID: 0
Date: 10/22/2008
Time: 9:36:28 AM
User: N/A
Computer: ----------
Description:
could not write to log file: Bad file descriptor

production server 8.3.3 (tried with 8.3.4 - same thing), win 2003 server.
I'd register this as a formal bug, but I can't figure out how to reproduce
it.. and
apparently noone else has seen this? Or is noone else running postgres on
win2003??? (survey says??)

The timing of each appears to be when log is rolled over (size limited to
10MB) and postgres attempts to write a log entry at the same time:

Any ideas? help? How to proceed???
Cheers
Ati

My log settings from postgresql.conf:

#-----------------------------
-------------------------------------------------
# ERROR REPORTING AND LOGGING

#------------------------------------------------------------------------------
# - Where to Log -
log_destination = 'stderr' # Valid values are combinations of
# This is used when logging to stderr:
logging_collector = on # Enable capturing of stderr and csvlog

# These are only used if logging_collector is on:
#log_directory = 'pg_log' # directory where log files are written,
# can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name
pattern,
# can include strftime() escapes
#log_truncate_on_rotation = off # If on, an existing log file of the
# same name as the new log file will be
# truncated rather than appended to.
# But such truncation only occurs on
# time-driven rotation, not on restarts
# or size-driven rotation. Default is
# off, meaning append to existing files
# in all cases.
#log_rotation_age = 1d # Automatic rotation of logfiles will
# happen after that time. 0 to disable.
log_rotation_size = 10MB # Automatic rotation of logfiles will
# happen after that much log output.
# 0 to disable.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'

# - When to Log -

#client_min_messages = notice # values in order of decreasing
detail:

#log_min_messages = notice # values in order of decreasing detail:

#log_error_verbosity = default # terse, default, or verbose messages

#log_min_error_statement = error # values in order of decreasing detail:
#log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements
# and their durations, > 0 logs only
# statements running at least this time.

#silent_mode = off # DO NOT USE without syslog or
# logging_collector
# (change requires restart)
# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = off
#log_checkpoints = off
log_connections = on
log_disconnections = on
#log_duration = off
#log_hostname = off
#log_line_prefix = '%t ' # special values:

log_line_prefix = '%t [%h:%u@%d] '
#log_lock_waits = off # log lock waits >= deadlock_timeout
log_statement = 'all' # none, ddl, mod, all
#log_temp_files = -1 # log temporary files equal or larger
# than specified size;
# -1 disables, 0 logs all temp files
#log_timezone = unknown # actually, defaults to TZ environment

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device

Attachments:

syslogger_race_bug_reproduce.patchapplication/octet-stream; name=syslogger_race_bug_reproduce.patchDownload
*** a/src/backend/port/win32/security.c
--- b/src/backend/port/win32/security.c
***************
*** 123,128 **** pgwin32_is_service(void)
--- 123,130 ----
    SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY};
    UINT        x;
    
+   return 1;
+ 
    /* Only check the first time */
    if (_is_service != -1)
        return _is_service;
*** a/src/backend/postmaster/syslogger.c
--- b/src/backend/postmaster/syslogger.c
***************
*** 909,914 **** write_syslogger_file(const char *buffer, int count, int destination)
--- 909,916 ---- 
  
    logfile = destination == LOG_DESTINATION_CSVLOG ? csvlogFile : syslogFile;
        
+   pg_usleep( 1000000 );
+ 
  #ifndef WIN32
    rc = fwrite(buffer, 1, count, logfile);
  #else
syslogger_race_bug_reproduce_temp_fix.patchapplication/octet-stream; name=syslogger_race_bug_reproduce_temp_fix.patchDownload
*** a/src/backend/port/win32/security.c
--- b/src/backend/port/win32/security.c
***************
*** 123,128 **** pgwin32_is_service(void)
--- 123,130 ----
    SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY};
    UINT        x;
    
+   return 1;
+ 
    /* Only check the first time */
    if (_is_service != -1)
        return _is_service;
*** a/src/backend/postmaster/syslogger.c
--- b/src/backend/postmaster/syslogger.c
***************
*** 907,920 **** write_syslogger_file(const char *buffer, int count, int destination)
    if (destination == LOG_DESTINATION_CSVLOG && csvlogFile == NULL)
        open_csvlogfile();
    
    logfile = destination == LOG_DESTINATION_CSVLOG ? csvlogFile : syslogFile;
    
! #ifndef WIN32
!   rc = fwrite(buffer, 1, count, logfile);
! #else
!   EnterCriticalSection(&sysfileSection);
    rc = fwrite(buffer, 1, count, logfile);
!   LeaveCriticalSection(&sysfileSection);
  #endif
  
    /* can't use ereport here because of possible recursion */
--- 907,922 ----
    if (destination == LOG_DESTINATION_CSVLOG && csvlogFile == NULL)
        open_csvlogfile();
  
+ #ifdef WIN32
+         EnterCriticalSection(&sysfileSection);
+ #endif
    logfile = destination == LOG_DESTINATION_CSVLOG ? csvlogFile : syslogFile;

!   pg_usleep( 1000000 );
!
    rc = fwrite(buffer, 1, count, logfile);
! #ifdef WIN32
!         LeaveCriticalSection(&sysfileSection);
  #endif

    /* can't use ereport here because of possible recursion */

syslogger_race_bugfix.patchapplication/octet-stream; name=syslogger_race_bugfix.patchDownload
*** a/src/backend/postmaster/syslogger.c
--- b/src/backend/postmaster/syslogger.c
***************
*** 907,920 **** write_syslogger_file(const char *buffer, int count, int destination)
    if (destination == LOG_DESTINATION_CSVLOG && csvlogFile == NULL)
        open_csvlogfile();

    logfile = destination == LOG_DESTINATION_CSVLOG ? csvlogFile : syslogFile;

- #ifndef WIN32
    rc = fwrite(buffer, 1, count, logfile);
! #else
!   EnterCriticalSection(&sysfileSection);
!   rc = fwrite(buffer, 1, count, logfile);
!   LeaveCriticalSection(&sysfileSection);
  #endif

    /* can't use ereport here because of possible recursion */
--- 907,920 ----
    if (destination == LOG_DESTINATION_CSVLOG && csvlogFile == NULL)
        open_csvlogfile();

+ #ifdef WIN32
+         EnterCriticalSection(&sysfileSection);
+ #endif
    logfile = destination == LOG_DESTINATION_CSVLOG ? csvlogFile : syslogFile;

    rc = fwrite(buffer, 1, count, logfile);
! #ifdef WIN32
!         LeaveCriticalSection(&sysfileSection);
  #endif

    /* can't use ereport here because of possible recursion */
#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Gurjeet Singh (#2)
Re: log : bad file dscriptor????

Gurjeet Singh wrote:

On Windows, the write to log file is done by a thread (whose main
function is pipeThread() ), and since it works completely independent of the
SysLoggerMain() ( which is responsible for calling logfile_rotate()
periodically, which in turn changes the global variable syslogFile) this is
causing a race condition due to an error in the way we are using the related
critical section.

Thanks, committed. I hope this helps with Ati's problems.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#4Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Heikki Linnakangas (#3)
Re: log : bad file dscriptor????

On Wed, Mar 18, 2009 at 3:47 AM, Heikki Linnakangas <
heikki.linnakangas@enterprisedb.com> wrote:

Gurjeet Singh wrote:

On Windows, the write to log file is done by a thread (whose main
function is pipeThread() ), and since it works completely independent of
the
SysLoggerMain() ( which is responsible for calling logfile_rotate()
periodically, which in turn changes the global variable syslogFile) this
is
causing a race condition due to an error in the way we are using the
related
critical section.

Thanks, committed. I hope this helps with Ati's problems.

It seems this commit never made it to the release notes. A customer came
asking for the fix to this very problem, and although we know that the issue
has been fixed, we could not refer him to the release notes. All we could
suggest was to do the minor upgrade.

I could not find any more reports of this bug after the fix was committed.
So what would it take to make release notes say that it has been fixed?

Best regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.enterprisedb.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gurjeet Singh (#4)
Re: log : bad file dscriptor????

Gurjeet Singh <singh.gurjeet@gmail.com> writes:

It seems this commit never made it to the release notes. A customer came
asking for the fix to this very problem, and although we know that the issue
has been fixed, we could not refer him to the release notes. All we could
suggest was to do the minor upgrade.

We sometimes deem fixes too small/obscure to be worth documenting in the
release notes, but this one should have been documented. I'm not sure
if this was my misjudgment or Bruce's, but in any case apologies for the
omission. If you need to convince somebody that the patch was in fact
applied, you could always point them at the commit log archives:
http://archives.postgresql.org/pgsql-committers/2009-03/msg00125.php

regards, tom lane