BUG #13143: Cannot stop and restart a streaming server with a replication slot
The following bug has been logged on the website:
Bug reference: 13143
Logged by: Patrice Drolet
Email address: pdrolet@infodata.ca
PostgreSQL version: 9.4.1
Operating system: Windows 2008r2
Description:
I have experienced it many times. The master streams to the slave for days
and no problem (using a replication slot). If I stop the master, it does not
want to restart and I have this error in the log:
2015-04-24 04:47:12 EDT LOG: le système de bases de données a été arrêté à
2015-04-24 04:44:37 EDT
2015-04-24 04:47:12 EDT PANIC: n'a pas pu synchroniser sur disque (fsync)
le fichier « pg_replslot/node_win2012sec/state » : Bad file descriptor
2015-04-24 04:47:12 EDT LOG: processus de lancement (PID 23180) quitte avec
le code de sortie 3
2015-04-24 04:47:12 EDT LOG: annulation du démarrage à cause d'un échec
dans le processus de lancement
To restart the server, I have to manually delete the folder in pg_replslot.
But then I need to re build the slave. Not very practical for a multi
gigabyte database.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
On 2015-04-24 10:10:06 +0000, pdrolet@infodata.ca wrote:
The following bug has been logged on the website:
Bug reference: 13143
Logged by: Patrice Drolet
Email address: pdrolet@infodata.ca
PostgreSQL version: 9.4.1
Operating system: Windows 2008r2
Description:I have experienced it many times. The master streams to the slave for days
and no problem (using a replication slot). If I stop the master, it does not
want to restart and I have this error in the log:2015-04-24 04:47:12 EDT LOG: le syst�me de bases de donn�es a �t� arr�t� �
2015-04-24 04:44:37 EDT
2015-04-24 04:47:12 EDT PANIC: n'a pas pu synchroniser sur disque (fsync)
le fichier � pg_replslot/node_win2012sec/state � : Bad file descriptor
2015-04-24 04:47:12 EDT LOG: processus de lancement (PID 23180) quitte avec
le code de sortie 3
2015-04-24 04:47:12 EDT LOG: annulation du d�marrage � cause d'un �chec
dans le processus de lancementTo restart the server, I have to manually delete the folder in pg_replslot.
But then I need to re build the slave. Not very practical for a multi
gigabyte database.
Obviously that's not how it supposed to be. I don't have access to a
windows systems, much less a french one unfortunately.
Could you:
1) describe your exact setup
2) Check that it's unrelated to any anti-virus software running?
3) configure 'log_error_verbosity = verbose'? Then we'll get line
numbers, which will help narrowing down what's happening.
4) You could try to debug it by installing sysinternal's sysmon and
recording what is exactly done with that file?
Regards,
Andres
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
Here is the log with verbose:
2015-04-25 14:25:59 EDT LOG: 00000: le système de bases de données a été arrêté à 2015-04-25 14:25:39 EDT
2015-04-25 14:25:59 EDT EMPLACEMENT : StartupXLOG, src\backend\access\transam\xlog.c:6011
2015-04-25 14:25:59 EDT PANIC: XX000: n'a pas pu synchroniser sur disque (fsync) le fichier « pg_replslot/node_win2008sec/state » : Bad file descriptor
2015-04-25 14:25:59 EDT EMPLACEMENT : RestoreSlotFromDisk, src\backend\replication\slot.c:1115
2015-04-25 14:25:59 EDT LOG: 00000: processus de lancement (PID 2696) a été arrêté par l'exception 0xC0000409
2015-04-25 14:25:59 EDT ASTUCE : Voir le fichier d'en-tête C « ntstatus.h » pour une description de la valeur
hexadécimale.
2015-04-25 14:25:59 EDT EMPLACEMENT : LogChildExit, src\backend\postmaster\postmaster.c:3336
2015-04-25 14:25:59 EDT LOG: 00000: annulation du démarrage à cause d'un échec dans le processus de lancement
2015-04-25 14:25:59 EDT EMPLACEMENT : reaper, src\backend\postmaster\postmaster.c:2604
As I said, this is a stream replication between 2 windows 64b using pg 9.4.1.
Here is my postgresql.conf:
—————————————————
wal_level = hot_standby
max_wal_senders = 3
checkpoint_segments = 16
wal_keep_segments = 32
#------------------------------------------------------------------------------
# FILE LOCATIONS
#------------------------------------------------------------------------------
# The default values of these variables are driven from the -D command-line
# option or PGDATA environment variable, represented here as ConfigDir.
#data_directory = 'ConfigDir' # use data in another directory
# (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
# (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
# (change requires restart)
# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = '' # write an extra PID file
# (change requires restart)
#------------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#------------------------------------------------------------------------------
# - Connection Settings -
listen_addresses = '*' # what IP address(es) to listen on;
# comma-separated list of addresses;
# defaults to 'localhost'; use '*' for all
# (change requires restart)
port = 5434 # (change requires restart)
max_connections = 100 # (change requires restart)
# Note: Increasing max_connections costs ~400 bytes of shared memory per
# connection slot, plus lock space (see max_locks_per_transaction).
#superuser_reserved_connections = 3 # (change requires restart)
#unix_socket_directories = '' # comma-separated list of directories
# (change requires restart)
#unix_socket_group = '' # (change requires restart)
#unix_socket_permissions = 0777 # begin with 0 to use octal notation
# (change requires restart)
#bonjour = off # advertise server via Bonjour
# (change requires restart)
#bonjour_name = '' # defaults to the computer name
# (change requires restart)
# - Security and Authentication -
#authentication_timeout = 1min # 1s-600s
#ssl = off # (change requires restart)
#ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL' # allowed SSL ciphers
# (change requires restart)
#ssl_prefer_server_ciphers = on # (change requires restart)
#ssl_ecdh_curve = 'prime256v1' # (change requires restart)
#ssl_renegotiation_limit = 512MB # amount of data between renegotiations
#ssl_cert_file = 'server.crt' # (change requires restart)
#ssl_key_file = 'server.key' # (change requires restart)
#ssl_ca_file = '' # (change requires restart)
#ssl_crl_file = '' # (change requires restart)
#password_encryption = on
#db_user_namespace = off
# GSSAPI using Kerberos
#krb_server_keyfile = ''
#krb_caseins_users = off
# - TCP Keepalives -
# see "man 7 tcp" for details
#tcp_keepalives_idle = 0 # TCP_KEEPIDLE, in seconds;
# 0 selects the system default
#tcp_keepalives_interval = 0 # TCP_KEEPINTVL, in seconds;
# 0 selects the system default
#tcp_keepalives_count = 0 # TCP_KEEPCNT;
# 0 selects the system default
#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------
# - Memory -
shared_buffers = 3072MB # min 128kB
# (change requires restart)
#huge_pages = try # on, off, or try
# (change requires restart)
temp_buffers = 8MB # min 800kB
#max_prepared_transactions = 0 # zero disables the feature
# (change requires restart)
# Note: Increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
work_mem = 256MB # LIDI 4MB *** min 64kB
maintenance_work_mem = 256MB # min 1MB
#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem
#max_stack_depth = 2MB # min 100kB
dynamic_shared_memory_type = windows # the default is the first option
# supported by the operating system:
# posix
# sysv
# windows
# mmap
# use none to disable dynamic shared memory
# - Disk -
#temp_file_limit = -1 # limits per-session temp file space
# in kB, or -1 for no limit
# - Kernel Resource Usage -
#max_files_per_process = 1000 # min 25
# (change requires restart)
#shared_preload_libraries = '' # (change requires restart)
# - Cost-Based Vacuum Delay -
#vacuum_cost_delay = 0 # 0-100 milliseconds
#vacuum_cost_page_hit = 1 # 0-10000 credits
#vacuum_cost_page_miss = 10 # 0-10000 credits
#vacuum_cost_page_dirty = 20 # 0-10000 credits
#vacuum_cost_limit = 200 # 1-10000 credits
# - Background Writer -
#bgwriter_delay = 200ms # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round
# - Asynchronous Behavior -
#effective_io_concurrency = 1 # 1-1000; 0 disables prefetching
#max_worker_processes = 8
#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
# - Settings -
#wal_level = minimal # minimal, archive, hot_standby, or logical
# (change requires restart)
#fsync = on # turns forced synchronization on or off
#synchronous_commit = on # synchronization level;
# off, local, remote_write, or on
#wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
# fdatasync (default on Linux)
# fsync
# fsync_writethrough
# open_sync
#full_page_writes = on # recover from partial page writes
#wal_log_hints = off # also do full page writes of non-critical updates
# (change requires restart)
#wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers
# (change requires restart)
#wal_writer_delay = 200ms # 1-10000 milliseconds
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000
# - Checkpoints -
checkpoint_segments = 90 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 5min # range 30s-1h
checkpoint_completion_target = 0.8 # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s # 0 disables
# - Archiving -
#archive_mode = off # allows archiving to be done
# (change requires restart)
#archive_command = '' # command to use to archive a logfile segment
# placeholders: %p = path of file to archive
# %f = file name only
# e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
#archive_timeout = 0 # force a logfile segment switch after this
# number of seconds; 0 disables
#------------------------------------------------------------------------------
# REPLICATION
#------------------------------------------------------------------------------
# - Sending Server(s) -
# Set these on the master and on any standby that will send replication data.
#max_wal_senders = 0 # max number of walsender processes
# (change requires restart)
#wal_keep_segments = 0 # in logfile segments, 16MB each; 0 disables
#wal_sender_timeout = 60s # in milliseconds; 0 disables
max_replication_slots = 1 # max number of replication slots
# (change requires restart)
# - Master Server -
# These settings are ignored on a standby server.
#synchronous_standby_names = '' # standby servers that provide sync rep
# comma-separated list of application_name
# from standby(s); '*' = all
#vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed
# - Standby Servers -
# These settings are ignored on a master server.
#hot_standby = off # "on" allows queries during recovery
# (change requires restart)
#max_standby_archive_delay = 30s # max delay before canceling queries
# when reading WAL from archive;
# -1 allows indefinite delay
#max_standby_streaming_delay = 30s # max delay before canceling queries
# when reading streaming WAL;
# -1 allows indefinite delay
#wal_receiver_status_interval = 10s # send replies at least this often
# 0 disables
#hot_standby_feedback = off # send info from standby to prevent
# query conflicts
#wal_receiver_timeout = 60s # time that receiver waits for
# communication from master
# in milliseconds; 0 disables
#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------
# - Planner Method Configuration -
#enable_bitmapscan = on
#enable_hashagg = on
#enable_hashjoin = on
#enable_indexscan = on
#enable_indexonlyscan = on
#enable_material = on
#enable_mergejoin = on
#enable_nestloop = on
#enable_seqscan = on
#enable_sort = on
#enable_tidscan = on
# - Planner Cost Constants -
#seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 2.0 # same scale as above
#cpu_tuple_cost = 0.01 # same scale as above
#cpu_index_tuple_cost = 0.005 # same scale as above
#cpu_operator_cost = 0.0025 # same scale as above
effective_cache_size = 6GB
# - Genetic Query Optimizer -
#geqo = on
geqo_threshold = 16
geqo_effort = 2 # range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0 # selects default based on effort
#geqo_selection_bias = 2.0 # range 1.5-2.0
#geqo_seed = 0.0 # range 0.0-1.0
# - Other Planner Options -
#default_statistics_target = 100 # range 1-10000
#constraint_exclusion = partition # on, off, or partition
#cursor_tuple_fraction = 0.1 # range 0.0-1.0
#from_collapse_limit = 8
#join_collapse_limit = 8 # 1 disables collapsing of explicit
# JOIN clauses
#------------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#------------------------------------------------------------------------------
# - Where to Log -
log_destination = 'stderr' # Valid values are combinations of
# stderr, csvlog, syslog, and eventlog,
# depending on platform. csvlog
# requires logging_collector to be on.
# This is used when logging to stderr:
logging_collector = on # Enable capturing of stderr and csvlog
# into log files. Required to be on for
# csvlogs.
# (change requires restart)
# These are only used if logging_collector is on:
#log_directory = 'pg_log' # directory where log files are written,
# can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
# can include strftime() escapes
#log_file_mode = 0600 # creation mode for log files,
# begin with 0 to use octal notation
#log_truncate_on_rotation = off # If on, an existing log file with the
# same name as the new log file will be
# truncated rather than appended to.
# But such truncation only occurs on
# time-driven rotation, not on restarts
# or size-driven rotation. Default is
# off, meaning append to existing files
# in all cases.
#log_rotation_age = 1d # Automatic rotation of logfiles will
# happen after that time. 0 disables.
#log_rotation_size = 10MB # Automatic rotation of logfiles will
# happen after that much log output.
# 0 disables.
# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
# This is only relevant when logging to eventlog (win32):
#event_source = 'PostgreSQL'
# - When to Log -
#client_min_messages = notice # values in order of decreasing detail:
# debug5
# debug4
# debug3
# debug2
# debug1
# log
# notice
# warning
# error
#log_min_messages = warning # values in order of decreasing detail:
# debug5
# debug4
# debug3
# debug2
# debug1
# info
# notice
# warning
# error
# log
# fatal
# panic
#log_min_error_statement = error # values in order of decreasing detail:
# debug5
# debug4
# debug3
# debug2
# debug1
# info
# notice
# warning
# error
# log
# fatal
# panic (effectively off)
#log_min_duration_statement = 150 # -1 is disabled, 0 logs all statements
# and their durations, > 0 logs only
# statements running at least this number
# of milliseconds
# - What to Log -
#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = on
#log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_error_verbosity = verbose # terse, default, or verbose messages
#log_hostname = off
log_line_prefix = '%t ' # special values:
# %a = application name
# %u = user name
# %d = database name
# %r = remote host and port
# %h = remote host
# %p = process ID
# %t = timestamp without milliseconds
# %m = timestamp with milliseconds
# %i = command tag
# %e = SQL state
# %c = session ID
# %l = session line number
# %s = session start timestamp
# %v = virtual transaction ID
# %x = transaction ID (0 if none)
# %q = stop here in non-session
# processes
# %% = '%'
# e.g. '<%u%%%d> '
#log_lock_waits = off # log lock waits >= deadlock_timeout
#log_statement = 'none' # none, ddl, mod, all
#log_temp_files = -1 # log temporary files equal or larger
# than the specified size in kilobytes;
# -1 disables, 0 logs all temp files
log_timezone = 'US/Eastern'
#------------------------------------------------------------------------------
# RUNTIME STATISTICS
#------------------------------------------------------------------------------
# - Query/Index Statistics Collector -
#track_activities = on
track_counts = on
#track_io_timing = off
#track_functions = none # none, pl, all
#track_activity_query_size = 1024 # (change requires restart)
#update_process_title = on
#stats_temp_directory = 'pg_stat_tmp'
# - Statistics Monitoring -
#log_parser_stats = off
#log_planner_stats = off
#log_executor_stats = off
#log_statement_stats = off
#------------------------------------------------------------------------------
# AUTOVACUUM PARAMETERS
#------------------------------------------------------------------------------
autovacuum = on # Enable autovacuum subprocess? 'on'
# requires track_counts to also be on.
#log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and
# their durations, > 0 logs only
# actions running at least this number
# of milliseconds.
#autovacuum_max_workers = 3 # max number of autovacuum subprocesses
# (change requires restart)
#autovacuum_naptime = 1min # time between autovacuum runs
#autovacuum_vacuum_threshold = 50 # min number of row updates before
# vacuum
#autovacuum_analyze_threshold = 50 # min number of row updates before
# analyze
#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum
#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze
#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum
# (change requires restart)
#autovacuum_multixact_freeze_max_age = 400000000 # maximum multixact age
# before forced vacuum
# (change requires restart)
autovacuum_vacuum_cost_delay = 50ms # default vacuum cost delay for
# autovacuum, in milliseconds;
# -1 means use vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for
# autovacuum, -1 means use
# vacuum_cost_limit
#------------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#------------------------------------------------------------------------------
# - Statement Behavior -
#search_path = '"$user",public' # schema names
#default_tablespace = '' # a tablespace name, '' uses the default
#temp_tablespaces = '' # a list of tablespace names, '' uses
# only default tablespace
#check_function_bodies = on
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = off
#default_transaction_deferrable = off
#session_replication_role = 'origin'
#statement_timeout = 0 # in milliseconds, 0 is disabled
#lock_timeout = 0 # in milliseconds, 0 is disabled
#vacuum_freeze_min_age = 50000000
#vacuum_freeze_table_age = 150000000
#vacuum_multixact_freeze_min_age = 5000000
#vacuum_multixact_freeze_table_age = 150000000
#bytea_output = 'hex' # hex, escape
#xmlbinary = 'base64'
#xmloption = 'content'
# - Locale and Formatting -
datestyle = 'iso, ymd'
#intervalstyle = 'postgres'
timezone = 'US/Eastern'
#timezone_abbreviations = 'Default' # Select the set of available time zone
# abbreviations. Currently, there are
# Default
# Australia (historical usage)
# India
# You can create your own file in
# share/timezonesets/.
#extra_float_digits = 0 # min -15, max 3
#client_encoding = sql_ascii # actually, defaults to database
# encoding
# These settings are initialized by initdb, but they can be changed.
lc_messages = 'French_Canada.1252' # locale for system error message
# strings
lc_monetary = 'French_Canada.1252' # locale for monetary formatting
lc_numeric = 'French_Canada.1252' # locale for number formatting
lc_time = 'French_Canada.1252' # locale for time formatting
# default configuration for text search
default_text_search_config = 'pg_catalog.french'
# - Other Defaults -
#dynamic_library_path = '$libdir'
#local_preload_libraries = ''
#session_preload_libraries = ''
#------------------------------------------------------------------------------
# LOCK MANAGEMENT
#------------------------------------------------------------------------------
#deadlock_timeout = 1s
#max_locks_per_transaction = 64 # min 10
# (change requires restart)
# Note: Each lock table slot uses ~270 bytes of shared memory, and there are
# max_locks_per_transaction * (max_connections + max_prepared_transactions)
# lock table slots.
#max_pred_locks_per_transaction = 64 # min 10
# (change requires restart)
#------------------------------------------------------------------------------
# VERSION/PLATFORM COMPATIBILITY
#------------------------------------------------------------------------------
# - Previous PostgreSQL Versions -
#array_nulls = on
#backslash_quote = safe_encoding # on, off, or safe_encoding
#default_with_oids = off
#escape_string_warning = on
#lo_compat_privileges = off
#quote_all_identifiers = off
#sql_inheritance = on
#standard_conforming_strings = on
#synchronize_seqscans = on
# - Other Platforms and Clients -
#transform_null_equals = off
#------------------------------------------------------------------------------
# ERROR HANDLING
#------------------------------------------------------------------------------
#exit_on_error = off # terminate session on any error?
#restart_after_crash = on # reinitialize after backend crash?
#------------------------------------------------------------------------------
# CONFIG FILE INCLUDES
#------------------------------------------------------------------------------
# These options allow settings to be loaded from files other than the
# default postgresql.conf.
#include_dir = 'conf.d' # include files ending in '.conf' from
# directory 'conf.d'
#include_if_exists = 'exists.conf' # include file only if it exists
#include = 'special.conf' # include file
#------------------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#------------------------------------------------------------------------------
# Add settings for extensions here
Le 2015-04-25 à 08:33, Andres Freund <andres@anarazel.de> a écrit :
Hi,
On 2015-04-24 10:10:06 +0000, pdrolet@infodata.ca wrote:
The following bug has been logged on the website:
Bug reference: 13143
Logged by: Patrice Drolet
Email address: pdrolet@infodata.ca
PostgreSQL version: 9.4.1
Operating system: Windows 2008r2
Description:I have experienced it many times. The master streams to the slave for days
and no problem (using a replication slot). If I stop the master, it does not
want to restart and I have this error in the log:2015-04-24 04:47:12 EDT LOG: le système de bases de données a été arrêté à
2015-04-24 04:44:37 EDT
2015-04-24 04:47:12 EDT PANIC: n'a pas pu synchroniser sur disque (fsync)
le fichier « pg_replslot/node_win2012sec/state » : Bad file descriptor
2015-04-24 04:47:12 EDT LOG: processus de lancement (PID 23180) quitte avec
le code de sortie 3
2015-04-24 04:47:12 EDT LOG: annulation du démarrage à cause d'un échec
dans le processus de lancementTo restart the server, I have to manually delete the folder in pg_replslot.
But then I need to re build the slave. Not very practical for a multi
gigabyte database.Obviously that's not how it supposed to be. I don't have access to a
windows systems, much less a french one unfortunately.Could you:
1) describe your exact setup
2) Check that it's unrelated to any anti-virus software running?
3) configure 'log_error_verbosity = verbose'? Then we'll get line
numbers, which will help narrowing down what's happening.
4) You could try to debug it by installing sysinternal's sysmon and
recording what is exactly done with that file?Regards,
Andres
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Andres Freund wrote:
On 2015-04-24 10:10:06 +0000, pdrolet@infodata.ca wrote:
2015-04-24 04:47:12 EDT LOG: le syst�me de bases de donn�es a �t� arr�t� �
2015-04-24 04:44:37 EDT
2015-04-24 04:47:12 EDT PANIC: n'a pas pu synchroniser sur disque (fsync)
le fichier � pg_replslot/node_win2012sec/state � : Bad file descriptor
2015-04-24 04:47:12 EDT LOG: processus de lancement (PID 23180) quitte avec
le code de sortie 3
2015-04-24 04:47:12 EDT LOG: annulation du d�marrage � cause d'un �chec
dans le processus de lancementTo restart the server, I have to manually delete the folder in pg_replslot.
But then I need to re build the slave. Not very practical for a multi
gigabyte database.Obviously that's not how it supposed to be. I don't have access to a
windows systems, much less a french one unfortunately.
I think this is failing in the fsync_fname() call in slot.c line 1045
(REL9_4_STABLE). Notice it's in a critical section (hence PANIC) and
isdir=false. This happens just after the rename() from tmppath to path;
maybe the file is "busy" and could not be renamed? Anyway the rename
itself didn't fail, and the file (under the new name) could be opened by
fd.c, otherwise the error would say "could not open" instead of "could
not fsync".
There are many other callers of rename() and none of them seem to have
special cases for WIN32 specifically; they all assume it works. (Some
of them are in turn special cases related to link/unlink).
The vast majority of callers of fsync_fname() are related to logical
decoding, so it seems fair game to assume that that code is missing a
trick or two.
2) Check that it's unrelated to any anti-virus software running?
It seems likely that something like this is related.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On 2015-04-27 11:44:47 -0300, Alvaro Herrera wrote:
I think this is failing in the fsync_fname() call in slot.c line 1045
(REL9_4_STABLE).
Patrice has since replied with log_error_verbosity=verbose logs, but
that reply is probably still stuck in moderation:
2015-04-25 14:25:59 EDT LOG: 00000: le syst�me de bases de donn�es a �t� arr�t� � 2015-04-25 14:25:39 EDT
2015-04-25 14:25:59 EDT EMPLACEMENT : StartupXLOG, src\backend\access\transam\xlog.c:6011
2015-04-25 14:25:59 EDT PANIC: XX000: n'a pas pu synchroniser sur disque (fsync) le fichier � pg_replslot/node_win2008sec/state � : Bad file descriptor
2015-04-25 14:25:59 EDT EMPLACEMENT : RestoreSlotFromDisk, src\backend\replication\slot.c:1115
2015-04-25 14:25:59 EDT LOG: 00000: processus de lancement (PID 2696) a �t� arr�t� par l'exception 0xC0000409
2015-04-25 14:25:59 EDT ASTUCE : Voir le fichier d'en-t�te C � ntstatus.h � pour une description de la valeur
hexad�cimale.
2015-04-25 14:25:59 EDT EMPLACEMENT : LogChildExit, src\backend\postmaster\postmaster.c:3336
2015-04-25 14:25:59 EDT LOG: 00000: annulation du d�marrage � cause d'un �chec dans le processus de lancement
2015-04-25 14:25:59 EDT EMPLACEMENT : reaper, src\backend\postmaster\postmaster.c:2604
So it looks to me like it's a straight pg_fsync() failing. Given that
the open apparently succeeded I'm unsure how that could be. The error
message appears to be a EBADFD.
Hm. I wonder if it's maybe that the file is opened with O_RDONLY? The
OSs I have access to don't care - for good reason imo, fsync isn't a
write - but it's not inconceivable that windows might. I very dimly
remember that that was a problem before at some point. Yep:
http://archives.postgresql.org/message-id/10494.1266903446%40sss.pgh.pa.us
So that's easy enough fixed.
Greetings,
Andres Freund
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Andres Freund wrote:
On 2015-04-27 11:44:47 -0300, Alvaro Herrera wrote:
I think this is failing in the fsync_fname() call in slot.c line 1045
(REL9_4_STABLE).Patrice has since replied with log_error_verbosity=verbose logs, but
that reply is probably still stuck in moderation:
Ah, sorry about that. Approved.
Hm. I wonder if it's maybe that the file is opened with O_RDONLY? The
OSs I have access to don't care - for good reason imo, fsync isn't a
write - but it's not inconceivable that windows might.
Ah, fsync_fname() explicitely defends against this.
So that's easy enough fixed.
Nice.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
On Windows, this is really easy to have this problem : create a replication slot then try to restart the server. The server will refuse to start.
I checked the file it was not read only. There is nothing I could do to the file or the directory to succeed in restarting pg.
I put fsync=off and I could start pg. but this is not a good permanent solution!
Envoyé de mon iPad
Le 2015-04-27 à 12:00, Alvaro Herrera <alvherre@2ndquadrant.com> a écrit :
Andres Freund wrote:
On 2015-04-27 11:44:47 -0300, Alvaro Herrera wrote:
I think this is failing in the fsync_fname() call in slot.c line 1045
(REL9_4_STABLE).Patrice has since replied with log_error_verbosity=verbose logs, but
that reply is probably still stuck in moderation:Ah, sorry about that. Approved.
Hm. I wonder if it's maybe that the file is opened with O_RDONLY? The
OSs I have access to don't care - for good reason imo, fsync isn't a
write - but it's not inconceivable that windows might.Ah, fsync_fname() explicitely defends against this.
So that's easy enough fixed.
Nice.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
On 2015-04-27 12:50:32 -0400, Patrice Drolet wrote:
On Windows, this is really easy to have this problem : create a replication slot then try to restart the server. The server will refuse to start.
I checked the file it was not read only. There is nothing I could do to the file or the directory to succeed in restarting pg.
I put fsync=off and I could start pg. but this is not a good permanent solution!
I've pushed a fix for this. It'll be included in the next 9.4 minor
release.
Thanks for the report!
Andres Freund
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
When will it be?
Do you want me to test it in my environnement?
Thanks,
Patrice Drolet
Le 2015-04-27 à 18:22, Andres Freund <andres@anarazel.de> a écrit :
Hi,
On 2015-04-27 12:50:32 -0400, Patrice Drolet wrote:
On Windows, this is really easy to have this problem : create a replication slot then try to restart the server. The server will refuse to start.
I checked the file it was not read only. There is nothing I could do to the file or the directory to succeed in restarting pg.
I put fsync=off and I could start pg. but this is not a good permanent solution!
I've pushed a fix for this. It'll be included in the next 9.4 minor
release.Thanks for the report!
Andres Freund
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
On 2015-04-30 09:48:52 -0400, Patrice Drolet wrote:
Le 2015-04-27 � 18:22, Andres Freund <andres@anarazel.de> a �crit :
I've pushed a fix for this. It'll be included in the next 9.4 minor
release.
When will it be?
I don't know exactly. My *guess* is that it's a couple weeks away.
Do you want me to test it in my environnement?
That would be good, but unfortunately it'd require compiling postgres
yourself. We don't have autogenerated installers except for releases
unfortunately.
Greetings,
Andres Freund
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs