Recovery Assistance
Hi,
I have a Atlassian Confluence Wiki that depends on postgres, but I haven't
much experience with postgres other than for this purpose.
A few days ago, the hard disk filled, so all services stopped working.
When the admin realised this he increased the disk size (its in a cloud, so
that was easy to do) however I think the way it shutdown left Postgres in
an inconsistent state for some reason.
Now when I start it up I get this message in the log over and over again:
"FATAL: the database system is starting up"
I have a backup, which I have successfully recovered, but it is 24 hours
old, the next backup was the cause of the disk filling. So I'm using this
as exercise in learning a bit more about postgres.
I did some research and found a number of options. I took a file level
backup with the service not running then tried 2 things. I haven't found
much else on what to do though.
*Attempt 1 - Reset Log *
It sounded like this shouldn't be my first option (it wasn't) but it did
sound like what I needed to do.
I ran this command
./pg_resetxlog /var/lib/postgresql/9.3/main -f
It worked a treat, the database did startup ok.
However when I tried to dump the DB:
root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb >
$now-confluencedb.sql
pg_dump: Dumping the contents of table "bodycontent" failed: PQgetResult()
failed.
pg_dump: Error message from server: ERROR: unexpected chunk size 104
(expected 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
pg_dump: The command was: COPY public.bodycontent (bodycontentid, body,
contentid, bodytypeid) TO stdout;
The dump failed, so I assume this did leave my database in an inconsistent
state.
*Attempt 2 - startup manually and let it try recovery*
I restored my file level backup and started again.
This time I tried to startup manually on the command line to see the output
(I'd done it as a service startup a number of times to nearly the same
effect too)
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D
/etc/postgresql/9.3/main start
server starting
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27 20:36:08 AEDT
LOG: database system was interrupted while in recovery at 2017-01-27
20:13:26 AEDT
2017-01-27 20:36:08 AEDT HINT: This probably means that some data is
corrupted and you will have to use the last backup for recovery.
2017-01-27 20:36:08 AEDT LOG: database system was not properly shut down;
automatic recovery in progress
2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558
2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log time
2017-01-24 02:08:00.023064+11
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
I've left it that way for 12 hours, and its still not allowing connections.
I assume its doing some kind of consistency check?
Does anyone have any suggestions on what I should be doing to try and
restore this database?
- The amount of change is minimal in the DB after 6pm it should be
basically no change overnight.
- The log above seems to suggest it has completed a redo ok, is that right?
- The last completed transaction time 2017-01-24 02:08:00.023064+11 the log
mentions would be fine to use, so loosing even a few hours before that
would be more than adequate.
I'm just not clear what the database is doing now, or how I should be
trying to recover it.
Any help anyone can suggest would be great! I've given myself this weekend
to attempt to recover more than the last backup, after that I need to
restore the service for my team to use and suck up the lost last day of
updates.
Thanks,
Brian
On 01/27/2017 01:31 PM, Brian Mills wrote:
Hi,
I have a Atlassian Confluence Wiki that depends on postgres, but I
haven't much experience with postgres other than for this purpose.A few days ago, the hard disk filled, so all services stopped working.
When the admin realised this he increased the disk size (its in a cloud,
so that was easy to do) however I think the way it shutdown left
Postgres in an inconsistent state for some reason.
Now when I start it up I get this message in the log over and over again:
"FATAL: the database system is starting up"I have a backup, which I have successfully recovered, but it is 24 hours
old, the next backup was the cause of the disk filling. So I'm using
this as exercise in learning a bit more about postgres.I did some research and found a number of options. I took a file level
backup with the service not running then tried 2 things. I haven't found
much else on what to do though.*Attempt 1 - Reset Log *
It sounded like this shouldn't be my first option (it wasn't) but it did
sound like what I needed to do.
I ran this command
./pg_resetxlog /var/lib/postgresql/9.3/main -f
It worked a treat, the database did startup ok.
However when I tried to dump the DB:
root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb >
$now-confluencedb.sql
pg_dump: Dumping the contents of table "bodycontent" failed:
PQgetResult() failed.
pg_dump: Error message from server: ERROR: unexpected chunk size 104
(expected 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
pg_dump: The command was: COPY public.bodycontent (bodycontentid, body,
contentid, bodytypeid) TO stdout;
The dump failed, so I assume this did leave my database in an
inconsistent state.*Attempt 2 - startup manually and let it try recovery*
I restored my file level backup and started again.
This time I tried to startup manually on the command line to see the
output (I'd done it as a service startup a number of times to nearly the
same effect too)postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D
/etc/postgresql/9.3/main start
server starting
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27 20:36:08 AEDT
LOG: database system was interrupted while in recovery at 2017-01-27
20:13:26 AEDT
2017-01-27 20:36:08 AEDT HINT: This probably means that some data is
corrupted and you will have to use the last backup for recovery.
2017-01-27 20:36:08 AEDT LOG: database system was not properly shut
down; automatic recovery in progress
2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558
2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log
time 2017-01-24 02:08:00.023064+11
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
What does ps ax | grep post show?
Is the cluster set up to log to a file, if so what does it show?
Does the system log show anything relevant?
I've left it that way for 12 hours, and its still not allowing connections.
I assume its doing some kind of consistency check?
What does it say when you attempt a connection?
Does anyone have any suggestions on what I should be doing to try and
restore this database?- The amount of change is minimal in the DB after 6pm it should be
basically no change overnight.
- The log above seems to suggest it has completed a redo ok, is that right?
- The last completed transaction time 2017-01-24 02:08:00.023064+11 the
log mentions would be fine to use, so loosing even a few hours before
that would be more than adequate.I'm just not clear what the database is doing now, or how I should be
trying to recover it.Any help anyone can suggest would be great! I've given myself this
weekend to attempt to recover more than the last backup, after that I
need to restore the service for my team to use and suck up the lost last
day of updates.Thanks,
Brian
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
First of all, Thank you for your time to assist me learning. I really
appreciate it.
root# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering
0000000100000005000000A3
7930 pts/4 S+ 0:00 grep --color=auto post
Its a single machine postgres database server, so I'm assuming there is no
cluster log. If there is one, where would I look for it?
The only log in /var/log/postgres is postgresql-9.3-main.log
which shows (tail):
2017-01-27 20:27:01 AEDT LOG: database system was shut down at 2017-01-27
20:26:29 AEDT
2017-01-27 20:27:01 AEDT LOG: MultiXact member wraparound protections are
now enabled
2017-01-27 20:27:01 AEDT LOG: autovacuum launcher started
2017-01-27 20:27:01 AEDT LOG: database system is ready to accept
connections
2017-01-27 20:27:02 AEDT LOG: incomplete startup packet
2017-01-27 20:28:54 AEDT ERROR: unexpected chunk size 104 (expected 1996)
in chunk 3 of 22 for toast value 48862 in pg_toast_20028
2017-01-27 20:28:54 AEDT STATEMENT: COPY public.bodycontent
(bodycontentid, body, contentid, bodytypeid) TO stdout;
2017-01-27 20:30:13 AEDT LOG: received fast shutdown request
2017-01-27 20:30:13 AEDT LOG: aborting any active transactions
2017-01-27 20:30:13 AEDT LOG: autovacuum launcher shutting down
2017-01-27 20:30:13 AEDT LOG: shutting down
2017-01-27 20:30:13 AEDT LOG: database system is shut down
I ran the screen utility so I could leave the db started using the pg_ctl
command. The later logs in that session have not progressed, its last entry
is still
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
Which is still later datetime than the /var/log/postgres... log.
Connection attempt shows the same log
postgres@atlassian:/home/myuser$ psql
psql: FATAL: the database system is starting up
Nothing in the syslog that seems connected.
*Brian Mills*
CTO
*Mob: *0410660003
*Melbourne* 03 9012 3460 or 03 8376 6327 *|* *Sydney* 02 8064 3600 *|*
*Brisbane* 07 3173 1570
Level 1 *|* 600 Chapel Street *|* South Yarra*|* VIC *|* 3141 *|*
Australia
<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking>
<https://www.linkedin.com/company/trybooking-com>
On 28 January 2017 at 12:05, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 01/27/2017 01:31 PM, Brian Mills wrote:
Hi,
I have a Atlassian Confluence Wiki that depends on postgres, but I
haven't much experience with postgres other than for this purpose.A few days ago, the hard disk filled, so all services stopped working.
When the admin realised this he increased the disk size (its in a cloud,
so that was easy to do) however I think the way it shutdown left
Postgres in an inconsistent state for some reason.
Now when I start it up I get this message in the log over and over again:
"FATAL: the database system is starting up"I have a backup, which I have successfully recovered, but it is 24 hours
old, the next backup was the cause of the disk filling. So I'm using
this as exercise in learning a bit more about postgres.I did some research and found a number of options. I took a file level
backup with the service not running then tried 2 things. I haven't found
much else on what to do though.*Attempt 1 - Reset Log *
It sounded like this shouldn't be my first option (it wasn't) but it did
sound like what I needed to do.
I ran this command
./pg_resetxlog /var/lib/postgresql/9.3/main -f
It worked a treat, the database did startup ok.
However when I tried to dump the DB:
root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb >
$now-confluencedb.sql
pg_dump: Dumping the contents of table "bodycontent" failed:
PQgetResult() failed.
pg_dump: Error message from server: ERROR: unexpected chunk size 104
(expected 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
pg_dump: The command was: COPY public.bodycontent (bodycontentid, body,
contentid, bodytypeid) TO stdout;
The dump failed, so I assume this did leave my database in an
inconsistent state.*Attempt 2 - startup manually and let it try recovery*
I restored my file level backup and started again.
This time I tried to startup manually on the command line to see the
output (I'd done it as a service startup a number of times to nearly the
same effect too)postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D
/etc/postgresql/9.3/main start
server starting
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27 20:36:08 AEDT
LOG: database system was interrupted while in recovery at 2017-01-27
20:13:26 AEDT
2017-01-27 20:36:08 AEDT HINT: This probably means that some data is
corrupted and you will have to use the last backup for recovery.
2017-01-27 20:36:08 AEDT LOG: database system was not properly shut
down; automatic recovery in progress
2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558
2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log
time 2017-01-24 02:08:00.023064+11
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
2017-01-27 23:00:01 AEDT FATAL: the database system is starting upWhat does ps ax | grep post show?
Is the cluster set up to log to a file, if so what does it show?
Does the system log show anything relevant?
I've left it that way for 12 hours, and its still not allowing
connections.
I assume its doing some kind of consistency check?What does it say when you attempt a connection?
Does anyone have any suggestions on what I should be doing to try and
restore this database?- The amount of change is minimal in the DB after 6pm it should be
basically no change overnight.
- The log above seems to suggest it has completed a redo ok, is that
right?
- The last completed transaction time 2017-01-24 02:08:00.023064+11 the
log mentions would be fine to use, so loosing even a few hours before
that would be more than adequate.I'm just not clear what the database is doing now, or how I should be
trying to recover it.Any help anyone can suggest would be great! I've given myself this
weekend to attempt to recover more than the last backup, after that I
need to restore the service for my team to use and suck up the lost last
day of updates.Thanks,
Brian--
Adrian Klaver
adrian.klaver@aklaver.com
On 01/27/2017 05:40 PM, Brian Mills wrote:
First of all, Thank you for your time to assist me learning. I really
appreciate it.root# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering
0000000100000005000000A3
7930 pts/4 S+ 0:00 grep --color=auto post
So if you check back does the recovering XXXXXXXX part change?
If so Postgres is walking through the WAL files as it should.
Its a single machine postgres database server, so I'm assuming there is
no cluster log. If there is one, where would I look for it?
The only log in /var/log/postgres is postgresql-9.3-main.log
That would be it. A single Postgres instance has multiple databases in
it, by default it starts with template0, template1 and postgres
databases. Then add whatever databases you created and you have a
cluster of databases.
which shows (tail):
2017-01-27 20:27:01 AEDT LOG: database system was shut down at
2017-01-27 20:26:29 AEDT
2017-01-27 20:27:01 AEDT LOG: MultiXact member wraparound protections
are now enabled
2017-01-27 20:27:01 AEDT LOG: autovacuum launcher started
2017-01-27 20:27:01 AEDT LOG: database system is ready to accept
connections
2017-01-27 20:27:02 AEDT LOG: incomplete startup packet
2017-01-27 20:28:54 AEDT ERROR: unexpected chunk size 104 (expected
1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
2017-01-27 20:28:54 AEDT STATEMENT: COPY public.bodycontent
(bodycontentid, body, contentid, bodytypeid) TO stdout;
2017-01-27 20:30:13 AEDT LOG: received fast shutdown request
2017-01-27 20:30:13 AEDT LOG: aborting any active transactions
2017-01-27 20:30:13 AEDT LOG: autovacuum launcher shutting down
2017-01-27 20:30:13 AEDT LOG: shutting down
2017-01-27 20:30:13 AEDT LOG: database system is shut down
That would be your Attempt 1 log.
I ran the screen utility so I could leave the db started using
the pg_ctl command. The later logs in that session have not progressed,
its last entry is still
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
Which is still later datetime than the /var/log/postgres... log.
So it is just logging to stdout and not to the log file.
Connection attempt shows the same log
postgres@atlassian:/home/myuser$ psql
psql: FATAL: the database system is starting upNothing in the syslog that seems connected.
*Brian Mills*
CTO*Mob: *0410660003 <tel:0410660003>
*Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327
<tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
<tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 <tel:07%203173%201570>
Level 1 *|* 600 Chapel Street *|* South
Yarra*|* VIC *|* 3141 *|* Australia<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <https://www.linkedin.com/company/trybooking-com>
On 28 January 2017 at 12:05, Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 01/27/2017 01:31 PM, Brian Mills wrote:
Hi,
I have a Atlassian Confluence Wiki that depends on postgres, but I
haven't much experience with postgres other than for this purpose.A few days ago, the hard disk filled, so all services stopped
working.
When the admin realised this he increased the disk size (its in
a cloud,
so that was easy to do) however I think the way it shutdown left
Postgres in an inconsistent state for some reason.
Now when I start it up I get this message in the log over and
over again:
"FATAL: the database system is starting up"I have a backup, which I have successfully recovered, but it is
24 hours
old, the next backup was the cause of the disk filling. So I'm using
this as exercise in learning a bit more about postgres.I did some research and found a number of options. I took a file
level
backup with the service not running then tried 2 things. I
haven't found
much else on what to do though.*Attempt 1 - Reset Log *
It sounded like this shouldn't be my first option (it wasn't)
but it did
sound like what I needed to do.
I ran this command
./pg_resetxlog /var/lib/postgresql/9.3/main -f
It worked a treat, the database did startup ok.
However when I tried to dump the DB:
root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb >
$now-confluencedb.sql
pg_dump: Dumping the contents of table "bodycontent" failed:
PQgetResult() failed.
pg_dump: Error message from server: ERROR: unexpected chunk
size 104
(expected 1996) in chunk 3 of 22 for toast value 48862 in
pg_toast_20028
pg_dump: The command was: COPY public.bodycontent
(bodycontentid, body,
contentid, bodytypeid) TO stdout;
The dump failed, so I assume this did leave my database in an
inconsistent state.*Attempt 2 - startup manually and let it try recovery*
I restored my file level backup and started again.
This time I tried to startup manually on the command line to see the
output (I'd done it as a service startup a number of times to
nearly the
same effect too)postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D
/etc/postgresql/9.3/main start
server starting
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27
20:36:08 AEDT
LOG: database system was interrupted while in recovery at
2017-01-27
20:13:26 AEDT
2017-01-27 20:36:08 AEDT HINT: This probably means that some
data is
corrupted and you will have to use the last backup for recovery.
2017-01-27 20:36:08 AEDT LOG: database system was not properly shut
down; automatic recovery in progress
2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558
2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log
time 2017-01-24 02:08:00.023064+11
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
2017-01-27 23:00:01 AEDT FATAL: the database system is starting upWhat does ps ax | grep post show?
Is the cluster set up to log to a file, if so what does it show?
Does the system log show anything relevant?
I've left it that way for 12 hours, and its still not allowing
connections.
I assume its doing some kind of consistency check?What does it say when you attempt a connection?
Does anyone have any suggestions on what I should be doing to
try and
restore this database?- The amount of change is minimal in the DB after 6pm it should be
basically no change overnight.
- The log above seems to suggest it has completed a redo ok, is
that right?
- The last completed transaction time 2017-01-24
02:08:00.023064+11 the
log mentions would be fine to use, so loosing even a few hours
before
that would be more than adequate.I'm just not clear what the database is doing now, or how I
should be
trying to recover it.Any help anyone can suggest would be great! I've given myself this
weekend to attempt to recover more than the last backup, after
that I
need to restore the service for my team to use and suck up the
lost last
day of updates.Thanks,
Brian--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi,
No, it hasn't changed since the first time I looked at it.
root@atlassian:/home/tbadmin# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering
0000000100000005000000A3
11161 pts/4 S+ 0:00 grep --color=auto post
The /var/log/postgres logs haven't got anything new in them.
Also the screen session hasn't got anything new in them, except it adds a
line every time I attempt to connect to it using psql. The last attempt
below was my connection attempt for this email, a bit over 8 hours after my
last connection attempt.
Screen session:
2017-01-28 12:38:35 AEDT FATAL: the database system is starting up
2017-01-28 23:00:01 AEDT FATAL: the database system is starting up
2017-01-28 23:00:01 AEDT FATAL: the database system is starting up
2017-01-29 07:14:00 AEDT FATAL: the database system is starting up
I also still can't connect.
postgres@atlassian:/home/tbadmin$ psql
psql: FATAL: the database system is starting up
*Brian Mills*
CTO
*Mob: *0410660003
*Melbourne* 03 9012 3460 or 03 8376 6327 *|* *Sydney* 02 8064 3600 *|*
*Brisbane* 07 3173 1570
Level 1 *|* 600 Chapel Street *|* South Yarra*|* VIC *|* 3141 *|*
Australia
<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking>
<https://www.linkedin.com/company/trybooking-com>
On 28 January 2017 at 16:08, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 01/27/2017 05:40 PM, Brian Mills wrote:
First of all, Thank you for your time to assist me learning. I really
appreciate it.root# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering
0000000100000005000000A3
7930 pts/4 S+ 0:00 grep --color=auto postSo if you check back does the recovering XXXXXXXX part change?
If so Postgres is walking through the WAL files as it should.
Its a single machine postgres database server, so I'm assuming there is
no cluster log. If there is one, where would I look for it?
The only log in /var/log/postgres is postgresql-9.3-main.logThat would be it. A single Postgres instance has multiple databases in it,
by default it starts with template0, template1 and postgres databases. Then
add whatever databases you created and you have a cluster of databases.which shows (tail):
2017-01-27 20:27:01 AEDT LOG: database system was shut down at
2017-01-27 20:26:29 AEDT
2017-01-27 20:27:01 AEDT LOG: MultiXact member wraparound protections
are now enabled
2017-01-27 20:27:01 AEDT LOG: autovacuum launcher started
2017-01-27 20:27:01 AEDT LOG: database system is ready to accept
connections
2017-01-27 20:27:02 AEDT LOG: incomplete startup packet
2017-01-27 20:28:54 AEDT ERROR: unexpected chunk size 104 (expected
1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
2017-01-27 20:28:54 AEDT STATEMENT: COPY public.bodycontent
(bodycontentid, body, contentid, bodytypeid) TO stdout;
2017-01-27 20:30:13 AEDT LOG: received fast shutdown request
2017-01-27 20:30:13 AEDT LOG: aborting any active transactions
2017-01-27 20:30:13 AEDT LOG: autovacuum launcher shutting down
2017-01-27 20:30:13 AEDT LOG: shutting down
2017-01-27 20:30:13 AEDT LOG: database system is shut downThat would be your Attempt 1 log.
I ran the screen utility so I could leave the db started using
the pg_ctl command. The later logs in that session have not progressed,
its last entry is still
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
Which is still later datetime than the /var/log/postgres... log.So it is just logging to stdout and not to the log file.
Connection attempt shows the same log
postgres@atlassian:/home/myuser$ psql
psql: FATAL: the database system is starting upNothing in the syslog that seems connected.
*Brian Mills*
CTO*Mob: *0410660003 <tel:0410660003>
*Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327
<tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
<tel:02%208064%203600> *|* *Brisbane* 07 3173 1570
<tel:07%203173%201570>
Level 1 *|* 600 Chapel Street *|* South
Yarra*|* VIC *|* 3141 *|* Australia<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <
https://www.linkedin.com/company/trybooking-com>On 28 January 2017 at 12:05, Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 01/27/2017 01:31 PM, Brian Mills wrote:
Hi,
I have a Atlassian Confluence Wiki that depends on postgres, but I
haven't much experience with postgres other than for this purpose.A few days ago, the hard disk filled, so all services stopped
working.
When the admin realised this he increased the disk size (its in
a cloud,
so that was easy to do) however I think the way it shutdown left
Postgres in an inconsistent state for some reason.
Now when I start it up I get this message in the log over and
over again:
"FATAL: the database system is starting up"I have a backup, which I have successfully recovered, but it is
24 hours
old, the next backup was the cause of the disk filling. So I'm
using
this as exercise in learning a bit more about postgres.I did some research and found a number of options. I took a file
level
backup with the service not running then tried 2 things. I
haven't found
much else on what to do though.*Attempt 1 - Reset Log *
It sounded like this shouldn't be my first option (it wasn't)
but it did
sound like what I needed to do.
I ran this command
./pg_resetxlog /var/lib/postgresql/9.3/main -f
It worked a treat, the database did startup ok.
However when I tried to dump the DB:
root@atlassian:/home/myuser# sudo -u postgres pg_dump
confluencedb >
$now-confluencedb.sql
pg_dump: Dumping the contents of table "bodycontent" failed:
PQgetResult() failed.
pg_dump: Error message from server: ERROR: unexpected chunk
size 104
(expected 1996) in chunk 3 of 22 for toast value 48862 in
pg_toast_20028
pg_dump: The command was: COPY public.bodycontent
(bodycontentid, body,
contentid, bodytypeid) TO stdout;
The dump failed, so I assume this did leave my database in an
inconsistent state.*Attempt 2 - startup manually and let it try recovery*
I restored my file level backup and started again.
This time I tried to startup manually on the command line to see
the
output (I'd done it as a service startup a number of times to
nearly the
same effect too)postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D
/etc/postgresql/9.3/main start
server starting
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27
20:36:08 AEDT
LOG: database system was interrupted while in recovery at
2017-01-27
20:13:26 AEDT
2017-01-27 20:36:08 AEDT HINT: This probably means that some
data is
corrupted and you will have to use the last backup for recovery.
2017-01-27 20:36:08 AEDT LOG: database system was not properly
shut
down; automatic recovery in progress
2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558
2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at
log
time 2017-01-24 02:08:00.023064+11
2017-01-27 23:00:01 AEDT FATAL: the database system is starting
up
2017-01-27 23:00:01 AEDT FATAL: the database system is starting
upWhat does ps ax | grep post show?
Is the cluster set up to log to a file, if so what does it show?
Does the system log show anything relevant?
I've left it that way for 12 hours, and its still not allowing
connections.
I assume its doing some kind of consistency check?What does it say when you attempt a connection?
Does anyone have any suggestions on what I should be doing to
try and
restore this database?- The amount of change is minimal in the DB after 6pm it should be
basically no change overnight.
- The log above seems to suggest it has completed a redo ok, is
that right?
- The last completed transaction time 2017-01-24
02:08:00.023064+11 the
log mentions would be fine to use, so loosing even a few hours
before
that would be more than adequate.I'm just not clear what the database is doing now, or how I
should be
trying to recover it.Any help anyone can suggest would be great! I've given myself this
weekend to attempt to recover more than the last backup, after
that I
need to restore the service for my team to use and suck up the
lost last
day of updates.Thanks,
Brian--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>--
Adrian Klaver
adrian.klaver@aklaver.com
Hello Brian,
On Sun, 2017-01-29 at 07:16 +1100, Brian Mills wrote:
Hi,
No, it hasn't changed since the first time I looked at it.
root@atlassian:/home/tbadmin# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering
0000000100000005000000A3
11161 pts/4 S+ 0:00 grep --color=auto post
Does this WAL file exist "0000000100000005000000A3"?
Cheers,
Rob
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Yes, its the last one in the directory, pg_xlog directory
...more files...
-rw------- 1 postgres postgres 16777216 Jan 21 10:05
0000000100000005000000A1
-rw------- 1 postgres postgres 16777216 Jan 22 21:29
0000000100000005000000A2
-rw------- 1 postgres postgres 16777216 Jan 24 02:08
0000000100000005000000A3
*Brian Mills*
CTO
*Mob: *0410660003
*Melbourne* 03 9012 3460 or 03 8376 6327 *|* *Sydney* 02 8064 3600 *|*
*Brisbane* 07 3173 1570
Level 1 *|* 600 Chapel Street *|* South Yarra*|* VIC *|* 3141 *|*
Australia
<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking>
<https://www.linkedin.com/company/trybooking-com>
On 29 January 2017 at 08:18, rob stone <floriparob@gmail.com> wrote:
Show quoted text
Hello Brian,
On Sun, 2017-01-29 at 07:16 +1100, Brian Mills wrote:Hi,
No, it hasn't changed since the first time I looked at it.
root@atlassian:/home/tbadmin# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering
0000000100000005000000A3
11161 pts/4 S+ 0:00 grep --color=auto postDoes this WAL file exist "0000000100000005000000A3"?
Cheers,
Rob
On 01/28/2017 01:55 PM, Brian Mills wrote:
Yes, its the last one in the directory, pg_xlog directory
...more files...
-rw------- 1 postgres postgres 16777216 Jan 21 10:05
0000000100000005000000A1
-rw------- 1 postgres postgres 16777216 Jan 22 21:29
0000000100000005000000A2
-rw------- 1 postgres postgres 16777216 Jan 24 02:08
0000000100000005000000A3
Best guess is the last WAL is not complete.
From your original post:
"Attempt 2 - startup manually and let it try recovery
I restored my file level backup and started again. "
How was the file level backup done?
*Brian Mills*
CTO*Mob: *0410660003 <tel:0410660003>
*Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327
<tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
<tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 <tel:07%203173%201570>
Level 1 *|* 600 Chapel Street *|* South
Yarra*|* VIC *|* 3141 *|* Australia<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <https://www.linkedin.com/company/trybooking-com>
On 29 January 2017 at 08:18, rob stone <floriparob@gmail.com
<mailto:floriparob@gmail.com>> wrote:Hello Brian,
On Sun, 2017-01-29 at 07:16 +1100, Brian Mills wrote:Hi,
No, it hasn't changed since the first time I looked at it.
root@atlassian:/home/tbadmin# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering
0000000100000005000000A3
11161 pts/4 S+ 0:00 grep --color=auto postDoes this WAL file exist "0000000100000005000000A3"?
Cheers,
Rob
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
I have a consistent sql dump from 24 hour previous.
The file level backup was done with rsync -a of full data directory after
the issue occurred so could reset as I learned.
Brian
On Sun, 29 Jan 2017 at 9:18 am, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
On 01/28/2017 01:55 PM, Brian Mills wrote:
Yes, its the last one in the directory, pg_xlog directory
...more files...
-rw------- 1 postgres postgres 16777216 Jan 21 10:05
0000000100000005000000A1
-rw------- 1 postgres postgres 16777216 Jan 22 21:29
0000000100000005000000A2
-rw------- 1 postgres postgres 16777216 Jan 24 02:08
0000000100000005000000A3Best guess is the last WAL is not complete.
From your original post:
"Attempt 2 - startup manually and let it try recoveryI restored my file level backup and started again. "
How was the file level backup done?
*Brian Mills*
CTO*Mob: *0410660003 <tel:0410660003>
*Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327
<tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
<tel:02%208064%203600> *|* *Brisbane* 07 3173 1570<tel:07%203173%201570>
Level 1 *|* 600 Chapel Street *|* South
Yarra*|* VIC *|* 3141 *|* Australia<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking>
<https://www.linkedin.com/company/trybooking-com>
On 29 January 2017 at 08:18, rob stone <floriparob@gmail.com
<mailto:floriparob@gmail.com>> wrote:Hello Brian,
On Sun, 2017-01-29 at 07:16 +1100, Brian Mills wrote:Hi,
No, it hasn't changed since the first time I looked at it.
root@atlassian:/home/tbadmin# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering
0000000100000005000000A3
11161 pts/4 S+ 0:00 grep --color=auto postDoes this WAL file exist "0000000100000005000000A3"?
Cheers,
Rob--
Adrian Klaver
adrian.klaver@aklaver.com
--
*Brian Mills*
CTO
*Mob: *0410660003 <javascript:void(0);>
*Melbourne* 03 9012 3460 <javascript:void(0);> or 03 8376 6327
<javascript:void(0);> *|* *Sydney* 02 8064 3600 <javascript:void(0);> *|*
*Brisbane* 07 3173 1570 <javascript:void(0);>
Level 1 *|* 600 Chapel Street *|* South Yarra*|* VIC *|* 3141 *|*
Australia
<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking>
<https://www.linkedin.com/company/trybooking-com>
I presume this is a binary log file for the database.
Am I able to recover to a point in time using this log file?
What I would do in SQL Server would be recover to a point in time, say a
bit before the last completed transaction time the log mentions, then take
a backup. Is that possible in postgres?
The log mentions this:
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log time
2017-01-24 02:08:00.023064+11
(which is moments before, or possibly as the disk filled up doing a db
backup dump)
*Brian Mills*
CTO
*Mob: *0410660003
*Melbourne* 03 9012 3460 or 03 8376 6327 *|* *Sydney* 02 8064 3600 *|*
*Brisbane* 07 3173 1570
Level 1 *|* 600 Chapel Street *|* South Yarra*|* VIC *|* 3141 *|*
Australia
<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking>
<https://www.linkedin.com/company/trybooking-com>
On 29 January 2017 at 12:58, Brian Mills <brian@trybooking.com> wrote:
Show quoted text
I have a consistent sql dump from 24 hour previous.
The file level backup was done with rsync -a of full data directory after
the issue occurred so could reset as I learned.Brian
On Sun, 29 Jan 2017 at 9:18 am, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:On 01/28/2017 01:55 PM, Brian Mills wrote:
Yes, its the last one in the directory, pg_xlog directory
...more files...
-rw------- 1 postgres postgres 16777216 Jan 21 10:05
0000000100000005000000A1
-rw------- 1 postgres postgres 16777216 Jan 22 21:29
0000000100000005000000A2
-rw------- 1 postgres postgres 16777216 Jan 24 02:08
0000000100000005000000A3Best guess is the last WAL is not complete.
From your original post:
"Attempt 2 - startup manually and let it try recoveryI restored my file level backup and started again. "
How was the file level backup done?
*Brian Mills*
CTO*Mob: *0410660003 <0410%20660%20003> <tel:0410660003 <0410%20660%20003>
*Melbourne* 03 9012 3460 <(03)%209012%203460> <tel:03%209012%203460>
or 03 8376 6327 <(03)%208376%206327>
<tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
<(02)%208064%203600>
<tel:02%208064%203600> *|* *Brisbane* 07 3173 1570
<(07)%203173%201570> <tel:07%203173%201570>
Level 1 *|* 600 Chapel Street *|* South
Yarra*|* VIC *|* 3141 *|* Australia<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking>
<https://www.linkedin.com/company/trybooking-com>
On 29 January 2017 at 08:18, rob stone <floriparob@gmail.com
<mailto:floriparob@gmail.com>> wrote:Hello Brian,
On Sun, 2017-01-29 at 07:16 +1100, Brian Mills wrote:Hi,
No, it hasn't changed since the first time I looked at it.
root@atlassian:/home/tbadmin# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres-D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering
0000000100000005000000A3
11161 pts/4 S+ 0:00 grep --color=auto postDoes this WAL file exist "0000000100000005000000A3"?
Cheers,
Rob--
Adrian Klaver
adrian.klaver@aklaver.com--
*Brian Mills*
CTO*Mob: *0410660003
*Melbourne* 03 9012 3460 or 03 8376 6327 *|* *Sydney* 02 8064 3600 *|*
*Brisbane* 07 3173 1570
Level 1 *|* 600 Chapel Street *|* South Yarra*|* VIC *|* 3141 *|*
Australia<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking>
<https://www.linkedin.com/company/trybooking-com>
On 01/28/2017 11:23 PM, Brian Mills wrote:
I presume this is a binary log file for the database.
Am I able to recover to a point in time using this log file?
What I would do in SQL Server would be recover to a point in time, say a
bit before the last completed transaction time the log mentions, then
take a backup. Is that possible in postgres?
Yes, though I am not sure you have the setup to do it. I would suggest
reading the below to see how much of it applies:
https://www.postgresql.org/docs/9.3/static/continuous-archiving.html
In particular:
24.3.4. Recovering Using a Continuous Archive Backup
https://www.postgresql.org/docs/9.3/static/recovery-target-settings.html
The log mentions this:
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log
time 2017-01-24 02:08:00.023064+11(which is moments before, or possibly as the disk filled up doing a db
backup dump)*Brian Mills*
CTO*Mob: *0410660003 <tel:0410660003>
*Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327
<tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
<tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 <tel:07%203173%201570>
Level 1 *|* 600 Chapel Street *|* South
Yarra*|* VIC *|* 3141 *|* Australia<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <https://www.linkedin.com/company/trybooking-com>
On 29 January 2017 at 12:58, Brian Mills <brian@trybooking.com
<mailto:brian@trybooking.com>> wrote:I have a consistent sql dump from 24 hour previous.
The file level backup was done with rsync -a of full data directory
after the issue occurred so could reset as I learned.Brian
On Sun, 29 Jan 2017 at 9:18 am, Adrian Klaver
<adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:On 01/28/2017 01:55 PM, Brian Mills wrote:
Yes, its the last one in the directory, pg_xlog directory
...more files...
-rw------- 1 postgres postgres 16777216 Jan 21 10:05
0000000100000005000000A1
-rw------- 1 postgres postgres 16777216 Jan 22 21:29
0000000100000005000000A2
-rw------- 1 postgres postgres 16777216 Jan 24 02:08
0000000100000005000000A3Best guess is the last WAL is not complete.
From your original post:
"Attempt 2 - startup manually and let it try recoveryI restored my file level backup and started again. "
How was the file level backup done?
*Brian Mills*
CTO*Mob: *0410660003 <tel:0410%20660%20003> <tel:0410660003
<tel:0410%20660%20003>>
*Melbourne* 03 9012 3460 <tel:(03)%209012%203460>
<tel:03%209012%203460> or 03 8376 6327 <tel:(03)%208376%206327>
<tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
<tel:(02)%208064%203600>
<tel:02%208064%203600> *|* *Brisbane* 07 3173 1570
<tel:(07)%203173%201570> <tel:07%203173%201570>
Level 1 *|* 600 Chapel Street *|* South
Yarra*|* VIC *|* 3141 *|* Australia<https://www.facebook.com/TryBooking/>>
<https://twitter.com/trybooking
<https://twitter.com/trybooking>>
<https://www.linkedin.com/company/trybooking-com
<https://www.linkedin.com/company/trybooking-com>>On 29 January 2017 at 08:18, rob stone <floriparob@gmail.com
<mailto:floriparob@gmail.com>
<mailto:floriparob@gmail.com <mailto:floriparob@gmail.com>>>
wrote:
Hello Brian,
On Sun, 2017-01-29 at 07:16 +1100, Brian Mills wrote:Hi,
No, it hasn't changed since the first time I looked at it.
root@atlassian:/home/tbadmin# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00/usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup processrecovering
0000000100000005000000A3
11161 pts/4 S+ 0:00 grep --color=auto postDoes this WAL file exist "0000000100000005000000A3"?
Cheers,
Rob--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>--
*Brian Mills*
CTO*Mob: *0410660003
*Melbourne* 03 9012 3460 or 03 8376 6327 *|* * **Sydney* 02 8064
3600 *|* *Brisbane* 07 3173 1570
Level 1 *|* 600 Chapel Street *|* South
Yarra*|* VIC *|* 3141 *|* Australia<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <https://www.linkedin.com/company/trybooking-com>
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 01/28/2017 11:23 PM, Brian Mills wrote:
I presume this is a binary log file for the database.
Am I able to recover to a point in time using this log file?
What I would do in SQL Server would be recover to a point in time, say a
bit before the last completed transaction time the log mentions, then
take a backup. Is that possible in postgres?
Had another thought. If I remember correctly you are using this as an
exercise in Postgres recovery. If that is indeed the case you might try:
1) Stop the Postgres instance you have running now.
2) Move the WAL file that Postgres is currently stalled on,
0000000100000005000000A3, out of pg_xlog.
3) Restart the Postgres instance.
My guess it it will not bring it back to the exact point you want, but
close. You can get a general idea by running(before and after removing
the WAL), as the postgres user:
pg_controldata -D /etc/postgresql/9.3/main
The log mentions this:
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log
time 2017-01-24 02:08:00.023064+11(which is moments before, or possibly as the disk filled up doing a db
backup dump)*Brian Mills*
CTO
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
OK. I think I'm on to something here, I first reset back to my file level
backup.
I created a recovery.conf file in the root of the data directory like this:
---------------------
restore_command = 'cp /mnt/archive/%f %p'
recovery_target_time = '2017-01-24 02:08:00.023064+11'
recovery_target_inclusive = 'true'
pause_at_recovery_target = 'false'
---------------------
Note, the archive directory had no files in it, I left the WAL files in the
pg_xlog directory.
Then I started up the database again:
postgres@atlassian:~/9.3/main$ /usr/lib/postgresql/9.3/bin/pg_ctl -D
/etc/postgresql/9.3/main start
server starting
postgres@atlassian:~/9.3/main$ 2017-01-30 10:07:28 AEDT LOG: database
system was interrupted while in recovery at 2017-01-27 20:13:26 AEDT
2017-01-30 10:07:28 AEDT HINT: This probably means that some data is
corrupted and you will have to use the last backup for recovery.
2017-01-30 10:07:28 AEDT LOG: starting point-in-time recovery to
2017-01-24 02:08:00.023064+11
2017-01-30 10:07:28 AEDT LOG: database system was not properly shut down;
automatic recovery in progress
2017-01-30 10:07:28 AEDT WARNING: WAL was generated with
wal_level=minimal, data may be missing
2017-01-30 10:07:28 AEDT HINT: This happens if you temporarily set
wal_level=minimal without taking a new base backup.
2017-01-30 10:07:28 AEDT LOG: redo starts at 5/528B4558
2017-01-30 10:07:40 AEDT LOG: consistent recovery state reached at
5/A3FFFA30
cp: cannot stat ‘/mnt/archive/0000000100000005000000A3’: No such file or
directory
cp: cannot stat ‘/mnt/archive/0000000100000005000000A4’: No such file or
directory
2017-01-30 10:07:40 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-30 10:07:40 AEDT LOG: last completed transaction was at log time
2017-01-24 02:08:00.023064+11
cp: cannot stat ‘/mnt/archive/0000000100000005000000A3’: No such file or
directory
cp: cannot stat ‘/mnt/archive/00000002.history’: No such file or directory
2017-01-30 10:07:40 AEDT LOG: selected new timeline ID: 2
cp: cannot stat ‘/mnt/archive/00000001.history’: No such file or directory
2017-01-30 10:07:40 AEDT LOG: archive recovery complete
2017-01-30 10:08:55 AEDT FATAL: the database system is starting up
2017-01-30 10:08:57 AEDT FATAL: the database system is starting up
This time it looks like it has actually finished the startup and recovery.
However I think I might have something wrong about the process.
Any thoughts on the above log?
*Brian Mills*
CTO
*Mob: *0410660003
*Melbourne* 03 9012 3460 or 03 8376 6327 *|* *Sydney* 02 8064 3600 *|*
*Brisbane* 07 3173 1570
Level 1 *|* 600 Chapel Street *|* South Yarra*|* VIC *|* 3141 *|*
Australia
<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking>
<https://www.linkedin.com/company/trybooking-com>
On 30 January 2017 at 04:49, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
Show quoted text
On 01/28/2017 11:23 PM, Brian Mills wrote:
I presume this is a binary log file for the database.
Am I able to recover to a point in time using this log file?
What I would do in SQL Server would be recover to a point in time, say a
bit before the last completed transaction time the log mentions, then
take a backup. Is that possible in postgres?Had another thought. If I remember correctly you are using this as an
exercise in Postgres recovery. If that is indeed the case you might try:1) Stop the Postgres instance you have running now.
2) Move the WAL file that Postgres is currently stalled on,
0000000100000005000000A3, out of pg_xlog.3) Restart the Postgres instance.
My guess it it will not bring it back to the exact point you want, but
close. You can get a general idea by running(before and after removing the
WAL), as the postgres user:pg_controldata -D /etc/postgresql/9.3/main
The log mentions this:
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log
time 2017-01-24 02:08:00.023064+11(which is moments before, or possibly as the disk filled up doing a db
backup dump)*Brian Mills*
CTO--
Adrian Klaver
adrian.klaver@aklaver.com
On 01/29/2017 03:12 PM, Brian Mills wrote:
OK. I think I'm on to something here, I first reset back to my file
level backup.
I created a recovery.conf file in the root of the data directory like this:
---------------------
restore_command = 'cp /mnt/archive/%f %p'
recovery_target_time = '2017-01-24 02:08:00.023064+11'
recovery_target_inclusive = 'true'
pause_at_recovery_target = 'false'
---------------------
Note, the archive directory had no files in it, I left the WAL files in
the pg_xlog directory.Then I started up the database again:
postgres@atlassian:~/9.3/main$ /usr/lib/postgresql/9.3/bin/pg_ctl -D
/etc/postgresql/9.3/main start
server starting
postgres@atlassian:~/9.3/main$ 2017-01-30 10:07:28 AEDT LOG: database
system was interrupted while in recovery at 2017-01-27 20:13:26 AEDT
2017-01-30 10:07:28 AEDT HINT: This probably means that some data is
corrupted and you will have to use the last backup for recovery.
2017-01-30 10:07:28 AEDT LOG: starting point-in-time recovery to
2017-01-24 02:08:00.023064+11
2017-01-30 10:07:28 AEDT LOG: database system was not properly shut
down; automatic recovery in progress
2017-01-30 10:07:28 AEDT WARNING: WAL was generated with
wal_level=minimal, data may be missing
This would be a problem:
https://www.postgresql.org/docs/9.3/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-SETTINGS
"But minimal WAL does not contain enough information to reconstruct the
data from a base backup and the WAL logs, so either archive or
hot_standby level must be used to enable WAL archiving (archive_mode)
and streaming replication. "
2017-01-30 10:07:28 AEDT HINT: This happens if you temporarily set
wal_level=minimal without taking a new base backup.
2017-01-30 10:07:28 AEDT LOG: redo starts at 5/528B4558
2017-01-30 10:07:40 AEDT LOG: consistent recovery state reached at
5/A3FFFA30
cp: cannot stat ‘/mnt/archive/0000000100000005000000A3’: No such file or
directory
cp: cannot stat ‘/mnt/archive/0000000100000005000000A4’: No such file or
directory
2017-01-30 10:07:40 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-30 10:07:40 AEDT LOG: last completed transaction was at log
time 2017-01-24 02:08:00.023064+11
cp: cannot stat ‘/mnt/archive/0000000100000005000000A3’: No such file or
directory
cp: cannot stat ‘/mnt/archive/00000002.history’: No such file or directory
2017-01-30 10:07:40 AEDT LOG: selected new timeline ID: 2
cp: cannot stat ‘/mnt/archive/00000001.history’: No such file or directory
2017-01-30 10:07:40 AEDT LOG: archive recovery complete
2017-01-30 10:08:55 AEDT FATAL: the database system is starting up
2017-01-30 10:08:57 AEDT FATAL: the database system is starting upThis time it looks like it has actually finished the startup and
recovery. However I think I might have something wrong about the process.
Any thoughts on the above log?
See above.
*Brian Mills*
CTO*Mob: *0410660003 <tel:0410660003>
*Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327
<tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
<tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 <tel:07%203173%201570>
Level 1 *|* 600 Chapel Street *|* South
Yarra*|* VIC *|* 3141 *|* Australia<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking> <https://www.linkedin.com/company/trybooking-com>
On 30 January 2017 at 04:49, Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:On 01/28/2017 11:23 PM, Brian Mills wrote:
I presume this is a binary log file for the database.
Am I able to recover to a point in time using this log file?
What I would do in SQL Server would be recover to a point in
time, say a
bit before the last completed transaction time the log mentions,
then
take a backup. Is that possible in postgres?Had another thought. If I remember correctly you are using this as
an exercise in Postgres recovery. If that is indeed the case you
might try:1) Stop the Postgres instance you have running now.
2) Move the WAL file that Postgres is currently stalled on,
0000000100000005000000A3, out of pg_xlog.3) Restart the Postgres instance.
My guess it it will not bring it back to the exact point you want,
but close. You can get a general idea by running(before and after
removing the WAL), as the postgres user:pg_controldata -D /etc/postgresql/9.3/main
The log mentions this:
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log
time 2017-01-24 02:08:00.023064+11(which is moments before, or possibly as the disk filled up
doing a db
backup dump)*Brian Mills*
CTO--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general