Re: Losing data

Started by Joshua D. Drakealmost 18 years ago15 messagesgeneral
Jump to latest
#1Joshua D. Drake
jd@commandprompt.com

On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote:

I have had a serious loss of data and wondered if anyone could shed any light
on what may have happened.
My users have been writing reports on students. No error messages have been
produced and when called back up the reports seem to be present at the time
of writing. However, next day they have disappeared, and they do not appear
in a pg_dump. They seem to have been kept in memory and never written to
disk.
We are using Zope and connecting to Postgres through psycopg on Centos 5.
I suspect a hard disk failure but any other ideas would be welcome.
Would these reports be in the WAL?

If it was hardware related you would know, quickly. This sounds a great
deal more like an application level interaction. Perhaps your zope
application caches things for a while before committing to disk?

Are you familiar with the actual database itself? Have you checked the
database logs for errors?

Sincerely,

Joshua D. rae

#2Garry Saddington
garry@schoolteachers.co.uk
In reply to: Joshua D. Drake (#1)

On Thursday 19 June 2008 16:55, Joshua D. Drake wrote:

On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote:

I have had a serious loss of data and wondered if anyone could shed any
light on what may have happened.
My users have been writing reports on students. No error messages have
been produced and when called back up the reports seem to be present at
the time of writing. However, next day they have disappeared, and they do
not appear in a pg_dump. They seem to have been kept in memory and never
written to disk.
We are using Zope and connecting to Postgres through psycopg on Centos 5.
I suspect a hard disk failure but any other ideas would be welcome.
Would these reports be in the WAL?

If it was hardware related you would know, quickly. This sounds a great
deal more like an application level interaction. Perhaps your zope
application caches things for a while before committing to disk?

Yes I thought of this but once the report is sent to the DB a separate query
is run to get all of that teacher's reports and these are then displayed on a
new page. They all appear here but then disappear later. Zope has transaction
machinery that rolls everything back on an error, so Postgres must have
indicated a successful write somehow. I read in a Postgres manual that the
hard disk may report to the OS that a write has occured when it actually has
not, is this possible? Oh, and the problem has been intermittant. Another
thing that happened this morning is that Postgres had today as 18/06/2008
when in fact it was 19/06/2008 and the OS reported this correctly. Restarting
postgres sorted it, could this be the problem?
Regards
Garry

#3Bill Moran
wmoran@collaborativefusion.com
In reply to: Garry Saddington (#2)

In response to Garry Saddington <garry@schoolteachers.co.uk>:

On Thursday 19 June 2008 16:55, Joshua D. Drake wrote:

On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote:

I have had a serious loss of data and wondered if anyone could shed any
light on what may have happened.
My users have been writing reports on students. No error messages have
been produced and when called back up the reports seem to be present at
the time of writing. However, next day they have disappeared, and they do
not appear in a pg_dump. They seem to have been kept in memory and never
written to disk.
We are using Zope and connecting to Postgres through psycopg on Centos 5.
I suspect a hard disk failure but any other ideas would be welcome.
Would these reports be in the WAL?

If it was hardware related you would know, quickly. This sounds a great
deal more like an application level interaction. Perhaps your zope
application caches things for a while before committing to disk?

Yes I thought of this but once the report is sent to the DB a separate query
is run to get all of that teacher's reports and these are then displayed on a
new page. They all appear here but then disappear later. Zope has transaction
machinery that rolls everything back on an error, so Postgres must have
indicated a successful write somehow. I read in a Postgres manual that the
hard disk may report to the OS that a write has occured when it actually has
not, is this possible?

No. If that happens you end up with corrupt disks. The chance of that
going unnoticed by the OS is pretty slim.

Oh, and the problem has been intermittant. Another
thing that happened this morning is that Postgres had today as 18/06/2008
when in fact it was 19/06/2008 and the OS reported this correctly. Restarting
postgres sorted it, could this be the problem?

Sounds to me like there's something seriously wrong with you OS or your
PostgreSQL install. What version of PostgreSQL is this? What OS?

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

#4Alan Hodgson
ahodgson@simkin.ca
In reply to: Garry Saddington (#2)

On Thursday 19 June 2008, Garry Saddington <garry@schoolteachers.co.uk>
wrote:

I read in a
Postgres manual that the hard disk may report to the OS that a write has
occured when it actually has not, is this possible?

Yeah. But unless the power suddenly turned off that wouldn't cause data
loss.

Oh, and the problem
has been intermittant. Another thing that happened this morning is that
Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and the
OS reported this correctly. Restarting postgres sorted it, could this be
the problem?

I strongly suspect the problem is between the keyboard and the chair.

In any case, however, if PostgreSQL reported the transaction complete and
the machine didn't experience any hardware problems (like sudden power or
disk failure), I would certainly not suspect PostgreSQL as the source of
the problem.

--
Alan

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Alan Hodgson (#4)

-------------- Original message ----------------------
From: Garry Saddington <garry@schoolteachers.co.uk>

On Thursday 19 June 2008 16:55, Joshua D. Drake wrote:

On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote:

I have had a serious loss of data and wondered if anyone could shed any
light on what may have happened.
My users have been writing reports on students. No error messages have
been produced and when called back up the reports seem to be present at
the time of writing. However, next day they have disappeared, and they do
not appear in a pg_dump. They seem to have been kept in memory and never
written to disk.
We are using Zope and connecting to Postgres through psycopg on Centos 5.
I suspect a hard disk failure but any other ideas would be welcome.
Would these reports be in the WAL?

If it was hardware related you would know, quickly. This sounds a great
deal more like an application level interaction. Perhaps your zope
application caches things for a while before committing to disk?

Yes I thought of this but once the report is sent to the DB a separate query
is run to get all of that teacher's reports and these are then displayed on a
new page. They all appear here but then disappear later. Zope has transaction
machinery that rolls everything back on an error, so Postgres must have
indicated a successful write somehow. I read in a Postgres manual that the
hard disk may report to the OS that a write has occured when it actually has
not, is this possible? Oh, and the problem has been intermittant. Another
thing that happened this morning is that Postgres had today as 18/06/2008
when in fact it was 19/06/2008 and the OS reported this correctly. Restarting
postgres sorted it, could this be the problem?
Regards
Garry

Seems like a transaction with no commit. Basically along as the session is active the data is there but once the session is closed the data does not persist.

--
Adrian Klaver
aklaver@comcast.net

#6Garry Saddington
garry@schoolteachers.co.uk
In reply to: Bill Moran (#3)

On Thursday 19 June 2008 18:10, Bill Moran wrote:

In response to Garry Saddington <garry@schoolteachers.co.uk>:

On Thursday 19 June 2008 16:55, Joshua D. Drake wrote:

On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote:

I have had a serious loss of data and wondered if anyone could shed
any light on what may have happened.
My users have been writing reports on students. No error messages
have been produced and when called back up the reports seem to be
present at the time of writing. However, next day they have
disappeared, and they do not appear in a pg_dump. They seem to have
been kept in memory and never written to disk.
We are using Zope and connecting to Postgres through psycopg on
Centos 5. I suspect a hard disk failure but any other ideas would be
welcome. Would these reports be in the WAL?

If it was hardware related you would know, quickly. This sounds a great
deal more like an application level interaction. Perhaps your zope
application caches things for a while before committing to disk?

Yes I thought of this but once the report is sent to the DB a separate
query is run to get all of that teacher's reports and these are then
displayed on a new page. They all appear here but then disappear later.
Zope has transaction machinery that rolls everything back on an error, so
Postgres must have indicated a successful write somehow. I read in a
Postgres manual that the hard disk may report to the OS that a write has
occured when it actually has not, is this possible?

No. If that happens you end up with corrupt disks. The chance of that
going unnoticed by the OS is pretty slim.

Oh, and the problem has been intermittant. Another
thing that happened this morning is that Postgres had today as 18/06/2008
when in fact it was 19/06/2008 and the OS reported this correctly.
Restarting postgres sorted it, could this be the problem?

Sounds to me like there's something seriously wrong with you OS or your
PostgreSQL install. What version of PostgreSQL is this? What OS?

--

Centos 5 with the Posgres that comes with it - 8.1
regards
garry

#7Garry Saddington
garry@schoolteachers.co.uk
In reply to: Alan Hodgson (#4)

On Thursday 19 June 2008 18:15, Alan Hodgson wrote:

On Thursday 19 June 2008, Garry Saddington <garry@schoolteachers.co.uk>

wrote:

I read in a
Postgres manual that the hard disk may report to the OS that a write has
occured when it actually has not, is this possible?

Yeah. But unless the power suddenly turned off that wouldn't cause data
loss.

Oh, and the problem
has been intermittant. Another thing that happened this morning is that
Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and the
OS reported this correctly. Restarting postgres sorted it, could this be
the problem?

I strongly suspect the problem is between the keyboard and the chair.

I'd love to agree, but I have seen this first hand as a user!

In any case, however, if PostgreSQL reported the transaction complete and
the machine didn't experience any hardware problems (like sudden power or
disk failure), I would certainly not suspect PostgreSQL as the source of
the problem.

What has happened to the reports then? I have used this combination of Zope
and Postgres for 5 years with no problems like this before and we have
written one complete set of reports on this server in the past 6 weeks. The
problem seems to have started last friday, when reports started to go
missing.
regards
Garry

#8Joshua D. Drake
jd@commandprompt.com
In reply to: Adrian Klaver (#5)

On Thu, 2008-06-19 at 19:12 +0100, Garry Saddington wrote:

On Thursday 19 June 2008 18:52, Adrian Klaver wrote:

-------------- Original message ----------------------
From: Garry Saddington <garry@schoolteachers.co.uk>

Seems like a transaction with no commit. Basically along as the session is
active the data is there but once the session is closed the data does not
persist.

Makes sense but what is to blame?

If this is indeed the problem, it is likely a lack of an explicit commit
to your connect object within zope. In pysocpg2 unless you set your
isolation level to something different it automatically creates a new
transaction for your session so if you don't do conn.commit() everything
you do will be rolled back.

Joshua D. Drake

Show quoted text

Regards
Garry

#9Garry Saddington
garry@schoolteachers.co.uk
In reply to: Adrian Klaver (#5)

On Thursday 19 June 2008 18:52, Adrian Klaver wrote:

-------------- Original message ----------------------
From: Garry Saddington <garry@schoolteachers.co.uk>

On Thursday 19 June 2008 16:55, Joshua D. Drake wrote:

On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote:

I have had a serious loss of data and wondered if anyone could shed
any light on what may have happened.
My users have been writing reports on students. No error messages
have been produced and when called back up the reports seem to be
present at the time of writing. However, next day they have
disappeared, and they do not appear in a pg_dump. They seem to have
been kept in memory and never written to disk.
We are using Zope and connecting to Postgres through psycopg on
Centos 5. I suspect a hard disk failure but any other ideas would be
welcome. Would these reports be in the WAL?

If it was hardware related you would know, quickly. This sounds a great
deal more like an application level interaction. Perhaps your zope
application caches things for a while before committing to disk?

Yes I thought of this but once the report is sent to the DB a separate
query is run to get all of that teacher's reports and these are then
displayed on a new page. They all appear here but then disappear later.
Zope has transaction machinery that rolls everything back on an error, so
Postgres must have indicated a successful write somehow. I read in a
Postgres manual that the hard disk may report to the OS that a write has
occured when it actually has not, is this possible? Oh, and the problem
has been intermittant. Another thing that happened this morning is that
Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and the
OS reported this correctly. Restarting postgres sorted it, could this be
the problem?
Regards
Garry

Seems like a transaction with no commit. Basically along as the session is
active the data is there but once the session is closed the data does not
persist.

Makes sense but what is to blame?
Regards
Garry

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Garry Saddington (#6)

Garry Saddington <garry@schoolteachers.co.uk> writes:

Oh, and the problem has been intermittant. Another
thing that happened this morning is that Postgres had today as 18/06/2008
when in fact it was 19/06/2008 and the OS reported this correctly.

Two theories about that one:

1. Postgres' timezone setting is far away from where you are.

2. You were in a transaction that had been open since yesterday.
now() and related functions show time of transaction start.

#2 is interesting since forgetting to commit an open transaction might
also explain the "data loss" complaint.

regards, tom lane

#11Shane Ambler
pgsql@Sheeky.Biz
In reply to: Adrian Klaver (#5)

Adrian Klaver wrote:

Yes I thought of this but once the report is sent to the DB a
separate query is run to get all of that teacher's reports and
these are then displayed on a new page. They all appear here but
then disappear later. Zope has transaction machinery that rolls
everything back on an error, so Postgres must have indicated a
successful write somehow. I read in a Postgres manual that the
hard disk may report to the OS that a write has occured when it
actually has not, is this possible? Oh, and the problem has been
intermittant. Another thing that happened this morning is that
Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and
the OS reported this correctly. Restarting postgres sorted it,
could this be the problem? Regards Garry

Seems like a transaction with no commit. Basically along as the
session is active the data is there but once the session is closed
the data does not persist.

This is my thinking.

Once they are entered and then get listed in the new page can another
user see the reports?

Do they log out and then back in and the reports are gone?

If so then maybe the commit step is getting skipped for some reason.

At what stage are things committed? After entry or at user logoff?
Is there some user data updated on logout that gets an error and rolls back?

--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

#12Tino Wildenhain
tino@wildenhain.de
In reply to: Garry Saddington (#2)

Hi,

Garry Saddington wrote:

On Thursday 19 June 2008 16:55, Joshua D. Drake wrote:

On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote:

...

Yes I thought of this but once the report is sent to the DB a separate query
is run to get all of that teacher's reports and these are then displayed on a
new page. They all appear here but then disappear later. Zope has transaction
machinery that rolls everything back on an error, so Postgres must have

are you running the report within the same request that made the write?
In this case you would be in the same transaction (and see uncommitted
data) if then an exeption is raised after you display the report
the transaction would be rolled back. You should check your zope logs.

Regards
Tino

#13Robert Treat
xzilla@users.sourceforge.net
In reply to: Garry Saddington (#7)

On Thursday 19 June 2008 14:06:38 Garry Saddington wrote:

In any case, however, if PostgreSQL reported the transaction complete and
the machine didn't experience any hardware problems (like sudden power or
disk failure), I would certainly not suspect PostgreSQL as the source of
the problem.

What has happened to the reports then? I have used this combination of Zope
and Postgres for 5 years with no problems like this before and we have
written one complete set of reports on this server in the past 6 weeks. The
problem seems to have started last friday, when reports started to go
missing.

Out of curiosity, what is your vacuum strategy?

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Treat (#13)

Robert Treat <xzilla@users.sourceforge.net> writes:

On Thursday 19 June 2008 14:06:38 Garry Saddington wrote:

The problem seems to have started last friday, when reports started to go
missing.

Out of curiosity, what is your vacuum strategy?

If you're thinking "transaction ID wraparound", I believe we can rule
that out, because Garry says he's running PG 8.1. 8.1 is not terribly
proactive about preventing wraparound (no forced autovacuums), but it
will squawk loudly about impending wraparound and shut down before it
hits the wall.

My bet is that something was changed in the client-side software last
week that has broken its commit-issuing behavior.

regards, tom lane

#15Garry Saddington
garry@schoolteachers.co.uk
In reply to: Robert Treat (#13)

On Friday 20 June 2008 05:26, Robert Treat wrote:

On Thursday 19 June 2008 14:06:38 Garry Saddington wrote:

In any case, however, if PostgreSQL reported the transaction complete
and the machine didn't experience any hardware problems (like sudden
power or disk failure), I would certainly not suspect PostgreSQL as the
source of the problem.

What has happened to the reports then? I have used this combination of
Zope and Postgres for 5 years with no problems like this before and we
have written one complete set of reports on this server in the past 6
weeks. The problem seems to have started last friday, when reports
started to go missing.

Out of curiosity, what is your vacuum strategy?

We back up and vacuum at the same time each day.
garry