Losing data

Started by Garry Saddingtonalmost 18 years ago8 messagesgeneral
Jump to latest
#1Garry Saddington
garry@schoolteachers.co.uk

I have had a serious loss of data and wondered if anyone could shed any light
on what may have happened.
My users have been writing reports on students. No error messages have been
produced and when called back up the reports seem to be present at the time
of writing. However, next day they have disappeared, and they do not appear
in a pg_dump. They seem to have been kept in memory and never written to
disk.
We are using Zope and connecting to Postgres through psycopg on Centos 5.
I suspect a hard disk failure but any other ideas would be welcome.
Would these reports be in the WAL?
regards
garry

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: Garry Saddington (#1)
Re: Losing data

On Thu, Jun 19, 2008 at 9:55 AM, Garry Saddington
<garry@schoolteachers.co.uk> wrote:

I have had a serious loss of data and wondered if anyone could shed any light
on what may have happened.
My users have been writing reports on students. No error messages have been
produced and when called back up the reports seem to be present at the time
of writing. However, next day they have disappeared, and they do not appear
in a pg_dump. They seem to have been kept in memory and never written to
disk.
We are using Zope and connecting to Postgres through psycopg on Centos 5.
I suspect a hard disk failure but any other ideas would be welcome.
Would these reports be in the WAL?
regards

Just a guess, but do you have some kind of search engine pointing at
the application? Could it be hitting a "delete" link maybe?

#3Geoffrey
lists@serioustechnology.com
In reply to: Garry Saddington (#1)
Re: Losing data

Garry Saddington wrote:

On Thursday 19 June 2008 18:15, Alan Hodgson wrote:

On Thursday 19 June 2008, Garry Saddington <garry@schoolteachers.co.uk>

wrote:

I read in a
Postgres manual that the hard disk may report to the OS that a write has
occured when it actually has not, is this possible?

Yeah. But unless the power suddenly turned off that wouldn't cause data
loss.

Oh, and the problem
has been intermittant. Another thing that happened this morning is that
Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and the
OS reported this correctly. Restarting postgres sorted it, could this be
the problem?

I strongly suspect the problem is between the keyboard and the chair.

I'd love to agree, but I have seen this first hand as a user!

In any case, however, if PostgreSQL reported the transaction complete and
the machine didn't experience any hardware problems (like sudden power or
disk failure), I would certainly not suspect PostgreSQL as the source of
the problem.

What has happened to the reports then? I have used this combination of Zope
and Postgres for 5 years with no problems like this before and we have
written one complete set of reports on this server in the past 6 weeks. The
problem seems to have started last friday, when reports started to go
missing.

What has changed prior to Friday?

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin

#4Joshua D. Drake
jd@commandprompt.com
In reply to: Garry Saddington (#1)
Re: Losing data

On Thu, 2008-06-19 at 19:06 +0100, Garry Saddington wrote:

On Thursday 19 June 2008 18:15, Alan Hodgson wrote:

On Thursday 19 June 2008, Garry Saddington <garry@schoolteachers.co.uk>

Although I appreciate that this is a funky problem, the problem doesn't
yet exist and we are operating in a diagnostic vacuum.

Can you duplicate the problem now?

Do you have any logs from PostgreSQL that describe the session that
either previously or currently (based on your tests) the problem?

Do those logs actually show ERROR?

What about your apachelog or zope logs?

Have you turned up the logging in all three components to ensure that
the problem actually does exist?

Sincerely,

Joshua D. Drake

Show quoted text

regards
Garry

#5Garry Saddington
garry@schoolteachers.co.uk
In reply to: Scott Marlowe (#2)
Re: Losing data

On Thursday 19 June 2008 18:09, Scott Marlowe wrote:

On Thu, Jun 19, 2008 at 9:55 AM, Garry Saddington

<garry@schoolteachers.co.uk> wrote:

I have had a serious loss of data and wondered if anyone could shed any
light on what may have happened.
My users have been writing reports on students. No error messages have
been produced and when called back up the reports seem to be present at
the time of writing. However, next day they have disappeared, and they do
not appear in a pg_dump. They seem to have been kept in memory and never
written to disk.
We are using Zope and connecting to Postgres through psycopg on Centos 5.
I suspect a hard disk failure but any other ideas would be welcome.
Would these reports be in the WAL?
regards

Just a guess, but do you have some kind of search engine pointing at
the application? Could it be hitting a "delete" link maybe?

Don't think so, once a teacher has entered a report there is no way that they
can delete it.
Regards
Garry

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Garry Saddington (#5)
Re: Losing data

-------------- Original message ----------------------
From: Garry Saddington <garry@schoolteachers.co.uk>

On Thursday 19 June 2008 18:52, Adrian Klaver wrote:

-------------- Original message ----------------------
From: Garry Saddington <garry@schoolteachers.co.uk>

On Thursday 19 June 2008 16:55, Joshua D. Drake wrote:

On Thu, 2008-06-19 at 16:55 +0100, Garry Saddington wrote:

I have had a serious loss of data and wondered if anyone could shed
any light on what may have happened.
My users have been writing reports on students. No error messages
have been produced and when called back up the reports seem to be
present at the time of writing. However, next day they have
disappeared, and they do not appear in a pg_dump. They seem to have
been kept in memory and never written to disk.
We are using Zope and connecting to Postgres through psycopg on
Centos 5. I suspect a hard disk failure but any other ideas would be
welcome. Would these reports be in the WAL?

If it was hardware related you would know, quickly. This sounds a great
deal more like an application level interaction. Perhaps your zope
application caches things for a while before committing to disk?

Yes I thought of this but once the report is sent to the DB a separate
query is run to get all of that teacher's reports and these are then
displayed on a new page. They all appear here but then disappear later.
Zope has transaction machinery that rolls everything back on an error, so
Postgres must have indicated a successful write somehow. I read in a
Postgres manual that the hard disk may report to the OS that a write has
occured when it actually has not, is this possible? Oh, and the problem
has been intermittant. Another thing that happened this morning is that
Postgres had today as 18/06/2008 when in fact it was 19/06/2008 and the
OS reported this correctly. Restarting postgres sorted it, could this be
the problem?
Regards
Garry

Seems like a transaction with no commit. Basically along as the session is
active the data is there but once the session is closed the data does not
persist.

Makes sense but what is to blame?

Therein lies the rub. As Josh mentioned in another post you will need turn up the logging for the various components and rerun the suspect procedure. I find the Postgres logs to most informative as they show what is actually hitting the database. Once I see the actual SQL statement I work backwards to see how it got there and in that form.

Regards
Garry

--
Adrian Klaver
aklaver@comcast.net

#7Garry Saddington
garry@schoolteachers.co.uk
In reply to: Joshua D. Drake (#4)
Re: Losing data

On Thursday 19 June 2008 19:03, Joshua D. Drake wrote:

On Thu, 2008-06-19 at 19:06 +0100, Garry Saddington wrote:

On Thursday 19 June 2008 18:15, Alan Hodgson wrote:

On Thursday 19 June 2008, Garry Saddington <garry@schoolteachers.co.uk>

Although I appreciate that this is a funky problem, the problem doesn't
yet exist and we are operating in a diagnostic vacuum.

Can you duplicate the problem now?

The problem is not consistent, some reports are lsot others not.

Do you have any logs from PostgreSQL that describe the session that
either previously or currently (based on your tests) the problem?

These are at school so I will look tomorrow

Do those logs actually show ERROR?

There are no errors relating to reports

What about your apachelog or zope logs?

Again tomorrow

Have you turned up the logging in all three components to ensure that
the problem actually does exist?

Will do this tomorrow
regards and thanks for your efforts.

PS
Could it be somethiing to do with libpq.so.4/5 required by psycopg and will
the WAL hold data to be replayed?

#8Joshua D. Drake
jd@commandprompt.com
In reply to: Garry Saddington (#7)
Re: Losing data

On Thu, 2008-06-19 at 20:29 +0100, Garry Saddington wrote:

On Thursday 19 June 2008 19:03, Joshua D. Drake wrote:

On Thu, 2008-06-19 at 19:06 +0100, Garry Saddington wrote:

On Thursday 19 June 2008 18:15, Alan Hodgson wrote:

On Thursday 19 June 2008, Garry Saddington

<garry@schoolteachers.co.uk>

PS
Could it be somethiing to do with libpq.so.4/5 required by psycopg and will

I think it certainly has something to do with psycopg and the way it
handles commits. Especially based on the description of what you say
above where some work and some don't. My guess (as I mentioned
previously) is there is a bit of code in your report generator that
doesn't call commit correctly.

the WAL hold data to be replayed?

Only if you are pushing the logs off for standby operations and only if
you commit.

Sincerely,

Joshua D. Drake

Show quoted text