strange behaviour (bug)

Started by Kovacs Zoltanover 25 years ago10 messages
#1Kovacs Zoltan
kovacsz@pc10.radnoti-szeged.sulinet.hu

Hi,

I experience a strange error with 7.0.2. I cannot get any results with
certain queries. For example, a foo table is defined with a few columns,
it has a

id_string varchar(100)

column, too. I filling this table, it contains e.g. a row with
'something' in the column id_string. I give the next query:

select * from foo where id_string = 'something';

I get no result.

select * from foo where id_string like '%something';

I get the row. Strange. Then, if I try to check the result:

select substr(id_string,1,1) from foo where id_string like '%something';

now I will get 's' as expected... Dumping the database out and bringing it
back the problem doesn't appear anymore... for a while... I cannot give
an exact report, but usually this bug occurs when I stop the database
and I start it again.

Did anybody experience such a behaviour?

TIA, Zoltan

Kov\'acs, Zolt\'an
kovacsz@pc10.radnoti-szeged.sulinet.hu
http://www.math.u-szeged.hu/~kovzol
ftp://pc10.radnoti-szeged.sulinet.hu/home/kovacsz

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kovacs Zoltan (#1)
Re: strange behaviour (bug)

Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> writes:

now I will get 's' as expected... Dumping the database out and bringing it
back the problem doesn't appear anymore... for a while... I cannot give
an exact report, but usually this bug occurs when I stop the database
and I start it again.

Hmm. Is it possible that when you restart the postmaster, you are
accidentally starting it with a different environment --- in particular,
different LOCALE or LC_xxx settings --- than it had before?

If there is an index on id_string then

select * from foo where id_string = 'something';

would try to use the index, and so could get messed up by a change
in LOCALE; the index would now appear to be out of order according to
the new LOCALE value.

We really ought to fix things so that all the LOCALE settings are saved
by "initdb" and then re-established during postmaster start, rather than
relying on the user always to start the postmaster with the same
environment. People have been burnt by this before :-(

regards, tom lane

#3Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Tom Lane (#2)
RE: strange behaviour (bug)

-----Original Message-----
From: Tom Lane

Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> writes:

now I will get 's' as expected... Dumping the database out and

bringing it

back the problem doesn't appear anymore... for a while... I cannot give
an exact report, but usually this bug occurs when I stop the database
and I start it again.

Hmm. Is it possible that when you restart the postmaster, you are
accidentally starting it with a different environment --- in particular,
different LOCALE or LC_xxx settings --- than it had before?

If there is an index on id_string then

select * from foo where id_string = 'something';

would try to use the index, and so could get messed up by a change
in LOCALE; the index would now appear to be out of order according to
the new LOCALE value.

There could be another cause.
If a B-tree page A was splitted to the page A(changed) and a page B but
the transaction was rolled back,the pages A,B would not be written to
disc and the followings could occur for example.
1) The changed non-leaf page of A and B may be written to disc later.
2) An index entry may be inserted into the page B and committed later.

I don't know how often those could occur.

Regards.

Hiroshi Inoue

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#3)
Re: strange behaviour (bug)

"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:

If a B-tree page A was splitted to the page A(changed) and a page B but
the transaction was rolled back,the pages A,B would not be written to
disc and the followings could occur for example.

Yes. I have been thinking that it's a mistake not to write changed
pages to disk at transaction abort, because that just makes for a longer
window where a system crash might leave you with corrupted indexes.
I don't think fsync is really essential, but leaving the pages unwritten
in shared memory is bad. (For example, if we next shut down the
postmaster, then the pages will NEVER get written.)

Skipping the update is a bit silly anyway; we aren't really that
concerned about optimizing performance of abort, are we?

regards, tom lane

#5Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Tom Lane (#4)
RE: strange behaviour (bug)

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:

If a B-tree page A was splitted to the page A(changed) and a page B but
the transaction was rolled back,the pages A,B would not be written to
disc and the followings could occur for example.

Yes. I have been thinking that it's a mistake not to write changed
pages to disk at transaction abort, because that just makes for a longer
window where a system crash might leave you with corrupted indexes.
I don't think fsync is really essential, but leaving the pages unwritten
in shared memory is bad. (For example, if we next shut down the
postmaster, then the pages will NEVER get written.)

Skipping the update is a bit silly anyway; we aren't really that
concerned about optimizing performance of abort, are we?

Probably WAL would solve this phenomenon by rolling
back the content of disc and shared buffer in reality.
However if 7.0.x would be released we had better change
bufmgr IMHO.

Regards.

Hiroshi Inoue

#6Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Hiroshi Inoue (#5)
RE: strange behaviour (bug)

Probably WAL would solve this phenomenon by rolling
back the content of disc and shared buffer in reality.
However if 7.0.x would be released we had better change
bufmgr IMHO.

I'm going to handle btree split but currently there is no way
to rollback it - we unlock splitted pages after parent
is locked and concurrent backend may update one/both of
siblings before we get our locks back.
We have to continue with split or could leave parent unchanged
and handle "my bits moved..." (ie continue split in another
xaction if we found no parent for a page) ... or we could hold
locks on all splitted pages till some parent updated without
split, but I wouldn't do this.

Vadim

#7Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Mikheev, Vadim (#6)
RE: strange behaviour (bug)

-----Original Message-----
From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM]

Probably WAL would solve this phenomenon by rolling
back the content of disc and shared buffer in reality.
However if 7.0.x would be released we had better change
bufmgr IMHO.

I'm going to handle btree split but currently there is no way
to rollback it - we unlock splitted pages after parent
is locked and concurrent backend may update one/both of
siblings before we get our locks back.
We have to continue with split or could leave parent unchanged
and handle "my bits moved..." (ie continue split in another
xaction if we found no parent for a page) ... or we could hold
locks on all splitted pages till some parent updated without
split, but I wouldn't do this.

It seems to me that btree split operations must always be
rolled forward even in case of abort/crash. DO you have
other ideas ?

Regards.

Hiroshi Inoue

#8Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Hiroshi Inoue (#7)
RE: strange behaviour (bug)

I'm going to handle btree split but currently there is no way
to rollback it - we unlock splitted pages after parent
is locked and concurrent backend may update one/both of
siblings before we get our locks back.
We have to continue with split or could leave parent unchanged
and handle "my bits moved..." (ie continue split in another
xaction if we found no parent for a page) ... or we could hold
locks on all splitted pages till some parent updated without
split, but I wouldn't do this.

It seems to me that btree split operations must always be
rolled forward even in case of abort/crash. DO you have
other ideas ?

Yes, it should, but hard to implement, especially for abort case.
So, for the moment, I would proceed with handling "my bits moved...":
no reason to elog(FATAL) here - we can try to insert missed pointers
into parent page(s). WAL will guarantee that btitems moved to right
sibling will not be lost (level consistency), and missing some pointers
in parent level is acceptable - scans will work.

Vadim

#9Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Mikheev, Vadim (#8)
RE: strange behaviour (bug)

-----Original Message-----
From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM]

I'm going to handle btree split but currently there is no way
to rollback it - we unlock splitted pages after parent
is locked and concurrent backend may update one/both of
siblings before we get our locks back.
We have to continue with split or could leave parent unchanged
and handle "my bits moved..." (ie continue split in another
xaction if we found no parent for a page) ... or we could hold
locks on all splitted pages till some parent updated without
split, but I wouldn't do this.

It seems to me that btree split operations must always be
rolled forward even in case of abort/crash. DO you have
other ideas ?

Yes, it should, but hard to implement, especially for abort case.
So, for the moment, I would proceed with handling "my bits moved...":
no reason to elog(FATAL) here - we can try to insert missed pointers
into parent page(s). WAL will guarantee that btitems moved to right
sibling will not be lost (level consistency), and missing some pointers
in parent level is acceptable - scans will work.

I looked into your XLOG stuff a little.
It seems that XLogFileOpen() isn't implemented yet.
Would/should XLogFIleOpen() guarantee to open a Relation
properly at any time ?

Regards.

Hiroshi Inoue

#10Mikheev, Vadim
vmikheev@SECTORBASE.COM
In reply to: Hiroshi Inoue (#9)
RE: strange behaviour (bug)

I looked into your XLOG stuff a little.
It seems that XLogFileOpen() isn't implemented yet.
Would/should XLogFIleOpen() guarantee to open a Relation
properly at any time ?

If each relation will have unique file name then there will be no
problem. If a relation was dropped then after crash redo will try
to open probably unexisted file. XLogFileOpen will return NULL in this case
(redo will do nothing) and remember this fact (ie - "file deletion is
expected").

Vadim