strange behaviour (bug)

Started by Kovacs Zoltanover 25 years ago10 messages

kovacsz@pc10.radnoti-szeged.sulinet.hu

over 25 years ago

Hi,

I experience a strange error with 7.0.2. I cannot get any results with
certain queries. For example, a foo table is defined with a few columns,
it has a

id_string varchar(100)

column, too. I filling this table, it contains e.g. a row with
'something' in the column id_string. I give the next query:

select * from foo where id_string = 'something';

I get no result.

select * from foo where id_string like '%something';

I get the row. Strange. Then, if I try to check the result:

select substr(id_string,1,1) from foo where id_string like '%something';

now I will get 's' as expected... Dumping the database out and bringing it
back the problem doesn't appear anymore... for a while... I cannot give
an exact report, but usually this bug occurs when I stop the database
and I start it again.

Did anybody experience such a behaviour?

TIA, Zoltan

Kov\'acs, Zolt\'an
kovacsz@pc10.radnoti-szeged.sulinet.hu
http://www.math.u-szeged.hu/~kovzol
ftp://pc10.radnoti-szeged.sulinet.hu/home/kovacsz

Tom Lane

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Kovacs Zoltan (#1)

Re: strange behaviour (bug)

Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> writes:

now I will get 's' as expected... Dumping the database out and bringing it
back the problem doesn't appear anymore... for a while... I cannot give
an exact report, but usually this bug occurs when I stop the database
and I start it again.

Hmm. Is it possible that when you restart the postmaster, you are
accidentally starting it with a different environment --- in particular,
different LOCALE or LC_xxx settings --- than it had before?

If there is an index on id_string then

select * from foo where id_string = 'something';

would try to use the index, and so could get messed up by a change
in LOCALE; the index would now appear to be out of order according to
the new LOCALE value.

We really ought to fix things so that all the LOCALE settings are saved
by "initdb" and then re-established during postmaster start, rather than
relying on the user always to start the postmaster with the same
environment. People have been burnt by this before :-(

regards, tom lane

Hiroshi Inoue

Inoue@tpf.co.jp

over 25 years ago

In reply to: Tom Lane (#2)

RE: strange behaviour (bug)

-----Original Message-----
From: Tom Lane

Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> writes:

now I will get 's' as expected... Dumping the database out and

bringing it

back the problem doesn't appear anymore... for a while... I cannot give
an exact report, but usually this bug occurs when I stop the database
and I start it again.

Hmm. Is it possible that when you restart the postmaster, you are
accidentally starting it with a different environment --- in particular,
different LOCALE or LC_xxx settings --- than it had before?

If there is an index on id_string then

select * from foo where id_string = 'something';

would try to use the index, and so could get messed up by a change
in LOCALE; the index would now appear to be out of order according to
the new LOCALE value.

There could be another cause.
If a B-tree page A was splitted to the page A(changed) and a page B but
the transaction was rolled back,the pages A,B would not be written to
disc and the followings could occur for example.
1) The changed non-leaf page of A and B may be written to disc later.
2) An index entry may be inserted into the page B and committed later.

I don't know how often those could occur.

Regards.

Hiroshi Inoue

Tom Lane

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Hiroshi Inoue (#3)

Re: strange behaviour (bug)

"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:

If a B-tree page A was splitted to the page A(changed) and a page B but
the transaction was rolled back,the pages A,B would not be written to
disc and the followings could occur for example.

Yes. I have been thinking that it's a mistake not to write changed
pages to disk at transaction abort, because that just makes for a longer
window where a system crash might leave you with corrupted indexes.
I don't think fsync is really essential, but leaving the pages unwritten
in shared memory is bad. (For example, if we next shut down the
postmaster, then the pages will NEVER get written.)

Skipping the update is a bit silly anyway; we aren't really that
concerned about optimizing performance of abort, are we?

regards, tom lane

Hiroshi Inoue

Inoue@tpf.co.jp

over 25 years ago

In reply to: Tom Lane (#4)

RE: strange behaviour (bug)

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:

If a B-tree page A was splitted to the page A(changed) and a page B but
the transaction was rolled back,the pages A,B would not be written to
disc and the followings could occur for example.

Yes. I have been thinking that it's a mistake not to write changed
pages to disk at transaction abort, because that just makes for a longer
window where a system crash might leave you with corrupted indexes.
I don't think fsync is really essential, but leaving the pages unwritten
in shared memory is bad. (For example, if we next shut down the
postmaster, then the pages will NEVER get written.)

Skipping the update is a bit silly anyway; we aren't really that
concerned about optimizing performance of abort, are we?

Probably WAL would solve this phenomenon by rolling
back the content of disc and shared buffer in reality.
However if 7.0.x would be released we had better change
bufmgr IMHO.

Regards.

Hiroshi Inoue

Mikheev, Vadim

vmikheev@SECTORBASE.COM

over 25 years ago

In reply to: Hiroshi Inoue (#5)

RE: strange behaviour (bug)

Probably WAL would solve this phenomenon by rolling
back the content of disc and shared buffer in reality.
However if 7.0.x would be released we had better change
bufmgr IMHO.

I'm going to handle btree split but currently there is no way
to rollback it - we unlock splitted pages after parent
is locked and concurrent backend may update one/both of
siblings before we get our locks back.
We have to continue with split or could leave parent unchanged
and handle "my bits moved..." (ie continue split in another
xaction if we found no parent for a page) ... or we could hold
locks on all splitted pages till some parent updated without
split, but I wouldn't do this.

Vadim

Import Notes

Resolved by subject fallback

Hiroshi Inoue

Inoue@tpf.co.jp

over 25 years ago

In reply to: Mikheev, Vadim (#6)

RE: strange behaviour (bug)

-----Original Message-----
From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM]

Probably WAL would solve this phenomenon by rolling
back the content of disc and shared buffer in reality.
However if 7.0.x would be released we had better change
bufmgr IMHO.

I'm going to handle btree split but currently there is no way
to rollback it - we unlock splitted pages after parent
is locked and concurrent backend may update one/both of
siblings before we get our locks back.
We have to continue with split or could leave parent unchanged
and handle "my bits moved..." (ie continue split in another
xaction if we found no parent for a page) ... or we could hold
locks on all splitted pages till some parent updated without
split, but I wouldn't do this.

It seems to me that btree split operations must always be
rolled forward even in case of abort/crash. DO you have
other ideas ?

Regards.

Hiroshi Inoue

Mikheev, Vadim

vmikheev@SECTORBASE.COM

over 25 years ago

In reply to: Hiroshi Inoue (#7)

RE: strange behaviour (bug)

I'm going to handle btree split but currently there is no way
to rollback it - we unlock splitted pages after parent
is locked and concurrent backend may update one/both of
siblings before we get our locks back.
We have to continue with split or could leave parent unchanged
and handle "my bits moved..." (ie continue split in another
xaction if we found no parent for a page) ... or we could hold
locks on all splitted pages till some parent updated without
split, but I wouldn't do this.

It seems to me that btree split operations must always be
rolled forward even in case of abort/crash. DO you have
other ideas ?

Yes, it should, but hard to implement, especially for abort case.
So, for the moment, I would proceed with handling "my bits moved...":
no reason to elog(FATAL) here - we can try to insert missed pointers
into parent page(s). WAL will guarantee that btitems moved to right
sibling will not be lost (level consistency), and missing some pointers
in parent level is acceptable - scans will work.

Vadim

Import Notes

Resolved by subject fallback

Hiroshi Inoue

Inoue@tpf.co.jp

over 25 years ago

In reply to: Mikheev, Vadim (#8)

RE: strange behaviour (bug)

-----Original Message-----
From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM]

I'm going to handle btree split but currently there is no way
to rollback it - we unlock splitted pages after parent
is locked and concurrent backend may update one/both of
siblings before we get our locks back.
We have to continue with split or could leave parent unchanged
and handle "my bits moved..." (ie continue split in another
xaction if we found no parent for a page) ... or we could hold
locks on all splitted pages till some parent updated without
split, but I wouldn't do this.

It seems to me that btree split operations must always be
rolled forward even in case of abort/crash. DO you have
other ideas ?

Yes, it should, but hard to implement, especially for abort case.
So, for the moment, I would proceed with handling "my bits moved...":
no reason to elog(FATAL) here - we can try to insert missed pointers
into parent page(s). WAL will guarantee that btitems moved to right
sibling will not be lost (level consistency), and missing some pointers
in parent level is acceptable - scans will work.

I looked into your XLOG stuff a little.
It seems that XLogFileOpen() isn't implemented yet.
Would/should XLogFIleOpen() guarantee to open a Relation
properly at any time ?

Regards.

Hiroshi Inoue

#10

Mikheev, Vadim

vmikheev@SECTORBASE.COM

over 25 years ago

In reply to: Hiroshi Inoue (#9)

RE: strange behaviour (bug)

I looked into your XLOG stuff a little.
It seems that XLogFileOpen() isn't implemented yet.
Would/should XLogFIleOpen() guarantee to open a Relation
properly at any time ?

If each relation will have unique file name then there will be no
problem. If a relation was dropped then after crash redo will try
to open probably unexisted file. XLogFileOpen will return NULL in this case
(redo will do nothing) and remember this fact (ie - "file deletion is
expected").

Vadim

Import Notes

Resolved by subject fallback