Addled index

Started by Oleg Alexeevabout 13 years ago10 messagesgeneral

oalexeev@gmail.com

about 13 years ago

We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, a_id).
B.a_id is foreign key to A. Both name columns in tables contains identical
values for A.id = B.a_id. A.name column has unique constraint and
additional index by it.

So, in some moment results for queries like [select id from A where name =
'petya'] became empty (row with 'petya' name exist in A).

But query [select a_id from B where name = 'petya'] returns A.id and
[select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

--
Oleg V Alexeev
E:oalexeev@gmail.com

Steve Crawford

scrawford@pinpointresearch.com

about 13 years ago

In reply to: Oleg Alexeev (#1)

Re: Addled index

On 03/15/2013 11:29 AM, Oleg Alexeev wrote:

We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name,
a_id). B.a_id is foreign key to A. Both name columns in tables
contains identical values for A.id = B.a_id. A.name column has unique
constraint and additional index by it.

So, in some moment results for queries like [select id from A where
name = 'petya'] became empty (row with 'petya' name exist in A).

But query [select a_id from B where name = 'petya'] returns A.id and
[select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

What version??

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Oleg Alexeev

oalexeev@gmail.com

about 13 years ago

In reply to: Steve Crawford (#2)

Re: Addled index

On 16 March 2013 01:21, Steve Crawford <scrawford@pinpointresearch.com>wrote:

On 03/15/2013 11:29 AM, Oleg Alexeev wrote:

We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, a_id).
B.a_id is foreign key to A. Both name columns in tables contains identical
values for A.id = B.a_id. A.name column has unique constraint and
additional index by it.

So, in some moment results for queries like [select id from A where name
= 'petya'] became empty (row with 'petya' name exist in A).

But query [select a_id from B where name = 'petya'] returns A.id and
[select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

What version??

The first one fail was on 9.1.? (table with at least 10 000 000 rows with
20% every day modifications)

Two day ago was another one fail on 9.2.3. (table with 120 000 rows with
less than 0.5% every day modifications)

--
Oleg V Alexeev
E:oalexeev@gmail.com

Alban Hertroys

haramrae@gmail.com

about 13 years ago

In reply to: Oleg Alexeev (#3)

Re: Addled index

On Mar 16, 2013, at 9:33, Oleg Alexeev <oalexeev@gmail.com> wrote:

On 16 March 2013 01:21, Steve Crawford <scrawford@pinpointresearch.com> wrote:
On 03/15/2013 11:29 AM, Oleg Alexeev wrote:
We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, a_id). B.a_id is foreign key to A. Both name columns in tables contains identical values for A.id = B.a_id. A.name column has unique constraint and additional index by it.

So, in some moment results for queries like [select id from A where name = 'petya'] became empty (row with 'petya' name exist in A).

But query [select a_id from B where name = 'petya'] returns A.id and [select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

What version??

The first one fail was on 9.1.? (table with at least 10 000 000 rows with 20% every day modifications)

Two day ago was another one fail on 9.2.3. (table with 120 000 rows with less than 0.5% every day modifications)

Perhaps the name you're not finding is spelled differently than what you're typing, due to collation?

If there's actually something wrong with the database; it looks a bit like your tables and your indexes get out of sync somehow, which normally wouldn't be possible. I'm mostly guessing, but perhaps one of the below has something to do with it:
Maybe you turned fsync off?
What type of index is that? A standard btree or one of the newer types?
Are those tables and indexes perhaps on some kind of virtual storage or on a file-system that might be rolling back file-system transactions? It this database perhaps a replicated node?

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Tom Lane

tgl@sss.pgh.pa.us

about 13 years ago

In reply to: Alban Hertroys (#4)

Re: Addled index

Alban Hertroys <haramrae@gmail.com> writes:

If there's actually something wrong with the database; it looks a bit like your tables and your indexes get out of sync somehow, which normally wouldn't be possible. I'm mostly guessing, but perhaps one of the below has something to do with it:
Maybe you turned fsync off?
What type of index is that? A standard btree or one of the newer types?
Are those tables and indexes perhaps on some kind of virtual storage or on a file-system that might be rolling back file-system transactions? It this database perhaps a replicated node?

More generally: since we're not hearing this type of complaint from
other people, there must be something pretty unusual about your
installation. You've provided no information that would suggest what,
though. Aside from Alban's questions, some other things come to mind:

* is that a plain text column, or some other data type?
* what collation/ctype is your database using?
* what nondefault parameter settings are you using?
* where did you get the Postgres executables from? Some distro (whose)?
If they're self-built, what compiler and configuration settings did
you use?
* what platform is this? I would not rule out kernel bugs or flaky
hardware.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Oleg Alexeev

oalexeev@gmail.com

about 13 years ago

In reply to: Alban Hertroys (#4)

Re: Addled index

On 16 March 2013 14:32, Alban Hertroys <haramrae@gmail.com> wrote:

On Mar 16, 2013, at 9:33, Oleg Alexeev <oalexeev@gmail.com> wrote:

On 16 March 2013 01:21, Steve Crawford <scrawford@pinpointresearch.com>wrote:

On 03/15/2013 11:29 AM, Oleg Alexeev wrote:

We've faced with strange index problem.

At some moment index became bad and queries does not return any data.

For example, there are two tables - A (id, name) and B (id, name, a_id).
B.a_id is foreign key to A. Both name columns in tables contains identical
values for A.id = B.a_id. A.name column has unique constraint and
additional index by it.

So, in some moment results for queries like [select id from A where name
= 'petya'] became empty (row with 'petya' name exist in A).

But query [select a_id from B where name = 'petya'] returns A.id and
[select * from A where id = <found id>] returns row.

This problem can be solved by index recreation only.

How can we avoid such situation?

What version??

The first one fail was on 9.1.? (table with at least 10 000 000 rows with
20% every day modifications)

Two day ago was another one fail on 9.2.3. (table with 120 000 rows with
less than 0.5% every day modifications)

Perhaps the name you're not finding is spelled differently than what
you're typing, due to collation?

If there's actually something wrong with the database; it looks a bit like
your tables and your indexes get out of sync somehow, which normally
wouldn't be possible. I'm mostly guessing, but perhaps one of the below has
something to do with it:
Maybe you turned fsync off?
What type of index is that? A standard btree or one of the newer types?
Are those tables and indexes perhaps on some kind of virtual storage or on
a file-system that might be rolling back file-system transactions? It this
database perhaps a replicated node?

Oh, it is not about short experiments. :)

Both failed queries are part of 24/7 application. And one of the query
results became empty for existing key at some moment. We've recreated index
and same query returned to the normal work.

fsync is in off state, yes

Both failed indexes are btree type.

Database located on software md raid 1 based on two SSD disks array. Ext4
filesystem. Database is master node.

--
Oleg V Alexeev
E:oalexeev@gmail.com

Oleg Alexeev

oalexeev@gmail.com

about 13 years ago

In reply to: Tom Lane (#5)

Re: Addled index

On 16 March 2013 19:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alban Hertroys <haramrae@gmail.com> writes:

If there's actually something wrong with the database; it looks a bit

like your tables and your indexes get out of sync somehow, which normally
wouldn't be possible. I'm mostly guessing, but perhaps one of the below has
something to do with it:

Maybe you turned fsync off?
What type of index is that? A standard btree or one of the newer types?
Are those tables and indexes perhaps on some kind of virtual storage or

on a file-system that might be rolling back file-system transactions? It
this database perhaps a replicated node?

More generally: since we're not hearing this type of complaint from
other people, there must be something pretty unusual about your
installation. You've provided no information that would suggest what,
though. Aside from Alban's questions, some other things come to mind:

* is that a plain text column, or some other data type?
* what collation/ctype is your database using?
* what nondefault parameter settings are you using?
* where did you get the Postgres executables from? Some distro (whose)?
If they're self-built, what compiler and configuration settings did
you use?
* what platform is this? I would not rule out kernel bugs or flaky
hardware.

regards, tom lane

* it is varchar columns, 256 and 32 symbols length
* encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
* autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
checkpoint_completion_target = 0.7
* postgres 9.2.3 installed via yum repository for version 9.2
* 64 bit Centos 6, installed and updated from yum repository

--
Oleg V Alexeev
E:oalexeev@gmail.com

Tom Lane

tgl@sss.pgh.pa.us

about 13 years ago

In reply to: Oleg Alexeev (#7)

Re: Addled index

Oleg Alexeev <oalexeev@gmail.com> writes:

* it is varchar columns, 256 and 32 symbols length
* encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
* autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
checkpoint_completion_target = 0.7
* postgres 9.2.3 installed via yum repository for version 9.2
* 64 bit Centos 6, installed and updated from yum repository

fsync off? Have you had any power failures or other system crashes?
ext4 is *way* more prone than ext3 was to corrupt data when fsync is
disabled, because it caches and reorders writes much more aggressively.

Database located on software md raid 1 based on two SSD disks array. Ext4
filesystem. Database is master node.

Meh. I quote from the RHEL6 documentation (Storage Administration
Guide, Chapter 20: Solid-State Disk Deployment Guidelines):

Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not
recommended for use on SSDs.

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newmds-ssdtuning.html

The part of the docs I'm looking at only asserts that performance is
bad, but considering that it's a deprecated combination, it may well be
that there are data-loss bugs in there. I'd certainly suggest making
sure you are on a *recent* kernel. If that doesn't help, reconsider
your filesystem choices.

(Disclaimer: I work for Red Hat, but not in the filesystem group,
so I don't necessarily know what I'm talking about. But I have the
feeling you have chosen a configuration that's pretty bleeding-edge
for RHEL6.)

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Grzegorz Jaśkiewicz

gryzman@gmail.com

about 13 years ago

In reply to: Tom Lane (#8)

Re: Addled index

On 17 Mar 2013, at 04:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Oleg Alexeev <oalexeev@gmail.com> writes:

* it is varchar columns, 256 and 32 symbols length
* encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
* autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
checkpoint_completion_target = 0.7
* postgres 9.2.3 installed via yum repository for version 9.2
* 64 bit Centos 6, installed and updated from yum repository

fsync off? Have you had any power failures or other system crashes?
ext4 is *way* more prone than ext3 was to corrupt data when fsync is
disabled, because it caches and reorders writes much more aggressively.

Database located on software md raid 1 based on two SSD disks array. Ext4
filesystem. Database is master node.

Meh. I quote from the RHEL6 documentation (Storage Administration
Guide, Chapter 20: Solid-State Disk Deployment Guidelines):

Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not
recommended for use on SSDs.

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newmds-ssdtuning.html

The part of the docs I'm looking at only asserts that performance is
bad, but considering that it's a deprecated combination, it may well be
that there are data-loss bugs in there. I'd certainly suggest making
sure you are on a *recent* kernel. If that doesn't help, reconsider
your filesystem choices.

Yeah, I don't think I'd consider using software raid for SSDs any time a good idea

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

Oleg Alexeev

oalexeev@gmail.com

about 13 years ago

In reply to: Tom Lane (#8)

Re: Addled index

On 17 March 2013 08:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Oleg Alexeev <oalexeev@gmail.com> writes:

* it is varchar columns, 256 and 32 symbols length
* encoding, collation and ctype: UTF8, en_US.utf8, en_US.utf8
* autovacuum, fsync off, full_page_writes = on, wal_writer_delay = 500ms,
commit_delay = 100, commit_siblings = 10, checkpoint_timeout = 20min,
checkpoint_completion_target = 0.7
* postgres 9.2.3 installed via yum repository for version 9.2
* 64 bit Centos 6, installed and updated from yum repository

fsync off? Have you had any power failures or other system crashes?
ext4 is *way* more prone than ext3 was to corrupt data when fsync is
disabled, because it caches and reorders writes much more aggressively.

Database located on software md raid 1 based on two SSD disks array. Ext4
filesystem. Database is master node.

Meh. I quote from the RHEL6 documentation (Storage Administration
Guide, Chapter 20: Solid-State Disk Deployment Guidelines):

Red Hat also warns that software RAID levels 1, 4, 5, and 6 are not
recommended for use on SSDs.

https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/newmds-ssdtuning.html

The part of the docs I'm looking at only asserts that performance is
bad, but considering that it's a deprecated combination, it may well be
that there are data-loss bugs in there. I'd certainly suggest making
sure you are on a *recent* kernel. If that doesn't help, reconsider
your filesystem choices.

(Disclaimer: I work for Red Hat, but not in the filesystem group,
so I don't necessarily know what I'm talking about. But I have the
feeling you have chosen a configuration that's pretty bleeding-edge
for RHEL6.)

regards, tom lane

I think fsync=off was really bad idea.

--
Oleg V Alexeev
E:oalexeev@gmail.com