Possible index issue on 9.5 slave

Started by Ian Barwickover 11 years ago8 messages
#1Ian Barwick
ian@2ndquadrant.com

Hi

I've just run into an index issue on 9.5 HEAD on a slave (master and slave
both compiled from 66802246e22d51858cd543877fcfddf24e6812f2); details
below (I have only found one index on the slave where the issue occurs so far).
The setup is admittedly slightly unusual; master is OS X 10.7.5, slave is
CentOS on a Virtualbox guest VM on the same system. The issue only occurs
with this combination of master and slave; I haven't been able to reproduce
it with master and slave running natively on OS X, or with a Linux guest VM
on a Linux machine. I have reproduced it several times on the OS X/Linux guest VM
combination.

I can't dig any further into this at the moment but can happily provide further
details etc.

Master
======

$ uname -a
Darwin nara.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64

tgg_current=> SELECT version();
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 9.5devel on x86_64-apple-darwin11.4.2, compiled by gcc (MacPorts gcc48 4.8.2_2) 4.8.2, 64-bit
(1 row)

tgg_current=> select user_id, login from tgg_user where login ='admin';
user_id | login
---------+-------
1 | admin
(1 row)

Slave
=====

$ uname -a
Linux localhost.localdomain 2.6.32-358.el6.x86_64 #1 SMP Fri Feb 22 00:31:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

tgg_current=> select version();
version
-----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.5devel on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3), 64-bit
(1 row)

tgg_current=> select user_id,login from tgg_user where login ='admin';
user_id | login
---------+-------
(0 rows)

tgg_current=> explain select user_id,login from tgg_user where login ='admin';
QUERY PLAN
------------------------------------------------------------------------------------
Index Scan using tgg_user_login_key on tgg_user (cost=0.28..8.30 rows=1 width=15)
Index Cond: ((login)::text = 'admin'::text)
Planning time: 0.105 ms
(3 rows)

tgg_current=> set enable_bitmapscan=off;
SET
tgg_current=> set enable_indexscan =off;
SET
tgg_current=> select user_id,login from tgg_user where login ='admin';
user_id | login
---------+-------
1 | admin
(1 row)

tgg_current=> \d tgg_user_login_key
Index "epp.tgg_user_login_key"
Column | Type | Definition
--------+-----------------------+------------
login | character varying(32) | login
unique, btree, for table "epp.tgg_user"

Regards

Ian Barwick

--
Ian Barwick http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Peter Geoghegan
pg@heroku.com
In reply to: Ian Barwick (#1)
Re: Possible index issue on 9.5 slave

On Wed, Jun 18, 2014 at 6:54 PM, Ian Barwick <ian@2ndquadrant.com> wrote:

I've just run into an index issue on 9.5 HEAD on a slave (master and slave
both compiled from 66802246e22d51858cd543877fcfddf24e6812f2); details
below (I have only found one index on the slave where the issue occurs so far).

Would you mind running my btreecheck tool on both systems? That might
shed some light on this. You can get it from:
/messages/by-id/CAM3SWZRtV+xmRWLWq6c-x7czvwavFdwFi4St1zz4dDgFH4yN4g@mail.gmail.com
.

I suggest running bt_parent_index_verify() and bt_leftright_verify()
on all indexes on both systems. It shouldn't take too long.

Thanks
--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Ian Barwick
ian@2ndquadrant.com
In reply to: Peter Geoghegan (#2)
Re: Possible index issue on 9.5 slave

On 19/06/14 11:58, Peter Geoghegan wrote:

On Wed, Jun 18, 2014 at 6:54 PM, Ian Barwick <ian@2ndquadrant.com> wrote:

I've just run into an index issue on 9.5 HEAD on a slave (master and slave
both compiled from 66802246e22d51858cd543877fcfddf24e6812f2); details
below (I have only found one index on the slave where the issue occurs so far).

Would you mind running my btreecheck tool on both systems? That might
shed some light on this. You can get it from:
/messages/by-id/CAM3SWZRtV+xmRWLWq6c-x7czvwavFdwFi4St1zz4dDgFH4yN4g@mail.gmail.com
.

I suggest running bt_parent_index_verify() and bt_leftright_verify()
on all indexes on both systems. It shouldn't take too long.

Interesting, I'll take a look later.

Thanks

Ian Barwick

--
Ian Barwick http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Peter Geoghegan
pg@heroku.com
In reply to: Ian Barwick (#3)
Re: Possible index issue on 9.5 slave

On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick <ian@2ndquadrant.com> wrote:

Interesting, I'll take a look later.

I'm pretty suspicious of incompatibilities that may exist between the
two sets of OS collations involved here. We aren't very clear on the
extent to which what you're doing is supported, but it's certainly the
case that bttextcmp()/varstr_cmp()/strcoll() return values must be
immutable between the two systems. Still, it should be possible to
determine if that's the problem using btreecheck.

Do you get perfectly consistent answers between the two when you ORDER BY login?

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Ian Barwick
ian@2ndquadrant.com
In reply to: Peter Geoghegan (#4)
Re: Possible index issue on 9.5 slave

On 19/06/14 12:30, Peter Geoghegan wrote:

On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick <ian@2ndquadrant.com> wrote:

Interesting, I'll take a look later.

I'm pretty suspicious of incompatibilities that may exist between the
two sets of OS collations involved here. We aren't very clear on the
extent to which what you're doing is supported, but it's certainly the
case that bttextcmp()/varstr_cmp()/strcoll() return values must be
immutable between the two systems. Still, it should be possible to
determine if that's the problem using btreecheck.

Do you get perfectly consistent answers between the two when you ORDER BY login?

Hmm, nope, different sort order.

Regards

Ian Barwick

--
Ian Barwick http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#4)
Re: Possible index issue on 9.5 slave

Peter Geoghegan <pg@heroku.com> writes:

On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick <ian@2ndquadrant.com> wrote:

Interesting, I'll take a look later.

I'm pretty suspicious of incompatibilities that may exist between the
two sets of OS collations involved here. We aren't very clear on the
extent to which what you're doing is supported, but it's certainly the
case that bttextcmp()/varstr_cmp()/strcoll() return values must be
immutable between the two systems.

Oooh, I'll bet that's exactly it. Is the database using UTF8 encoding and
a non-C locale? It's well known that OS X's UTF8 locales sort nothing at
all like the supposedly equivalent locales on other systems.

Still, it should be possible to
determine if that's the problem using btreecheck.

Does btreecheck attempt to verify that the sort ordering of the index
matches the comparison behavior of the datatype? That would (in general)
require calling user-defined code, which seems like probably a pretty
bad idea for the purposes btreecheck is being advertised for.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Peter Geoghegan
pg@heroku.com
In reply to: Tom Lane (#6)
Re: Possible index issue on 9.5 slave

On Wed, Jun 18, 2014 at 8:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Still, it should be possible to
determine if that's the problem using btreecheck.

Does btreecheck attempt to verify that the sort ordering of the index
matches the comparison behavior of the datatype? That would (in general)
require calling user-defined code, which seems like probably a pretty
bad idea for the purposes btreecheck is being advertised for.

Yes, it does, but I see no alternative for a general-purpose tool, and
the fact that it is general purpose is of considerable value. I have
more or less invented my own weird index scans.

I assume you're referring to the field-verification of indexes use
case, which is not an immediate goal of btreecheck, even though it's
an important goal that there has already been some discussion of.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Ian Barwick
ian@2ndquadrant.com
In reply to: Tom Lane (#6)
Re: Possible index issue on 9.5 slave

On 19/06/14 12:35, Tom Lane wrote:

Peter Geoghegan <pg@heroku.com> writes:

On Wed, Jun 18, 2014 at 8:09 PM, Ian Barwick <ian@2ndquadrant.com> wrote:

Interesting, I'll take a look later.

I'm pretty suspicious of incompatibilities that may exist between the
two sets of OS collations involved here. We aren't very clear on the
extent to which what you're doing is supported, but it's certainly the
case that bttextcmp()/varstr_cmp()/strcoll() return values must be
immutable between the two systems.

Oooh, I'll bet that's exactly it. Is the database using UTF8 encoding and
a non-C locale?

Yup, that is indeed the case.

It's well known that OS X's UTF8 locales sort nothing at
all like the supposedly equivalent locales on other systems.

True, that. A different sort order wouldn't have surprised me,
but the failure to return an extant row had me thinking there
was something awry with the laptop causing file corruption (it's
getting on in years and has been bashed about a bit).

Regards

Ian Barwick

--
Ian Barwick http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers