sorting/grouping/(non-)unique indexes bug

Started by Alexey Borzovalmost 24 years ago4 messagesgeneral
Jump to latest
#1Alexey Borzov
borz_off@rdw.ru

Greetings!

Before I present the following, I must say that 'simple' index
corruption is highly improbable; the server did not suffer hard
reboots.

rdw=# \d reg_user
Table "reg_user"
Column | Type | Modifiers
------------------+------------------------+------------------------------------------------
user_id | integer | not null default nextval('reg_user_seq'::text)
user_email | character varying(50) | not null
user_passwd | character varying(32) | not null
user_active | boolean | not null default 't'
user_allow_pauth | boolean | default 'f'
user_full_name | character varying(100) |
user_pseudonym | character varying(100) |
user_who | character(1) |
Primary key: reg_user_pkey
Unique keys: reg_user_email_key

rdw=# \d reg_user_email_key
Index "reg_user_email_key"
Column | Type
------------+-----------------------
user_email | character varying(50)
unique btree

rdw=# create table broken as select user_email from reg_user;
SELECT

rdw=# -- one may think that emails are unique...
rdw=# select count(user_email) from broken;
count
--------
212400
(1 row)

rdw=# select count(distinct user_email) from broken;
count
--------
212397
(1 row)

rdw=# -- look closely at the numbers
rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
user_email | count
----------------------------------------------------+-------
[duplicated email] | 2
[some email] | 1
[some email] | 1
[some email] | 1

rdw=# delete from broken where user_email = '[duplicated email]';
DELETE 2

rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
user_email | count
----------------------------------------------------+-------
[another email] | 2
[some email] | 1
[some email] | 1

rdw=# delete from broken where user_email = '[another email]';
DELETE 2

rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
user_email | count
----------------------------------------------------+-------
[some email] | 1
[some email] | 1
[some email] | 1

rdw=# --finally, no more duplicates!
rdw=# create unique index broken_email_key on broken (user_email);
ERROR: Cannot create unique index. Table contains non-unique values

rdw=# select version();
version
-------------------------------------------------------------
PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)

$psql -l
List of databases
Name | Owner | Encoding
-----------+----------+-----------
...
rdw | alex | WIN
...

The server is run under ru_RU.CP1251 locale, initdb was done under the
same locale.

As you probably guessed, reg_user table contains actual email
addresses of our site's registered users. I can provide them for
testing (without passwords and stuff, of course) if this is going to
help.

--
Yours, Alexey V. Borzov, Webmaster of RDW.ru

#2Jean-Luc Lachance
jllachan@nsd.ca
In reply to: Alexey Borzov (#1)
Re: sorting/grouping/(non-)unique indexes bug

Unbelievable, but you may have a couple of record with null user_email.

Try:

select * from broken where user_email is null;
select * from reg_user where user_email is null;

Alexey Borzov wrote:

Show quoted text

Greetings!

Before I present the following, I must say that 'simple' index
corruption is highly improbable; the server did not suffer hard
reboots.

rdw=# \d reg_user
Table "reg_user"
Column | Type | Modifiers
------------------+------------------------+------------------------------------------------
user_id | integer | not null default nextval('reg_user_seq'::text)
user_email | character varying(50) | not null
user_passwd | character varying(32) | not null
user_active | boolean | not null default 't'
user_allow_pauth | boolean | default 'f'
user_full_name | character varying(100) |
user_pseudonym | character varying(100) |
user_who | character(1) |
Primary key: reg_user_pkey
Unique keys: reg_user_email_key

rdw=# \d reg_user_email_key
Index "reg_user_email_key"
Column | Type
------------+-----------------------
user_email | character varying(50)
unique btree

rdw=# create table broken as select user_email from reg_user;
SELECT

rdw=# -- one may think that emails are unique...
rdw=# select count(user_email) from broken;
count
--------
212400
(1 row)

rdw=# select count(distinct user_email) from broken;
count
--------
212397
(1 row)

rdw=# -- look closely at the numbers
rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
user_email | count
----------------------------------------------------+-------
[duplicated email] | 2
[some email] | 1
[some email] | 1
[some email] | 1

rdw=# delete from broken where user_email = '[duplicated email]';
DELETE 2

rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
user_email | count
----------------------------------------------------+-------
[another email] | 2
[some email] | 1
[some email] | 1

rdw=# delete from broken where user_email = '[another email]';
DELETE 2

rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
user_email | count
----------------------------------------------------+-------
[some email] | 1
[some email] | 1
[some email] | 1

rdw=# --finally, no more duplicates!
rdw=# create unique index broken_email_key on broken (user_email);
ERROR: Cannot create unique index. Table contains non-unique values

rdw=# select version();
version
-------------------------------------------------------------
PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)

$psql -l
List of databases
Name | Owner | Encoding
-----------+----------+-----------
...
rdw | alex | WIN
...

The server is run under ru_RU.CP1251 locale, initdb was done under the
same locale.

As you probably guessed, reg_user table contains actual email
addresses of our site's registered users. I can provide them for
testing (without passwords and stuff, of course) if this is going to
help.

--
Yours, Alexey V. Borzov, Webmaster of RDW.ru

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#3Alexey Borzov
borz_off@rdw.ru
In reply to: Jean-Luc Lachance (#2)
Re: sorting/grouping/(non-)unique indexes bug

Hello Jean-Luc,

Friday, June 07, 2002, 8:26:55 PM, you wrote:

JLL> Unbelievable, but you may have a couple of record with null user_email.

JLL> Try:

JLL> select * from broken where user_email is null;
JLL> select * from reg_user where user_email is null;

Nope, there are none. Which is expected, 'cause user_email was
declared not null:

rdw=# select * from reg_user where user_email is null;
user_id | user_email | user_passwd | user_active | user_allow_pauth | user_full_name | user_pseudonym | user_who
---------+------------+-------------+-------------+------------------+----------------+----------------+----------
(0 rows)

rdw=# select * from broken where user_email is null;
user_email
------------
(0 rows)

And the locale is probably OK, too. I tried dumping the table in
question and passing it through sort/uniq. It worked OK, unlike GROUP BY in
Postgres.

JLL> Alexey Borzov wrote:

Greetings!

Before I present the following, I must say that 'simple' index
corruption is highly improbable; the server did not suffer hard
reboots.

[skip]

--
Yours, Alexey V. Borzov, webmaster of RDW.ru

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexey Borzov (#1)
Re: sorting/grouping/(non-)unique indexes bug

Alexey Borzov <borz_off@rdw.ru> writes:

rdw=# --finally, no more duplicates!
rdw=# create unique index broken_email_key on broken (user_email);
ERROR: Cannot create unique index. Table contains non-unique values

Seems pretty broken :-(

rdw=# select version();
version
-------------------------------------------------------------
PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96

Uh, what is the platform exactly?

As you probably guessed, reg_user table contains actual email
addresses of our site's registered users. I can provide them for
testing (without passwords and stuff, of course) if this is going to
help.

If you could send me the email address data (off-list), I could try
to reproduce the problem here.

regards, tom lane