Possible savepoint bug
As you can see, we have duplicates within the table (heap) of a primary
key value. The index itself only references one of these tuples.
Nearly all data inserted into this table is wrapped in a subtransaction,
and is created a single tuple per subtransaction. About 20% of entries
are duplicate, so we catch the UNIQUE VIOLATION and restore to the
savepoint.
I did keep a copy of the table. Compressed it is about 24MB.
After trying everything below, I also gave it a run with vacuum full. It
did not change the output.
ssdb=# select version();
version
---------------------------------------------------------------------------
PostgreSQL 8.0.3 on sparc-sun-solaris2.9, compiled by GCC gcc (GCC)
3.4.2
(1 row)
ssdb=# \d feature_keyword_supply_google
Table "public.feature_keyword_supply_google"
Column | Type |
Modifiers
----------------------------------------------+-----------------------+-----------
account_instance_id | integer |
not null
keyword_id | integer |
not null
feature_keyword_supply_google_score | natural_number |
not null
feature_keyword_supply_google_last_collected | ss_timestamp_recorded |
not null
Indexes:
"feature_keyword_supply_google_pkey" PRIMARY KEY, btree
(account_instance_id, keyword_id)
Foreign-key constraints:
"feature_keyword_supply_google_account_instance_id_fkey" FOREIGN KEY
(account_instance_id, keyword_id) REFERENCES
feature_keyword(account_instance_id, keyword_id) ON UPDATE CASCADE ON
DELETE CASCADE
ssdb=# set enable_indexscan TO off;
SET
ssdb=# select ctid, xmin, cmin, xmax, account_instance_id = 11916,
keyword_id = 1985374 from feature_keyword_supply_google where keyword_id
= 1985374 and account_instance_id = 11916;
ctid | xmin | cmin | xmax | ?column? | ?column?
-----------+-----------+------+-----------+----------+----------
(4277,60) | 506766160 | 3593 | 744608069 | t | t
(4277,72) | 397750949 | 4828 | 506766160 | t | t
(2 rows)
ssdb=# set enable_indexscan TO on;
SET
ssdb=# select ctid, xmin, cmin, xmax, account_instance_id = 11916,
keyword_id = 1985374 from feature_keyword_supply_google where keyword_id
= 1985374 and account_instance_id = 11916;
ctid | xmin | cmin | xmax | ?column? | ?column?
-----------+-----------+------+-----------+----------+----------
(4277,60) | 506766160 | 3593 | 744608069 | t | t
(1 row)
ssdb=# vacuum feature_keyword_supply_google;
VACUUM
ssdb=# set enable_indexscan = off;
SET
ssdb=# select ctid, xmin, cmin, xmax, account_instance_id = 11916,
keyword_id = 1985374 from feature_keyword_supply_google where keyword_id
= 1985374 and account_instance_id = 11916;
ctid | xmin | cmin | xmax | ?column? | ?column?
-----------+-----------+------+-----------+----------+----------
(4277,60) | 506766160 | 3593 | 744608069 | t | t
(4277,72) | 397750949 | 4828 | 506766160 | t | t
(2 rows)
[root@DB1 rbt]# pg_controldata --version
pg_controldata (PostgreSQL) 8.0.3
[root@DB1 rbt]# pg_controldata /var/opt/pgsql/data
pg_control version number: 74
Catalog version number: 200411041
Database system identifier: 4769850195962887551
Database cluster state: in production
pg_control last modified: Wed Nov 09 10:10:26 2005
Current log file ID: 860
Next log file segment: 170
Latest checkpoint location: 35C/A14C8D60
Prior checkpoint location: 35C/908D6470
Latest checkpoint's REDO location: 35C/A1440E20
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 745383235
Latest checkpoint's NextOID: 30513944
Time of latest checkpoint: Wed Nov 09 09:42:53 2005
Database block size: 8192
Blocks per segment of large relation: 131072
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum number of function arguments: 32
Date/time type storage: floating-point numbers
Maximum length of locale name: 128
LC_COLLATE: en_US
LC_CTYPE: en_US
--
Oh, and the duplication is not isolated but I only went through the one
case when checking the indexes.
ssdb=# select ctid, xmin, cmin, xmax, account_instance_id, keyword_id
from feature_keyword_supply_google where (account_instance_id,
keyword_id) in (select account_instance_id, keyword_id from
feature_keyword_supply_google group by account_instance_id, keyword_id
having count(*) > 1) order by account_instance_id, keyword_id;
ctid | xmin | cmin | xmax | account_instance_id |
keyword_id
------------+-----------+-------+-----------+---------------------+------------
(5454,81) | 338184867 | 2259 | 485608742 | 1215 |
1646046
(3396,123) | 485608742 | 2480 | 0 | 1215 |
1646046
(3396,128) | 485608853 | 2552 | 0 | 1215 |
1646058
(5454,83) | 338185099 | 2335 | 485608853 | 1215 |
1646058
(3396,126) | 485608832 | 2516 | 0 | 1215 |
1646076
(5454,82) | 338184954 | 2297 | 485608832 | 1215 |
1646076
(3396,130) | 485608909 | 2588 | 0 | 1215 |
1646092
(5454,85) | 338185239 | 2420 | 485608909 | 1215 |
1646092
(3396,132) | 485608931 | 2624 | 0 | 1215 |
1646097
(5454,86) | 338185486 | 2458 | 485608931 | 1215 |
1646097
(3414,82) | 309534239 | 9967 | 620905091 | 1603 |
1431827
(3997,116) | 620905091 | 6859 | 0 | 1603 |
1431827
(4718,58) | 721916095 | 1046 | 0 | 2034 |
53759
(6580,86) | 357774736 | 4616 | 721916095 | 2034 |
53759
(183,47) | 499027939 | 7824 | 621991044 | 3673 |
41599
(183,14) | 621991044 | 7154 | 0 | 3673 |
41599
(1836,46) | 389868188 | 18424 | 628568217 | 4163 |
622560
(9202,69) | 628568217 | 13409 | 0 | 4163 |
622560
(52,89) | 340438230 | 10823 | 0 | 4634 |
32949
(52,38) | 95782780 | 37 | 340438230 | 4634 |
32949
(594,7) | 113806807 | 11714 | 506856848 | 5218 |
255688
(594,43) | 506856848 | 4489 | 0 | 5218 |
255688
(2589,115) | 206806182 | 14750 | 206854773 | 6594 |
36790
(2599,115) | 206854773 | 18022 | 0 | 6594 |
36790
(2585,134) | 206798169 | 10492 | 206845937 | 6594 |
110879
(2598,24) | 206845937 | 12819 | 0 | 6594 |
110879
(5589,81) | 439221415 | 1636 | 552554533 | 8127 |
2207941
(8626,114) | 552554533 | 13066 | 743644204 | 8127 |
2207941
(1662,71) | 487505211 | 6370 | 487566654 | 8527 |
68364
(1662,54) | 487566654 | 6367 | 0 | 8527 |
68364
(615,115) | 487515981 | 7604 | 0 | 8527 |
696634
(609,69) | 487395439 | 6666 | 487515981 | 8527 |
696634
(5771,9) | 340367613 | 300 | 508552100 | 10003 |
660208
(5771,28) | 508552100 | 6390 | 650701855 | 10003 |
660208
(7827,7) | 278400268 | 14702 | 587980438 | 10604 |
148263
(7827,94) | 587980438 | 9543 | 0 | 10604 |
148263
(8267,128) | 283872022 | 16451 | 547602059 | 10781 |
1158532
(3479,119) | 547602059 | 15092 | 0 | 10781 |
1158532
(8267,55) | 283853649 | 13761 | 547597049 | 10781 |
1274445
(2794,66) | 547597049 | 12576 | 0 | 10781 |
1274445
(3198,86) | 485581600 | 7681 | 0 | 10892 |
99247
(3198,51) | 365165699 | 14941 | 485581600 | 10892 |
99247
(757,116) | 123605388 | 13927 | 377061887 | 11888 |
330758
(757,19) | 377061887 | 12522 | 0 | 11888 |
330758
(4277,76) | 397751291 | 4921 | 506766356 | 11916 |
1985203
(4277,66) | 506766356 | 3649 | 0 | 11916 |
1985203
(4277,82) | 506766470 | 3733 | 0 | 11916 |
1985208
(4277,79) | 397751662 | 5017 | 506766470 | 11916 |
1985208
(4277,81) | 506766418 | 3705 | 0 | 11916 |
1985226
(4277,78) | 397751549 | 4985 | 506766418 | 11916 |
1985226
(4277,73) | 397751023 | 4860 | 506766292 | 11916 |
1985354
(4277,64) | 506766292 | 3621 | 0 | 11916 |
1985354
(4277,60) | 506766160 | 3593 | 744608069 | 11916 |
1985374
(4277,72) | 397750949 | 4828 | 506766160 | 11916 |
1985374
(5007,42) | 397790263 | 10647 | 506787452 | 11916 |
1985394
(3767,68) | 506787452 | 7682 | 0 | 11916 |
1985394
(3421,94) | 573110542 | 6390 | 0 | 12399 |
2922605
(2815,48) | 573017088 | 5568 | 573110542 | 12399 |
2922605
(1975,38) | 413415231 | 10327 | 573873717 | 12883 |
443196
(1975,24) | 573873717 | 7400 | 0 | 12883 |
443196
(7842,106) | 412053873 | 11664 | 0 | 13111 |
769855
(7842,43) | 300750744 | 4849 | 412053873 | 13111 |
769855
(7469,79) | 512363410 | 4282 | 512399373 | 13174 |
1317593
(7469,99) | 512399373 | 4262 | 0 | 13174 |
1317593
(7454,28) | 396487100 | 4877 | 396555126 | 13235 |
1199747
(7454,17) | 274011632 | 2793 | 396487100 | 13235 |
1199747
(9781,43) | 674282871 | 8976 | 0 | 13269 |
259255
(7736,122) | 277309368 | 8612 | 674282871 | 13269 |
259255
(3350,12) | 606411328 | 2401 | 744153965 | 13272 |
428208
(3350,65) | 464622255 | 2128 | 606411328 | 13272 |
428208
(8438,121) | 711745644 | 1275 | 0 | 13607 |
1559247
(8438,27) | 530172971 | 4847 | 711745644 | 13607 |
1559247
(2004,64) | 389876221 | 9583 | 546064672 | 14335 |
696146
(2004,41) | 546064672 | 8668 | 0 | 14335 |
696146
(904,73) | 553271853 | 13376 | 0 | 14404 |
2208853
(251,19) | 447353031 | 13395 | 553271853 | 14404 |
2208853
(8654,23) | 553245993 | 9327 | 0 | 14404 |
2208932
(60,60) | 447334230 | 9340 | 553245993 | 14404 |
2208932
(9442,71) | 644636546 | 7353 | 0 | 16423 |
404048
(3092,59) | 644579575 | 6363 | 644636546 | 16423 |
404048
(80 rows)
On Wed, 2005-11-09 at 10:27 -0500, Rod Taylor wrote:
As you can see, we have duplicates within the table (heap) of a primary
key value. The index itself only references one of these tuples.Nearly all data inserted into this table is wrapped in a subtransaction,
and is created a single tuple per subtransaction. About 20% of entries
are duplicate, so we catch the UNIQUE VIOLATION and restore to the
savepoint.I did keep a copy of the table. Compressed it is about 24MB.
After trying everything below, I also gave it a run with vacuum full. It
did not change the output.ssdb=# select version();
version
---------------------------------------------------------------------------
PostgreSQL 8.0.3 on sparc-sun-solaris2.9, compiled by GCC gcc (GCC)
3.4.2
(1 row)ssdb=# \d feature_keyword_supply_google
Table "public.feature_keyword_supply_google"
Column | Type |
Modifiers
----------------------------------------------+-----------------------+-----------
account_instance_id | integer |
not null
keyword_id | integer |
not null
feature_keyword_supply_google_score | natural_number |
not null
feature_keyword_supply_google_last_collected | ss_timestamp_recorded |
not null
Indexes:
"feature_keyword_supply_google_pkey" PRIMARY KEY, btree
(account_instance_id, keyword_id)
Foreign-key constraints:
"feature_keyword_supply_google_account_instance_id_fkey" FOREIGN KEY
(account_instance_id, keyword_id) REFERENCES
feature_keyword(account_instance_id, keyword_id) ON UPDATE CASCADE ON
DELETE CASCADEssdb=# set enable_indexscan TO off;
SET
ssdb=# select ctid, xmin, cmin, xmax, account_instance_id = 11916,
keyword_id = 1985374 from feature_keyword_supply_google where keyword_id
= 1985374 and account_instance_id = 11916;
ctid | xmin | cmin | xmax | ?column? | ?column?
-----------+-----------+------+-----------+----------+----------
(4277,60) | 506766160 | 3593 | 744608069 | t | t
(4277,72) | 397750949 | 4828 | 506766160 | t | t
(2 rows)ssdb=# set enable_indexscan TO on;
SET
ssdb=# select ctid, xmin, cmin, xmax, account_instance_id = 11916,
keyword_id = 1985374 from feature_keyword_supply_google where keyword_id
= 1985374 and account_instance_id = 11916;
ctid | xmin | cmin | xmax | ?column? | ?column?
-----------+-----------+------+-----------+----------+----------
(4277,60) | 506766160 | 3593 | 744608069 | t | t
(1 row)ssdb=# vacuum feature_keyword_supply_google;
VACUUM
ssdb=# set enable_indexscan = off;
SET
ssdb=# select ctid, xmin, cmin, xmax, account_instance_id = 11916,
keyword_id = 1985374 from feature_keyword_supply_google where keyword_id
= 1985374 and account_instance_id = 11916;
ctid | xmin | cmin | xmax | ?column? | ?column?
-----------+-----------+------+-----------+----------+----------
(4277,60) | 506766160 | 3593 | 744608069 | t | t
(4277,72) | 397750949 | 4828 | 506766160 | t | t
(2 rows)[root@DB1 rbt]# pg_controldata --version
pg_controldata (PostgreSQL) 8.0.3
[root@DB1 rbt]# pg_controldata /var/opt/pgsql/data
pg_control version number: 74
Catalog version number: 200411041
Database system identifier: 4769850195962887551
Database cluster state: in production
pg_control last modified: Wed Nov 09 10:10:26 2005
Current log file ID: 860
Next log file segment: 170
Latest checkpoint location: 35C/A14C8D60
Prior checkpoint location: 35C/908D6470
Latest checkpoint's REDO location: 35C/A1440E20
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 745383235
Latest checkpoint's NextOID: 30513944
Time of latest checkpoint: Wed Nov 09 09:42:53 2005
Database block size: 8192
Blocks per segment of large relation: 131072
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum number of function arguments: 32
Date/time type storage: floating-point numbers
Maximum length of locale name: 128
LC_COLLATE: en_US
LC_CTYPE: en_US
--
Rod Taylor <pg@rbt.ca> writes:
As you can see, we have duplicates within the table (heap) of a primary
key value. The index itself only references one of these tuples.
Can you put together a test case to reproduce this? It doesn't have to
fail every time, as long as it fails once in awhile ...
regards, tom lane
On Wed, 2005-11-09 at 14:20 -0500, Tom Lane wrote:
Rod Taylor <pg@rbt.ca> writes:
As you can see, we have duplicates within the table (heap) of a primary
key value. The index itself only references one of these tuples.Can you put together a test case to reproduce this? It doesn't have to
fail every time, as long as it fails once in awhile ...
I'll see what I can put together.
I see I have a similar issue on another structure which involves the same type of process.
On Wed, 2005-11-09 at 14:20 -0500, Tom Lane wrote:
Rod Taylor <pg@rbt.ca> writes:
As you can see, we have duplicates within the table (heap) of a primary
key value. The index itself only references one of these tuples.Can you put together a test case to reproduce this? It doesn't have to
fail every time, as long as it fails once in awhile ...
Seems not. I've done millions of iterations of the same type of
functionality that happens with these structures and haven't produced a
single case. These are fairly low usage structures, so I think I've done
about 3 months worth of work, which in production had 20 bad tuples. I
tried playing with various delays, vacuum schedules, and number of
parallel processes.
Whatever is happening is from interaction not contained within the
structures showing the symptoms.
I'll watch it a bit closer now that I know the problem exists to see if
I can find a pattern.
--
Rod Taylor schrieb:
On Wed, 2005-11-09 at 14:20 -0500, Tom Lane wrote:
Rod Taylor <pg@rbt.ca> writes:
As you can see, we have duplicates within the table (heap) of a primary
key value. The index itself only references one of these tuples.Can you put together a test case to reproduce this? It doesn't have to
fail every time, as long as it fails once in awhile ...Seems not. I've done millions of iterations of the same type of
functionality that happens with these structures and haven't produced a
single case. These are fairly low usage structures, so I think I've done
about 3 months worth of work, which in production had 20 bad tuples. I
tried playing with various delays, vacuum schedules, and number of
parallel processes.
I am seeing a similar unique index bug here...
This is PostgreSQL 8.1.1 on RHEL 3, Intel Xeon (i686).
We don't use SAVEPOINTs and we don't use autovacuum. It's quite unlikely
that the problem is directly related to VACUUM since that is only run
via cron during night hours.
The symptoms are duplicate entries in a unique index.
billing=> \d properties
Table "billing.properties"
Column | Type | Modifiers
----------+-------------------+-----------
language | character(2) | not null
key_name | character varying | not null
value | character varying | not null
Indexes:
"pk_properties" PRIMARY KEY, btree ("language", key_name)
Check constraints:
"tc_properties_key_name" CHECK (key_name::text ~
'^[a-zA-Z][a-zA-Z0-9_.]+$'::text)
"tc_properties_language" CHECK ("language" = 'de'::bpchar OR
"language" = 'en'::bpchar)
billing=> reindex table properties;
ERROR: could not create unique index
DETAIL: Table contains duplicated values.
billing=> select ctid,xmin,xmax,cmin,cmax,language,key_name from
properties where key_name = 'enum.server_task_log.status.keys';
ctid | xmin | xmax | cmin | cmax | language | key_name
---------+--------+------+------+------+----------+----------------------------------
(31,64) | 505433 | 0 | 5 | 0 | de |
enum.server_task_log.status.keys
(31,57) | 505261 | 0 | 7 | 0 | de |
enum.server_task_log.status.keys
(31,56) | 505261 | 0 | 5 | 0 | en |
enum.server_task_log.status.keys
(3 rows)
The state is the effect of only UPDATEs of the rows after a SELECT ...
FOR UPDATE in the same transaction. It happend twice right now but I
deleted the other rows... the table should still contain the data. I
have disabled scheduled vacuums for now.
I could send the index and table files off-list. This is the only
effected table right now. It is not updated frequently but is rather
static. I upgraded to 8.1.1 around Dec 21, there should have been near
zero updates since then until today.
Perhaps it's a problem with multi-column unique indexes?
Best Regards,
Michael Paesold
Michael Paesold <mpaesold@gmx.at> writes:
I am seeing a similar unique index bug here...
This is PostgreSQL 8.1.1 on RHEL 3, Intel Xeon (i686).
It looks like the problem is that index entries are being inserted out
of order. I find this pair that should be in the other order:
Item 124 -- Length: 44 Offset: 2360 (0x0938) Flags: USED
Block Id: 3 linp Index: 55 Size: 44
Has Nulls: 0 Has Varwidths: 16384
0938: 00000300 37002c40 06000000 64650000 ....7.,@....de..
0948: 1a000000 656e756d 2e626f6f 6b696e67 ....enum.booking
0958: 2e747970 652e6b65 79730000 .type.keys..
Item 125 -- Length: 40 Offset: 2320 (0x0910) Flags: USED
Block Id: 3 linp Index: 48 Size: 40
Has Nulls: 0 Has Varwidths: 16384
0910: 00000300 30002840 06000000 64650000 ....0.(@....de..
0920: 17000000 656e756d 2e626f6f 6b696e67 ....enum.booking
0930: 2e747970 652e4c00 .type.L.
and likewise here:
Item 60 -- Length: 52 Offset: 5060 (0x13c4) Flags: USED
Block Id: 4 linp Index: 38 Size: 52
Has Nulls: 0 Has Varwidths: 16384
13c4: 00000400 26003440 06000000 64650000 ....&.4@....de..
13d4: 24000000 656e756d 2e736572 7665725f $...enum.server_
13e4: 7461736b 5f6c6f67 2e737461 7475732e task_log.status.
13f4: 6b657973 keys
Item 61 -- Length: 56 Offset: 5004 (0x138c) Flags: USED
Block Id: 4 linp Index: 37 Size: 56
Has Nulls: 0 Has Varwidths: 16384
138c: 00000400 25003840 06000000 64650000 ....%.8@....de..
139c: 27000000 656e756d 2e736572 7665725f '...enum.server_
13ac: 7461736b 5f6c6f67 2e737461 7475732e task_log.status.
13bc: 52554e4e 494e4700 RUNNING.
All four of the referenced tuples were inserted by XMIN 986, CMIN 0,
which I assume was probably a COPY command. So the breakage occurred
long before the update operations.
Did you create the index before or after loading the initial data?
If you have the original dump that was loaded, it'd be interesting
to see if re-loading it reproduces the corrupt index.
regards, tom lane
I wrote:
Michael Paesold <mpaesold@gmx.at> writes:
I am seeing a similar unique index bug here...
This is PostgreSQL 8.1.1 on RHEL 3, Intel Xeon (i686).
It looks like the problem is that index entries are being inserted out
of order.
After further investigation, it seems that the original sort order of
the index was not C-locale, but something else --- I can reproduce the
current index ordering except for a small number of new-ish tuples if
I sort the data in en_US.
We go out of our way to prevent the backend's locale from changing after
initdb. Did you do something to override that?
Another theory is that this is a manifestation of the known problem with
plperl sometimes changing the backend's locale setting. Is it possible
that the index was created in a session that had previously run some
plperl functions?
regards, tom lane
Tom Lane wrote:
I wrote:
Michael Paesold <mpaesold@gmx.at> writes:
I am seeing a similar unique index bug here...
This is PostgreSQL 8.1.1 on RHEL 3, Intel Xeon (i686).It looks like the problem is that index entries are being inserted out
of order.After further investigation, it seems that the original sort order of
the index was not C-locale, but something else --- I can reproduce the
current index ordering except for a small number of new-ish tuples if
I sort the data in en_US.We go out of our way to prevent the backend's locale from changing after
initdb. Did you do something to override that?
No, I am sure I did not do anything to change the locale itentionally. The
cluster was initialized with "initdb --no-locale"... (and this is what it
still is).
Another theory is that this is a manifestation of the known problem with
plperl sometimes changing the backend's locale setting. Is it possible
that the index was created in a session that had previously run some
plperl functions?
This is a theory. The whole database was loaded using pg_restore, I still
have the original dump so I will have a look at the dump now. The database
actually contains some plperl functions.
Restoring to a file I find some perhaps interesting facts perhaps relevant:
*) SET check_function_bodies = false;
So at least the syntax checking function should not be called.
*) Old plperl call handler:
The dump from 7.4.x created the public.plperl_call_handler() function,
which I only dropped after the full dump was loaded.
CREATE FUNCTION plperl_call_handler() RETURNS language_handler
AS '$libdir/plperl', 'plperl_call_handler'
LANGUAGE c;
ALTER FUNCTION public.plperl_call_handler() OWNER TO postgres;
CREATE TRUSTED PROCEDURAL LANGUAGE plperl HANDLER plperl_call_handler;
*) There is a single plperl function that is only used in a view. (Btw.
this view is totally unrelated to the given table and should never be used
in the same backend session.)
From the points above, I don't think the plperl function should have been
called during load. Perhaps I am mistaken and plperl did really override
the locale setting.
Looking at the environment set for the "postgres" unix user, which is used
to run Postgres, I see that LANG is set to the default value of
en_US.UTF-8. So it seems possible that setting LANG to C here, could fix
the problem.
This still doesn't explain why the initial sort order is wrong here.
The creation order in the dump is:
CREATE TABLE... (without indexes)
COPY ...
ALTER TABLE ONLY properties ADD CONSTRAINT pk_properties...
Please tell me if you need further information.
Best Regards,
Michael Paesold
Michael Paesold <mpaesold@gmx.at> writes:
This is a theory. The whole database was loaded using pg_restore, I still
have the original dump so I will have a look at the dump now. The database
actually contains some plperl functions.
OK, I think I have reproduced the problem. initdb in C locale, then
start postmaster with LANG=en_US.UTF-8 in its environment. Then:
z1=# create language plperl;
CREATE LANGUAGE
z1=# select 'enum.server_task_log.status.RUNNING'::varchar < 'enum.server_task_log.status.keys'::varchar;
?column?
----------
t -- correct result for C locale
(1 row)
z1=# \c z1
You are now connected to database "z1".
z1=# SET check_function_bodies = false;
SET
z1=# create or replace function perlf() returns text as $$
z1$# return 'foo';
z1$# $$ language plperl;
CREATE FUNCTION
z1=# select 'enum.server_task_log.status.RUNNING'::varchar < 'enum.server_task_log.status.keys'::varchar;
?column?
----------
f -- WRONG result for C locale
(1 row)
So the mere act of defining a plperl function, even with
check_function_bodies = false, is sufficient to send control through
that bit of libperl code that does setlocale(LC_ALL, ""). Ugh.
This is much worse than I thought.
The reason I had not seen it before is that lc_collate_is_c caches its
result, which means that if you do any text/varchar comparisons before
first invoking libperl, you won't see any misbehavior (at least not when
you started in C locale). The reconnect in the middle of the above test
sequence is essential to reproduce the failure.
We were talking last week about forcing the LANG/LC_ environment
variables to match our desired settings within the postmaster.
I think this example raises the priority of doing that by several
notches :-(
In the meantime, Michael, I'd suggest modifying your postmaster start
script to force LANG=C, and then reindexing all indexes you have on
text/varchar/char columns. That should get you out of the immediate
problem and prevent it from recurring before we have a fix. (The
system catalogs should be OK because they use "name" which is not
locale-sensitive.)
regards, tom lane
I wrote:
So the mere act of defining a plperl function, even with
check_function_bodies = false, is sufficient to send control through
that bit of libperl code that does setlocale(LC_ALL, ""). Ugh.
This is much worse than I thought.
It seems one ingredient in this is that the plperl function validator
fails to honor check_function_bodies, and hence is calling libperl
anyway. I wonder if that explains the sudden rise in incidents in 8.1?
regards, tom lane
"Andrew Dunstan" <andrew@dunslane.net> writes:
It should certainly be fixed, but surely at best this would only delay
seeing the ugly locale effect - as soon as you call a perl function you'll
be back in the same boat regardless.
Right, I was not suggesting that fixing the validator would avoid the
bug. It just surprised me that libperl was getting called in the
restore-a-dump scenario.
BTW, I was just about to go change the validator, but if you want to
do it go ahead...
regards, tom lane
Import Notes
Reply to msg id not found: 44947.68.143.134.146.1135790639.squirrel@www.dunslane.net
Tom Lane said:
I wrote:
So the mere act of defining a plperl function, even with
check_function_bodies = false, is sufficient to send control through
that bit of libperl code that does setlocale(LC_ALL, ""). Ugh.
This is much worse than I thought.It seems one ingredient in this is that the plperl function validator
fails to honor check_function_bodies, and hence is calling libperl
anyway. I wonder if that explains the sudden rise in incidents in 8.1?
That's probably because I was unaware of its existence.
It should certainly be fixed, but surely at best this would only delay
seeing the ugly locale effect - as soon as you call a perl function you'll
be back in the same boat regardless.
cheers
andrew
Tom Lane said:
"Andrew Dunstan" <andrew@dunslane.net> writes:
It should certainly be fixed, but surely at best this would only delay
seeing the ugly locale effect - as soon as you call a perl function
you'll be back in the same boat regardless.Right, I was not suggesting that fixing the validator would avoid the
bug. It just surprised me that libperl was getting called in the
restore-a-dump scenario.BTW, I was just about to go change the validator, but if you want to do
it go ahead...
no, please do.
cheers
andrew
Tom Lane wrote:
Michael Paesold <mpaesold@gmx.at> writes:
This is a theory. The whole database was loaded using pg_restore, I still
have the original dump so I will have a look at the dump now. The database
actually contains some plperl functions.OK, I think I have reproduced the problem. initdb in C locale, then
start postmaster with LANG=en_US.UTF-8 in its environment. Then:
I had reproduced the problem here with a stripped down dump file from my
backup, but your test case is much simpler, as usual. :-)
In the meantime, Michael, I'd suggest modifying your postmaster start
script to force LANG=C, and then reindexing all indexes you have on
text/varchar/char columns. That should get you out of the immediate
problem and prevent it from recurring before we have a fix.
I had already reindexed all tables in a clean session and have now added
"export LANG=C" to the profile of the postgres unix account. I cannot
reproduce the bug after doing so.
Thank you for your quick help debugging the problem.
Best Regards,
Michael Paesold
Tom Lane wrote:
Michael Paesold <mpaesold@gmx.at> writes:
This is a theory. The whole database was loaded using pg_restore, I still
have the original dump so I will have a look at the dump now. The database
actually contains some plperl functions.OK, I think I have reproduced the problem. initdb in C locale, then
start postmaster with LANG=en_US.UTF-8 in its environment. Then:z1=# create language plperl;
CREATE LANGUAGE
z1=# select 'enum.server_task_log.status.RUNNING'::varchar < 'enum.server_task_log.status.keys'::varchar;
?column?
----------
t -- correct result for C locale
(1 row)z1=# \c z1
You are now connected to database "z1".
z1=# SET check_function_bodies = false;
SET
z1=# create or replace function perlf() returns text as $$
z1$# return 'foo';
z1$# $$ language plperl;
CREATE FUNCTION
z1=# select 'enum.server_task_log.status.RUNNING'::varchar < 'enum.server_task_log.status.keys'::varchar;
?column?
----------
f -- WRONG result for C locale
(1 row)
Unfortunately we have not fixed this on Windows. I have confirmed the
effect on 8.1.1, and I still see this effect on HEAD. We have fixed the
check_function_bodies bit, but if that is on, or if I call a plperl
func, I get the bad result shown above.
The log message from the commit that was supposed to fix this says:
Arrange to set the LC_XXX environment variables to match our locale
setup. This protects against undesired changes in locale behavior
if someone carelessly does setlocale(LC_ALL, "") (and we know who
you are, perl guys).
However, to the best of my knowledge, Windows does NOT consult the environment when set_locale is called. ISTM we probably need to call set_locale ourselves on Windows with the desired values.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
However, to the best of my knowledge, Windows does NOT consult the environment when set_locale is called. ISTM we probably need to call set_locale ourselves on Windows with the desired values.
We already do, during process startup. The question becomes where the
heck is Perl looking for the locale information, when on Windows?
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
However, to the best of my knowledge, Windows does NOT consult the environment when set_locale is called. ISTM we probably need to call set_locale ourselves on Windows with the desired values.
We already do, during process startup. The question becomes where the
heck is Perl looking for the locale information, when on Windows?
The Windows docs at
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt_setlocale.2c_._wsetlocale.asp
say:
|setlocale( LC_ALL, "" );|
Sets the locale to the default, which is the user-default ANSI code
page obtained from the operating system.
Does libperl call this only at interpreter startup? If so, maybe we
should probably save out the settings and then restore them after the
interpreter has started.
cheers
andrew
||
Andrew Dunstan wrote:
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
However, to the best of my knowledge, Windows does NOT consult the
environment when set_locale is called. ISTM we probably need to call
set_locale ourselves on Windows with the desired values.We already do, during process startup. The question becomes where the
heck is Perl looking for the locale information, when on Windows?The Windows docs at
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt_setlocale.2c_._wsetlocale.asp
say:|setlocale( LC_ALL, "" );|
Sets the locale to the default, which is the user-default ANSI code
page obtained from the operating system.Does libperl call this only at interpreter startup? If so, maybe we
should probably save out the settings and then restore them after the
interpreter has started.
After some analysis of perl's locale.c, I came up with the attached
patch, which seems to cure the previously observed problem on my Windows
box.
The questions are:
a) is this an acceptable solution, and
b) should we do this for all the LC_* settings (on Windows at least)?
Especially, should we do it for LC_CTYPE?
cheers
andrew