Using index for "like 'ABC%'" type query

Started by lecover 22 years ago40 messages

lec

englim@pc.jaring.my

over 22 years ago

For current version of postgresql, is it possible for a query like below
to utilize an index?

select item_name from item where item_code like 'ABC%';

Earlier versions of postgresql, eg, 6.5.3 can utilize the index.

Thanks,
Thomas.

Jeffrey Melloy

jmelloy@visualdistortion.org

over 22 years ago

In reply to: lec (#1)

Re: Using index for "like 'ABC%'" type query

There is a module called "tsearch" in the contrib directory which works
quite well.
On Thursday, April 17, 2003, at 09:56 PM, lec wrote:

Show quoted text

For current version of postgresql, is it possible for a query like
below to utilize an index?

select item_name from item where item_code like 'ABC%';

Earlier versions of postgresql, eg, 6.5.3 can utilize the index.

Thanks,
Thomas.

---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Bruce Momjian

pgman@candle.pha.pa.us

over 22 years ago

In reply to: Jeffrey Melloy (#2)

Re: Using index for "like 'ABC%'" type query

See the FAQ about index usage.

---------------------------------------------------------------------------

Jeffrey Melloy wrote:

There is a module called "tsearch" in the contrib directory which works
quite well.
On Thursday, April 17, 2003, at 09:56 PM, lec wrote:

For current version of postgresql, is it possible for a query like
below to utilize an index?

select item_name from item where item_code like 'ABC%';

Earlier versions of postgresql, eg, 6.5.3 can utilize the index.

Thanks,
Thomas.

---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Andrew Sullivan

andrew@libertyrms.info

over 22 years ago

In reply to: lec (#1)

Re: Using index for "like 'ABC%'" type query

On Fri, Apr 18, 2003 at 10:56:52AM +0800, lec wrote:

For current version of postgresql, is it possible for a query like below
to utilize an index?

select item_name from item where item_code like 'ABC%';

Yes. But locale is enabled by default now, and if you use anything
other than C (and other-than-C is increasingly the default on the
supportedplatforms), it won't work. Just make sure that your locale
is C when you do initdb.

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Andrew Sullivan (#4)

default locale considered harmful? (was Re: Using index for "like 'ABC%'" type query)

Andrew Sullivan <andrew@libertyrms.info> writes:

[ can't LIKE use an index? ]

Yes. But locale is enabled by default now, and if you use anything
other than C (and other-than-C is increasingly the default on the
supported platforms), it won't work. Just make sure that your locale
is C when you do initdb.

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

regards, tom lane

Bruce Momjian

pgman@candle.pha.pa.us

over 22 years ago

In reply to: Tom Lane (#5)

Re: [HACKERS] default locale considered harmful? (was Re:

Tom Lane wrote:

Andrew Sullivan <andrew@libertyrms.info> writes:

[ can't LIKE use an index? ]

Yes. But locale is enabled by default now, and if you use anything
other than C (and other-than-C is increasingly the default on the
supported platforms), it won't work. Just make sure that your locale
is C when you do initdb.

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

Agreed, or we could special-case en_US to be C locale.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Dennis Gearon

gearond@cvc.net

over 22 years ago

In reply to: Tom Lane (#5)

Re: default locale considered harmful? (was Re: Using

I whole heartedly agree. It would make it easier to do multilangugages in one db, I think.

When I get done with my current project, about a year from now, I'm going to dig in deep to UTF-8 on postgress and see if I cn write a collation function in C to be used in ORDER BY, statements.

Tom Lane wrote:

Show quoted text

Andrew Sullivan <andrew@libertyrms.info> writes:

[ can't LIKE use an index? ]

Yes. But locale is enabled by default now, and if you use anything
other than C (and other-than-C is increasingly the default on the
supported platforms), it won't work. Just make sure that your locale
is C when you do initdb.

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Lincoln Yeoh

lyeoh@pop.jaring.my

over 22 years ago

In reply to: Tom Lane (#5)

Re: default locale considered harmful? (was Re:

Yes. I'd prefer C locale by default.

Usage of other locales seems to incur many issues. Those who know about
locale and want it are usually prepared to turn it on and use it given some
documentation. Those who don't should get C locale.

Link.

At 10:48 AM 4/18/2003 -0400, Tom Lane wrote:

Show quoted text

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Darko Prenosil

darko.prenosil@finteh.hr

over 22 years ago

In reply to: lec (#1)

Re: Using index for "like 'ABC%'" type query

On Friday 18 April 2003 02:56, lec wrote:

For current version of postgresql, is it possible for a query like below
to utilize an index?

select item_name from item where item_code like 'ABC%';

Earlier versions of postgresql, eg, 6.5.3 can utilize the index.

I think some other DB projects use something that is called
"BEGINS WITH" operator. Some time ago I tried to write such operator for
postgres. Unfortunatelly at that time I had no experience with postgres
internals, so the functions are written in PL/PSQL !!! They are working all
right, but the whole thing should be rewriten in C. If You are interested,
I'll send You the "code". For us using non "C" collations, I do not know for
other way. Of course if "C" collation satisfy Your needs, You do not need to
do such "hacks".

Regards !

#10

Tatsuo Ishii

t-ishii@sra.co.jp

over 22 years ago

In reply to: Tom Lane (#5)

Re: default locale considered harmful?

Andrew Sullivan <andrew@libertyrms.info> writes:

[ can't LIKE use an index? ]

Yes. But locale is enabled by default now, and if you use anything
other than C (and other-than-C is increasingly the default on the
supported platforms), it won't work. Just make sure that your locale
is C when you do initdb.

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

I agree with that initdb should by default initialize the database in
C locale. I have found on a local list too many users in Japan are
suffered by the locale problem and I'm getting tired of saying "you
should not forget to explicitly specify --no-locale siwtch".
--
Tatsuo Ishii

#11

Peter Eisentraut

peter_e@gmx.net

over 22 years ago

In reply to: Tom Lane (#5)

Re: default locale considered harmful? (was Re: [GENERAL]

Tom Lane writes:

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

If you're concerned about speed, let's think about fixing the real
problems, not about disabling the feature altogether. A while ago I
proposed an easy solution that made LIKE use an index based on strxfrm
order instead. It was rejected on the grounds that it would prevent a
future enhancement of the LIKE mechanism to use the locale-enabled
collation order, but no one seems to be seriously interested in
implementing that. I still have the patch; we can reconsider it if you
like.

(Btw., LIKE using the locale-enabled collation sequence is hardly going to
work, because most locales compare strings backwards from the end to the
start in the second pass, so something like LIKE 'foo%' can easily give
inconsistent results, since you don't know what the end of the string
really is. It's better to think of pattern matching as
character-by-character matching.)

--
Peter Eisentraut peter_e@gmx.net

#12

Andrew Sullivan

andrew@libertyrms.info

over 22 years ago

In reply to: Tom Lane (#5)

Re: default locale considered harmful? (was Re: Using index for "like 'ABC%'" type query)

On Fri, Apr 18, 2003 at 10:48:21AM -0400, Tom Lane wrote:

prevalence of en_US as a platform locale setting. Comments?

I tend to agree, especially since there's no real fix after you've
initdb'd.

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

#13

Noname

dalgoda@ix.netcom.com

over 22 years ago

In reply to: Andrew Sullivan (#4)

Re: default locale considered harmful? (was Re: Using index for "like 'ABC%'" type query)

In article <3954.1050677301@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

And who's brilliant idea was that anyway? Grrr. 'file' should NOT sort
next to 'FILE'.

mrc [more than slightly annoyed and off topic]
--
Mike Castle dalgoda@ix.netcom.com www.netcom.com/~dalgoda/
We are all of us living in the shadow of Manhattan. -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

#14

Peter Eisentraut

peter_e@gmx.net

over 22 years ago

In reply to: Noname (#13)

Re: default locale considered harmful? (was Re: Using

Mike Castle writes:

And who's brilliant idea was that anyway? Grrr. 'file' should NOT sort
next to 'FILE'.

Any reputable dictionary in the world would disagree with you.

--
Peter Eisentraut peter_e@gmx.net

#15

Olleg Samojlov

olleg@telecom.mipt.ru

over 22 years ago

In reply to: Tom Lane (#5)

Re: default locale considered harmful? (was Re: [GENERAL] Using

Tom Lane wrote:

I'm beginning to think that would be a good idea, given the increasing
prevalence of en_US as a platform locale setting. Comments?

Americans will be agree with you. :-)

IMHO:

LC_CTYPE -\
LC_COLLATE - Ideally make client depending for international databases.
But it seems impossible to use indexes. May be add parameter to CREATE
INDEX, which locale (locales) use for sorting text data?

LC_NUMERIC client depending
LC_TIME client depending, default ISO
LC_MONETARY not usefull, lacking monetary type
LC_MESSAGES client depending for notice (client messages), server
depending for console or syslog messages, agree with default C

Summary: Exchange information with local services (stdout, syslog,
files, etc) must be according local locale. With client - according
client locale.

--
Olleg

#16

Bruce Momjian

pgman@candle.pha.pa.us

over 22 years ago

In reply to: Peter Eisentraut (#11)

Re: default locale considered harmful? (was Re: [GENERAL]

Has the single-byte LIKE penalty been eliminated, so we don't need to
consider using C as the default locale for initdb, right?

If fixed, how was it done?

---------------------------------------------------------------------------

Peter Eisentraut wrote:

Tom Lane writes:

I recall someone floating a proposal that initdb should by default
initialize the database in C locale, not whatever-it-finds-in-the-
environment. To get a non-C locale you'd have to give an explicit
command-line switch --- essentially, reversing the sense of the present
"initdb --no-locale" option.

If you're concerned about speed, let's think about fixing the real
problems, not about disabling the feature altogether. A while ago I
proposed an easy solution that made LIKE use an index based on strxfrm
order instead. It was rejected on the grounds that it would prevent a
future enhancement of the LIKE mechanism to use the locale-enabled
collation order, but no one seems to be seriously interested in
implementing that. I still have the patch; we can reconsider it if you
like.

(Btw., LIKE using the locale-enabled collation sequence is hardly going to
work, because most locales compare strings backwards from the end to the
start in the second pass, so something like LIKE 'foo%' can easily give
inconsistent results, since you don't know what the end of the string
really is. It's better to think of pattern matching as
character-by-character matching.)

--
Peter Eisentraut peter_e@gmx.net

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#17

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Bruce Momjian (#16)

Re: default locale considered harmful? (was Re: [GENERAL]

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Has the single-byte LIKE penalty been eliminated, so we don't need to
consider using C as the default locale for initdb, right?

I'm still of the opinion that we should make C the default locale.
But I'm not sure where the consensus is, so I've not made the change.

If fixed, how was it done?

Peter has provided a hack whereby one can create a LIKE-supporting index
in a non-C locale. But a *default* index in a non-C locale is still not
going to support LIKE ... and the hacked index will not support ordinary
comparison or ordering operators. So I think there's still a lot left
to be desired here.

regards, tom lane

#18

Peter Eisentraut

peter_e@gmx.net

over 22 years ago

In reply to: Tom Lane (#17)

Re: default locale considered harmful? (was Re: [GENERAL]

Tom Lane writes:

Peter has provided a hack whereby one can create a LIKE-supporting index
in a non-C locale. But a *default* index in a non-C locale is still not
going to support LIKE ... and the hacked index will not support ordinary
comparison or ordering operators. So I think there's still a lot left
to be desired here.

I don't understand why you call this a hack. Pattern matching and string
comparison simply work differently, so the proper solution is to use
different operator classes. After all, that's what operator classes exist
for. What is left to be desired?

--
Peter Eisentraut peter_e@gmx.net

#19

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Peter Eisentraut (#18)

Re: default locale considered harmful? (was Re: [GENERAL]

Peter Eisentraut <peter_e@gmx.net> writes:

I don't understand why you call this a hack. Pattern matching and string
comparison simply work differently, so the proper solution is to use
different operator classes. After all, that's what operator classes exist
for. What is left to be desired?

I think that a more general solution would be the ability to select a
locale (and hence a sort order) per-column, as the SQL spec envisions.
Then you'd just select C locale for columns you wanted to do pattern
matching for.

Admittedly, you'd still need the opclass-based approach for cases where
you wanted both pattern matching and a non-C-locale sort order ... but
I doubt that constitutes the majority of cases.

I guess my main concern is that we should not feel that this approach
takes the heat off us to support multiple locales. As a solution to the
narrow problem of LIKE performance, it's okay --- but it's not getting
us any nearer to a solution to the general locale problem.

regards, tom lane

#20

Bruce Momjian

pgman@candle.pha.pa.us

over 22 years ago

In reply to: Tom Lane (#17)

Re: default locale considered harmful? (was Re: [GENERAL]

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Has the single-byte LIKE penalty been eliminated, so we don't need to
consider using C as the default locale for initdb, right?

I'm still of the opinion that we should make C the default locale.
But I'm not sure where the consensus is, so I've not made the change.

If fixed, how was it done?

Peter has provided a hack whereby one can create a LIKE-supporting index
in a non-C locale. But a *default* index in a non-C locale is still not
going to support LIKE ... and the hacked index will not support ordinary
comparison or ordering operators. So I think there's still a lot left
to be desired here.

So, my understanding is that you would create something such as:

CREATE INDEX iix ON tab (LIKE col)

and that does LIKE lookups and knows how to do col LIKE 'abc%', but it
can't be used for >= or ORDER BY, but it can be used for equality tests?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#21

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Bruce Momjian (#20)

Re: default locale considered harmful? (was Re: [GENERAL]

Bruce Momjian <pgman@candle.pha.pa.us> writes:

So, my understanding is that you would create something such as:
CREATE INDEX iix ON tab (LIKE col)
and that does LIKE lookups and knows how to do col LIKE 'abc%', but it
can't be used for >= or ORDER BY, but it can be used for equality tests?

Hm. Right at the moment, it wouldn't be used for equality tests unless
you spelled equality as "a ~=~ b". I wonder whether that's necessary
though; couldn't we dispense with that operator and use ordinary
equality as the BTEqual member of these opclasses? Are there any
locales that claim that not-physically-identical strings are equal?

regards, tom lane

#22

Bruce Momjian

pgman@candle.pha.pa.us

over 22 years ago

In reply to: Tom Lane (#21)

Re: default locale considered harmful? (was Re: [GENERAL]

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

So, my understanding is that you would create something such as:
CREATE INDEX iix ON tab (LIKE col)
and that does LIKE lookups and knows how to do col LIKE 'abc%', but it
can't be used for >= or ORDER BY, but it can be used for equality tests?

Hm. Right at the moment, it wouldn't be used for equality tests unless
you spelled equality as "a ~=~ b". I wonder whether that's necessary
though; couldn't we dispense with that operator and use ordinary
equality as the BTEqual member of these opclasses? Are there any
locales that claim that not-physically-identical strings are equal?

Let me see if I understand.

Our default indexes will be able to do =, >, <, ORDER BY, and the
special index will be able to do LIKE, ORDER BY, and maybe equals. Do I
have that correct?

Looking at CVS, I see the warning about non-C locales has been removed.
Should we instead mention the new LIKE index method?

# (Be sure to maintain the correspondence with locale_is_like_safe() in selfuncs.c.)
if test x`pg_getlocale COLLATE` != xC && test x`pg_getlocale COLLATE` != xPOSIX; then
echo "This locale setting will prevent the use of indexes for pattern matching"
echo "operations. If that is a concern, rerun $CMDNAME with the collation order"
echo "set to \"C\". For more information see the Administrator's Guide."
fi

Doing LIKE with single-byte encodings would be easy because it would be
only 256 compares to find the min/max char values, but that doesn't work
with multi-byte encodings, right?

This LIKE/encoding problem is a tricky one because it gives poor
performance with little warning to users.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#23

Peter Eisentraut

peter_e@gmx.net

over 22 years ago

In reply to: Tom Lane (#19)

Re: default locale considered harmful? (was Re: [GENERAL]

Tom Lane writes:

I think that a more general solution would be the ability to select a
locale (and hence a sort order) per-column, as the SQL spec envisions.

It is a general solution, but not for this problem. The problem was to
make all locales equally suitable for certain optimizations, not to make
locales available in more places. I won't pretend to anyone that this
little change will bring us anywhere closer to a solution for that other
problem.

Then you'd just select C locale for columns you wanted to do pattern
matching for.

That's wrong, for a number of reasons:

First of all, I don't agree at all that cases where you want both pattern
matching and collation are rare; in fact, I rarely see a case where you
don't want both. Designing a system on that assumption is not sound,
because all operations should be equally possible in all situations.

Second, we will eventually want pattern matching operations to be locale
aware. Case-sensitive matching needs this, because case mappings depend
on the locale. The character class features of POSIX regexps also need
this. So you cannot make locales and well-performing pattern matching
mutually exclusive.

Third, keep in mind that datums with different locales cannot be combined
liberally. So systems built the way you propose become crippled in ways
that will be hard to understand and justify.

Finally, the locale of a datum should be a property that describes that
language of the stored data and that can be used for that specific purpose
without concerns and tradeoffs with the internal doings of the
optimization engine.

--
Peter Eisentraut peter_e@gmx.net

#24

Peter Eisentraut

peter_e@gmx.net

over 22 years ago

In reply to: Tom Lane (#21)

Re: default locale considered harmful? (was Re: [GENERAL]

Tom Lane writes:

Are there any locales that claim that not-physically-identical strings
are equal?

In Unicode there are plenty such combinations.

--
Peter Eisentraut peter_e@gmx.net

#25

Peter Eisentraut

peter_e@gmx.net

over 22 years ago

In reply to: Bruce Momjian (#22)

Re: default locale considered harmful? (was Re: [GENERAL]

Bruce Momjian writes:

Our default indexes will be able to do =, >, <, ORDER BY, and the
special index will be able to do LIKE, ORDER BY, and maybe equals. Do I
have that correct?

The default operator class supports comparisons (=, >, <, etc.) and ORDER
BY based on those operators. The other operator class supports pattern
matching operations (LIKE, SIMILAR, POSIX regexps).

Looking at CVS, I see the warning about non-C locales has been removed.
Should we instead mention the new LIKE index method?

I don't see a need. The old warning was mainly because once you
initdb'ed, you were basically stuck with your choice. Now we have plenty
of options to query and adjust things later.

Doing LIKE with single-byte encodings would be easy because it would be
only 256 compares to find the min/max char values, but that doesn't work
with multi-byte encodings, right?

This has nothing to do with encodings.

--
Peter Eisentraut peter_e@gmx.net

#26

Bruce Momjian

pgman@candle.pha.pa.us

over 22 years ago

In reply to: Peter Eisentraut (#25)

Re: default locale considered harmful? (was Re: [GENERAL]

Peter Eisentraut wrote:

Bruce Momjian writes:

Our default indexes will be able to do =, >, <, ORDER BY, and the
special index will be able to do LIKE, ORDER BY, and maybe equals. Do I
have that correct?

The default operator class supports comparisons (=, >, <, etc.) and ORDER
BY based on those operators. The other operator class supports pattern
matching operations (LIKE, SIMILAR, POSIX regexps).

Looking at CVS, I see the warning about non-C locales has been removed.
Should we instead mention the new LIKE index method?

I don't see a need. The old warning was mainly because once you
initdb'ed, you were basically stuck with your choice. Now we have plenty
of options to query and adjust things later.

How are people going to know to use these special LIKE indexes?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#27

Bruce Momjian

pgman@candle.pha.pa.us

over 22 years ago

In reply to: Bruce Momjian (#26)

Re: default locale considered harmful? (was Re: [GENERAL]

Peter Eisentraut wrote:

Bruce Momjian writes:

How are people going to know to use these special LIKE indexes?

The same way they presumably find out about anything else: RTFM. A couple
of more cross-references and index entries need to be added, though.

Well, this isn't one of those, "How do I do X" but rather something they
will only know they need if they wonder why their LIKE queries are slow
--- that isn't going to be obvious to too many people.  An FAQ may be
required for this --- fortunately we already have an item for indexes.

That is one thing I liked about the initdb mention --- it clearly told
them to watch out for something they might not have been looking for.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Import Notes

Reply to msg id not found: Pine.LNX.4.44.0306022126230.1643-100000@peter.localdomain | Resolved by subject fallback

#28

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Bruce Momjian (#27)

Re: default locale considered harmful? (was Re: [GENERAL]

Bruce Momjian <pgman@candle.pha.pa.us> writes:

That is one thing I liked about the initdb mention --- it clearly told
them to watch out for something they might not have been looking for.

Only if they read the message, though. People who are running RPM
installations probably never get to see what initdb has to say ...
so I can't put much faith in the usefulness of warnings emitted by
initdb.

regards, tom lane

#29

Bruce Momjian

pgman@candle.pha.pa.us

over 22 years ago

In reply to: Tom Lane (#28)

Re: default locale considered harmful? (was Re: [GENERAL]

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

That is one thing I liked about the initdb mention --- it clearly told
them to watch out for something they might not have been looking for.

Only if they read the message, though. People who are running RPM
installations probably never get to see what initdb has to say ...
so I can't put much faith in the usefulness of warnings emitted by
initdb.

True, but for people who _do_ see initdb output, is it helpful, and what
other places can we put it?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#30

Alvaro Herrera

alvherre@dcc.uchile.cl

over 22 years ago

In reply to: Tom Lane (#28)

Re: default locale considered harmful? (was Re: [GENERAL]

On Wed, Jun 04, 2003 at 11:05:03PM -0400, Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

That is one thing I liked about the initdb mention --- it clearly told
them to watch out for something they might not have been looking for.

Only if they read the message, though. People who are running RPM
installations probably never get to see what initdb has to say ...
so I can't put much faith in the usefulness of warnings emitted by
initdb.

It'd be nice if the RPM installation mailed initdb's messages to someone
(root@localhost maybe). It's not impossible, and while it's likely that
RedHat would remove the feature, at least PGDG's RPM would do it. Same
for DEBs and other binary packages...

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Nunca confiarï¿½ en un traidor. Ni siquiera si el traidor lo he creado yo"
(Barï¿½n Vladimir Harkonnen)

#31

Nigel J. Andrews

nandrews@investsystems.co.uk

over 22 years ago

In reply to: Tom Lane (#28)

Re: default locale considered harmful? (was Re: [GENERAL]

On Wed, 4 Jun 2003, Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

That is one thing I liked about the initdb mention --- it clearly told
them to watch out for something they might not have been looking for.

Only if they read the message, though. People who are running RPM
installations probably never get to see what initdb has to say ...
so I can't put much faith in the usefulness of warnings emitted by
initdb.

Yes, I mentioned this when this thread was going a few weeks ago. I only caught
the locale setting being wrong on a system before it went into production
because I happened to install on another system and noticed the message. I then
had to ask the hosting company's SA to first check and then re-initdb. I was
even sat watching/directing what he was doing and missed it. He was using
Redhat with RPMs I was doing it properly from source.

Those RPMs are dangerous, they turn you mind off.

I voted for setting 'C' by default.

--
Nigel J. Andrews

#32

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Nigel J. Andrews (#31)

Re: default locale considered harmful? (was Re: [GENERAL]

On Thu, 5 Jun 2003, Nigel J. Andrews wrote:

On Wed, 4 Jun 2003, Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

That is one thing I liked about the initdb mention --- it clearly told
them to watch out for something they might not have been looking for.

Only if they read the message, though. People who are running RPM
installations probably never get to see what initdb has to say ...
so I can't put much faith in the usefulness of warnings emitted by
initdb.

Yes, I mentioned this when this thread was going a few weeks ago. I only caught
the locale setting being wrong on a system before it went into production
because I happened to install on another system and noticed the message. I then
had to ask the hosting company's SA to first check and then re-initdb. I was
even sat watching/directing what he was doing and missed it. He was using
Redhat with RPMs I was doing it properly from source.

Those RPMs are dangerous, they turn you mind off.

I voted for setting 'C' by default.

Everything Nigel just wrote plus one thing.

If it comes down to it, we could always require a --locale setting and
refuse to initdb without it. That way, whether it's in an RPM or from
source, somebody somewhere along the line has to choose something.

Or would that break RPM / automated installs in too nasty a way?

#33

Peter Eisentraut

peter_e@gmx.net

over 22 years ago

In reply to: scott.marlowe (#32)

Re: default locale considered harmful? (was Re: [GENERAL]

scott.marlowe writes:

If it comes down to it, we could always require a --locale setting and
refuse to initdb without it. That way, whether it's in an RPM or from
source, somebody somewhere along the line has to choose something.

By default, you choose when you install or configure your operating
system. In most cases, the region where you install your operating system
and the region where you run your database is the same, so equating these
settings by default is reasonable.

--
Peter Eisentraut peter_e@gmx.net

#34

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Peter Eisentraut (#33)

Re: default locale considered harmful? (was Re: [GENERAL]

On Thu, 5 Jun 2003, Peter Eisentraut wrote:

scott.marlowe writes:

If it comes down to it, we could always require a --locale setting and
refuse to initdb without it. That way, whether it's in an RPM or from
source, somebody somewhere along the line has to choose something.

By default, you choose when you install or configure your operating
system. In most cases, the region where you install your operating system
and the region where you run your database is the same, so equating these
settings by default is reasonable.

But it's not that simple. If one could flip a switch and change a
postgresql installation from one locale to another, then hey, no big deal.
If indexes on text worked right in other locales it would be no big deal.

If you don't choose locale=C with doing initdb then you
have to backup the whole database, reinitdb, and restore it in order to
switch to it.

If the postgresql engine could use indexes well in all
locales then it would be reasonable to pick up the environmental locale.
As long as locale C is the only one that can use indexes on text, it's not
reasonable to use the locale of the environment without knowing what the
user really wants to do with the database.

Especially since most of the folks I know who download it are going to
prefer a locale of C to en_US or whatnot, since they'll likely want fast
indexed access on text types.

I would at least suggest that certain locales default to be coerced to C
if the user doesn't pick one.

#35

Alvaro Herrera Munoz

alvherre@dcc.uchile.cl

over 22 years ago

In reply to: scott.marlowe (#32)

Re: default locale considered harmful? (was Re: [GENERAL]

On Thu, Jun 05, 2003 at 09:44:21AM -0600, scott.marlowe wrote:

On Thu, 5 Jun 2003, Nigel J. Andrews wrote:

Everything Nigel just wrote plus one thing.

If it comes down to it, we could always require a --locale setting and
refuse to initdb without it. That way, whether it's in an RPM or from
source, somebody somewhere along the line has to choose something.

Yeah, that way the RPM guys would put the --locale taking the locale
from the environment and you're back to ground zero.

There's no point in forcing things down the throat of users using this
kind of mechanisms, because someone is going to automate the thing
along the way. What is needed is a way to make the user aware of his
system's configuration.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)

#36

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Alvaro Herrera Munoz (#35)

Re: default locale considered harmful? (was Re: [GENERAL]

On Thu, 5 Jun 2003, Alvaro Herrera Munoz wrote:

On Thu, Jun 05, 2003 at 09:44:21AM -0600, scott.marlowe wrote:

On Thu, 5 Jun 2003, Nigel J. Andrews wrote:

Everything Nigel just wrote plus one thing.

If it comes down to it, we could always require a --locale setting and
refuse to initdb without it. That way, whether it's in an RPM or from
source, somebody somewhere along the line has to choose something.

Yeah, that way the RPM guys would put the --locale taking the locale
from the environment and you're back to ground zero.

There's no point in forcing things down the throat of users using this
kind of mechanisms, because someone is going to automate the thing
along the way. What is needed is a way to make the user aware of his
system's configuration.

But initdb IS different since it takes so much effort to change locales
once you've set up a cluster.

Unless the other locales can offer similar performance to the C locale, I
would suggest that we make the C locale the default. IF they need
something else they can change it after initdb.

If you're an old time user, you know how to set locale, and the
implications of a non-C locale, so a default of C is no big deal, and
you're likely to be looking at initdb to see the message telling you it's
using C.

If you're a beginner you likely need or want a locale of C, but don't know
it, and don't know that you can't change it without reinitdbing.

My only concern with going with a default locale of C is if it causes a
problem with data integrity (i.e. constraints that only behave right in a
certain locale).

#37

Peter Eisentraut

peter_e@gmx.net

over 22 years ago

In reply to: scott.marlowe (#34)

Re: default locale considered harmful? (was Re: [GENERAL]

scott.marlowe writes:

If indexes on text worked right in other locales it would be no big deal.

They will in version 7.4, so all these concerns about trading off locale
use vs. performance will become obsolete.

--
Peter Eisentraut peter_e@gmx.net

#38

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Peter Eisentraut (#37)

Re: default locale considered harmful? (was Re: [GENERAL]

On Fri, 6 Jun 2003, Peter Eisentraut wrote:

scott.marlowe writes:

If indexes on text worked right in other locales it would be no big deal.

They will in version 7.4, so all these concerns about trading off locale
use vs. performance will become obsolete.

Oh! I thought there were still issues that couldn't be worked out on that
front. In that case, heck yeah, set the locale on initdb to the current
system locale. sweet.

#39

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: scott.marlowe (#38)

Re: default locale considered harmful? (was Re: [GENERAL]

On Fri, 6 Jun 2003, Bruce Momjian wrote:

scott.marlowe wrote:

On Fri, 6 Jun 2003, Peter Eisentraut wrote:

scott.marlowe writes:

If indexes on text worked right in other locales it would be no big deal.

They will in version 7.4, so all these concerns about trading off locale
use vs. performance will become obsolete.

Oh! I thought there were still issues that couldn't be worked out on that
front. In that case, heck yeah, set the locale on initdb to the current
system locale. sweet.

The problems go away _if_ the user knows about the new way of indexing
LIKE on non-C locales.

Should we have something about this mentioned in

http://developer.postgresql.org/docs/postgres/install-upgrading.html

for the 7.4 release? Or is there a more appropriate place?

Import Notes

Reply to msg id not found: 200306070123.h571NUs05903@candle.pha.pa.us | Resolved by subject fallback

#40

Bruce Momjian

pgman@candle.pha.pa.us

over 22 years ago

In reply to: scott.marlowe (#39)

Re: default locale considered harmful? (was Re: [GENERAL]

scott.marlowe wrote:

On Fri, 6 Jun 2003, Bruce Momjian wrote:

scott.marlowe wrote:

On Fri, 6 Jun 2003, Peter Eisentraut wrote:

scott.marlowe writes:

If indexes on text worked right in other locales it would be no big deal.

They will in version 7.4, so all these concerns about trading off locale
use vs. performance will become obsolete.

Oh! I thought there were still issues that couldn't be worked out on that
front. In that case, heck yeah, set the locale on initdb to the current
system locale. sweet.

The problems go away _if_ the user knows about the new way of indexing
LIKE on non-C locales.

Should we have something about this mentioned in

http://developer.postgresql.org/docs/postgres/install-upgrading.html

for the 7.4 release? Or is there a more appropriate place?

Yes, at a minimum, plus we need someone coming fresh to 7.5 to know this
is an issue.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073