settings for multi-language unicode DB

Started by J. Baggover 20 years ago19 messagesgeneral
Jump to latest
#1J. Bagg
j.bagg@kent.ac.uk

Please could somebody give me advice on settings for PGSQL with a database
with UTF-8 strings in a large number of languages? I've had no problems so
far in storing/retrieving UTF-8 strings but can't find clear answers to
other issues.

What locale would be best for sorting this dataset? Does it have to be the
same as the one set for the OS (these seem very limited for Solaris)?

As well as European accented characters, I need to handle some others used
for transcription. I have to produce simple to use, sorted lists by initial
letter for users and staff.

The database is for a library which takes journals from all over the world
and has to store titles in all european langauges (broadly defined) plus
western transcriptions of others. We are in the process of moving to
unicode. The DBMS currently runs on Solaris 9 with a backup system on
Fedora 3 Linux and user/editor access is via servlets (Java).

Thanks for any help.

Janet Bagg, CSAC, University of Kent, UK

#2Bruce Momjian
bruce@momjian.us
In reply to: J. Bagg (#1)
Re: settings for multi-language unicode DB

Janet Bagg wrote:

Please could somebody give me advice on settings for PGSQL with a database
with UTF-8 strings in a large number of languages? I've had no problems so
far in storing/retrieving UTF-8 strings but can't find clear answers to
other issues.

What locale would be best for sorting this dataset? Does it have to be the
same as the one set for the OS (these seem very limited for Solaris)?

Yes, right now PostgreSQL uses the operating system to do locale
ordering. That might change in a future release.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Wes
wespvp@syntegra.com
In reply to: Bruce Momjian (#2)
Deadlock Detected (revisited)

I haven't seen anything to indicate that 8.x improves foreign key
refererence locking and fixes the foreign key 'deadlock detected' issue.
Has that been addressed in 8.1?

I sort my records before loading, but am still getting 'deadlock detected'
(still running 7.3.4).

I have

B references C
B references A
A references C

Any new records for C are loaded first (both A and B depend on C).

A is loaded next (B depends on A).

B is loaded last.

Both A and B are loaded in 'Table C' sorted order. However, a deadlock can
occur with one process's A records and another's B records, due to the fact
that both have a foreign key reference to C.

The only way I know of to eliminate the deadlocks without serializing the
processes is to remove the foreign key reference from either A or B.
However, jeopardizes referential integrity if something goes wrong with the
load process.

I've never quite understood why a READ of a record with a foreign key
reference results in the referenced record being locked with more than a
shared lock.

Wes

#4Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Wes (#3)
Re: Deadlock Detected (revisited)

On Fri, 25 Nov 2005, Wes wrote:

I haven't seen anything to indicate that 8.x improves foreign key
refererence locking and fixes the foreign key 'deadlock detected' issue.
Has that been addressed in 8.1?

8.1 should be using the new shared row locks for doing the checks. This
should fix the case mentioned.

I've never quite understood why a READ of a record with a foreign key
reference results in the referenced record being locked with more than a
shared lock.

Up until now, we didn't have one to get on a per-record basis.

#5Wes
wespvp@syntegra.com
In reply to: Stephan Szabo (#4)
Re: Deadlock Detected (revisited)

On 11/25/05 2:40 AM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote:

8.1 should be using the new shared row locks for doing the checks. This
should fix the case mentioned.

Hmm. I'm looking in the "What's new" for 8.1 and see that now. I could
have sworn it wasn't there the last time I looked. Sorry.

Up until now, we didn't have one to get on a per-record basis.

Ok, I guess I'm confused... From the 7.3.2 manual (6.5 updates):

"Multiversion concurrency control(MVCC) This removes our old table-level
locking, and replaces it with a locking system that is superior to most
commercial database systems. In a traditional system, each row that is
modified is locked until committed, preventing reads by other users. MVCC
uses the natural multiversion nature of PostgreSQL to allow readers to
continue reading consistent data during writer activity. Writers continue to
use the compact pg_log transaction system. This is all performed without
having to allocate a lock for every row like traditional database systems.
So, basically, we no longer are restricted by simple table-level locking; we
have something better than row-level locking."

So, until 8.1 PostgreSQL had "something better than row-level locking" for
some things, but no row locking when needed? Or was it row locking is
there, but just no shared row locking?

Wes

#6Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Wes (#5)
Re: Deadlock Detected (revisited)

On Fri, 25 Nov 2005, Wes wrote:

On 11/25/05 2:40 AM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote:

8.1 should be using the new shared row locks for doing the checks. This
should fix the case mentioned.

Hmm. I'm looking in the "What's new" for 8.1 and see that now. I could
have sworn it wasn't there the last time I looked. Sorry.

Up until now, we didn't have one to get on a per-record basis.

Ok, I guess I'm confused... From the 7.3.2 manual (6.5 updates):

"Multiversion concurrency control(MVCC) This removes our old table-level
locking, and replaces it with a locking system that is superior to most
commercial database systems. In a traditional system, each row that is
modified is locked until committed, preventing reads by other users. MVCC
uses the natural multiversion nature of PostgreSQL to allow readers to
continue reading consistent data during writer activity. Writers continue to
use the compact pg_log transaction system. This is all performed without
having to allocate a lock for every row like traditional database systems.
So, basically, we no longer are restricted by simple table-level locking; we
have something better than row-level locking."

So, until 8.1 PostgreSQL had "something better than row-level locking" for
some things, but no row locking when needed? Or was it row locking is
there, but just no shared row locking?

The latter, the row locks before were single owner and were such that a
second lock request for the same row would wait for the first to be
released. Now effectively you have two levels of locks at the row level,
the weaker of which conflicts with the stronger but not with itself. The
thing about MVCC is that readers do not have to get either lock if they
aren't trying to prevent modifications.

#7Wes
wespvp@syntegra.com
In reply to: Stephan Szabo (#6)
Re: Deadlock Detected (revisited)

On 11/25/05 9:12 PM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote:

The latter, the row locks before were single owner and were such that a
second lock request for the same row would wait for the first to be
released. Now effectively you have two levels of locks at the row level,
the weaker of which conflicts with the stronger but not with itself.

Now I understand. Thanks for clarifying.

It sounds like there may be some performance improvements that will be
beneficial to us also. Hopefully I can convince them to agree to the 8.1
upgrade. Besides verifying our application, it takes a full weekend to do
the dump/load - the database is just under a billion rows.

Wes

#8Wes
wespvp@syntegra.com
In reply to: Wes (#7)
Excessive vacuum times

Some time ago I reported a problem where right after an index rebuild vacuum
might take a couple of hours, but gets slower and slower at a seeming
increasing rate as time goes on. What takes a couple of ours after a full
index rebuild now takes 12 hours or more.

The problem was determined to be due to the fact that indexes are vacuumed
in index order, not in disk storage order. I don't see anything about this
in the "What's new" for 8.1. Has anything been done to resolve this?

Scheduled index rebuilds are not feasible. I really don't even want to take
a chance on starting one. The database is just a hair under one billion
rows, and could take the entire weekend or more to rebuild.

Wes

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wes (#8)
Re: Excessive vacuum times

Wes <wespvp@syntegra.com> writes:

The problem was determined to be due to the fact that indexes are vacuumed
in index order, not in disk storage order. I don't see anything about this
in the "What's new" for 8.1. Has anything been done to resolve this?

No. Avoiding that would require a new approach to
vacuum-vs-ordinary-indexscan interlocking, so it won't happen until
someone has a Bright Idea (tm).

regards, tom lane

#10Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#9)
Re: Excessive vacuum times

On Mon, Dec 12, 2005 at 06:26:37PM -0500, Tom Lane wrote:

Wes <wespvp@syntegra.com> writes:

The problem was determined to be due to the fact that indexes are vacuumed
in index order, not in disk storage order. I don't see anything about this
in the "What's new" for 8.1. Has anything been done to resolve this?

No. Avoiding that would require a new approach to
vacuum-vs-ordinary-indexscan interlocking, so it won't happen until
someone has a Bright Idea (tm).

Plus there is a TODO to only vacuum pages that are known to have dead
tuples, which should hopefully mean no more index-scans during vacuum as
well. Hopefully this makes it into 8.2...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#10)
Re: Excessive vacuum times

"Jim C. Nasby" <jnasby@pervasive.com> writes:

On Mon, Dec 12, 2005 at 06:26:37PM -0500, Tom Lane wrote:

No. Avoiding that would require a new approach to
vacuum-vs-ordinary-indexscan interlocking, so it won't happen until
someone has a Bright Idea (tm).

Plus there is a TODO to only vacuum pages that are known to have dead
tuples, which should hopefully mean no more index-scans during vacuum as
well.

No such luck. You delete any tuples, you need to scan the indexes.

regards, tom lane

#12Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#11)
Re: Excessive vacuum times

On Mon, Dec 12, 2005 at 11:09:01PM -0500, Tom Lane wrote:

"Jim C. Nasby" <jnasby@pervasive.com> writes:

On Mon, Dec 12, 2005 at 06:26:37PM -0500, Tom Lane wrote:

No. Avoiding that would require a new approach to
vacuum-vs-ordinary-indexscan interlocking, so it won't happen until
someone has a Bright Idea (tm).

Plus there is a TODO to only vacuum pages that are known to have dead
tuples, which should hopefully mean no more index-scans during vacuum as
well.

No such luck. You delete any tuples, you need to scan the indexes.

Even though you can see what the index values were for the now-dead
tuple?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#13Wes
wespvp@syntegra.com
In reply to: Tom Lane (#9)
Re: Excessive vacuum times

On 12/12/05 5:26 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

The problem was determined to be due to the fact that indexes are vacuumed
in index order, not in disk storage order. I don't see anything about this
in the "What's new" for 8.1. Has anything been done to resolve this?

No. Avoiding that would require a new approach to
vacuum-vs-ordinary-indexscan interlocking, so it won't happen until
someone has a Bright Idea (tm).

Any ideas on how I might I reconfigure to mitigate the issue? Separating
the most offending indexes to separate drives probably isn't an option.

Wes

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wes (#13)
Re: Excessive vacuum times

Wes <wespvp@syntegra.com> writes:

Any ideas on how I might I reconfigure to mitigate the issue? Separating
the most offending indexes to separate drives probably isn't an option.

What are you using for vacuum_mem? A larger value should reduce the
number of times we have to scan the indexes of a large table.

regards, tom lane

#15Wes
wespvp@syntegra.com
In reply to: Tom Lane (#11)
./configure --with-openssl=path fails

Because of our build environment, we need to build PostgreSQL specifying the
location of the openssl libraries to insure everyone is using the same
libraries, regardless of the system software is built and executed on. At
7.4.5, we used:

./configure --with-openssl=path

This no longer works with 8.x. See:

<http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php&gt;

What is the correct way to work around this restriction? Set
LD_LIBRARY_PATH?

This is on a variety of unix platforms.

Wes

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wes (#15)
Re: ./configure --with-openssl=path fails

Wes <wespvp@syntegra.com> writes:

./configure --with-openssl=path
This no longer works with 8.x. See:
<http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php&gt;
What is the correct way to work around this restriction?

Use --with-includes and --with-libraries as needed.

regards, tom lane

#17Wes
wespvp@syntegra.com
In reply to: Tom Lane (#16)
Re: ./configure --with-openssl=path fails

On 1/6/06 2:47 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

./configure --with-openssl=path
This no longer works with 8.x. See:
<http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php&gt;
What is the correct way to work around this restriction?

Use --with-includes and --with-libraries as needed.

That allows me to build, but doesn't set the library search path in the
binaries ("-Wl,-rpath /blah/blah" on linux or "-R /blah/blah" for Solaris).
This prevents programs from loading, since they can't find the libraries if
they are not in the default library search path of the user.

Wes

#18Wes
wespvp@syntegra.com
In reply to: Tom Lane (#16)
Re: ./configure --with-openssl=path fails

On 1/6/06 2:47 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

./configure --with-openssl=path
This no longer works with 8.x. See:
<http://archives.postgresql.org/pgsql-committers/2003-11/msg00278.php&gt;
What is the correct way to work around this restriction?

Use --with-includes and --with-libraries as needed.

That doesn't get the library paths into the binary. If the libraries are
not in the default system search path, the user is screwed. Is there a way
to solve this? I think the previous --with-openssl=path set the search path
so the libraries would always be found.

Wes

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Wes (#18)
Re: ./configure --with-openssl=path fails

Wes <wespvp@syntegra.com> writes:

On 1/6/06 2:47 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Use --with-includes and --with-libraries as needed.

That doesn't get the library paths into the binary. If the libraries are
not in the default system search path, the user is screwed. Is there a way
to solve this? I think the previous --with-openssl=path set the search path
so the libraries would always be found.

No, it didn't. Really the best solution to this is to not have commonly
used libraries that aren't in the system search path ...

regards, tom lane