Index creation takes for ever

ohp@pyrenet.fr

almost 23 years ago

hackerspatches

Hi every one,

I've tried to reindex one of my customer's table to gain some disk space.

I had to stop after 90 m cpu...

I've then pg_dump'ed the database and recreate an other both on 7.3.4 and
7.4b

Both are still running after more than 30 minutes of CPU (100% cpu taken)
creating the levt_lu_ligne_evt_key.

Disks don't do anything, just cpu.

Here are the info I have
test=# \d ligneevt
Table "public.ligne_evt"
Column | Type | Modifiers

-----------------+--------------------------+-----------------------------------
-------------
levt_cod | integer | not null default nextval('seq_levt
_cod'::text)
levt_tevt_cod | integer | not null
levt_date | timestamp with time zone | not null
levt_type_per1 | integer | not null
levt_perso_cod1 | integer | not null
levt_type_per2 | integer |
levt_perso_cod2 | integer |
levt_texte | text |
levt_lu | character varying(2) | default 'N'
levt_visible | character varying(2) |
levt_attaquant | integer |
levt_cible | integer |
Indexes: ligne_evt_pkey primary key btree (levt_cod),
levt_attaquant_ligne_evt_key btree (levt_attaquant),
levt_cible_ligne_evt_key btree (levt_cible),
levt_lu_ligne_evt_key btree (levt_lu),
levt_visible_ligne_evt_key btree (levt_visible),
ligne_evt_levt_cod_key btree (levt_cod),
ligne_evt_levt_date_key btree (levt_date),
ligne_evt_levt_perso_cod1_key btree (levt_perso_cod1),
ligne_evt_levt_perso_cod2_key btree (levt_perso_cod2),
ligne_evt_levt_tevt_cod_key btree (levt_tevt_cod),
ligne_evt_levt_type_per1_key btree (levt_type_per1),
ligne_evt_levt_type_per2_key btree (levt_type_per2)
Triggers: RI_ConstraintTrigger_5453038

test=# SELECT count(*) from ligne_evt ;
count
--------
230670
(1 row)

test=# SELECT levt_lu,count(*) from ligne_evt group by levt_lu;
levt_lu | count
---------+--------
N | 49435
O | 181242
(2 rows)

Can some one help me?

Regards
--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

tgl@sss.pgh.pa.us

almost 23 years ago

In reply to: Olivier PRENANT (#1)

hackerspatches

Re: Index creation takes for ever

ohp@pyrenet.fr writes:

I've then pg_dump'ed the database and recreate an other both on 7.3.4 and
7.4b

Both are still running after more than 30 minutes of CPU (100% cpu taken)
creating the levt_lu_ligne_evt_key.

That's hard to believe. I get

regression=# SELECT levt_lu,count(*) from ligne_evt group by levt_lu;
levt_lu | count
---------+--------
N | 49435
O | 181242
(2 rows)

Time: 6927.28 ms
regression=# create index levt_lu_ligne_evt_key on ligne_evt (levt_lu);
CREATE INDEX
Time: 14946.74 ms

on a not-very-fast machine ... and it seems to be mostly I/O bound.

What platform are you on? I could believe that the local qsort() is
incredibly inefficient with many equal keys. Another possibility is
that you're using a non-C locale and strcoll() is horribly slow.

regards, tom lane

ohp@pyrenet.fr

almost 23 years ago

In reply to: Tom Lane (#2)

hackerspatches

Re: Index creation takes for ever

Hi Tom,
This is on unixware 7 (both 7.3.4 and 7.4b)

I'm on the FR language (I'll re-initdb whith lang=C to see what happens)
But I never had a problem before.
Waiting for a reply, I drop the table after copying it, and arranged sql
script to create the indexes BEFORE the COPY

The overall process took only 5 Min.
On Thu, 28 Aug 2003, Tom Lane wrote:

Date: Thu, 28 Aug 2003 10:13:21 -0400
From: Tom Lane <tgl@sss.pgh.pa.us>
To: ohp@pyrenet.fr
Cc: pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Index creation takes for ever

ohp@pyrenet.fr writes:

I've then pg_dump'ed the database and recreate an other both on 7.3.4 and
7.4b

Both are still running after more than 30 minutes of CPU (100% cpu taken)
creating the levt_lu_ligne_evt_key.

That's hard to believe. I get

regression=# SELECT levt_lu,count(*) from ligne_evt group by levt_lu;
levt_lu | count
---------+--------
N | 49435
O | 181242
(2 rows)

Time: 6927.28 ms
regression=# create index levt_lu_ligne_evt_key on ligne_evt (levt_lu);
CREATE INDEX
Time: 14946.74 ms

on a not-very-fast machine ... and it seems to be mostly I/O bound.

What platform are you on? I could believe that the local qsort() is
incredibly inefficient with many equal keys. Another possibility is
that you're using a non-C locale and strcoll() is horribly slow.

regards, tom lane

--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

tgl@sss.pgh.pa.us

almost 23 years ago

In reply to: Olivier PRENANT (#3)

hackerspatches

Re: Index creation takes for ever

ohp@pyrenet.fr writes:

This is on unixware 7 (both 7.3.4 and 7.4b)

I'm on the FR language (I'll re-initdb whith lang=C to see what happens)

Okay. If you find it's still slow in C locale, the next thing to try
would be forcing use of our own qsort, as we already do for Solaris.
You'd need to tweak this bit in configure.in:

# Solaris has a very slow qsort in certain cases, so we replace it.
case $host_os in solaris*)
AC_LIBOBJ(qsort) ;;
esac

I'm not sure why it's done that way though. 'Twould be better to let
the per-platform template files determine it.

regards, tom lane

ohp@pyrenet.fr

almost 23 years ago

In reply to: Tom Lane (#4)

hackerspatches

Re: Index creation takes for ever

Okay, so far,
I've reinitdb (on 7.4b) with LANG=C and it worked.
So I reinitDB with LANG=FR and used LANG=C to psql -f xxx.sql template1 to
recreate the db and it worked too...

I did'nt initdb between cvs changes, maybe that's why.

7.4b seems ok.
However, is there a way I can reindex on my 7.3.4 production
cluster cluster without an initdb?

would LANG=C psql -c"reindex table blah" db would help?
On Thu, 28 Aug 2003, Tom Lane wrote:

Date: Thu, 28 Aug 2003 10:38:26 -0400
From: Tom Lane <tgl@sss.pgh.pa.us>
To: ohp@pyrenet.fr
Cc: pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Index creation takes for ever

ohp@pyrenet.fr writes:

This is on unixware 7 (both 7.3.4 and 7.4b)

I'm on the FR language (I'll re-initdb whith lang=C to see what happens)

Okay. If you find it's still slow in C locale, the next thing to try
would be forcing use of our own qsort, as we already do for Solaris.
You'd need to tweak this bit in configure.in:

# Solaris has a very slow qsort in certain cases, so we replace it.
case $host_os in solaris*)
AC_LIBOBJ(qsort) ;;
esac

I'm not sure why it's done that way though. 'Twould be better to let
the per-platform template files determine it.

regards, tom lane

tgl@sss.pgh.pa.us

almost 23 years ago

In reply to: Olivier PRENANT (#5)

hackerspatches

Re: Index creation takes for ever

ohp@pyrenet.fr writes:

I've reinitdb (on 7.4b) with LANG=C and it worked.
So I reinitDB with LANG=FR and used LANG=C to psql -f xxx.sql template1 to
recreate the db and it worked too...

That's weird. I don't understand why an initdb in the same locale would
make the problem go away.

I did'nt initdb between cvs changes, maybe that's why.

If you had only seen the problem in 7.4b then I could believe that theory,
but since you also saw it in the stable 7.3 installation, I don't.

would LANG=C psql -c"reindex table blah" db would help?

No, the LANG environment of psql wouldn't make the slightest difference
here.

I don't think you told us your platform? (OS version, kernel, compiler,
etc)

regards, tom lane

ohp@pyrenet.fr

almost 23 years ago

In reply to: Tom Lane (#6)

hackerspatches

Re: Index creation takes for ever

My platforms are Unixware 713.

Am I right to be afraid that I have to pg_dump and reload?
Stiil it's amazind, 7.3 has been up for months and I discover the proclem
today...
Well, I don't reindex that often...
On Thu, 28 Aug 2003, Tom Lane wrote:

Date: Thu, 28 Aug 2003 12:19:13 -0400
From: Tom Lane <tgl@sss.pgh.pa.us>
To: ohp@pyrenet.fr
Cc: pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Index creation takes for ever

ohp@pyrenet.fr writes:

I've reinitdb (on 7.4b) with LANG=C and it worked.
So I reinitDB with LANG=FR and used LANG=C to psql -f xxx.sql template1 to
recreate the db and it worked too...

That's weird. I don't understand why an initdb in the same locale would
make the problem go away.

I did'nt initdb between cvs changes, maybe that's why.

If you had only seen the problem in 7.4b then I could believe that theory,
but since you also saw it in the stable 7.3 installation, I don't.

would LANG=C psql -c"reindex table blah" db would help?

No, the LANG environment of psql wouldn't make the slightest difference
here.

I don't think you told us your platform? (OS version, kernel, compiler,
etc)

regards, tom lane

bruce@momjian.us

almost 23 years ago

In reply to: Tom Lane (#4)

hackerspatches

Re: Index creation takes for ever

Tom Lane wrote:

ohp@pyrenet.fr writes:

This is on unixware 7 (both 7.3.4 and 7.4b)

I'm on the FR language (I'll re-initdb whith lang=C to see what happens)

Okay. If you find it's still slow in C locale, the next thing to try
would be forcing use of our own qsort, as we already do for Solaris.
You'd need to tweak this bit in configure.in:

# Solaris has a very slow qsort in certain cases, so we replace it.
case $host_os in solaris*)
AC_LIBOBJ(qsort) ;;
esac

I'm not sure why it's done that way though. 'Twould be better to let
the per-platform template files determine it.

The macro has to be in the configure.in file, but we could flag whether
the macro should be used in the template file. I have avoided adding a
variable just for one platform because it just adds an extra level of
abstraction for little benefit.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

ohp@pyrenet.fr

almost 23 years ago

In reply to: Bruce Momjian (#8)

hackerspatches

Re: Index creation takes for ever

Thanks to evryone that help on this one.
I reinitdb --locale=C and reloaded everything today.
That did the trick.
However, is there a way to test that strcol is really the culprit?
On Sat, 30 Aug 2003, Bruce Momjian wrote:

Date: Sat, 30 Aug 2003 11:11:11 -0400 (EDT)
From: Bruce Momjian <pgman@candle.pha.pa.us>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: ohp@pyrenet.fr, pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Index creation takes for ever

Tom Lane wrote:

ohp@pyrenet.fr writes:

This is on unixware 7 (both 7.3.4 and 7.4b)

I'm on the FR language (I'll re-initdb whith lang=C to see what happens)

Okay. If you find it's still slow in C locale, the next thing to try
would be forcing use of our own qsort, as we already do for Solaris.
You'd need to tweak this bit in configure.in:

# Solaris has a very slow qsort in certain cases, so we replace it.
case $host_os in solaris*)
AC_LIBOBJ(qsort) ;;
esac

I'm not sure why it's done that way though. 'Twould be better to let
the per-platform template files determine it.

The macro has to be in the configure.in file, but we could flag whether
the macro should be used in the template file. I have avoided adding a
variable just for one platform because it just adds an extra level of
abstraction for little benefit.

#10

ohp@pyrenet.fr

almost 23 years ago

In reply to: Tom Lane (#2)

hackerspatches

Re: Index creation takes for ever

Tom,
Let me come back on this one:
I initdb --locale=C and reloaded all bases including this one. Index
creation wassn't too bad. So I thought that was it.

Yesturday evening I decided to make a test so I pg_dump'd that database,
created a test db and reloaded evrything from the pg_dump.

it took 69 minutes to finish, 75% of this time was devoted to create 2
indexes on varchar(2) with value being 'O', 'N' or null;

I wonder if it's a configuration matter.
PGSQL is VERY solicited here (10's thousand connections/day , multiple
queries/connection

Here's my postgresql.conf FWIW, Any advice ?

#
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
# name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect, or use
# "pg_ctl reload".

#========================================================================

#
# Connection Parameters
#
tcpip_socket = true
#ssl = false

max_connections = 64
#superuser_reserved_connections = 2

port = 5432
hostname_lookup = true
#show_source_port = false

#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal

#virtual_host = ''

#krb_server_keyfile = ''

#
# Shared Memory Size
#
shared_buffers = 10000 # min max_connections*2 or 16, 8KB each
max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes
#max_locks_per_transaction = 64 # min 10
#wal_buffers = 8 # min 4, typically 8KB each

#
# Non-shared Memory Sizes
#
sort_mem = 10240 # min 64, size in KB
#vacuum_mem = 8192 # min 1024, size in KB

#
# Write-ahead log (WAL)
#
#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300 # range 30-3600, in seconds
#
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000
#
#fsync = true
#wal_sync_method = fsync # the default varies across platforms:
# # fsync, fdatasync, open_sync, or open_datasync
#wal_debug = 0 # range 0-16

#
# Optimizer Parameters
#
#enable_seqscan = true
#enable_indexscan = true
#enable_tidscan = true
#enable_sort = true
#enable_nestloop = true
#enable_mergejoin = true
#enable_hashjoin = true

effective_cache_size = 10000 # typically 8KB each
#random_page_cost = 4 # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01 # (same)
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)

#default_statistics_target = 10 # range 1-1000

#
# GEQO Optimizer Parameters
#
#geqo = true
#geqo_selection_bias = 2.0 # range 1.5-2.0
#geqo_threshold = 11
#geqo_pool_size = 0 # default based on tables in statement,
# range 128-1024
#geqo_effort = 1
#geqo_generations = 0
#geqo_random_seed = -1 # auto-compute seed

#
# Message display
#
#server_min_messages = notice # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2, debug1,
# info, notice, warning, error, log, fatal,
# panic
#client_min_messages = notice # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2, debug1,
# log, info, notice, warning, error
#silent_mode = false

log_connections = true
log_pid = true
log_statement = true
log_duration = true
#log_timestamp = false

#log_min_error_statement = error # Values in order of increasing severity:
# debug5, debug4, debug3, debug2, debug1,
# info, notice, warning, error, panic(off)

#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
debug_pretty_print = false

#explain_pretty_print = true

# requires USE_ASSERT_CHECKING
#debug_assertions = true

#
# Syslog
#
syslog = 2 # range 0-2
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'

#
# Statistics
#
#show_parser_stats = false
#show_planner_stats = false
#show_executor_stats = false
#show_statement_stats = false

# requires BTREE_BUILD_STATS
#show_btree_build_stats = false

#
# Access statistics collection
#
#stats_start_collector = true
#stats_reset_on_server_start = true
stats_command_string = true
stats_row_level = true
#stats_block_level = false

#
# Lock Tracing
#
#trace_notify = false

# requires LOCK_DEBUG
#trace_locks = false
#trace_userlocks = false
#trace_lwlocks = false
#debug_deadlocks = false
#trace_lock_oidmin = 16384
#trace_lock_table = 0

#
# Misc
#
#autocommit = true
#dynamic_library_path = '$libdir'
#search_path = '$user,public'
datestyle = 'postgres, european'
#timezone = unknown # actually, defaults to TZ environment setting
#australian_timezones = false
#client_encoding = sql_ascii # actually, defaults to database encoding
#authentication_timeout = 60 # 1-600, in seconds
#deadlock_timeout = 1000 # in milliseconds
#default_transaction_isolation = 'read committed'
#max_expr_depth = 10000 # min 10
#max_files_per_process = 1000 # min 25
#password_encryption = true
#sql_inheritance = true
#transform_null_equals = false
#statement_timeout = 0 # 0 is disabled, in milliseconds
#db_user_namespace = false

#
# Local Settings
LC_MESSAGES = 'fr_FR'
LC_MONETARY = 'fr_FR'

The machine has 1 G DDR ECC Registred RAM, 2 1,8 GZ XEON running uw713
PGversion is 7.3.4, macine is not swaping.

Those index creation took 100% of 1 CPU.

Regards
On Thu, 28 Aug 2003, Tom Lane wrote:

Date: Thu, 28 Aug 2003 10:13:21 -0400
From: Tom Lane <tgl@sss.pgh.pa.us>
To: ohp@pyrenet.fr
Cc: pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Index creation takes for ever

ohp@pyrenet.fr writes:

I've then pg_dump'ed the database and recreate an other both on 7.3.4 and
7.4b

Both are still running after more than 30 minutes of CPU (100% cpu taken)
creating the levt_lu_ligne_evt_key.

That's hard to believe. I get

regression=# SELECT levt_lu,count(*) from ligne_evt group by levt_lu;
levt_lu | count
---------+--------
N | 49435
O | 181242
(2 rows)

Time: 6927.28 ms
regression=# create index levt_lu_ligne_evt_key on ligne_evt (levt_lu);
CREATE INDEX
Time: 14946.74 ms

on a not-very-fast machine ... and it seems to be mostly I/O bound.

What platform are you on? I could believe that the local qsort() is
incredibly inefficient with many equal keys. Another possibility is
that you're using a non-C locale and strcoll() is horribly slow.

regards, tom lane

#11

tgl@sss.pgh.pa.us

almost 23 years ago

In reply to: Olivier PRENANT (#10)

hackerspatches

Re: Index creation takes for ever

ohp@pyrenet.fr writes:

it took 69 minutes to finish, 75% of this time was devoted to create 2
indexes on varchar(2) with value being 'O', 'N' or null;
I wonder if it's a configuration matter.

I still say it's either strcoll or qsort's fault. Try swapping in our
own version of qsort to see if the behavior changes.

regards, tom lane

#12

ohp@pyrenet.fr

almost 23 years ago

In reply to: Tom Lane (#11)

hackerspatches

Re: Index creation takes for ever

Hi Tom,
I've made some tests with your qsort and it DEFINITIVLY help
~3 mn instead of 69.

However this is for 7.3.4 I've got no probs with 7.4b.
Did something change in btree creation?
On Mon, 1 Sep 2003, Tom Lane wrote:

Date: Mon, 01 Sep 2003 08:46:09 -0400
From: Tom Lane <tgl@sss.pgh.pa.us>
To: ohp@pyrenet.fr
Cc: pgsql-hackers list <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Index creation takes for ever

ohp@pyrenet.fr writes:

it took 69 minutes to finish, 75% of this time was devoted to create 2
indexes on varchar(2) with value being 'O', 'N' or null;
I wonder if it's a configuration matter.

I still say it's either strcoll or qsort's fault. Try swapping in our
own version of qsort to see if the behavior changes.

regards, tom lane

#13

tgl@sss.pgh.pa.us

almost 23 years ago

In reply to: Olivier PRENANT (#12)

hackerspatches

Re: Index creation takes for ever

ohp@pyrenet.fr writes:

I've made some tests with your qsort and it DEFINITIVLY help
~3 mn instead of 69.

However this is for 7.3.4 I've got no probs with 7.4b.
Did something change in btree creation?

Hmm, I wouldn't have thought so, but perhaps we did change something
that would affect this. You might need to burrow down as far as seeing
exactly what qsort calls are being made in each version ...

regards, tom lane

#14

Manfred Koizar

mkoi-pg@aon.at

almost 23 years ago

In reply to: Tom Lane (#11)

hackerspatches

Re: [HACKERS] Index creation takes for ever

On Mon, 01 Sep 2003 08:46:09 -0400, Tom Lane <tgl@sss.pgh.pa.us>
wrote:

ohp@pyrenet.fr writes:

it took 69 minutes to finish, 75% of this time was devoted to create 2
indexes on varchar(2) with value being 'O', 'N' or null;

I still say it's either strcoll or qsort's fault.

If qsort is to blame, then maybe this patch could help. It sorts
equal key values on item pointer. And if it doesn't help index
creation speed, at least the resulting index has better correlation.

Test script:
CREATE TABLE t (i int NOT NULL, t text NOT NULL);
INSERT INTO t VALUES (1, 'lajshdflasjhdflajhsdfljhasdlfjhasdf');
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t VALUES (100, 's,dmfa.,smdn.famsndfamdnsbfmansdbf');
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
ANALYZE t;
CREATE INDEX t_i ON t(i);
SET enable_seqscan = 0;
SELECT ctid FROM t WHERE i=100 LIMIT 10;

Result without patch:
ctid
----------
(153,14)
(306,23)
(305,80)
(152,91)
(76,68)
(38,34)
(153,34)
(305,50)
(9,62)
(305,40)
(10 rows)

Result with patch:
ctid
--------
(0,5)
(0,10)
(0,15)
(0,20)
(0,25)
(0,30)
(0,35)
(0,40)
(0,45)
(0,50)
(10 rows)

For testing purposes I have made a second patch that provides a
boolean GUC variable sort_index. It is available here:
http://www.pivot.at/pg/23.test-IdxTupleSort.diff

Servus
Manfred

#15

bruce@momjian.us

almost 23 years ago

In reply to: Manfred Koizar (#14)

hackerspatches

Re: [HACKERS] Index creation takes for ever

I assume this completes this TODO:

* Order duplicate index entries by tid for faster heap lookups

and you will submit it for 7.5? If you want to post it now, I can get
it into the 7.5 queue so we don't forget it.

---------------------------------------------------------------------------

Manfred Koizar wrote:

On Mon, 01 Sep 2003 08:46:09 -0400, Tom Lane <tgl@sss.pgh.pa.us>
wrote:

ohp@pyrenet.fr writes:

it took 69 minutes to finish, 75% of this time was devoted to create 2
indexes on varchar(2) with value being 'O', 'N' or null;

I still say it's either strcoll or qsort's fault.

If qsort is to blame, then maybe this patch could help. It sorts
equal key values on item pointer. And if it doesn't help index
creation speed, at least the resulting index has better correlation.

Test script:
CREATE TABLE t (i int NOT NULL, t text NOT NULL);
INSERT INTO t VALUES (1, 'lajshdflasjhdflajhsdfljhasdlfjhasdf');
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t VALUES (100, 's,dmfa.,smdn.famsndfamdnsbfmansdbf');
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;
ANALYZE t;
CREATE INDEX t_i ON t(i);
SET enable_seqscan = 0;
SELECT ctid FROM t WHERE i=100 LIMIT 10;

Result without patch:
ctid
----------
(153,14)
(306,23)
(305,80)
(152,91)
(76,68)
(38,34)
(153,34)
(305,50)
(9,62)
(305,40)
(10 rows)

Result with patch:
ctid
--------
(0,5)
(0,10)
(0,15)
(0,20)
(0,25)
(0,30)
(0,35)
(0,40)
(0,45)
(0,50)
(10 rows)

For testing purposes I have made a second patch that provides a
boolean GUC variable sort_index. It is available here:
http://www.pivot.at/pg/23.test-IdxTupleSort.diff

Servus
Manfred

diff -ruN ../base/src/backend/utils/sort/tuplesort.c src/backend/utils/sort/tuplesort.c
--- ../base/src/backend/utils/sort/tuplesort.c	2003-08-17 21:58:06.000000000 +0200
+++ src/backend/utils/sort/tuplesort.c	2003-09-05 10:04:22.000000000 +0200
@@ -2071,6 +2071,33 @@
(errcode(ERRCODE_UNIQUE_VIOLATION),
errmsg("could not create unique index"),
errdetail("Table contains duplicated values.")));
+	else
+	{
+		/*
+		 * If key values are equal, we sort on ItemPointer.  This might help
+		 * for some bad qsort implementation having performance problems
+		 * with many equal items.  OTOH I wouldn't trust such a weak qsort
+		 * to handle pre-sorted sequences very well ...
+		 *
+		 * Anyway, this code doesn't hurt much, and it helps produce indices
+		 * with better index correlation which is a good thing per se.
+		 */
+		ItemPointer tid1 = &tuple1->t_tid;
+		ItemPointer tid2 = &tuple2->t_tid;
+		BlockNumber blk1 = ItemPointerGetBlockNumber(tid1);
+		BlockNumber blk2 = ItemPointerGetBlockNumber(tid2);
+		
+		if (blk1 != blk2)
+			return (blk1 < blk2) ? -1 : 1;
+		else
+		{
+			OffsetNumber pos1 = ItemPointerGetOffsetNumber(tid1);
+			OffsetNumber pos2 = ItemPointerGetOffsetNumber(tid2);
+			
+			if (pos1 != pos2)
+				return (pos1 < pos2) ? -1 : 1;
+		}
+	}

return 0;
}

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#16

tgl@sss.pgh.pa.us

almost 23 years ago

In reply to: Bruce Momjian (#15)

hackerspatches

Re: [HACKERS] Index creation takes for ever

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I assume this completes this TODO:
* Order duplicate index entries by tid for faster heap lookups

I don't know why that TODO entry exists, but I think the idea is
counterproductive. The existing btree code will tend to put newer
versions of a row earlier (because it puts a new entry in front of any
with duplicate keys), which usually reduces the time spent skipping dead
rows. Forcing tid ordering will cost us more in dead-row skipping than
it's likely to save elsewhere.

regards, tom lane

#17

bruce@momjian.us

almost 23 years ago

In reply to: Tom Lane (#16)

hackerspatches

Re: [HACKERS] Index creation takes for ever

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I assume this completes this TODO:
* Order duplicate index entries by tid for faster heap lookups

I don't know why that TODO entry exists, but I think the idea is
counterproductive. The existing btree code will tend to put newer
versions of a row earlier (because it puts a new entry in front of any
with duplicate keys), which usually reduces the time spent skipping dead
rows. Forcing tid ordering will cost us more in dead-row skipping than
it's likely to save elsewhere.

I assume you are talking about a unique index that probably only has a
few non-expired rows (in which case the newer rows first is better).
The TODO deals with cases where you have lots of valid duplicate index
rows, and you want to spin through all the matching rows in heap order
rather than randomly. This is related to our CLUSTER capability. The
idea originally came from Vadim.

At what point does this patch do the sorting?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#18

tgl@sss.pgh.pa.us

almost 23 years ago

In reply to: Bruce Momjian (#17)

hackerspatches

Re: [HACKERS] Index creation takes for ever

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

* Order duplicate index entries by tid for faster heap lookups

I don't know why that TODO entry exists, but I think the idea is
counterproductive.

I assume you are talking about a unique index that probably only has a
few non-expired rows (in which case the newer rows first is better).
The TODO deals with cases where you have lots of valid duplicate index
rows, and you want to spin through all the matching rows in heap order
rather than randomly.

Maybe so, but it would degrade the performance in the unique-index case
if we do it as the TODO is worded.

My own opinion is that the bitmap-index-lookup approach will be superior
to trying to keep the index entries in TID order. (That's the idea
we've been discussing for awhile of separating the heap-fetch stage from
the index-scan stage: scan the index, make a sparse bitmap of the TIDs
we need to visit, possibly AND or OR this bitmap with maps derived from
other indexes, and finally visit the rows in heap order.)

regards, tom lane

#19

bruce@momjian.us

almost 23 years ago

In reply to: Tom Lane (#18)

hackerspatches

Re: [HACKERS] Index creation takes for ever

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

* Order duplicate index entries by tid for faster heap lookups

I don't know why that TODO entry exists, but I think the idea is
counterproductive.

I assume you are talking about a unique index that probably only has a
few non-expired rows (in which case the newer rows first is better).
The TODO deals with cases where you have lots of valid duplicate index
rows, and you want to spin through all the matching rows in heap order
rather than randomly.

Maybe so, but it would degrade the performance in the unique-index case
if we do it as the TODO is worded.

Yes, the wording is just a guide.

My own opinion is that the bitmap-index-lookup approach will be superior
to trying to keep the index entries in TID order. (That's the idea
we've been discussing for awhile of separating the heap-fetch stage from
the index-scan stage: scan the index, make a sparse bitmap of the TIDs
we need to visit, possibly AND or OR this bitmap with maps derived from
other indexes, and finally visit the rows in heap order.)

Oh, yes.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#20

Manfred Koizar

mkoi-pg@aon.at

almost 23 years ago

In reply to: Bruce Momjian (#15)

hackerspatches

Re: [HACKERS] Index creation takes for ever

On Sun, 7 Sep 2003 11:43:42 -0400 (EDT), Bruce Momjian
<pgman@candle.pha.pa.us> wrote:

I assume this completes this TODO:

* Order duplicate index entries by tid for faster heap lookups

I don't think so, because the patch does nothing to keep the sort
order once the index is initially created.

If you want to post it now, [...]

I did already post it. It's only the last page or so of the original
message. The link in that message points to a testing aid which is
not part of what I would like to see committed.

Servus
Manfred

#21

Manfred Koizar

mkoi-pg@aon.at

almost 23 years ago

In reply to: Tom Lane (#18)

hackerspatches

#22