Index on integer or on string field

Started by Jobalmost 11 years ago9 messagesgeneral
Jump to latest
#1Job
Job@colliniconsulting.it

Hello,

i have a table of about 10 millions of records, with the index on a string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify, with numerical integer values, the 50 recurring names.

Is index are integer and not characteral, performance are better and workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Arthur Silva
arthurprs@gmail.com
In reply to: Job (#1)
Re: Index on integer or on string field

You should probably experiment with a btree-gin index on those.
Em 15/05/2015 12:22, "Job" <Job@colliniconsulting.it> escreveu:

Show quoted text

Hello,

i have a table of about 10 millions of records, with the index on a string
field.
Actually is alphabetical; since queries are about 100/200 per seconds, i
was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and
we could create a second table to codify, with numerical integer values,
the 50 recurring names.

Is index are integer and not characteral, performance are better and
workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Job
Job@colliniconsulting.it
In reply to: Arthur Silva (#2)
R: Index on integer or on string field

Hello Arthur!

So, i read that btree-gin have got "the ability to enforce uniqueness".

If in this 10.millions long table i have, in index, 50 recurring values, i can leave the alphabetical field and change to btree-gin the index on it?!

Thank you!
Francesco

________________________________
Da: Arthur Silva [arthurprs@gmail.com]
Inviato: venerdì 15 maggio 2015 17.26
A: Job
Cc: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Index on integer or on string field

You should probably experiment with a btree-gin index on those.

Em 15/05/2015 12:22, "Job" <Job@colliniconsulting.it<mailto:Job@colliniconsulting.it>> escreveu:
Hello,

i have a table of about 10 millions of records, with the index on a string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify, with numerical integer values, the 50 recurring names.

Is index are integer and not characteral, performance are better and workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Rob Sargent
robjsargent@gmail.com
In reply to: Job (#3)
Re: R: Index on integer or on string field

Are you saying your indexed field has only 50 distinct values? Seems a horrible candidate for an index. Might be good to partition on those fifty values but ten million records probably doesn't warrant that.

Sent from my iPhone

Show quoted text

On May 15, 2015, at 9:34 AM, Job <Job@colliniconsulting.it> wrote:

Hello Arthur!

So, i read that btree-gin have got "the ability to enforce uniqueness".

If in this 10.millions long table i have, in index, 50 recurring values, i can leave the alphabetical field and change to btree-gin the index on it?!

Thank you!
Francesco

Da: Arthur Silva [arthurprs@gmail.com]
Inviato: venerdì 15 maggio 2015 17.26
A: Job
Cc: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Index on integer or on string field

You should probably experiment with a btree-gin index on those.
Em 15/05/2015 12:22, "Job" <Job@colliniconsulting.it> escreveu:

Hello,

i have a table of about 10 millions of records, with the index on a string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify, with numerical integer values, the 50 recurring names.

Is index are integer and not characteral, performance are better and workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Arthur Silva
arthurprs@gmail.com
In reply to: Job (#3)
Re: R: Index on integer or on string field

Yes that's my suggestion. Btree-Gin deals with lots of repeated values much
better than the Btree index as repeated keys are only stored once.
Em 15/05/2015 12:38, "Job" <Job@colliniconsulting.it> escreveu:

Show quoted text

Hello Arthur!

So, i read that btree-gin have got "the ability to enforce uniqueness".

If in this 10.millions long table i have, in index, 50 recurring values, i
can leave the alphabetical field and change to btree-gin the index on it?!

Thank you!
Francesco

------------------------------
*Da:* Arthur Silva [arthurprs@gmail.com]
*Inviato:* venerdì 15 maggio 2015 17.26
*A:* Job
*Cc:* pgsql-general@postgresql.org
*Oggetto:* Re: [GENERAL] Index on integer or on string field

You should probably experiment with a btree-gin index on those.
Em 15/05/2015 12:22, "Job" <Job@colliniconsulting.it> escreveu:

Hello,

i have a table of about 10 millions of records, with the index on a
string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i
was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and
we could create a second table to codify, with numerical integer values,
the 50 recurring names.

Is index are integer and not characteral, performance are better and
workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6William Dunn
dunnwjr@gmail.com
In reply to: Arthur Silva (#5)
Re: R: Index on integer or on string field

Hello Francesco,

You should probably set timing on, run an explain analyze, and use pgbadger
to diagnose your performance issue.

While it may be the case that comparison in the index might be slightly
faster because of the modulo arithmetic, those in-memory operations are
extremely fast and it is likely that the seek in that index is the fastest
part of your query. And since you only have 50 distinct values the btree is
probably extremely shallow and there will be very few comparisons anyway.

I don't know much about your query but I suspect that the issue is that
your index scan is not selective enough so Postgres needs to scan a lot
from disk (which is *extremely* slow). If you want to improve the
performance you should first try to make the query as selective as
possible, and try to put an index on a more selective column of the WHERE
clause.

If you really must rely primarily on that column which has only 50 distinct
values you can try periodically running a CLUSTER command on the table for
that column index (doc:
http://www.postgresql.org/docs/devel/static/sql-cluster.html), or partition
the table (doc:
http://www.postgresql.org/docs/devel/static/ddl-partitioning.html) so that
the data you are scanning is close together on disk and you can get as much
of it per IO operation as possible.

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com&gt;*

On Fri, May 15, 2015 at 1:32 PM, Arthur Silva <arthurprs@gmail.com> wrote:

Show quoted text

Yes that's my suggestion. Btree-Gin deals with lots of repeated values
much better than the Btree index as repeated keys are only stored once.
Em 15/05/2015 12:38, "Job" <Job@colliniconsulting.it> escreveu:

Hello Arthur!

So, i read that btree-gin have got "the ability to enforce uniqueness".

If in this 10.millions long table i have, in index, 50 recurring values,
i can leave the alphabetical field and change to btree-gin the index on it?!

Thank you!
Francesco

------------------------------
*Da:* Arthur Silva [arthurprs@gmail.com]
*Inviato:* venerdì 15 maggio 2015 17.26
*A:* Job
*Cc:* pgsql-general@postgresql.org
*Oggetto:* Re: [GENERAL] Index on integer or on string field

You should probably experiment with a btree-gin index on those.
Em 15/05/2015 12:22, "Job" <Job@colliniconsulting.it> escreveu:

Hello,

i have a table of about 10 millions of records, with the index on a
string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i
was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and
we could create a second table to codify, with numerical integer values,
the 50 recurring names.

Is index are integer and not characteral, performance are better and
workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Scott Marlowe
scott.marlowe@gmail.com
In reply to: Job (#1)
Re: Index on integer or on string field

On Fri, May 15, 2015 at 9:18 AM, Job <Job@colliniconsulting.it> wrote:

Hello,

i have a table of about 10 millions of records, with the index on a string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify, with numerical integer values, the 50 recurring names.

Is index are integer and not characteral, performance are better and workload reduces?

Is there any comparisons?

Have you considered using a multi-column index here? if there's a more
selective field you could index along with your rather non-selective
one that might work better. But it's hard to tell without looking at
you database usage etc.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8FarjadFarid(ChkNet)
farjad.farid@checknetworks.com
In reply to: Job (#1)
Re: Index on integer or on string field

Hi,

My approach would be to improve the uniqueness of each record/row. Otherwise
you'll have to traverse the entire table for every query. At 100/200 queries
per second you are asking for trouble on several fronts. Including crashing
your hard disk faster than need be.

Hope this helps. Good luck.

Best Regards

Farjad Farid

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Job
Sent: 15 May 2015 16:19
To: pgsql-general@postgresql.org
Subject: [GENERAL] Index on integer or on string field

Hello,

i have a table of about 10 millions of records, with the index on a string
field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was
looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and we
could create a second table to codify, with numerical integer values, the 50
recurring names.

Is index are integer and not characteral, performance are better and
workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Yves Dorfsman
yves@zioup.com
In reply to: Scott Marlowe (#7)
Re: Index on integer or on string field

On 2015-05-15 18:31, Scott Marlowe wrote:

On Fri, May 15, 2015 at 9:18 AM, Job <Job@colliniconsulting.it> wrote:

i have a table of about 10 millions of records, with the index on a string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify, with numerical integer values, the 50 recurring names.

Is index are integer and not characteral, performance are better and workload reduces?

Is there any comparisons?

Have you considered using a multi-column index here? if there's a more
selective field you could index along with your rather non-selective
one that might work better. But it's hard to tell without looking at
you database usage etc.

To add to Scott suggestion, think about what columns you include in your
queries when you are looking for a unique row (what columns are on the right
hand side of the WHEN clause). This will help you narrow down what makes
unique conditions on your table, from which you can create a better index.

--
http://yves.zioup.com
gpg: 4096R/32B0F416

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general