how to make this database / query faster

Started by markabout 18 years ago11 messagesgeneral
Jump to latest
#1mark
markkicks@gmail.com

Hi
I use postgres v 8.3 on a dual quad core, intel xeon e5405@2.00GHz, fedora
core 8 x86_64, and 32GB RAM

settings i changed on postgresql.conf:

shared_buffers = 1000MB # min 128kB or max_connections*16kB
effective_cache_size = 4000MB

I have a user table structure is attached and I have around 2 million rows
and adding like 10k-30k rows everyday..
id is the primary key, and i have an index session_key

i iterate through the users table like this

select * from users where session_key is not Null order by id offset OFFSET
limit 300

i want to go through the whole table... it gets really slow like greater
than 5 minutes when the OFFSET is over 500,000..

what is the best way to iterate through the whole table? should i increase
the limit?
thanks a lot!

Attachments:

usertable.sqltext/x-sql; name=usertable.sqlDownload
#2Richard Broersma
richard.broersma@gmail.com
In reply to: mark (#1)
Re: how to make this database / query faster

On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:

select * from users where session_key is not Null order by id offset
OFFSET limit 300

OFFSET actually scans past all of the records that specify in the

"OFFSET". So the bigger your offset the longer the scan will be.

One solution is to retain the last ID from the previous scan:

SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
LIMIT 300;

#3mark
markkicks@gmail.com
In reply to: Richard Broersma (#2)
Re: how to make this database / query faster

On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <
richard.broersma@gmail.com> wrote:

On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:

select * from users where session_key is not Null order by id offset
OFFSET limit 300

One solution is to retain the last ID from the previous scan:

SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
LIMIT 300;

will this ensure that no row is repeated when i itereate over the table?
what are the rows ordered by?
thanks

#4Richard Broersma
richard.broersma@gmail.com
In reply to: mark (#3)
Re: how to make this database / query faster

On Sat, Mar 15, 2008 at 4:41 PM, mark <markkicks@gmail.com> wrote:

On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <
richard.broersma@gmail.com> wrote:

On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:

select * from users where session_key is not Null order by id offset
OFFSET limit 300

One solution is to retain the last ID from the previous scan:

SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
LIMIT 300;

will this ensure that no row is repeated when i itereate over the table?
what are the rows ordered by?
thanks

Ya, sorry I forgot to include the order by.

SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
ORDER BY id
LIMIT 300;

Yes there will not be any repeated rows sence you are using a order set
that who's ID are greated than the last set.

#5brian
brian@zijn-digital.com
In reply to: Richard Broersma (#4)
Re: how to make this database / query faster

Richard Broersma wrote:

On Sat, Mar 15, 2008 at 4:41 PM, mark <markkicks@gmail.com> wrote:

On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <
richard.broersma@gmail.com> wrote:

On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:

select * from users where session_key is not Null order by id offset
OFFSET limit 300

One solution is to retain the last ID from the previous scan:

SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
LIMIT 300;

will this ensure that no row is repeated when i itereate over the table?
what are the rows ordered by?
thanks

Ya, sorry I forgot to include the order by.

SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
ORDER BY id
LIMIT 300;

Yes there will not be any repeated rows sence you are using a order set
that who's ID are greated than the last set.

As there's an index on id would it be faster to transpose the WHERE
conditions?

WHERE id > your_last_id
AND session_key IS NOT NULL

I can't remember if the order of WHERE is significant.

#6mark
markkicks@gmail.com
In reply to: brian (#5)
Re: how to make this database / query faster

On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:

Richard Broersma wrote:

On Sat, Mar 15, 2008 at 4:41 PM, mark <markkicks@gmail.com> wrote:

On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <
richard.broersma@gmail.com> wrote:

On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:

select * from users where session_key is not Null order by id offset
OFFSET limit 300

One solution is to retain the last ID from the previous scan:

SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
LIMIT 300;

will this ensure that no row is repeated when i itereate over the

table?

what are the rows ordered by?
thanks

Ya, sorry I forgot to include the order by.

SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
ORDER BY id
LIMIT 300;

Yes there will not be any repeated rows sence you are using a order set
that who's ID are greated than the last set.

As there's an index on id would it be faster to transpose the WHERE
conditions?

WHERE id > your_last_id
AND session_key IS NOT NULL

I can't remember if the order of WHERE is significant.

brian,
i have an index on session_key also
but i dont have a session key combined on both. should i do that?

#7brian
brian@zijn-digital.com
In reply to: mark (#6)
Re: how to make this database / query faster

mark wrote:

On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:

As there's an index on id would it be faster to transpose the WHERE
conditions?

WHERE id > your_last_id
AND session_key IS NOT NULL

I can't remember if the order of WHERE is significant.

brian,
i have an index on session_key also

Yeah, I should have asked, "If there's just the index on id ..."

but i dont have a session key combined on both. should i do that?

I'll defer to someone with more insight.

#8Volkan YAZICI
yazicivo@ttmail.com
In reply to: mark (#1)
Re: how to make this database / query faster

On Sat, 15 Mar 2008, mark <markkicks@gmail.com> writes:

select * from users where session_key is not Null order by id offset OFFSET limit 300

i want to go through the whole table... it gets really slow like
greater than 5 minutes when the OFFSET is over 500,000..

Did you try your chance with CURSORs? (See manual page for DECLARE.)

Regards.

#9Richard Broersma
richard.broersma@gmail.com
In reply to: brian (#5)
Re: how to make this database / query faster

On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:

As there's an index on id would it be faster to transpose the WHERE
conditions?

WHERE id > your_last_id
AND session_key IS NOT NULL

I can't remember if the order of WHERE is significant.

I don't think that the order of WHERE clause conditions are significant with
PostgreSQL since it has a sophisticated planner. One way to be sure would
be to examine the EXPLAIN plans for different where clause ordering to see
if the query plan changes depending upon the order specified.

Another way to make this query faster would be to create a partial index on
id;

CREATE UNIQUE INDEX Partial_id
ON Users
WHERE session_key IS NOT NULL;

This index would be much smaller if there are many session_keys that are
null, so it should be loaded as searched much faster.

#10Martin Gainty
mgainty@hotmail.com
In reply to: mark (#1)
Re: how to make this database / query faster

Hi Richard-

My understanding is that Partial index is implemented for low cardinality scenarios ('Y'/'N') ('T'/'F') (null/not null) ?
http://en.wikipedia.org/wiki/Partial_index

Would it matter the selectivity is balanced?
thus 1 null record and 1 trillion null records would not apply

?
Martin-
----- Original Message -----
From: Richard Broersma
To: brian
Cc: pgsql-general@postgresql.org
Sent: Sunday, March 16, 2008 10:25 AM
Subject: Re: [GENERAL] how to make this database / query faster

On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:

As there's an index on id would it be faster to transpose the WHERE
conditions?

WHERE id > your_last_id
AND session_key IS NOT NULL

I can't remember if the order of WHERE is significant.

I don't think that the order of WHERE clause conditions are significant with PostgreSQL since it has a sophisticated planner. One way to be sure would be to examine the EXPLAIN plans for different where clause ordering to see if the query plan changes depending upon the order specified.

Another way to make this query faster would be to create a partial index on id;

CREATE UNIQUE INDEX Partial_id
ON Users
WHERE session_key IS NOT NULL;

This index would be much smaller if there are many session_keys that are null, so it should be loaded as searched much faster.

#11Richard Broersma
richard.broersma@gmail.com
In reply to: Martin Gainty (#10)
Re: how to make this database / query faster

On Sun, Mar 16, 2008 at 9:47 AM, Martin Gainty <mgainty@hotmail.com> wrote:

My understanding is that Partial index is implemented for low cardinality
scenarios ('Y'/'N') ('T'/'F') (null/not null) ?
http://en.wikipedia.org/wiki/Partial_index

Low cardinality can apply for more than just boolean or null/not null?

Let say I wanted to run the majority of my select queries on users with the
name 'Richard'. Now of the billion users that I have, only 500 have the
name 'Richard'. Since 'Richard' only makes up an insignificant part of the
users table, have a partial index on 'Richard' would greatly improve select
query performance for these kinds of queries.

If your boolean fields T and F were about 50% even throughout your entire
trillion record table, a partial index wouldn't do much to help since 50%
isn't selective enough. The same thing applies for records that have an even
distribution of nulls and not nulls.

Would it matter the selectivity is balanced?
thus 1 null record

In this case, a partial index would be a really good idea if you were mostly
interested in records that *were* null. However, if you were most
interested in records that were not null in a table distribution like this,
then a partial index would not do much for you in this case.

and 1 trillion null records would not apply

once again, assuming that you are mostly interested in querying the NOT NULL
records in a mostly null record table, then a partial index would be a
really good idea for query speed improvement.

I hope I am making sense.

Regards,
Richard Broersma Jr.