how to make this database / query faster
Hi
I use postgres v 8.3 on a dual quad core, intel xeon e5405@2.00GHz, fedora
core 8 x86_64, and 32GB RAM
settings i changed on postgresql.conf:
shared_buffers = 1000MB # min 128kB or max_connections*16kB
effective_cache_size = 4000MB
I have a user table structure is attached and I have around 2 million rows
and adding like 10k-30k rows everyday..
id is the primary key, and i have an index session_key
i iterate through the users table like this
select * from users where session_key is not Null order by id offset OFFSET
limit 300
i want to go through the whole table... it gets really slow like greater
than 5 minutes when the OFFSET is over 500,000..
what is the best way to iterate through the whole table? should i increase
the limit?
thanks a lot!
Attachments:
usertable.sqltext/x-sql; name=usertable.sqlDownload
On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:
select * from users where session_key is not Null order by id offset
OFFSET limit 300OFFSET actually scans past all of the records that specify in the
"OFFSET". So the bigger your offset the longer the scan will be.
One solution is to retain the last ID from the previous scan:
SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
LIMIT 300;
On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <
richard.broersma@gmail.com> wrote:
On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:
select * from users where session_key is not Null order by id offset
OFFSET limit 300One solution is to retain the last ID from the previous scan:
SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
LIMIT 300;
will this ensure that no row is repeated when i itereate over the table?
what are the rows ordered by?
thanks
On Sat, Mar 15, 2008 at 4:41 PM, mark <markkicks@gmail.com> wrote:
On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <
richard.broersma@gmail.com> wrote:On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:
select * from users where session_key is not Null order by id offset
OFFSET limit 300One solution is to retain the last ID from the previous scan:
SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
LIMIT 300;will this ensure that no row is repeated when i itereate over the table?
what are the rows ordered by?
thanks
Ya, sorry I forgot to include the order by.
SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
ORDER BY id
LIMIT 300;
Yes there will not be any repeated rows sence you are using a order set
that who's ID are greated than the last set.
Richard Broersma wrote:
On Sat, Mar 15, 2008 at 4:41 PM, mark <markkicks@gmail.com> wrote:
On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <
richard.broersma@gmail.com> wrote:On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:
select * from users where session_key is not Null order by id offset
OFFSET limit 300One solution is to retain the last ID from the previous scan:
SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
LIMIT 300;will this ensure that no row is repeated when i itereate over the table?
what are the rows ordered by?
thanksYa, sorry I forgot to include the order by.
SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
ORDER BY id
LIMIT 300;Yes there will not be any repeated rows sence you are using a order set
that who's ID are greated than the last set.
As there's an index on id would it be faster to transpose the WHERE
conditions?
WHERE id > your_last_id
AND session_key IS NOT NULL
I can't remember if the order of WHERE is significant.
On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:
Richard Broersma wrote:
On Sat, Mar 15, 2008 at 4:41 PM, mark <markkicks@gmail.com> wrote:
On Sat, Mar 15, 2008 at 4:37 PM, Richard Broersma <
richard.broersma@gmail.com> wrote:On Sat, Mar 15, 2008 at 4:21 PM, mark <markkicks@gmail.com> wrote:
select * from users where session_key is not Null order by id offset
OFFSET limit 300One solution is to retain the last ID from the previous scan:
SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
LIMIT 300;will this ensure that no row is repeated when i itereate over the
table?
what are the rows ordered by?
thanksYa, sorry I forgot to include the order by.
SELECT *
FROM Users
WHERE session_key IS NOT NULL
AND id > your_last_id
ORDER BY id
LIMIT 300;Yes there will not be any repeated rows sence you are using a order set
that who's ID are greated than the last set.As there's an index on id would it be faster to transpose the WHERE
conditions?WHERE id > your_last_id
AND session_key IS NOT NULLI can't remember if the order of WHERE is significant.
brian,
i have an index on session_key also
but i dont have a session key combined on both. should i do that?
mark wrote:
On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:
As there's an index on id would it be faster to transpose the WHERE
conditions?WHERE id > your_last_id
AND session_key IS NOT NULLI can't remember if the order of WHERE is significant.
brian,
i have an index on session_key also
Yeah, I should have asked, "If there's just the index on id ..."
but i dont have a session key combined on both. should i do that?
I'll defer to someone with more insight.
On Sat, 15 Mar 2008, mark <markkicks@gmail.com> writes:
select * from users where session_key is not Null order by id offset OFFSET limit 300
i want to go through the whole table... it gets really slow like
greater than 5 minutes when the OFFSET is over 500,000..
Did you try your chance with CURSORs? (See manual page for DECLARE.)
Regards.
On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:
As there's an index on id would it be faster to transpose the WHERE
conditions?WHERE id > your_last_id
AND session_key IS NOT NULLI can't remember if the order of WHERE is significant.
I don't think that the order of WHERE clause conditions are significant with
PostgreSQL since it has a sophisticated planner. One way to be sure would
be to examine the EXPLAIN plans for different where clause ordering to see
if the query plan changes depending upon the order specified.
Another way to make this query faster would be to create a partial index on
id;
CREATE UNIQUE INDEX Partial_id
ON Users
WHERE session_key IS NOT NULL;
This index would be much smaller if there are many session_keys that are
null, so it should be loaded as searched much faster.
Hi Richard-
My understanding is that Partial index is implemented for low cardinality scenarios ('Y'/'N') ('T'/'F') (null/not null) ?
http://en.wikipedia.org/wiki/Partial_index
Would it matter the selectivity is balanced?
thus 1 null record and 1 trillion null records would not apply
?
Martin-
----- Original Message -----
From: Richard Broersma
To: brian
Cc: pgsql-general@postgresql.org
Sent: Sunday, March 16, 2008 10:25 AM
Subject: Re: [GENERAL] how to make this database / query faster
On Sat, Mar 15, 2008 at 5:04 PM, brian <brian@zijn-digital.com> wrote:
As there's an index on id would it be faster to transpose the WHERE
conditions?
WHERE id > your_last_id
AND session_key IS NOT NULL
I can't remember if the order of WHERE is significant.
I don't think that the order of WHERE clause conditions are significant with PostgreSQL since it has a sophisticated planner. One way to be sure would be to examine the EXPLAIN plans for different where clause ordering to see if the query plan changes depending upon the order specified.
Another way to make this query faster would be to create a partial index on id;
CREATE UNIQUE INDEX Partial_id
ON Users
WHERE session_key IS NOT NULL;
This index would be much smaller if there are many session_keys that are null, so it should be loaded as searched much faster.
On Sun, Mar 16, 2008 at 9:47 AM, Martin Gainty <mgainty@hotmail.com> wrote:
My understanding is that Partial index is implemented for low cardinality
scenarios ('Y'/'N') ('T'/'F') (null/not null) ?
http://en.wikipedia.org/wiki/Partial_index
Low cardinality can apply for more than just boolean or null/not null?
Let say I wanted to run the majority of my select queries on users with the
name 'Richard'. Now of the billion users that I have, only 500 have the
name 'Richard'. Since 'Richard' only makes up an insignificant part of the
users table, have a partial index on 'Richard' would greatly improve select
query performance for these kinds of queries.
If your boolean fields T and F were about 50% even throughout your entire
trillion record table, a partial index wouldn't do much to help since 50%
isn't selective enough. The same thing applies for records that have an even
distribution of nulls and not nulls.
Would it matter the selectivity is balanced?
thus 1 null record
In this case, a partial index would be a really good idea if you were mostly
interested in records that *were* null. However, if you were most
interested in records that were not null in a table distribution like this,
then a partial index would not do much for you in this case.
and 1 trillion null records would not apply
once again, assuming that you are mostly interested in querying the NOT NULL
records in a mostly null record table, then a partial index would be a
really good idea for query speed improvement.
I hope I am making sense.
Regards,
Richard Broersma Jr.