is there a relationship between indexes and temporary file creation?

Started by Seref Arikanabout 11 years ago5 messagesgeneral
Jump to latest
#1Seref Arikan
serefarikan@kurumsalteknoloji.com

Greetings,
I have a table with 10s of millions of rows and I'm running a fairly
complex query with a lot of self joins.

This is an experimental db and the table structure needs to stay as it is.
I've noticed that some queries are not completed because postgres uses all
available space on disk for temporary data when doing complex joins.

The table has no indexes at the moment; I'm measuring performance with and
without indexes so I don't mind long query times. Running out of disk space
is a problem though.I can't think of a connection between indexes and the
temporary space needed for joins but I hope I can get some input from those
with (a lot) more knowledge about potgres internals.

Is there any possibility of indexing decreasing the required temporary disk
space when performing complex joins?

Best regards
Seref

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Seref Arikan (#1)
Re: is there a relationship between indexes and temporary file creation?

Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:

I have a table with 10s of millions of rows and I'm running a fairly
complex query with a lot of self joins.

This is an experimental db and the table structure needs to stay as it is.
I've noticed that some queries are not completed because postgres uses all
available space on disk for temporary data when doing complex joins.

The table has no indexes at the moment; I'm measuring performance with and
without indexes so I don't mind long query times. Running out of disk space
is a problem though.I can't think of a connection between indexes and the
temporary space needed for joins but I hope I can get some input from those
with (a lot) more knowledge about potgres internals.

Is there any possibility of indexing decreasing the required temporary disk
space when performing complex joins?

Hm. Your alternatives don't seem terribly good here. Presumably you're
running out of space because either a merge or hash join will spill temp
data to disk if there's too much data. While an indexed nestloop join
wouldn't have that problem, for the amount of data you're apparently
dealing with, you might be dead of old age before it finishes :-(.
A slightly more promising idea is that an indexscan might substitute for
the sort needed by a mergejoin, so again not needing temp space; but
this still likely implies a whole lot of random I/O which will take
forever, unless maybe your data is more or less in order by the join key.

In short I doubt that indexes will be magic bullets for enormous joins.
You probably need to be smarter about what the queries are asking for.
But without a lot more detail than you've provided here, we're not going
to be able to help much about that.

If you'd like additional advice, I'd recommend reviewing
https://wiki.postgresql.org/wiki/Slow_Query_Questions
and then taking the discussion to the pgsql-performance list.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Seref Arikan
serefarikan@kurumsalteknoloji.com
In reply to: Tom Lane (#2)
Re: is there a relationship between indexes and temporary file creation?

Thanks Tom,
This is helpful. I'm unable to share the queries but your reasoning helps
nonetheless.

Another idea that popped into my head is to consider table partitioning. Am
I correct to assume that (inevitable) joins could benefit from partitions
when sub query criteria constraints results to a particular partition?
I know this is all too vague, but even brief opinions would help. I knew
the current approach would converge to infinity in terms of query time :)
My findings support that but I'm trying to write down some potential
remedies and partial improvements as future directions for research.

Best regards
Seref

On Sat, Mar 14, 2015 at 7:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:

I have a table with 10s of millions of rows and I'm running a fairly
complex query with a lot of self joins.

This is an experimental db and the table structure needs to stay as it

is.

I've noticed that some queries are not completed because postgres uses

all

available space on disk for temporary data when doing complex joins.

The table has no indexes at the moment; I'm measuring performance with

and

without indexes so I don't mind long query times. Running out of disk

space

is a problem though.I can't think of a connection between indexes and the
temporary space needed for joins but I hope I can get some input from

those

with (a lot) more knowledge about potgres internals.

Is there any possibility of indexing decreasing the required temporary

disk

space when performing complex joins?

Hm. Your alternatives don't seem terribly good here. Presumably you're
running out of space because either a merge or hash join will spill temp
data to disk if there's too much data. While an indexed nestloop join
wouldn't have that problem, for the amount of data you're apparently
dealing with, you might be dead of old age before it finishes :-(.
A slightly more promising idea is that an indexscan might substitute for
the sort needed by a mergejoin, so again not needing temp space; but
this still likely implies a whole lot of random I/O which will take
forever, unless maybe your data is more or less in order by the join key.

In short I doubt that indexes will be magic bullets for enormous joins.
You probably need to be smarter about what the queries are asking for.
But without a lot more detail than you've provided here, we're not going
to be able to help much about that.

If you'd like additional advice, I'd recommend reviewing
https://wiki.postgresql.org/wiki/Slow_Query_Questions
and then taking the discussion to the pgsql-performance list.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Seref Arikan (#3)
Re: is there a relationship between indexes and temporary file creation?

Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:

Another idea that popped into my head is to consider table partitioning. Am
I correct to assume that (inevitable) joins could benefit from partitions
when sub query criteria constraints results to a particular partition?

That's too vague to answer really. My suspicion is that you'd still end
up having to modify the queries, ie hand-factoring the joins, but maybe
you wouldn't need to. You could experiment easily enough; just set up a
toy database with not much data in it and see if you get plans that don't
scan all the partitions.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Seref Arikan
serefarikan@kurumsalteknoloji.com
In reply to: Tom Lane (#4)
Re: is there a relationship between indexes and temporary file creation?

Thanks Tom,
I'll give it a try, the whole setup is synthetic anyway, should not be too
much of a trouble

Best regards
Seref

On Sun, Mar 15, 2015 at 3:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:

Another idea that popped into my head is to consider table partitioning.

Am

I correct to assume that (inevitable) joins could benefit from partitions
when sub query criteria constraints results to a particular partition?

That's too vague to answer really. My suspicion is that you'd still end
up having to modify the queries, ie hand-factoring the joins, but maybe
you wouldn't need to. You could experiment easily enough; just set up a
toy database with not much data in it and see if you get plans that don't
scan all the partitions.

regards, tom lane