test data for query optimizer

Started by Nonameover 23 years ago5 messages
#1Noname
nconway@klamath.dyndns.org

I'd like to look at the performance of the query optimizer (both the
traditional one and GEQO) when joining large numbers of tables: 10-15,
or more. In order to do that (and to get meaningful results), I'll
need to work with some data that actually requires joins of that
magnitude. Ideally, I'd like the data to be somewhat realistic -- so
that the performance I'm seeing will reflect the performance a typical
user might see. (i.e. I don't want an artificial benchmark)

However, I don't possess any data of that nature, and I'm unsure
where I can find some (or how to generate some of my own). Does
anyone know of:

- a freely available collection of data that requires queries
of this type, and is reasonably representative of "real world"
applications

- or, a means to generate programatically some data that
fits the above criteria.

Thanks in advance,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: test data for query optimizer

nconway@klamath.dyndns.org (Neil Conway) writes:

I'd like to look at the performance of the query optimizer (both the
traditional one and GEQO) when joining large numbers of tables: 10-15,
or more. In order to do that (and to get meaningful results), I'll
need to work with some data that actually requires joins of that
magnitude.

The easiest way to construct a realistic many-way join is to use a star
schema. Here you have a primary "fact table" that includes a lot of
columns that individually join to the primary keys of other "detail
tables". For example, you might have a column "State" in the fact table
with values like "PA", "NY", etc, and you want to join it to a table
states(abbrev,fullname,...) so your query can display "Pennsylvania",
"New York", etc. It's easy to make up realistic examples that involve
any number of joins.

This is of course only one usage pattern for lots-o-joins, so don't put
too much credence in it alone as a benchmark, but it's certainly a
widely used pattern.

Searching for "star schema" at Google turned up some interesting things
last time I tried it.

regards, tom lane

#3Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Noname (#1)
Re: test data for query optimizer

What about the OSDB benchmark? Does that contain a large dataset?

Chris

----- Original Message -----
From: "Neil Conway" <nconway@klamath.dyndns.org>
To: "PostgreSQL Hackers" <pgsql-hackers@postgresql.org>
Sent: Saturday, July 13, 2002 12:05 AM
Subject: [HACKERS] test data for query optimizer

Show quoted text

I'd like to look at the performance of the query optimizer (both the
traditional one and GEQO) when joining large numbers of tables: 10-15,
or more. In order to do that (and to get meaningful results), I'll
need to work with some data that actually requires joins of that
magnitude. Ideally, I'd like the data to be somewhat realistic -- so
that the performance I'm seeing will reflect the performance a typical
user might see. (i.e. I don't want an artificial benchmark)

However, I don't possess any data of that nature, and I'm unsure
where I can find some (or how to generate some of my own). Does
anyone know of:

- a freely available collection of data that requires queries
of this type, and is reasonably representative of "real world"
applications

- or, a means to generate programatically some data that
fits the above criteria.

Thanks in advance,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#4Mark kirkwood
markir@slingshot.co.nz
In reply to: Noname (#1)
Re: test data for query optimizer

What about the TPC-H benchmark ?

I cant recall if it has more than 10 tables, but it seemed like the
queries were "quite good" for a benchmark. In addition it comes with a
data generator.

regards

Mark

Show quoted text

On Sat, 2002-07-13 at 04:05, Neil Conway wrote:
I'd like to look at the performance of the query optimizer (both the
traditional one and GEQO) when joining large numbers of tables: 10-15,
or more. In order to do that (and to get meaningful results), I'll
need to work with some data that actually requires joins of that
magnitude. Ideally, I'd like the data to be somewhat realistic -- so
that the performance I'm seeing will reflect the performance a typical
user might see. (i.e. I don't want an artificial benchmark)

#5Noname
nconway@klamath.dyndns.org
In reply to: Christopher Kings-Lynne (#3)
Re: test data for query optimizer

On Sat, Jul 13, 2002 at 11:18:14AM +0800, Christopher Kings-Lynne wrote:

What about the OSDB benchmark? Does that contain a large dataset?

No -- it only uses 5 relations total, with the most complex query
only involving 4 joins.

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC