How to improve performance in reporting database?

Started by Matthew Wilsonover 15 years ago4 messagesgeneral
Jump to latest
#1Matthew Wilson
matt@tplus1.com

I have a daily job that pushes data from the production database into
the reporting database, which right now, is an exact copy.

I have a webapp that builds lots of reports for users. Most of these
reports involve elaborate joins of lookup tables and lots of summations,
and they take too long to run, even after using everything I know to
tune the queries.

Since I know this is a read-only data, it seems like I should be able to
speed everything up dramatically if I run the queries offline and then
save the results into new tables. Then the web app could just grab the
cached results out of these new tables and then spit them out quickly.

I've heard people talking about using "materialized views" for this, but
that was with Oracle.

What's the postgresql way here?

More generally, any advice on running reporting databases well is
welcome.

Matt

#2Greg Smith
gsmith@gregsmith.com
In reply to: Matthew Wilson (#1)
Re: How to improve performance in reporting database?

Matthew Wilson wrote:

I've heard people talking about using "materialized views" for this, but
that was with Oracle.

You can build those manually with PostgreSQL if you really want them:
http://wiki.postgresql.org/wiki/Materialized_Views

The fundamental architecture is sound for a lot of problems in this
area, you just have to figure out how to build them efficiently. In
your case, you might just consider if there's a way way to update the MV
in batches, rather than rely on triggers to keep the data up to date,
after each data import.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us

#3Scott Marlowe
scott.marlowe@gmail.com
In reply to: Matthew Wilson (#1)
Re: How to improve performance in reporting database?

On Thu, Jul 22, 2010 at 8:45 AM, Matthew Wilson <matt@tplus1.com> wrote:

I have a daily job that pushes data from the production database into
the reporting database, which right now, is an exact copy.

I have a webapp that builds lots of reports for users.  Most of these
reports involve elaborate joins of lookup tables and lots of summations,
and they take too long to run, even after using everything I know to
tune the queries.

Since I know this is a read-only data, it seems like I should be able to
speed everything up dramatically if I run the queries offline and then
save the results into new tables.  Then the web app could just grab the
cached results out of these new tables and then spit them out quickly.

I've heard people talking about using "materialized views" for this, but
that was with Oracle.

What's the postgresql way here?

http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views

They're kinda roll your own, but they're not that hard to do.

More generally, any advice on running reporting databases well is
welcome.

Throw more drives and RAM at the problem, and use materialized views.
Also you're often better off with fewer faster cpus than more slower
ones for reporting servers (the opposite of OLTP where number of cores
is far more important.)

#4Vick Khera
vivek@khera.org
In reply to: Greg Smith (#2)
Re: How to improve performance in reporting database?

On Thu, Jul 22, 2010 at 2:31 PM, Greg Smith <greg@2ndquadrant.com> wrote:

You can build those manually with PostgreSQL if you really want them:
 http://wiki.postgresql.org/wiki/Materialized_Views

Another thing to consider... In our case we use a materialized view to
keep track of counts of various things that are expensive to count,
such as number of members of a specific account, or number of times a
particular URL was viewed. What we end up with is many processes
trying to update the same counter row, and we end up with lots and
lots of lock contention.

Which reminds me... Greg, are you done with your book yet and when can
I get you back in here to help with my locking problems? ;-)