PostgreSQL and Real Application Testing (RAT)

Started by ROS Didierover 6 years ago5 messages
#1ROS Didier
didier.ros@edf.fr

Hi

In my business, one of the things blocking the migration from Oracle to PostgreSQL is not having the equivalent of Oracle Real Application Testing .
This product captures a charge in production and replay it in a test environment.
this allows to know the impacts of a migration to a newer version, the creation of an index..
is there an equivalent in the PostgreSQL community?
if not, do you think it's technically possible to do it ?
who would be interested in this project ?

Thanks in advance
Best Regards

Didier ROS
EDF

#2Thomas Kellerer
shammat@gmx.net
In reply to: ROS Didier (#1)
Re: PostgreSQL and Real Application Testing (RAT)

ROS Didier schrieb am 27.08.2019 um 12:47:

In my business, one of the things blocking the migration from Oracle
to PostgreSQL is not having the equivalent of Oracle Real Application
Testing .

This product captures a charge in production and replay it in a test
environment.

this allows to know the impacts of a migration to a newer version,
the creation of an index..

is there an equivalent in the PostgreSQL community?

if not, do you think it's technically possible to do it?

who would be interested in this project?

Not sure how up-to-date that is, but you might want to have a look here:

https://wiki.postgresql.org/wiki/Statement_Playback

#3Jaime Casanova
jaime.casanova@2ndquadrant.com
In reply to: ROS Didier (#1)
Re: PostgreSQL and Real Application Testing (RAT)

On Tue, 27 Aug 2019 at 05:47, ROS Didier <didier.ros@edf.fr> wrote:

Hi

In my business, one of the things blocking the migration from Oracle to
PostgreSQL is not having the equivalent of Oracle Real Application Testing .

This product captures a charge in production and replay it in a test
environment.

this allows to know the impacts of a migration to a newer version, the
creation of an index..

is there an equivalent in the PostgreSQL community?

I used https://github.com/laurenz/pgreplay recently to re-execute the
queries sent to a pg9.1 in a pg11. It was very useful to find queries that
are affected but changes in default values of GUCs.

Normally, a query that works in an old version will work in a new one; but
this is useful to catch the few that don't if any

--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: ROS Didier (#1)
Re: PostgreSQL and Real Application Testing (RAT)

On Tue, Aug 27, 2019 at 3:47 AM ROS Didier <didier.ros@edf.fr> wrote:

Hi

In my business, one of the things blocking the migration from Oracle to
PostgreSQL is not having the equivalent of Oracle Real Application Testing .

This product captures a charge in production and replay it in a test
environment.

this allows to know the impacts of a migration to a newer version, the
creation of an index..

is there an equivalent in the PostgreSQL community?

if not, do you think it's technically possible to do it ?

who would be interested in this project ?

Replaying workload might or might not apply well to your case.

There are several major difficulties if you want to replay workload:

1) How to "record" workload. You need to write all your queries to the
Postgres log. Three problems here:
1a) pgreplay expects log_statements = 'all' while you might prefer
dealing with log_min_duration_statement instead. This is a minor issue
though, quite easy to solve with preprocessing.
1b) under heavy load, log_min_duration_statement = 0 (or log_statements =
'all') will lead to performance degradation or even downtime. Possible
solutions are: write to memory, or don't write at all but send over the
network.
1c) ideally, recoding just queries is not enough. To replay workload "as
is", we need to replay queries with known plans. There is no easy solution
to this problem in the Postgres ecosystem yet.

A couple of additional points regarding item 1b and 1c. In Postgres 12,
there is a cool new capability: sampling for query logging,
implemented by Adrien
Nayrat https://commitfest.postgresql.org/20/1691/ WIth this, it will be
possible to fully log, say, 5% of all transactions and use it for
replaying. Moreover, with auto_explain, it will be possible to have plans!
Open questions are: (a) how to determine, if N% is enough, and (b) how to
replay with specified plans. [If anyone is interested in working in this
direction – please reach out to me.]

2) Issues with replaying itself. I can highlight at least two problems here:
2a) pgreplay might be not enough for your workload, it doesn't scale
well. If interested, look at its analog written in Go,
https://github.com/gocardless/pgreplay-go, but this is quite a young
project.
2b) Postgres logs have millisecond precision (if you switched from %t to
%m in log_line_prefix), this might be not enough. There is a patch to
microsecond precision from David Fetter
/messages/by-id/20181023185050.GE6049@fetter.org,
but that conversation hasn't yet led to commit.

Another approach you might be interested in -- workload simulation. This is
what we (Postgres.ai) now used in most times when building "lab"
environments for our clients. The idea is as follows:
- carefully analyze workload using pg_stat_statements (here, our
open-source tool called "postgres-checkup"
https://gitlab.com/postgres-ai/postgres-checkup might be helpful, see
reports in section K),
- take the most resource-consuming query groups (Top-N ordered by
total_time),
- create a set of files with statements with randomly filled parameters
(won't work for most cases, I discuss restrictions below),
- use pgbench, feed workload files to it, using multiple -f options, with
balancing (-f filename@XX, where XX is to be taked from
pg_statements_analysis, but this time, "calls" and their ratio in the whole
workload will be needed -- again, postgres-checkup can help here).
- run, analyze, compare behavior.

Restrictions of this approach are obvious:
- doesn't work well if most of your transactions have multiple statements,
- in many cases, randomization is hard (not obvious how to organize;
synthetic approach is far from real data distribution in storage and
workload; etc),
- the approach requires a significant amount of manual efforts.

However, the "workload simulation" approach is an extremely helpful
approach in many cases, helping with change management. It doesn't require
anything that might negatively affect your production workload, it utilizes
pgbench (or any other tool) which is reliable, has great features and
scales well.

You might be interested in looking at our tool that we built to conduct a
huge amount of DB experiments, Nancy CLI
https://gitlab.com/postgres-ai/nancy. It supports both "workload replay"
method (with pgreplay) and "workload simulation" (with pgbench). PM me if
you're interested in discussing details.

Thanks,
Nik

#5Jaime Casanova
jaime.casanova@2ndquadrant.com
In reply to: Nikolay Samokhvalov (#4)
Re: PostgreSQL and Real Application Testing (RAT)

On Tue, 27 Aug 2019 at 19:33, Nikolay Samokhvalov <samokhvalov@gmail.com>
wrote:

On Tue, Aug 27, 2019 at 3:47 AM ROS Didier <didier.ros@edf.fr> wrote:

Hi

In my business, one of the things blocking the migration from Oracle to
PostgreSQL is not having the equivalent of Oracle Real Application Testing .

This product captures a charge in production and replay it in a test
environment.

this allows to know the impacts of a migration to a newer version, the
creation of an index..

is there an equivalent in the PostgreSQL community?

if not, do you think it's technically possible to do it ?

who would be interested in this project ?

Replaying workload might or might not apply well to your case.

There are several major difficulties if you want to replay workload:

1) How to "record" workload. You need to write all your queries to the
Postgres log. Three problems here:
1a) pgreplay expects log_statements = 'all' while you might prefer
dealing with log_min_duration_statement instead. This is a minor issue
though, quite easy to solve with preprocessing.
1b) under heavy load, log_min_duration_statement = 0 (or log_statements
= 'all') will lead to performance degradation or even downtime. Possible
solutions are: write to memory, or don't write at all but send over the
network.
1c) ideally, recoding just queries is not enough. To replay workload "as
is", we need to replay queries with known plans. There is no easy solution
to this problem in the Postgres ecosystem yet.

why? i prefer queries to take advantage of new plans for example if i'm
migrating from 9.5 to 9.6+ i would prefer that, when replaying, the queries
use parallel plans so i quickly get if that would somehow be a problem (for
example by using more cpu than before)

--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services