Easiest way to compare the results of two queries row by row and column by column

Started by Jason Longalmost 13 years ago10 messagesgeneral

Jump to latest

Jason Long

mailing.lists@octgsoftware.com

almost 13 years ago

Can someone suggest the easiest way to compare the results from two
queries to make sure they are identical?

I am rewriting a large number of views and I want to make sure that
nothing is changes in the results.

Something like

select compare_results('select * from v_old', 'select * from v_new');

I would want this to check that the row count and each row matched
column by column.

I am hoping someone has already written something for this...

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

David G. Johnston

david.g.johnston@gmail.com

almost 13 years ago

In reply to: Jason Long (#1)

Re: Easiest way to compare the results of two queries row by row and column by column

Jason Long-2 wrote

Can someone suggest the easiest way to compare the results from two
queries to make sure they are identical?

First thing that comes to mind:

WITH
before_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) )
, after_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) )
, before_array AS (SELECT array_agg(before_qry) AS before_agg_array
FROM before_qry)
, after_array AS (SELECT array_agg(before_qry) AS after_agg_array FROM
before_qry)
SELECT *, before_agg_array = after_agg_array
FROM before_array CROSS JOIN after_array

Basically turn the resultsets into arrays (of composites) and then see if
the arrays are the same. This has issues with respect to column names and
comparable datatypes (i.e., if one column is bigint and the other is integer
they still compare equally).

One thought would to only allow a view name (and possibly, separately, the
ORDER BY clause). Catalog lookups can be used to check for identical view
output types.

No idea of something like this exists and is readily available.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5760215.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Jeff Janes

jeff.janes@gmail.com

almost 13 years ago

In reply to: Jason Long (#1)

Re: Easiest way to compare the results of two queries row by row and column by column

On Thu, Jun 20, 2013 at 3:18 PM, Jason Long
<mailing.lists@octgsoftware.com>wrote:

Can someone suggest the easiest way to compare the results from two
queries to make sure they are identical?

I am rewriting a large number of views and I want to make sure that
nothing is changes in the results.

Something like

select compare_results('select * from v_old', 'select * from v_new');

I'd run:

select * from v_old
except
select * from v_new ;

And then

select * from v_new
except
select * from v_old ;

Both should return no rows.

However, if the queries can contain duplicate rows this will not detect
differences in the number of times a row is replicated, i.e. if one query
has a row 2 times and the other has it 3 times. If you need to detect such
cases, I'd probably \copy out each query to a file, then use system tools
to sort and diff the files.

Cheers,

Jeff

Jason Long

mailing.lists@octgsoftware.com

almost 13 years ago

In reply to: David G. Johnston (#2)

Re: Re: Easiest way to compare the results of two queries row by row and column by column

Thank you. I will give it a try. I have never used WITH before.

Thank you for the tips.

On Thu, 2013-06-20 at 16:05 -0700, David Johnston wrote:

Show quoted text

Jason Long-2 wrote

Can someone suggest the easiest way to compare the results from two
queries to make sure they are identical?

First thing that comes to mind:

WITH
before_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) )
, after_qry (col1, col2, col3) AS ( VALUES (1,1,1),(2,2,2),(3,3,3) )
, before_array AS (SELECT array_agg(before_qry) AS before_agg_array
FROM before_qry)
, after_array AS (SELECT array_agg(before_qry) AS after_agg_array FROM
before_qry)
SELECT *, before_agg_array = after_agg_array
FROM before_array CROSS JOIN after_array

Basically turn the resultsets into arrays (of composites) and then see if
the arrays are the same. This has issues with respect to column names and
comparable datatypes (i.e., if one column is bigint and the other is integer
they still compare equally).

One thought would to only allow a view name (and possibly, separately, the
ORDER BY clause). Catalog lookups can be used to check for identical view
output types.

No idea of something like this exists and is readily available.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5760215.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Vik Fearing

vik@postgresfriends.org

almost 13 years ago

In reply to: Jeff Janes (#3)

Re: Easiest way to compare the results of two queries row by row and column by column

On 06/21/2013 01:07 AM, Jeff Janes wrote:

On Thu, Jun 20, 2013 at 3:18 PM, Jason Long
<mailing.lists@octgsoftware.com
<mailto:mailing.lists@octgsoftware.com>> wrote:

Can someone suggest the easiest way to compare the results from two
queries to make sure they are identical?

I am rewriting a large number of views and I want to make sure that
nothing is changes in the results.

Something like

select compare_results('select * from v_old', 'select * from v_new');

I'd run:

select * from v_old
except
select * from v_new ;

And then

select * from v_new
except
select * from v_old ;

Both should return no rows.

This is my solution as well.

However, if the queries can contain duplicate rows this will not
detect differences in the number of times a row is replicated, i.e. if
one query has a row 2 times and the other has it 3 times. If you need
to detect such cases, I'd probably \copy out each query to a file,
then use system tools to sort and diff the files.

No need, just use EXCEPT ALL.

Basically, the following query should return nothing:

(TABLE v_old EXCEPT ALL TABLE v_new)
UNION ALL
(TABLE v_new EXCEPT ALL TABLE v_old);

--
Vik

gilad905

gilad905@gmail.com

over 9 years ago

In reply to: Vik Fearing (#5)

Re: Easiest way to compare the results of two queries row by row and column by column

Vik, note that your new suggestion for a query might be more
readable/efficient than Jeff's, but IT STILL DOESN'T SOLVE THE PROBLEM OF
DETECTING DUPLICATE ROWS.

--
View this message in context: http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5916761.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Branden Visser

mrvisser@gmail.com

over 9 years ago

In reply to: gilad905 (#6)

Re: Re: Easiest way to compare the results of two queries row by row and column by column

On Wed, Aug 17, 2016 at 4:58 AM, gilad905 <gilad905@gmail.com> wrote:

Vik, note that your new suggestion for a query might be more
readable/efficient than Jeff's, but IT STILL DOESN'T SOLVE THE PROBLEM OF
DETECTING DUPLICATE ROWS.

I've done this in the past and in my use-case it was easy enough to
export the rows to CSV with a sort, and then use a diff tool to ensure
they're identical. My data had 100's of thousands of rows, but
obviously with massive tables this may not be appropriate.

If you want more details on this I can find the exact psql client
query form I used to get a CSV file of the results.

Hope that helps.

Branden

--
View this message in context: http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5916761.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Vik Fearing

vik@postgresfriends.org

over 9 years ago

In reply to: gilad905 (#6)

Re: Re: Easiest way to compare the results of two queries row by row and column by column

On 17/08/16 10:58, gilad905 wrote:

Vik, note that your new suggestion for a query might be more
readable/efficient than Jeff's, but IT STILL DOESN'T SOLVE THE PROBLEM OF
DETECTING DUPLICATE ROWS.

Yes, it does.

And please don't shout at me.
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

gilad905

gilad905@gmail.com

over 9 years ago

In reply to: Vik Fearing (#8)

Re: Easiest way to compare the results of two queries row by row and column by column

Didn't mean to give the impression I'm 'shouting' :) just wanted to make sure
other users will notice my reply.
But you're right! it does. It even disregards the order of the rows. For
some reason in a previous test I looked to me like it didn't.

--
View this message in context: http://postgresql.nabble.com/Easiest-way-to-compare-the-results-of-two-queries-row-by-row-and-column-by-column-tp5760209p5917059.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

anupping

anupping@gmail.com

almost 7 years ago

In reply to: Jason Long (#1)

Re: Easiest way to compare the results of two queries row by row and column by column

select * from (select id, my_age, ur_age as urage from datatable where
agegp != 'S') m where m.myage <> m.urage

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html