diff-/patch-functionality for text-type data inside PostgreSQL

Started by Markus Wollnyalmost 17 years ago5 messagesgeneral
Jump to latest
#1Markus Wollny
Markus.Wollny@computec.de

Hi!

I want to implement a versioning system for text columns of a table
inside a PostgreSQL 8.3 database. As some of the changes to a text field
might be very small compared to the total text size, I'd prefer storing
diffs instead of full previous versions of the text and use a patch-like
function whenever I want to actually roll back to a certain version. I
know that I could probably handle this quite easily in the application
code, but I'd prefer some solution running on the database itself, so
that the application wouldn't have to know anything about storing the
diffs, instead that process would be handled by a ON UPDATE trigger.

So far I have been playing around with PL/PerlU for diff/path
functionality, using CPAN modules Text::Diff and Text::Patch, but
haven't been too successful, as there seems to be some issue with this
mechanism if the text data doesn't contain newlines. Just as an
off-topic info, because it's some issue with the CPAN modules, not with
PostgreSQL:

#!/usr/bin/perl
use Text::Patch;
use Text::Diff;
$src = "foo sdffasd";
$dst = "34asd sdf";
$diff = diff( \$src, \$dst, { STYLE => 'Unified' } );
print $diff . "\n";
$out = patch( $src, $diff, { STYLE => 'Unified' } );
print "Patch successful\n" if $out eq $dst;

Running this results in the following output:
@@ -1 +1 @@
-foo sdffasd+34asd sdf
Hunk #1 failed at line 1.

Anyway, has anybody already done something in this direction,
preferrably in some way that is purely pl/* and wouldn't require any
custom-made C-library? So far I have only found this interesting
description of the implementation of the very same functionality here:
http://www.ciselant.de/projects/pg_ci_diff/doc.html - but there's no
source code supplied for the libpg_ci_diff.so library.

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Furth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jurg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Markus Wollny (#1)
Re: diff-/patch-functionality for text-type data inside PostgreSQL

"Markus Wollny" <Markus.Wollny@computec.de> writes:

So far I have been playing around with PL/PerlU for diff/path
functionality, using CPAN modules Text::Diff and Text::Patch, but
haven't been too successful, as there seems to be some issue with this
mechanism if the text data doesn't contain newlines.

Almost all diff/patch functions operate line-by-line, so that hardly
seems surprising.

regards, tom lane

#3Martijn van Oosterhout
kleptog@svana.org
In reply to: Markus Wollny (#1)
Re: diff-/patch-functionality for text-type data inside PostgreSQL

On Mon, May 04, 2009 at 12:26:13PM +0200, Markus Wollny wrote:

So far I have been playing around with PL/PerlU for diff/path
functionality, using CPAN modules Text::Diff and Text::Patch, but
haven't been too successful, as there seems to be some issue with this
mechanism if the text data doesn't contain newlines. Just as an
off-topic info, because it's some issue with the CPAN modules, not with
PostgreSQL:

I've used the Algorithm::Diff module in the past with success. It works
on sequences of objects rather than just text but it works well. That
means you can diff on word or character level at your choice, and even
control what sequences you consider "equal". That said, it doesn't have
a patch function but that should be fairly easy to make. You'll need to
define your own storage format for the diff though.

http://search.cpan.org/~nedkonz/Algorithm-Diff-1.15/lib/Algorithm/Diff.pm

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#4Markus Wollny
Markus.Wollny@computec.de
In reply to: Tom Lane (#2)
Re: diff-/patch-functionality for text-type data inside PostgreSQL

Hi!

-----Ursprüngliche Nachricht-----
Von: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Gesendet: Montag, 4. Mai 2009 15:04

"Markus Wollny" <Markus.Wollny@computec.de> writes:

So far I have been playing around with PL/PerlU for diff/path
functionality, using CPAN modules Text::Diff and Text::Patch, but
haven't been too successful, as there seems to be some

issue with this

mechanism if the text data doesn't contain newlines.

Almost all diff/patch functions operate line-by-line, so that
hardly seems surprising.

Not so much surprising, no, but I hadn't expected it to fail altogether on entries that just end after one line of text just because they lack a newline character - they are a one line text after all, so I assumed that the diff would produce a "replace this old line with the new one" type of instruction instead of producing something that patch doesn't seem to be able to process at all.

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

#5Markus Wollny
Markus.Wollny@computec.de
In reply to: Martijn van Oosterhout (#3)
Re: diff-/patch-functionality for text-type data insidePostgreSQL

Hi!

-----Ursprüngliche Nachricht-----
Von: Martijn van Oosterhout [mailto:kleptog@svana.org]
Gesendet: Montag, 4. Mai 2009 15:30

I've used the Algorithm::Diff module in the past with
success. It works on sequences of objects rather than just
text but it works well. That means you can diff on word or
character level at your choice, and even control what
sequences you consider "equal". That said, it doesn't have a
patch function but that should be fairly easy to make. You'll
need to define your own storage format for the diff though.

http://search.cpan.org/~nedkonz/Algorithm-Diff-1.15/lib/Algori
thm/Diff.pm

Thank you - I have considered using Algorithm::Diff before, as Text::Diff seems to be based on that and one could, as you describe, create even more granular deltas than just line by line comparisons. The latter would however be fully sufficient for my needs, but I think I'll give Algorithm::Diff a closer look now :)

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276