DBMirror.pl performance change
I discovered a problem in DBMirror.pl, performance wise.
pending.c stores data in a way
very similar to the PgSQL input "\" escaped format.
When the field is of type bytea, and the source of data is binary, then
this produces 2 additional backslashes for every unprintable
char.
The performance in function extractData in DBMirror.pl, really suffers
from this condition, since it breaks data in chunks of "\" delimited
strings.
Informally speaking, performance tends to be O(n) where n is the size
of the data.
This can be remedied if we break data in chunks of "'" rather than "\".
"'" happens much more infrequently in common binary files (bz2, tiff, jpg,
pdf etc..), and if we notice that odd number of contained "\", signals an
intermidiate "'", whereas even number of "\" signals the final "'",
then we can make this routine run much faster.
I attach the new extractData function.
Now replicating a 400 k tiff takes 3 seconds instead of 12 minutes
it used to do.
I am wondering about the state of
http://www.whitebeam.org/library/guide/TechNotes/replicate.rhtm
Please feel free for any comments.
Pete could you test this new DBMirror.pl, to see how it behaves
in comparison with your C++ solution?
--
-Achilleus
Attachments:
extractData.pltext/plain; CHARSET=US-ASCII; NAME=extractData.plDownload
Peter,
It is much more convinient for you to make a test,
(just change the last function in DBmirror.pl), than for me
(grab whitebeam, compile for FreeBSD, etc...)
Of course you would need to use the original .conf format
than the one you are using now.
It would be interesting to see some numbers.
P.S.
Please include my address explicitly, pgsql-general comes
to me in digest mode.
--
-Achilleus
Import Notes
Reply to msg id not found: 20060123132338.40FDC5121EF@mx2.hub.org | Resolved by subject fallback
Achilleus Mantzios wrote:
Peter,
It is much more convinient for you to make a test,
(just change the last function in DBmirror.pl), than for me
(grab whitebeam, compile for FreeBSD, etc...)Of course you would need to use the original .conf format
than the one you are using now.It would be interesting to see some numbers.
P.S.
Please include my address explicitly, pgsql-general comes
to me in digest mode.
I'll take a look into this when I get a chance. Right now the only replicated systems I have are for live commercial clients - my development systems
aren't replicated, just backed-up periodically.
It is worth looking through the Perl version some more though. I'm pretty sure I worked around most of the escaping/unescaping when I looked at the
'C' version. I'm pretty sure some of the same approach could be used to improve performance of the Perl version. The main thing I found was that the
data table is un-escaped when read from the table and then re-escaped before being sent to the slave database. In practice the data doesn't have to be
touched.
My own preference right now is to stick with the C version now I have it. Replication is just about simultaneous with negligible CPU usage. When I get
a chance, I'm intending decoupling the 'C' version from the whole of Whitebeam so it can be built by itself. At the time I needed a solution quickly
so making use of a few Whitebeam utility classes got me there.
Pete