Point-in-time data recovery - v.7.4

Started by Rafael Martinezover 22 years ago53 messagesgeneral

r.m.guerrero@usit.uio.no

over 22 years ago

Hello

I am trying to find out when 'Point-in-time data recovery' functionality
will be available with postgreSQL but I can not find concrete info about
this.

References in mailinglists talk about version 7.4 and in the TODO list
is under the section 'urgent'.

Anybody knows when this functionality will be available with
prod-quality?

--
With regards
Rafael Martinez
USIT, University of Oslo

Franco Bruno Borghesi

franco@akyasociados.com.ar

over 22 years ago

In reply to: Rafael Martinez (#1)

Re: [GENERAL] Point-in-time data recovery - v.7.4

I think it was delayed until 7.5... same for win32 port.

Here ir Bruce's message talkin about both topics:
http://archives.postgresql.org/pgsql-hackers/2003-07/msg00284.php

On Tue, 2003-11-18 at 11:01, Rafael Martinez Guerrero wrote:

Show quoted text

Hello

I am trying to find out when 'Point-in-time data recovery' functionality
will be available with postgreSQL but I can not find concrete info about
this.

References in mailinglists talk about version 7.4 and in the TODO list
is under the section 'urgent'.

Anybody knows when this functionality will be available with
prod-quality?

Bruce Momjian

bruce@momjian.us

over 22 years ago

In reply to: Rafael Martinez (#1)

Re: Point-in-time data recovery - v.7.4

Rafael Martinez Guerrero <r.m.guerrero@usit.uio.no> writes:

Anybody knows when this functionality will be available with
prod-quality?

It's likely to show up in 7.5 which it is hoped would be released around the
middle of next year. However as this is free software and dependent on
volunteers and other people's priorities neither the feature set of the next
release nor the exact release date are promised.

--
greg

Robert Treat

xzilla@users.sourceforge.net

over 22 years ago

In reply to: Rafael Martinez (#1)

Re: Point-in-time data recovery - v.7.4

It was planned for 7.4 but got bumped since the work was not finished.
There are patches floating around and people are putting some effort
into it, so hopefully we will see something in 7.5, but it is dependent
on the code being finished before the end of 7.5 development cycle. If
you want to help code PITR please send a message to hackers.

Robert Treat

On Tue, 2003-11-18 at 09:01, Rafael Martinez Guerrero wrote:

Hello

I am trying to find out when 'Point-in-time data recovery' functionality
will be available with postgreSQL but I can not find concrete info about
this.

References in mailinglists talk about version 7.4 and in the TODO list
is under the section 'urgent'.

Anybody knows when this functionality will be available with
prod-quality?

--
With regards
Rafael Martinez
USIT, University of Oslo

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Andrew Sullivan

andrew@libertyrms.info

over 22 years ago

In reply to: Rafael Martinez (#1)

Re: [GENERAL] Point-in-time data recovery - v.7.4

On Tue, Nov 18, 2003 at 03:01:15PM +0100, Rafael Martinez Guerrero wrote:

References in mailinglists talk about version 7.4 and in the TODO list
is under the section 'urgent'.

It didn't get done.

Anybody knows when this functionality will be available with
prod-quality?

I don't think so.

Jan Wieck has a proposal for a new replication system which will
offer a trick for producing point in time as a side benefit. We are
aiming to have that software in use sooner rather than later, but it
hasn't been written yet.

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Ron Johnson

ron.l.johnson@cox.net

over 22 years ago

In reply to: Andrew Sullivan (#5)

Re: Point-in-time data recovery - v.7.4

On Tue, 2003-11-18 at 08:29, Andrew Sullivan wrote:

On Tue, Nov 18, 2003 at 03:01:15PM +0100, Rafael Martinez Guerrero wrote:

References in mailinglists talk about version 7.4 and in the TODO list
is under the section 'urgent'.

It didn't get done.

Anybody knows when this functionality will be available with
prod-quality?

I don't think so.

Jan Wieck has a proposal for a new replication system which will
offer a trick for producing point in time as a side benefit. We are
aiming to have that software in use sooner rather than later, but it
hasn't been written yet.

Supplanting PITR with Replication? If so, I don't think that's
a good idea, since you'd either have to buy a new server, or double
disk capacity, both of which can cost noticeable amounts of money,
if SCSI disks are used.

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

When Swedes start committing terrorism, I'll become suspicious of
Scandinavians.

Rick Gigger

rick@alpinenetworking.com

over 22 years ago

In reply to: Rafael Martinez (#1)

uploading files

What is the best method for storing files in postgres? Is it better to use
the large object functions or to just encode the data and store it in a
regular text or data field?

Andrew Sullivan

andrew@libertyrms.info

over 22 years ago

In reply to: Ron Johnson (#6)

Re: Point-in-time data recovery - v.7.4

On Tue, Nov 18, 2003 at 12:39:40PM -0600, Ron Johnson wrote:

Supplanting PITR with Replication? If so, I don't think that's

No, I think the idea is that if you're already using the replication
system, you can get this for nothing along with it. Yes, it'll cost
you in hardware.

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Afilias Canada                        Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110

Doug McNaught

doug@mcnaught.org

over 22 years ago

In reply to: Rafael Martinez (#1)

Re: uploading files

"Rick Gigger" <rick@alpinenetworking.com> writes:

What is the best method for storing files in postgres? Is it better to use
the large object functions or to just encode the data and store it in a
regular text or data field?

Depends on whether you need "random access" to the contents. You can
lo_seek() inside a large object and retrieve parts of the data with
lo_read(), while 'text' and 'bytea' currently require fetching the
whole field.

-Doug

Import Notes

Reply to msg id not found: RickGigger'smessageofTue18Nov2003121153-0700

#10

Keith C. Perry

netadmin@vcsn.com

over 22 years ago

In reply to: Rick Gigger (#7)

Re: uploading files

Quoting Rick Gigger <rick@alpinenetworking.com>:

What is the best method for storing files in postgres? Is it better to use
the large object functions or to just encode the data and store it in a
regular text or data field?

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Rick,

This has been discussed recently so you might want to dig through the archives
but one thing to be aware of is that with large objects, you have have to dump
your database in a different format- a non-text format. This is less portable
than storing files as bytea's which can be dumped in the text and non-text
formats. Argueably is all you use is PosgreSQL then this might not be a big deal.

Performance-wise I'm not qualified to speak to which is "better". As always, it
is going to depend on your specific application and environment.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

#11

Rick Gigger

rick@alpinenetworking.com

over 22 years ago

In reply to: Rafael Martinez (#1)

Re: uploading files

I will search the archives but does anyone know off the top of their head
which performs better?

----- Original Message -----
From: "Keith C. Perry" <netadmin@vcsn.com>
To: "Rick Gigger" <rick@alpinenetworking.com>
Cc: "PgSQL General ML" <pgsql-general@postgresql.org>
Sent: Tuesday, November 18, 2003 12:25 PM
Subject: Re: [GENERAL] uploading files

Quoting Rick Gigger <rick@alpinenetworking.com>:

What is the best method for storing files in postgres? Is it better to

use

the large object functions or to just encode the data and store it in a
regular text or data field?

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Rick,

This has been discussed recently so you might want to dig through the

archives

but one thing to be aware of is that with large objects, you have have to

dump

your database in a different format- a non-text format. This is less

portable

than storing files as bytea's which can be dumped in the text and non-text
formats. Argueably is all you use is PosgreSQL then this might not be a

big deal.

Performance-wise I'm not qualified to speak to which is "better". As

always, it

Show quoted text

is going to depend on your specific application and environment.

--
Keith C. Perry, MS E.E.
Director of Networks & Applications
VCSN, Inc.
http://vcsn.com

____________________________________
This email account is being host by:
VCSN, Inc : http://vcsn.com

#12

Doug McNaught

doug@mcnaught.org

over 22 years ago

In reply to: Rafael Martinez (#1)

Re: uploading files

"Rick Gigger" <rick@alpinenetworking.com> writes:

I will search the archives but does anyone know off the top of their head
which performs better?

It really depends on what you are trying to do. If you are always
fetching the entire file from the database, 'bytea' *might* have a
little less overhead, but it's not clearcut.

Both of them will be ultimately limited by how fast you can get data
off the disk and and pump it through your client connection.

-Doug

Import Notes

Reply to msg id not found: RickGigger'smessageofTue18Nov2003122924-0700

#13

Karsten Hilbert

Karsten.Hilbert@gmx.net

over 22 years ago

In reply to: Doug McNaught (#9)

Re: uploading files

Depends on whether you need "random access" to the contents. You can
lo_seek() inside a large object and retrieve parts of the data with
lo_read(), while 'text' and 'bytea' currently require fetching the
whole file.

Not so unless I misunderstand. We use substr() on bytea for
chunking access to large files on OSs with inferior TCP/IP buffer
implementations. That's probably suboptimal performance-wise,
though.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

#14

Doug McNaught

doug@mcnaught.org

over 22 years ago

In reply to: Rafael Martinez (#1)

Re: uploading files

Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:

Depends on whether you need "random access" to the contents. You can
lo_seek() inside a large object and retrieve parts of the data with
lo_read(), while 'text' and 'bytea' currently require fetching the
whole file.

Not so unless I misunderstand. We use substr() on bytea for
chunking access to large files on OSs with inferior TCP/IP buffer
implementations. That's probably suboptimal performance-wise,
though.

I think the field will still be competely loaded into memory on the
server side though, while LOs are stored in "chunks" and can
theoretically be streamed to the client. I'm not really a definitive
authority, though...

-Doug

Import Notes

Reply to msg id not found: KarstenHilbert'smessageofTue18Nov2003205058+0100

#15

Karsten Hilbert

Karsten.Hilbert@gmx.net

over 22 years ago

In reply to: Doug McNaught (#14)

Re: uploading files

I think the field will still be competely loaded into memory on the
server side though, while LOs are stored in "chunks" and can
theoretically be streamed to the client. I'm not really a definitive
authority, though...

Ah ! Sounds about right ! Something new to learn every day :-)

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

#16

Rick Gigger

rick@alpinenetworking.com

over 22 years ago

In reply to: Rafael Martinez (#1)

performance problem

I am currently trying to import a text data file without about 45,000
records. At the end of the import it does an update on each of the 45,000
records. Doing all of the inserts completes in a fairly short amount of
time (about 2 1/2 minutes). Once it gets to the the updates though it slows
to a craw. After about 10 minutes it's only done about 3000 records.

Is that normal? Is it because it's inside such a large transaction? Is
there anything I can do to speed that up. It seems awfully slow to me.

I didn't think that giving it more shared buffers would help but I tried
anyway. It didn't help.

I tried doing a analyze full on it (vacuumdb -z -f) and it cleaned up a lot
of stuff but it didn't speed up the updates at all.

I am using a dual 800mhz xeon box with 2 gb of ram. I've tried anywhere
from about 16,000 to 65000 shared buffers.

What other factors are involved here?

#17

Mike Mascari

mascarm@mascari.com

over 22 years ago

In reply to: Rick Gigger (#16)

Re: performance problem

Rick Gigger wrote:

I am currently trying to import a text data file without about 45,000
records. At the end of the import it does an update on each of the 45,000
records. Doing all of the inserts completes in a fairly short amount of
time (about 2 1/2 minutes). Once it gets to the the updates though it slows
to a craw. After about 10 minutes it's only done about 3000 records.

Is that normal? Is it because it's inside such a large transaction? Is
there anything I can do to speed that up. It seems awfully slow to me.

I didn't think that giving it more shared buffers would help but I tried
anyway. It didn't help.

I tried doing a analyze full on it (vacuumdb -z -f) and it cleaned up a lot
of stuff but it didn't speed up the updates at all.

I am using a dual 800mhz xeon box with 2 gb of ram. I've tried anywhere
from about 16,000 to 65000 shared buffers.

What other factors are involved here?

It is difficult to say without knowing either the definition of the
relation(s) or the update queries involved. Are there indexes being
created after the import that would allow PostgreSQL to locate the
rows being updated quickly, or is the update an unqualified update (no
WHERE clause) that affects all tuples?

EXPLAIN ANALYZE is your friend...

Mike Mascari
mascarm@mascari.com

#18

Dann Corbit

DCorbit@connx.com

over 22 years ago

In reply to: Mike Mascari (#17)

Re: performance problem

-----Original Message-----
From: Rick Gigger [mailto:rick@alpinenetworking.com]
Sent: Tuesday, November 18, 2003 12:43 PM
To: PgSQL General ML
Subject: [GENERAL] performance problem

I am currently trying to import a text data file without
about 45,000 records. At the end of the import it does an
update on each of the 45,000 records. Doing all of the
inserts completes in a fairly short amount of time (about 2
1/2 minutes). Once it gets to the the updates though it
slows to a craw. After about 10 minutes it's only done about
3000 records.

Is that normal?

No way to know, from what you have told us.

Is it because it's inside such a large
transaction? Is there anything I can do to speed that up.
It seems awfully slow to me.

I didn't think that giving it more shared buffers would help
but I tried anyway. It didn't help.

I tried doing a analyze full on it (vacuumdb -z -f) and it
cleaned up a lot of stuff but it didn't speed up the updates at all.

I am using a dual 800mhz xeon box with 2 gb of ram. I've
tried anywhere from about 16,000 to 65000 shared buffers.

What other factors are involved here?

What is the structure of the tables? What sort of indexes are involved?
Is the query using a prepared statement? Can you perform the entire
update in a single SQL statement? What is the cardinality of the tables
involved?

Provide your exact code that you use to perform the update.

Provide the SQL schema for the every table used in the queries.

Perform an analyze for the query that is slow. What does it say?

Import Notes

Resolved by subject fallback

#19

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Rick Gigger (#16)

Re: performance problem

On Tue, 18 Nov 2003, Rick Gigger wrote:

I am currently trying to import a text data file without about 45,000
records. At the end of the import it does an update on each of the 45,000
records. Doing all of the inserts completes in a fairly short amount of
time (about 2 1/2 minutes). Once it gets to the the updates though it slows
to a craw. After about 10 minutes it's only done about 3000 records.

Are you doing your updates like this:

update table1 set field1='1' where id=1;
update table1 set field2=4 where id=1;
...
update table1 set field10='something else' where id=1;
update table1 set field1='3' where id=2;
...

Maybe an analyze after the import and before the updates would help. As
might a vacuum [full]. If the table isn't updated by other processes
probably not.

Maybe you've got a foreign key mistmatch going on and a lot of sequential
scanning?

Is that normal? Is it because it's inside such a large transaction? Is
there anything I can do to speed that up. It seems awfully slow to me.

Possibly. If you are creating a lot of dead tuples, then the operations
can get slower and slower. Have you checked your fsm settings et. al.?

I didn't think that giving it more shared buffers would help but I tried
anyway. It didn't help.

Usually doesn't. More sort_mem might though. Make it something like
16384 or 32768 (it's measured in kbytes)

I tried doing a analyze full on it (vacuumdb -z -f) and it cleaned up a lot
of stuff but it didn't speed up the updates at all.

You need to probably do the analyze between the import and the update.

I am using a dual 800mhz xeon box with 2 gb of ram. I've tried anywhere
from about 16,000 to 65000 shared buffers.

That's VERY high. When postgresql has to manage a lot of buffers it
actually is slower than letting the kernel in Linux or BSD do it for you.

What other factors are involved here?

Not sure. More concrete examples would help. Have you run your queries
with explain analyze at the front and looked for differences in number of
rows / loops? Those are the dead giveaways.

Take a look here:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

#20

Rick Gigger

rick@alpinenetworking.com

over 22 years ago

In reply to: Rafael Martinez (#1)

Re: performance problem

Uh, I feel a little silly now. I had and index on the field in question
(needed to locate the row to update) but later recreated the table and
forgot to readd it. I had assumed that it was there but double checked just
now and it was gone. I then readded the index and and it finished in a few
minutes.
Sorry about that one. Thanks for the help.

----- Original Message -----
From: "Mike Mascari" <mascarm@mascari.com>
To: "Rick Gigger" <rick@alpinenetworking.com>
Cc: "PgSQL General ML" <pgsql-general@postgresql.org>
Sent: Tuesday, November 18, 2003 2:03 PM
Subject: Re: [GENERAL] performance problem

Rick Gigger wrote:

I am currently trying to import a text data file without about 45,000
records. At the end of the import it does an update on each of the

45,000

records. Doing all of the inserts completes in a fairly short amount of
time (about 2 1/2 minutes). Once it gets to the the updates though it

slows

to a craw. After about 10 minutes it's only done about 3000 records.

Is that normal? Is it because it's inside such a large transaction? Is
there anything I can do to speed that up. It seems awfully slow to me.

I didn't think that giving it more shared buffers would help but I tried
anyway. It didn't help.

I tried doing a analyze full on it (vacuumdb -z -f) and it cleaned up a

lot

Show quoted text

of stuff but it didn't speed up the updates at all.

I am using a dual 800mhz xeon box with 2 gb of ram. I've tried anywhere
from about 16,000 to 65000 shared buffers.

What other factors are involved here?

It is difficult to say without knowing either the definition of the
relation(s) or the update queries involved. Are there indexes being
created after the import that would allow PostgreSQL to locate the
rows being updated quickly, or is the update an unqualified update (no
WHERE clause) that affects all tuples?

EXPLAIN ANALYZE is your friend...

Mike Mascari
mascarm@mascari.com

#21

Martín Marqués

martin@bugs.unl.edu.ar

over 22 years ago

In reply to: Rick Gigger (#16)

#22

scott.marlowe

scott.marlowe@ihs.com

over 22 years ago

In reply to: Andrew Sullivan (#8)

#23

Rick Gigger

rick@alpinenetworking.com

over 22 years ago

In reply to: scott.marlowe (#19)

#24

Gaetano Mendola

mendola@bigfoot.com

over 22 years ago

In reply to: scott.marlowe (#22)

#25

Joshua D. Drake

jd@commandprompt.com

over 22 years ago

In reply to: Gaetano Mendola (#24)

#26

Doug McNaught

doug@mcnaught.org

over 22 years ago

In reply to: scott.marlowe (#19)

#27

Martijn van Oosterhout

kleptog@svana.org

over 22 years ago

In reply to: Rick Gigger (#23)

#28

Merrall, Graeme

gmerrall@team.aol7.com.au

over 22 years ago

In reply to: Joshua D. Drake (#25)

#29

Stef

svb@ucs.co.za

over 22 years ago

In reply to: Merrall, Graeme (#28)

#30

Dave Cramer

pg@fastcrypt.com

over 22 years ago

In reply to: Joshua D. Drake (#25)

#31

Robert Treat

xzilla@users.sourceforge.net

over 22 years ago

In reply to: Stef (#29)

#32

Bruce Momjian

bruce@momjian.us

over 22 years ago

In reply to: Stef (#29)

#33

Bruce Momjian

bruce@momjian.us

over 22 years ago

In reply to: Dave Cramer (#30)

#34

John Gray

jgray@azuli.co.uk

over 22 years ago

In reply to: Rafael Martinez (#1)

#35

Shridhar Daithankar

shridhar_daithankar@myrealbox.com

over 22 years ago

In reply to: Stef (#29)

#36

Stef

svb@ucs.co.za

over 22 years ago

In reply to: Shridhar Daithankar (#35)

#37

Stef

svb@ucs.co.za

over 22 years ago

In reply to: Robert Treat (#31)

#38

Shridhar Daithankar

shridhar_daithankar@myrealbox.com

over 22 years ago

In reply to: Stef (#36)

#39

Andrew Sullivan

andrew@libertyrms.info

over 22 years ago

In reply to: Bruce Momjian (#33)

#40

Rick Gigger

rick@alpinenetworking.com

over 22 years ago

In reply to: scott.marlowe (#19)

#41

Rick Gigger

rick@alpinenetworking.com

over 22 years ago

In reply to: Rafael Martinez (#1)

#42

Fred Moyer

fred@redhotpenguin.com

over 22 years ago

In reply to: Rick Gigger (#40)

#43

Alvaro Herrera

alvherre@dcc.uchile.cl

over 22 years ago

In reply to: Rick Gigger (#41)

#44

Rick Gigger

rick@alpinenetworking.com

over 22 years ago

In reply to: scott.marlowe (#19)

#45

Andrew Sullivan

andrew@libertyrms.info

over 22 years ago

In reply to: Rick Gigger (#40)

#46

Rick Gigger

rick@alpinenetworking.com

over 22 years ago

In reply to: Rafael Martinez (#1)

#47

Mike Mascari

mascarm@mascari.com

over 22 years ago

In reply to: Rick Gigger (#41)

#48

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Rick Gigger (#40)

#49

Jan Wieck

JanWieck@Yahoo.com

over 22 years ago

In reply to: Andrew Sullivan (#39)

#50

Rick Gigger

rick@alpinenetworking.com

over 22 years ago

In reply to: Rafael Martinez (#1)

#51

Rick Gigger

rick@alpinenetworking.com

over 22 years ago

In reply to: scott.marlowe (#19)

#52

Bruce Momjian

bruce@momjian.us

over 22 years ago

In reply to: Tom Lane (#48)

#53

Nigel J. Andrews

nandrews@investsystems.co.uk

over 22 years ago

In reply to: Rick Gigger (#11)