Data Warehousing

Started by Rob Kirkbrideover 18 years ago9 messagesgeneral
Jump to latest
#1Rob Kirkbride
rob.kirkbride@gmail.com

Hi,

I've got a postgres database collected logged data. This data I have to keep
for at least 3 years. The data in the first instance is being recorded in a
postgres cluster. This then needs to be moved a reports database server for
analysis. Therefore I'd like a job to dump data on the cluster say every
hour and record this is in the reports database. The clustered database
could be purged of say data more than a week old.

So basically I need a dump/restore that only appends new data to the reports
server database.

I've googled but can't find anything, can anyone help?

Thanks

Rob

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: Rob Kirkbride (#1)
Re: Data Warehousing

On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:

Hi,

I've got a postgres database collected logged data. This data I have to keep
for at least 3 years. The data in the first instance is being recorded in a
postgres cluster. This then needs to be moved a reports database server for
analysis. Therefore I'd like a job to dump data on the cluster say every
hour and record this is in the reports database. The clustered database
could be purged of say data more than a week old.

So basically I need a dump/restore that only appends new data to the reports
server database.

I've googled but can't find anything, can anyone help?

You might find an answer in partitioning your data. There's a section
in the docs on it. Then you can just dump the old data from the
newest couple of partitions if you're partitioning by week, and dump
anything older with a simple delete where date < now() - interval '1
week' or something like that.

#3Andrej Ricnik-Bay
andrej.groups@gmail.com
In reply to: Rob Kirkbride (#1)
Re: Data Warehousing

On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:

So basically I need a dump/restore that only appends new
data to the reports server database.

I guess that will all depend on whether or not your
data has a record of the time it got stuck in the cluster
or not ... if there's no concept of a time-stamp attached
to the records as they get entered I don't think it can be
done.

Thanks

Rob

Cheers,
Andrej

#4Rob Kirkbride
rob.kirkbride@gmail.com
In reply to: Scott Marlowe (#2)
Re: Data Warehousing

On 03/09/07, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:

Hi,

I've got a postgres database collected logged data. This data I have to

keep

for at least 3 years. The data in the first instance is being recorded

in a

postgres cluster. This then needs to be moved a reports database server

for

analysis. Therefore I'd like a job to dump data on the cluster say every
hour and record this is in the reports database. The clustered database
could be purged of say data more than a week old.

So basically I need a dump/restore that only appends new data to the

reports

server database.

I've googled but can't find anything, can anyone help?

You might find an answer in partitioning your data. There's a section
in the docs on it. Then you can just dump the old data from the
newest couple of partitions if you're partitioning by week, and dump
anything older with a simple delete where date < now() - interval '1
week' or something like that.

We're using hibernate to write to the database. Partitioning looks like it
will be too much of a re-architecture. In reply to Andrej we do have a
logged_time entity in the required tables. That being the case how does that
help me with the tools provided?

Might I have to write a custom JDBC application to do the data migration?

Rob

#5Andrej Ricnik-Bay
andrej.groups@gmail.com
In reply to: Rob Kirkbride (#4)
Re: Data Warehousing

On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:

We're using hibernate to write to the database. Partitioning looks like it
will be too much of a re-architecture. In reply to Andrej we do have a
logged_time entity in the required tables. That being the case how does that
help me with the tools provided?

Might I have to write a custom JDBC application to do the data migration?

That would be one option :}

If the server is on a Unix/Linux-platform you should be able
to achieve the result with a reasonably simple shell-script
and cron, I'd say.

Rob

Cheers,
Andrej

--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

#6Rob Kirkbride
rob.kirkbride@gmail.com
In reply to: Andrej Ricnik-Bay (#5)
Re: Data Warehousing

Andrej Ricnik-Bay wrote:

On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:

We're using hibernate to write to the database. Partitioning looks like it
will be too much of a re-architecture. In reply to Andrej we do have a
logged_time entity in the required tables. That being the case how does that
help me with the tools provided?

Might I have to write a custom JDBC application to do the data migration?

That would be one option :}

If the server is on a Unix/Linux-platform you should be able
to achieve the result with a reasonably simple shell-script
and cron, I'd say.

I am on a Linux platform but I'm going to need some pointers regarding
the cron job. Are you suggesting that I parse the dump file? I assume I
would need to switch to using inserts and then parse the dump looking
for where I need to start from?

#7Andrej Ricnik-Bay
andrej.groups@gmail.com
In reply to: Rob Kirkbride (#6)
Re: Data Warehousing

On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:

I am on a Linux platform but I'm going to need some pointers regarding
the cron job. Are you suggesting that I parse the dump file? I assume I
would need to switch to using inserts and then parse the dump looking
for where I need to start from?

The question is: how complex is the data you need to
extract? I guess where I was heading was to run a
select with the interval Scott described from psql into
a file, and then copy-from that into the analysis database.

However, if the structure is more complex, if you needed
to join tables, the parsing of a dump-file may be an option,
even though (always retaining a weeks worth) might make
that into quite some overhead.

Cheers,
Andrej

--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

#8Ken Colson
ken.colson@sage.com
In reply to: Andrej Ricnik-Bay (#7)
Re: Data Warehousing

I am on a Linux platform but I'm going to need some pointers regarding
the cron job. Are you suggesting that I parse the dump file? I assume I
would need to switch to using inserts and then parse the dump looking
for where I need to start from?

Something that you may want to consider is dblink from contrib. We have a
similar situation for the archiving of collected data and have been able to
implement a fairly easy solution that does not require the parsing of dump
files, just a simple(ish) query based on the time inserted.

-Ken

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#9melina386
melina386@gmail.com
In reply to: Rob Kirkbride (#1)
Re: Data Warehousing

Here's a link to the docs for rskeymgmt, a command line utility for changing
the key used to access the catalog.

http://msdn2.microsoft.com/en-us/library/aa179504(SQL.80).aspx

You might also need to use the rsactivate, and rsconfig utilities to get
everything working.
-------------------------------
[url=http://e-datapro.net/]Data entry india[/url]
--
View this message in context: http://old.nabble.com/Data-Warehousing-tp12457670p26515230.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.