Data Warehousing
Hi,
I've got a postgres database collected logged data. This data I have to keep
for at least 3 years. The data in the first instance is being recorded in a
postgres cluster. This then needs to be moved a reports database server for
analysis. Therefore I'd like a job to dump data on the cluster say every
hour and record this is in the reports database. The clustered database
could be purged of say data more than a week old.
So basically I need a dump/restore that only appends new data to the reports
server database.
I've googled but can't find anything, can anyone help?
Thanks
Rob
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:
Hi,
I've got a postgres database collected logged data. This data I have to keep
for at least 3 years. The data in the first instance is being recorded in a
postgres cluster. This then needs to be moved a reports database server for
analysis. Therefore I'd like a job to dump data on the cluster say every
hour and record this is in the reports database. The clustered database
could be purged of say data more than a week old.So basically I need a dump/restore that only appends new data to the reports
server database.I've googled but can't find anything, can anyone help?
You might find an answer in partitioning your data. There's a section
in the docs on it. Then you can just dump the old data from the
newest couple of partitions if you're partitioning by week, and dump
anything older with a simple delete where date < now() - interval '1
week' or something like that.
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:
So basically I need a dump/restore that only appends new
data to the reports server database.
I guess that will all depend on whether or not your
data has a record of the time it got stuck in the cluster
or not ... if there's no concept of a time-stamp attached
to the records as they get entered I don't think it can be
done.
Thanks
Rob
Cheers,
Andrej
On 03/09/07, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:
Hi,
I've got a postgres database collected logged data. This data I have to
keep
for at least 3 years. The data in the first instance is being recorded
in a
postgres cluster. This then needs to be moved a reports database server
for
analysis. Therefore I'd like a job to dump data on the cluster say every
hour and record this is in the reports database. The clustered database
could be purged of say data more than a week old.So basically I need a dump/restore that only appends new data to the
reports
server database.
I've googled but can't find anything, can anyone help?
You might find an answer in partitioning your data. There's a section
in the docs on it. Then you can just dump the old data from the
newest couple of partitions if you're partitioning by week, and dump
anything older with a simple delete where date < now() - interval '1
week' or something like that.
We're using hibernate to write to the database. Partitioning looks like it
will be too much of a re-architecture. In reply to Andrej we do have a
logged_time entity in the required tables. That being the case how does that
help me with the tools provided?
Might I have to write a custom JDBC application to do the data migration?
Rob
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:
We're using hibernate to write to the database. Partitioning looks like it
will be too much of a re-architecture. In reply to Andrej we do have a
logged_time entity in the required tables. That being the case how does that
help me with the tools provided?Might I have to write a custom JDBC application to do the data migration?
That would be one option :}
If the server is on a Unix/Linux-platform you should be able
to achieve the result with a reasonably simple shell-script
and cron, I'd say.
Rob
Cheers,
Andrej
--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.
Andrej Ricnik-Bay wrote:
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:
We're using hibernate to write to the database. Partitioning looks like it
will be too much of a re-architecture. In reply to Andrej we do have a
logged_time entity in the required tables. That being the case how does that
help me with the tools provided?Might I have to write a custom JDBC application to do the data migration?
That would be one option :}
If the server is on a Unix/Linux-platform you should be able
to achieve the result with a reasonably simple shell-script
and cron, I'd say.
I am on a Linux platform but I'm going to need some pointers regarding
the cron job. Are you suggesting that I parse the dump file? I assume I
would need to switch to using inserts and then parse the dump looking
for where I need to start from?
On 9/3/07, Rob Kirkbride <rob.kirkbride@gmail.com> wrote:
I am on a Linux platform but I'm going to need some pointers regarding
the cron job. Are you suggesting that I parse the dump file? I assume I
would need to switch to using inserts and then parse the dump looking
for where I need to start from?
The question is: how complex is the data you need to
extract? I guess where I was heading was to run a
select with the interval Scott described from psql into
a file, and then copy-from that into the analysis database.
However, if the structure is more complex, if you needed
to join tables, the parsing of a dump-file may be an option,
even though (always retaining a weeks worth) might make
that into quite some overhead.
Cheers,
Andrej
--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.
I am on a Linux platform but I'm going to need some pointers regarding
the cron job. Are you suggesting that I parse the dump file? I assume I
would need to switch to using inserts and then parse the dump looking
for where I need to start from?
Something that you may want to consider is dblink from contrib. We have a
similar situation for the archiving of collected data and have been able to
implement a fairly easy solution that does not require the parsing of dump
files, just a simple(ish) query based on the time inserted.
-Ken
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Import Notes
Resolved by subject fallback
Here's a link to the docs for rskeymgmt, a command line utility for changing
the key used to access the catalog.
http://msdn2.microsoft.com/en-us/library/aa179504(SQL.80).aspx
You might also need to use the rsactivate, and rsconfig utilities to get
everything working.
-------------------------------
[url=http://e-datapro.net/]Data entry india[/url]
--
View this message in context: http://old.nabble.com/Data-Warehousing-tp12457670p26515230.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.