How to just "link" to some data feed
Hi,
~
I have some CSV files which I need to link to from within PG
~
From http://darkavngr.blogspot.com/2007/06/importar-datos-externos-nuestra-base-de.html
~
Say, you have the following data feed in /tmp/codigos_postales.csv
~
"01000","San Angel","Colonia","Alvaro Obregon","Distrito Federal"
"01010","Los Alpes","Colonia","Alvaro Obregon","Distrito Federal"
"01020","Guadalupe Inn","Colonia","Alvaro Obregon","Distrito Federal"
"01028","Secretaria de Contraloria y Desarrollo Administrativo","Gran
usuario","Alvaro Obregon","Distrito Federal"
"01029","Infonavit","Gran usuario","Alvaro Obregon","Distrito Federal"
"01030","Axotla","Pueblo","Alvaro Obregon","Distrito Federal"
"01030","Florida","Colonia","Alvaro Obregon","Distrito Federal"
"01040","Campestre","Colonia","Alvaro Obregon","Distrito Federal"
~
then you can defined the DB data model as:
~
CREATE TABLE codigos_postales(
cp char(5),
asentamiento varchar(120),
tipo_asentamiento varchar(120),
municipio varchar(120),
estado varchar(120)
);
~
and copy all values into the DB like this:
~
COPY codigos_postales FROM '/tmp/codigos_postales.csv' DELIMITERS ',' CSV;
~
But how do you just link to the data feed effectively keeping it in a
CSV file format not internal to PG?
~
Also I need for all changes to the data to (of course!) be propagated
to the data back end
~
Then you can link to the same data feed from some other engine that
reads in CSV (I think all do or should)
~
How do you link to a CSV using PG?
~
Thank you
lbrtchx
On 3-Jun-08, at 8:04 PM, Albretch Mueller wrote:
Hi,
~
I have some CSV files which I need to link to from within PG
~
From http://darkavngr.blogspot.com/2007/06/importar-datos-externos-nuestra-base-de.html
~
Say, you have the following data feed in /tmp/codigos_postales.csv
~
"01000","San Angel","Colonia","Alvaro Obregon","Distrito Federal"
"01010","Los Alpes","Colonia","Alvaro Obregon","Distrito Federal"
"01020","Guadalupe Inn","Colonia","Alvaro Obregon","Distrito Federal"
"01028","Secretaria de Contraloria y Desarrollo Administrativo","Gran
usuario","Alvaro Obregon","Distrito Federal"
"01029","Infonavit","Gran usuario","Alvaro Obregon","Distrito Federal"
"01030","Axotla","Pueblo","Alvaro Obregon","Distrito Federal"
"01030","Florida","Colonia","Alvaro Obregon","Distrito Federal"
"01040","Campestre","Colonia","Alvaro Obregon","Distrito Federal"
~
then you can defined the DB data model as:
~
CREATE TABLE codigos_postales(
cp char(5),
asentamiento varchar(120),
tipo_asentamiento varchar(120),
municipio varchar(120),
estado varchar(120)
);
~
and copy all values into the DB like this:
~
COPY codigos_postales FROM '/tmp/codigos_postales.csv' DELIMITERS
',' CSV;
~
But how do you just link to the data feed effectively keeping it in a
CSV file format not internal to PG?
~
Also I need for all changes to the data to (of course!) be propagated
to the data back end
~
Then you can link to the same data feed from some other engine that
reads in CSV (I think all do or should)
~
How do you link to a CSV using PG?
~
Thank you
lbrtchx
You can't.
You have to insert it into the table
so you need some code which reads the csv and inserts it into the table.
Dave
Show quoted text
--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc
On Tue, Jun 3, 2008 at 8:24 PM, Dave Cramer <pg@fastcrypt.com> wrote:
. . .
You can't.
Hmm! Doesn't PG have a way to do something like this, say in MySQL:
load data local infile 'uniq.csv' into table tblUniq
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(uniqName, uniqCity, uniqComments)
and even in low end (not real) DBs like MS Access?
Is there a technical reason for that, or should I apply for a RFE?
Thank you very much
lbrtchx
Albretch Mueller wrote:
On Tue, Jun 3, 2008 at 8:24 PM, Dave Cramer <pg@fastcrypt.com> wrote:
. . .You can't.
Hmm! Doesn't PG have a way to do something like this, say in MySQL:
load data local infile 'uniq.csv' into table tblUniq
That's essentially the same as the COPY you quoted in your original
email, isn't it? So.. what exactly is it you want to do that COPY
doesn't do?
Anyway, it's not really JDBC related.
-O
On Tue, Jun 3, 2008 at 11:03 PM, Oliver Jowett <oliver@opencloud.com> wrote:
That's essentially the same as the COPY you quoted in your original email,
isn't it? So.. what exactly is it you want to do that COPY doesn't do?
~
well, actually, not exactly; based on:
~
http://postgresql.com.cn/docs/8.3/static/sql-copy.html
~
COPY <table_name> [FROM|TO] <data_feed> <OPTIONS>
~
import/export the data into/out of PG, so you will be essentially
duplicating the data and having to synch it. This is exactly what I am
trying to avoid, I would like for PG to handle the data right from the
data feed
~
Anyway, it's not really JDBC related.
~
Well, no, but I was hoping to get an answer here because I mostly
access PG through jdbc and also because java developer would generally
be more inclined to these types of DB-independent data formats,
reusing, transferring issues
~
lbrtchx
Albretch Mueller wrote:
On Tue, Jun 3, 2008 at 11:03 PM, Oliver Jowett <oliver@opencloud.com> wrote:
That's essentially the same as the COPY you quoted in your original email,
isn't it? So.. what exactly is it you want to do that COPY doesn't do?~
well, actually, not exactly; based on:
~
http://postgresql.com.cn/docs/8.3/static/sql-copy.html
~
COPY <table_name> [FROM|TO] <data_feed> <OPTIONS>
~
import/export the data into/out of PG, so you will be essentially
duplicating the data and having to synch it. This is exactly what I am
trying to avoid, I would like for PG to handle the data right from the
data feed
As Dave said, PG won't magically keep the data up to date for you, you
will need some external process to do the synchronization with the feed.
That could use COPY if it wanted ..
Then you said:
Hmm! Doesn't PG have a way to do something like this, say in MySQL:
load data local infile 'uniq.csv' into table tblUniq
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(uniqName, uniqCity, uniqComments)and even in low end (not real) DBs like MS Access?
But isn't this doing exactly what PG's COPY does - loads data, once,
from a local file, with no ongoing synchronization?
Is there a technical reason for that, or should I apply for a RFE?
Personally I don't see this sort of synchronization as something that
you want the core DB to be doing anyway. The rules for how you get the
data, how often you check for updates, how you merge the updates, and so
on are very application specific.
-O
Oliver Jowett <oliver@opencloud.com> writes:
That's essentially the same as the COPY you quoted in your original
email, isn't it? So.. what exactly is it you want to do that COPY
doesn't do?
The OP seems to be laboring under the delusion that we might consider
storing "live" data in a flat text file. I'd be interested to know how
such a representation could handle MVCC updates, or indexing ... even
presuming that it was a good idea for any other reason, which he has not
provided.
Anyway, it's not really JDBC related.
Definitely not.
regards, tom lane
On Tue, Jun 3, 2008 at 9:58 PM, Albretch Mueller <lbrtchx@gmail.com> wrote:
On Tue, Jun 3, 2008 at 11:03 PM, Oliver Jowett <oliver@opencloud.com> wrote:
That's essentially the same as the COPY you quoted in your original email,
isn't it? So.. what exactly is it you want to do that COPY doesn't do?~
well, actually, not exactly; based on:
~
http://postgresql.com.cn/docs/8.3/static/sql-copy.html
~
COPY <table_name> [FROM|TO] <data_feed> <OPTIONS>
~
import/export the data into/out of PG, so you will be essentially
duplicating the data and having to synch it. This is exactly what I am
trying to avoid, I would like for PG to handle the data right from the
data feed
I think what you're looking for is the equivalent to oracles external
tables which invoke sqlldr every time you access them in the
background. No such animal in the pg universe that I know of.
Well, no, but I was hoping to get an answer here because I mostly
access PG through jdbc and also because java developer would generally
be more inclined to these types of DB-independent data formats,
reusing, transferring issues
Removed pgsql-jdbc from cc list. You're still better off sending to
the right list. And so are we. The general list has a much larger
readership than jdbc, and it's far more likely you'll run into someone
with oracle experience here who knows about the external table format,
etc...
Scott Marlowe wrote:
On Tue, Jun 3, 2008 at 9:58 PM, Albretch Mueller
<lbrtchx@gmail.com> wrote:On Tue, Jun 3, 2008 at 11:03 PM, Oliver Jowett
<oliver@opencloud.com> wrote:
That's essentially the same as the COPY you quoted in your
original email,
isn't it? So.. what exactly is it you want to do that COPY
doesn't do?
~
well, actually, not exactly; based on:
~
http://postgresql.com.cn/docs/8.3/static/sql-copy.html
~
COPY <table_name> [FROM|TO] <data_feed> <OPTIONS>
~
import/export the data into/out of PG, so you will be essentially
duplicating the data and having to synch it. This isexactly what I am
trying to avoid, I would like for PG to handle the data
right from the
data feed
I think what you're looking for is the equivalent to oracles external
tables which invoke sqlldr every time you access them in the
background. No such animal in the pg universe that I know of.
There was a similar discussion of this on -hackers in April. Closest to this idea was
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00250.php
Regards,
Stephen Denne.
--
At the Datamail Group we value teamwork, respect, achievement, client focus, and courage.
This email with any attachments is confidential and may be subject to legal privilege.
If it is not intended for you please advise by replying immediately, destroy it and do not
copy, disclose or use it in any way.
The Datamail Group, through our GoGreen programme, is committed to environmental sustainability.
Help us in our efforts by not printing this email.
__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________
Albretch Mueller wrote:
import/export the data into/out of PG, so you will be essentially
duplicating the data and having to synch it. This is exactly what I am
trying to avoid, I would like for PG to handle the data right from the
data feed
You could write a set-returning function that reads the file, and
perhaps a view on top of that. As the file is read on every invocation,
that's only practical for small tables, though.
It's likely a better idea to just COPY the table into the database
periodically, or perhaps write an external script that watches the
modification timestamp on the file and triggers a reload whenever it
changes.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Hello.
I'd recommend you to have some intermediate (temporary?) table to load
data to and then sync data to main table using simple insert statement.
The call list would look like
copy temptbl FROM '/tmp/codigos_postales.csv' DELIMITERS ',' CSV;
insert into maintbl select * from temptbl where cp not in (select cp
from maintbl)
delete from temptbl;
commit;
P.S. Last step is not needed if you use temporary table.