How to insert .xls files into database
Hey,
I am using Postgresql 8.1.4 on windows. I have a large amount of data stored
in .xls files which I want to insert into my database.
The columns in .xls files are not exactly compatible with the database
schema. For example the event_id in every .xls file starts with 1 while for
my database event_id is the primary key. Also, there are some information
like event_type, event_location that are particular to every .xls file and
thus they have been mentioned only once in the .xls file but in the database
there exists a separate column for them.
For more clarification I am giving my database schema and attaching a sample
.xls file.
My database schema is as follows :
{
event_id int4 NOT NULL,
buoy char(1) NOT NULL,
deployment varchar(40),
depth int4 NOT NULL,
event_type varchar(64),
model_info_id varchar(256),
start_date float8 NOT NULL,
start_date_sd float8,
end_date float8 NOT NULL,
end_date_sd float8,
mean float8,
variance float8,
max float8,
min float8,
event varchar(20) NOT NULL,
depth_type varchar(20) NOT NULL,
buoy_location geometry,
duration float8,
Amplitude_sd float8,
}
.xls file is in the attachment. Now as you can see all the bold attributes
are specified only once in the .xls files. And all the bold+italics one have
to be manipulated a bit before storing. Even event_id in every .xls file
starts with 1 but as this is a primary key I have to manipulate this also.
I think if I can transform and manipulate each row into insert statements
then I can insert the data into my database. Please guide me how to do this.
Or if there is any another way of doing this.
I am relatively new in this field so, please dont get offended if this
problem is quite obvious.
Thanks
Parang Saraf
parang.saraf@gmail.com
Attachments:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Parang Saraf wrote:
Hey,
I am using Postgresql 8.1.4 on windows. I have a large amount of data
stored
in .xls files which I want to insert into my database.The columns in .xls files are not exactly compatible with the database
schema. For example the event_id in every .xls file starts with 1 while for
my database event_id is the primary key. Also, there are some information
like event_type, event_location that are particular to every .xls file and
thus they have been mentioned only once in the .xls file but in the
database
there exists a separate column for them.For more clarification I am giving my database schema and attaching a
sample
.xls file.My database schema is as follows :
{
event_id int4 NOT NULL,
buoy char(1) NOT NULL,
deployment varchar(40),
depth int4 NOT NULL,
event_type varchar(64),
model_info_id varchar(256),
start_date float8 NOT NULL,
start_date_sd float8,
end_date float8 NOT NULL,
end_date_sd float8,
mean float8,
variance float8,
max float8,
min float8,
event varchar(20) NOT NULL,
depth_type varchar(20) NOT NULL,
buoy_location geometry,
duration float8,
Amplitude_sd float8,
}.xls file is in the attachment. Now as you can see all the bold attributes
are specified only once in the .xls files. And all the bold+italics one
have
to be manipulated a bit before storing. Even event_id in every .xls file
starts with 1 but as this is a primary key I have to manipulate this also.I think if I can transform and manipulate each row into insert statements
then I can insert the data into my database. Please guide me how to do
this.
Or if there is any another way of doing this.I am relatively new in this field so, please dont get offended if this
problem is quite obvious.
OpenOffice.org v2 can interface .xls files with PostgreSQL.
- --
Ron Johnson, Jr.
Jefferson LA USA
Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFErqcHS9HxQb37XmcRAj6bAKCRGCZMeKXbaIUewBFAPDnko8t/kACfU+sa
7EiEI+V2LEGD1OKJh+8IDeU=
=Ahtm
-----END PGP SIGNATURE-----
I guess the solution depends on what is a 'large amount of data'. The most
time consuming part is going to be converting the single data elements at the
top of each sheet into multiple elements. I would create columns for the data
in the sheet. At the same time I would order the columns to match the
database schema. Then it would a matter of cut and paste to fill the columns
with the data. The event id's could be renumbered using Excel's series
generator to create a non repeating set of id's. If the amount of data was
very large it might pay to create some macros to do the work. Once the data
was filled in you would have a couple of choices. One, as mentioned by Ron
would be to use OpenOffice v2 to dump the data into the database. The other
would be to save the data as CSV and use the psql \copy command to move the
data into the table.
On Friday 07 July 2006 09:40 am, Parang Saraf wrote:
Hey,
I am using Postgresql 8.1.4 on windows. I have a large amount of data
stored in .xls files which I want to insert into my database.The columns in .xls files are not exactly compatible with the database
schema. For example the event_id in every .xls file starts with 1 while for
my database event_id is the primary key. Also, there are some information
like event_type, event_location that are particular to every .xls file and
thus they have been mentioned only once in the .xls file but in the
database there exists a separate column for them.For more clarification I am giving my database schema and attaching a
sample .xls file.My database schema is as follows :
{
event_id int4 NOT NULL,
buoy char(1) NOT NULL,
deployment varchar(40),
depth int4 NOT NULL,
event_type varchar(64),
model_info_id varchar(256),
start_date float8 NOT NULL,
start_date_sd float8,
end_date float8 NOT NULL,
end_date_sd float8,
mean float8,
variance float8,
max float8,
min float8,
event varchar(20) NOT NULL,
depth_type varchar(20) NOT NULL,
buoy_location geometry,
duration float8,
Amplitude_sd float8,
}.xls file is in the attachment. Now as you can see all the bold attributes
are specified only once in the .xls files. And all the bold+italics one
have to be manipulated a bit before storing. Even event_id in every .xls
file starts with 1 but as this is a primary key I have to manipulate this
also.I think if I can transform and manipulate each row into insert statements
then I can insert the data into my database. Please guide me how to do
this. Or if there is any another way of doing this.I am relatively new in this field so, please dont get offended if this
problem is quite obvious.Thanks
Parang Saraf
parang.saraf@gmail.com
--
Adrian Klaver
aklaver@comcast.net
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Adrian Klaver
I guess the solution depends on what is a 'large amount of data'. The
most time consuming part is going to be converting the single data
elements at the top of each sheet into multiple elements. I would
create columns for the data in the sheet. At the same time I would
order the columns to match the database schema. Then it would a matter
of cut and paste to fill the columns with the data. The event id's
could be renumbered using Excel's series generator to create a non
repeating set of id's. If the amount of data was very large it might
pay to create some macros to do the work. Once the data was filled in
you would have a couple of choices. One, as mentioned by Ron would be
to use OpenOffice v2 to dump the data into the database. The other
would be to save the data as CSV and use the psql \copy command to
move the data into the table.
On Friday 07 July 2006 09:40 am, Parang Saraf wrote:
Evrything You described is familiar to me, except the OpenOffice v2 dump - could You explain this more in details pls? I tried to do it many times, without success.
Thank You
Tomas
[del]
Show quoted text
--
Adrian Klaver
aklaver@comcast.net
Import Notes
Resolved by subject fallback
One option is to write a Python translator to create CSV files, or even
an uploader to go directly from the Excel files to the database. There
is at least one module to read Excel files, in all their complexity:
http://cheeseshop.python.org/pypi/xlrd/0.5.2
and a number of Postgres modules:
http://wiki.python.org/moin/PostgreSQL
- John Burger
MITRE
So far I have only got this to work with the Postgres SDBC driver-
http://dba.openoffice.org/drivers/postgresql/index.html
1) Open the Data Source (F4) window in the spreadsheet.
2) Make a connection to the database. I usually do this by opening a table.
This is fairly important, otherwise when you do the import OO will try to
establish connection at the same as it is importing the data and end up
hanging.
3)Select the data in the spreadsheet you want to import. The column headings
will become the field names. NOTE: You want the data to be as plain as
possible. OpenOffice tends to auto format the cell contents i.e changing 1/2
to a typeset version, changing " to true quotes. This introduces format codes
that don't translate well.
4)Drag the selected data to the Tables heading of the appropriate data source.
At this point a dialog form will pop up.
5)Name the table. You will need to schema qualify i.e. public.test_table. Make
sure you select Create a Primary Key. Without the table will be created but
the data will not be imported. This why I use the SDBC driver. The JDBC
driver does not seem to support this step.
6) Select Next. Here you can choose the columns you wish to import.
7) Select Next. Here you can modify the data types of the columns.
8) Select Create. The table should be created and the data inserted.
On Friday 07 July 2006 03:41 pm, lanczos@t-zones.sk wrote:
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Adrian Klaver
I guess the solution depends on what is a 'large amount of data'. The
most time consuming part is going to be converting the single data
elements at the top of each sheet into multiple elements. I would
create columns for the data in the sheet. At the same time I would
order the columns to match the database schema. Then it would a matter
of cut and paste to fill the columns with the data. The event id's
could be renumbered using Excel's series generator to create a non
repeating set of id's. If the amount of data was very large it might
pay to create some macros to do the work. Once the data was filled in
you would have a couple of choices. One, as mentioned by Ron would be
to use OpenOffice v2 to dump the data into the database. The other
would be to save the data as CSV and use the psql \copy command to
move the data into the table.
On Friday 07 July 2006 09:40 am, Parang Saraf wrote:Evrything You described is familiar to me, except the OpenOffice v2 dump -
could You explain this more in details pls? I tried to do it many times,
without success.Thank You
Tomas
[del]
--
Adrian Klaver
aklaver@comcast.net---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
--
Adrian Klaver
aklaver@comcast.net
On Fri, 2006-07-07 at 22:41 +0000, lanczos@t-zones.sk wrote:
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Adrian Klaver
I guess the solution depends on what is a 'large amount of data'. The
most time consuming part is going to be converting the single data
elements at the top of each sheet into multiple elements. I would
create columns for the data in the sheet. At the same time I would
order the columns to match the database schema. Then it would a matter
of cut and paste to fill the columns with the data. The event id's
could be renumbered using Excel's series generator to create a non
repeating set of id's. If the amount of data was very large it might
pay to create some macros to do the work. Once the data was filled in
you would have a couple of choices. One, as mentioned by Ron would be
to use OpenOffice v2 to dump the data into the database. The other
would be to save the data as CSV and use the psql \copy command to
move the data into the table.
On Friday 07 July 2006 09:40 am, Parang Saraf wrote:Evrything You described is familiar to me, except the OpenOffice v2
dump - could You explain this more in details pls? I tried to do it
many times, without success.Thank You
Tomas
Does the "OpenOffice v2 dump" convert the date correctly when
exporting into PostgreSQL?
The date in .xls when using excel is exported to CSV as a number
which is tricky to convert to a date. This is what I use :
date_pli('epoch'::date, date_num::integer - 25569) AS date_fmt
The number "25569" is a fudge factor, that can be different between
dumps, but stays consistent through the dump. I usually adjust it
and compare the result to the value shown in excel until I get a
match.
Dates don't transfer correctly. When I try it the date moves two days ahead.
On Wednesday 12 July 2006 08:43 am, Guy Fraser wrote:
On Fri, 2006-07-07 at 22:41 +0000, lanczos@t-zones.sk wrote:
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Adrian Klaver
I guess the solution depends on what is a 'large amount of data'. The
most time consuming part is going to be converting the single data
elements at the top of each sheet into multiple elements. I would
create columns for the data in the sheet. At the same time I would
order the columns to match the database schema. Then it would a matter
of cut and paste to fill the columns with the data. The event id's
could be renumbered using Excel's series generator to create a non
repeating set of id's. If the amount of data was very large it might
pay to create some macros to do the work. Once the data was filled in
you would have a couple of choices. One, as mentioned by Ron would be
to use OpenOffice v2 to dump the data into the database. The other
would be to save the data as CSV and use the psql \copy command to
move the data into the table.
On Friday 07 July 2006 09:40 am, Parang Saraf wrote:Evrything You described is familiar to me, except the OpenOffice v2
dump - could You explain this more in details pls? I tried to do it
many times, without success.Thank You
Tomas
Does the "OpenOffice v2 dump" convert the date correctly when
exporting into PostgreSQL?The date in .xls when using excel is exported to CSV as a number
which is tricky to convert to a date. This is what I use :date_pli('epoch'::date, date_num::integer - 25569) AS date_fmt
The number "25569" is a fudge factor, that can be different between
dumps, but stays consistent through the dump. I usually adjust it
and compare the result to the value shown in excel until I get a
match.---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
Adrian Klaver
aklaver@comcast.net
On Thu, 2006-13-07 at 06:52 -0700, Adrian Klaver wrote:
Dates don't transfer correctly. When I try it the date moves two days ahead.
Thanks for the heads up, I will continue to avoid using
Open Office for Spreadsheet and DB activities. I
prefer Gnumeric for Spreadsheet activities, and am very
comfortable using psql for db activities.