Howto import regularly some CSV files with variing names?

Started by Andreasover 12 years ago8 messagesgeneral

maps.on@gmx.net

over 12 years ago

Hi,

I need to import some log-files of an application running on a different
host.
This app can't talk to the db but only creates daily a dump in a remote
directory that is mountable via samba by a Linux host that runs the
db-server.

The import would be easy if the files had a constant name but the app
creates csv files with names like "ExportYYYYMMDD".

I could have cron to use "find" to search for all the files in the
mounted directoy.
But how can I pipe a SQL script into the db-server that takes the
filenames from "find" as a parameter?

Because of the somewhat limited intelligence of the application that
creates the logs I have to read the contents of the log in a temporary
table and insert from there only those lines that aren't allready in the
actual log-table within the db.
I've got all covered but the filenames that change from day to day. :(

I could copy each of those files in a temp directory and import from
there so that the sql script wouldn't have to deal with date within the
file-name but I'd rather store the names in a table though so that the
script could skip all those files that allready got imported previously.

So how would I get the filenames into the sql-script?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

adrian.klaver@aklaver.com

over 12 years ago

In reply to: Andreas (#1)

Re: Howto import regularly some CSV files with variing names?

On 09/23/2013 05:19 PM, Andreas wrote:

Hi,

I need to import some log-files of an application running on a different
host.
This app can't talk to the db but only creates daily a dump in a remote
directory that is mountable via samba by a Linux host that runs the
db-server.

The import would be easy if the files had a constant name but the app
creates csv files with names like "ExportYYYYMMDD".

I could have cron to use "find" to search for all the files in the
mounted directoy.
But how can I pipe a SQL script into the db-server that takes the
filenames from "find" as a parameter?

Because of the somewhat limited intelligence of the application that
creates the logs I have to read the contents of the log in a temporary
table and insert from there only those lines that aren't allready in the
actual log-table within the db.
I've got all covered but the filenames that change from day to day. :(

I could copy each of those files in a temp directory and import from
there so that the sql script wouldn't have to deal with date within the
file-name but I'd rather store the names in a table though so that the
script could skip all those files that allready got imported previously.

So how would I get the filenames into the sql-script?

Do man on find and look for -exec.

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

maps.on@gmx.net

over 12 years ago

In reply to: Adrian Klaver (#2)

Re: Howto import regularly some CSV files with variing names?

Am 24.09.2013 02:25, schrieb Adrian Klaver:

On 09/23/2013 05:19 PM, Andreas wrote:

I need to import some log-files of an application [...]
The import would be easy if the files had a constant name but the app
creates csv files with names like "ExportYYYYMMDD".

So how would I get the filenames into the sql-script?

Do man on find and look for -exec.

I could find the files and exec a shell script but how can I have a SQL
script take the found filenames as parameter?

The SQL script needs to create a temp table
then COPY the file with the filename it got as parameter into the temp table
then insert from there into the log-table

How would I get the filenames into the SQL script?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

adrian.klaver@aklaver.com

over 12 years ago

In reply to: Andreas (#3)

Re: Howto import regularly some CSV files with variing names?

On 09/23/2013 05:47 PM, Andreas wrote:

Am 24.09.2013 02:25, schrieb Adrian Klaver:

On 09/23/2013 05:19 PM, Andreas wrote:

I need to import some log-files of an application [...]
The import would be easy if the files had a constant name but the app
creates csv files with names like "ExportYYYYMMDD".

So how would I get the filenames into the sql-script?

Do man on find and look for -exec.

I could find the files and exec a shell script but how can I have a SQL
script take the found filenames as parameter?

The SQL script needs to create a temp table
then COPY the file with the filename it got as parameter into the temp
table
then insert from there into the log-table

How would I get the filenames into the SQL script?

Just a thought:

1) Create a function that encapsulates the above logic where the
argument is the file name.

2) Create a shell script. Use substitution to take the filename passed
to the shell script to build a string and in the script do

psql -d db_name -U user_name -c 'select the function(filename)'

2a) Use alternate language to do 2).
--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

scrawford@pinpointresearch.com

over 12 years ago

In reply to: Andreas (#3)

Re: Howto import regularly some CSV files with variing names?

On 09/23/2013 05:47 PM, Andreas wrote:

Am 24.09.2013 02:25, schrieb Adrian Klaver:

On 09/23/2013 05:19 PM, Andreas wrote:

I need to import some log-files of an application [...]
The import would be easy if the files had a constant name but the app
creates csv files with names like "ExportYYYYMMDD".

So how would I get the filenames into the sql-script?

Do man on find and look for -exec.

I could find the files and exec a shell script but how can I have a
SQL script take the found filenames as parameter?

The SQL script needs to create a temp table
then COPY the file with the filename it got as parameter into the temp
table
then insert from there into the log-table

How would I get the filenames into the SQL script?

Assuming your main script - the one that mounts the directory and finds
the file name - is in bash you can easily put a small script into a
heredoc block with variable substitution:

some script stuff that mounts remote directory and sets variable logfilename
...
psql -your -connection -parameters <<EOS
some preliminary setup statements
\copy .... from $logfilename ...
some processing statements
EOS

The disadvantage of this approach is that it is difficult-to-impossible
to detect and handle statement-level errors. But for short scripts like
simple imports this may not be an issue or may be easily solved by
wrapping things in a begin;...commit; block.

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

remi.cura@gmail.com

over 12 years ago

In reply to: Steve Crawford (#5)

Re: Howto import regularly some CSV files with variing names?

To be very straightforward :
your bash script will dynamically create the sql query in a string, then
send it to database using psql.
You can also use pipes.

For example :

$4 -c "COPY $1 FROM '/tmp/pipe_ply_binaire_vers_ply_ascii_"$1"' WITH
CSV DELIMITER AS ' ';";
where $4 is the psql command to connect to db, $1 the number in the
name of the file we are working in, etc

Cheers,
Rémi-C

2013/9/24 Steve Crawford <scrawford@pinpointresearch.com>

Show quoted text

On 09/23/2013 05:47 PM, Andreas wrote:

Am 24.09.2013 02:25, schrieb Adrian Klaver:

On 09/23/2013 05:19 PM, Andreas wrote:

I need to import some log-files of an application [...]
The import would be easy if the files had a constant name but the app
creates csv files with names like "ExportYYYYMMDD".

So how would I get the filenames into the sql-script?

Do man on find and look for -exec.

I could find the files and exec a shell script but how can I have a SQL
script take the found filenames as parameter?

The SQL script needs to create a temp table
then COPY the file with the filename it got as parameter into the temp
table
then insert from there into the log-table

How would I get the filenames into the SQL script?

Assuming your main script - the one that mounts the directory and finds
the file name - is in bash you can easily put a small script into a heredoc
block with variable substitution:

some script stuff that mounts remote directory and sets variable
logfilename
...
psql -your -connection -parameters <<EOS
some preliminary setup statements
\copy .... from $logfilename ...
some processing statements
EOS

The disadvantage of this approach is that it is difficult-to-impossible to
detect and handle statement-level errors. But for short scripts like simple
imports this may not be an issue or may be easily solved by wrapping things
in a begin;...commit; block.

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/**mailpref/pgsql-general<http://www.postgresql.org/mailpref/pgsql-general>

rowan.collins@gmail.com

over 12 years ago

In reply to: Rémi Cura (#6)

Re: Howto import regularly some CSV files with variing names?

On 24/09/2013 18:18, Rémi Cura wrote:

To be very straightforward :
your bash script will dynamically create the sql query in a string,
then send it to database using psql.
You can also use pipes.

For example :
$4 -c "COPY $1 FROM '/tmp/pipe_ply_binaire_vers_ply_ascii_"$1"' WITH
CSV DELIMITER AS ' ';";

where $4 is the psql command to connect to db, $1 the number in the
name of the file we are working in, etc

One problem with this is that it requires your shell script to have
superuser access to your database, since it is directly running a COPY
command. This may or may not matter, depending on setup - for instance,
you might trust local Unix sockets, meaning someone would need access to
the box first. It's probably best to never have a non-interactive
process able to connect with elevated privileges though.

--
Rowan Collins
[IMSoP]

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

rowan.collins@gmail.com

over 12 years ago

In reply to: Adrian Klaver (#4)

Re: Howto import regularly some CSV files with variing names?

On 24/09/2013 02:14, Adrian Klaver wrote:

Just a thought:

1) Create a function that encapsulates the above logic where the
argument is the file name.

2) Create a shell script. Use substitution to take the filename passed
to the shell script to build a string and in the script do

psql -d db_name -U user_name -c 'select the function(filename)'

2a) Use alternate language to do 2).

This has the advantage that you can enforce security inside the
function. Note that for very good reasons, Postgres requires server-side
COPY commands to be run under a super-user account - but this is exactly
where a SECURITY DEFINER function comes in handy. Basically, your psql
command would run as a non-privileged user, but the function would run
as the superuser that created/owned it. The important part is that you
are /asking/ for the command to run, not /demanding /it - the script can
and should say no sometimes! (I have a half-written blog-post on this
subject kicking around...)

The import would be easy if the files had a constant name but the app
creates csv files with names like "ExportYYYYMMDD".

If that's the /current/ year month and day, it should be easy enough to
generate the correct filename, rather than having to search the file
system. Or perhaps iterate backwards from today to find the latest (or
give up if you got to too old a date).

On the other hand...

This app can't talk to the db but only creates daily a dump in a
remote directory that is mountable via samba

Is there a reason for that access constraint? As Steve mentioned, psql
has a \copy command which works like the COPY SQL statement, but reads
the file from the /client/ and sends it over a connection, eliminating
the need for a superuser account at the database end. If database
security is a concern, you could have a login with extremely limitied
privileges, and whitelist it explicitly in pg_hba.conf for this purpose.

Regards,
--
Rowan Collins
[IMSoP]