Function to execute a program

Started by Magnus Haganderover 5 years ago7 messages
#1Magnus Hagander
magnus@hagander.net

Today we have pg_read_file() and pg_read_binary_file(), which mostly
correspond to COPY from a file, but reads a whole (or parts but based on
bytes) into a single value rather than rows.

Would it make sense to have a pg_execute_program() that corresponds to COPY
FROM PROGRAM? This would obviously have the same permissions restrictions
as COPY FROM PROGRAM.

The usecase would be to for example execute a command that returns json
format output, which could then be parsed and processed as part of a query.
Today, COPY FROM PROGRAM cannot do this, as we can't read a multiline json
value with COPY.

Thoughts?

//Magnus

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#1)
Re: Function to execute a program

Magnus Hagander <magnus@hagander.net> writes:

Would it make sense to have a pg_execute_program() that corresponds to COPY
FROM PROGRAM? This would obviously have the same permissions restrictions
as COPY FROM PROGRAM.
The usecase would be to for example execute a command that returns json
format output, which could then be parsed and processed as part of a query.
Today, COPY FROM PROGRAM cannot do this, as we can't read a multiline json
value with COPY.

copy ... from program 'random_json_producer | tr "\n\t" " "';

I don't necessarily object to providing such a function to make it
easier, but it's not the case that you can't have the functionality
today.

regards, tom lane

#3Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#2)
Re: Function to execute a program

On Sat, Sep 12, 2020 at 5:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:

Would it make sense to have a pg_execute_program() that corresponds to

COPY

FROM PROGRAM? This would obviously have the same permissions restrictions
as COPY FROM PROGRAM.
The usecase would be to for example execute a command that returns json
format output, which could then be parsed and processed as part of a

query.

Today, COPY FROM PROGRAM cannot do this, as we can't read a multiline

json

value with COPY.

copy ... from program 'random_json_producer | tr "\n\t" " "';

I don't necessarily object to providing such a function to make it
easier, but it's not the case that you can't have the functionality
today.

"tr" is not typically available on Windows, for one :)

But yes, assuming tr is available, it is true that you can. (And you can
perhaps find something else to pipe it through on Windows). Of course, you
still can't use it in a query, since COPY can only target a table :)
Independently of something like this it would be nice to be able to target
say a CTE with COPY, but that's kan entirely different topic.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#4Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Magnus Hagander (#3)
Re: Function to execute a program

On 9/12/20 11:11 AM, Magnus Hagander wrote:

On Sat, Sep 12, 2020 at 5:06 PM Tom Lane <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:

Magnus Hagander <magnus@hagander.net <mailto:magnus@hagander.net>>
writes:

Would it make sense to have a pg_execute_program() that

corresponds to COPY

FROM PROGRAM? This would obviously have the same permissions

restrictions

as COPY FROM PROGRAM.
The usecase would be to for example execute a command that

returns json

format output, which could then be parsed and processed as part

of a query.

Today, COPY FROM PROGRAM cannot do this, as we can't read a

multiline json

value with COPY.

copy ... from program 'random_json_producer | tr "\n\t" "  "';

I don't necessarily object to providing such a function to make it
easier, but it's not the case that you can't have the functionality
today.

"tr" is not typically available on Windows, for one :)

But yes, assuming tr is available, it is true that you can. (And you
can perhaps find something else to pipe it through on Windows). Of
course, you still can't use it in a query, since COPY can only target
a table :) Independently of something like this it would be nice to be
able to target say a CTE with COPY, but that's kan entirely different
topic.

A more robust recipe  would use "jq -c" as the filter. And it's
available on Windows via chocolatey.

I don't have a strong opinion on the suggested facility. Presumably you
can construct a function that runs COPY into a temp table and then
returns the results. But maybe that's more work than we should require
users to perform.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Stephen Frost
sfrost@snowman.net
In reply to: Magnus Hagander (#1)
Re: Function to execute a program

Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:

Would it make sense to have a pg_execute_program() that corresponds to COPY
FROM PROGRAM? This would obviously have the same permissions restrictions
as COPY FROM PROGRAM.

Eh, perhaps.

The usecase would be to for example execute a command that returns json
format output, which could then be parsed and processed as part of a query.
Today, COPY FROM PROGRAM cannot do this, as we can't read a multiline json
value with COPY.

I'd rather come up with a way to import this kind of object into PG by
using COPY rather than adding a different way to pull them in.

In the "this is a crazy idea" category- have a way to specify an input
function to COPY, or a target data type (and then use it's input
function) to which COPY just keeps feeding bytes until the function
comes back with "ok, I got a value to return" or something.

Would be really neat to have a way to COPY in a JSON, XML, or whatever
even if it's multi-line and even if it has multiple objects (ie: 10
complete JSON documents in a single file would become 10 rows of jsonb).

Thanks,

Stephen

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#5)
Re: Function to execute a program

Stephen Frost <sfrost@snowman.net> writes:

* Magnus Hagander (magnus@hagander.net) wrote:

Would it make sense to have a pg_execute_program() that corresponds to COPY
FROM PROGRAM? This would obviously have the same permissions restrictions
as COPY FROM PROGRAM.

I'd rather come up with a way to import this kind of object into PG by
using COPY rather than adding a different way to pull them in.

I'm not for overloading COPY to try to make it handle every data import
use-case. The issue here AIUI is that Magnus wants the program output
to be read as an uninterpreted blob (which he'll then try to convert to
jsonb or whatever, but that's not the concern of the import code). This
is exactly antithetical to COPY's mission of reading some rows that are
made up of some columns and putting the result into a table.

Yeah, we could no doubt add some functionality to disable all the
row-splitting and column-splitting and associated escaping logic,
but that's going to make COPY slower and more complicated. And it
still doesn't address wanting to use the result directly in a query
instead of sticking it into a table.

regards, tom lane

#7Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#6)
Re: Function to execute a program

Greetings.

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Magnus Hagander (magnus@hagander.net) wrote:

Would it make sense to have a pg_execute_program() that corresponds to COPY
FROM PROGRAM? This would obviously have the same permissions restrictions
as COPY FROM PROGRAM.

I'd rather come up with a way to import this kind of object into PG by
using COPY rather than adding a different way to pull them in.

I'm not for overloading COPY to try to make it handle every data import
use-case. The issue here AIUI is that Magnus wants the program output
to be read as an uninterpreted blob (which he'll then try to convert to
jsonb or whatever, but that's not the concern of the import code). This
is exactly antithetical to COPY's mission of reading some rows that are
made up of some columns and putting the result into a table.

I don't really think there's anything inherent in the fact that "COPY"
today only has one way to handle data that the user wants to import that
it should be required to always operate in that manner.

As for slowing down the current method- I don't think that we'd
implement such a change as just a modification to the existing optimized
parsing code as that just wouldn't make any sense and would slow COPY
down for this use-case, but having a COPY command that's able to work in
a few different modes when it comes to importing data seems like it
could be sensible, fast, and clear to users.

One could imagine creating some other top-level command to handle more
complex import cases than what COPY does today but I don't actually
think that'd be an improvment.

Yeah, we could no doubt add some functionality to disable all the
row-splitting and column-splitting and associated escaping logic,
but that's going to make COPY slower and more complicated. And it
still doesn't address wanting to use the result directly in a query
instead of sticking it into a table.

The way that's handled for the cases that COPY does work with today is
file_fdw. Ideally, we'd do the same here.

Ultimately, COPY absolutely *is* our general data import tool- it's just
that today we push some of the work to make things 'fit' on the user and
that ends up with pain points like exactly what Magnus has pointed out
here. We should be looking to improve that situation, and I don't
really care for the solution to that being "create some random other new
thing for data import that users then have to know exists and learn how
to use".

Thanks,

Stephen