How to read an external pdf file from postgres?

Started by Amine Tengilimogluabout 4 years ago5 messagesgeneral
Jump to latest
#1Amine Tengilimoglu
aminetengilimoglu@gmail.com

Hi;

I want to read an external pdf file from postgres. pdf file will
exist on the disk. postgres only know the disk full path as metadata. Is
there any software or extension that can be used for this? Or do we have to
develop software for it? Or what is the best approach for this? I'd
appreciate it if anyone with experience could make suggestions.

Thanks.

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Amine Tengilimoglu (#1)
Re: How to read an external pdf file from postgres?

On 12.01.22 12:16, Amine Tengilimoglu wrote:

     I want to read an external pdf file from postgres. pdf file will
exist on the disk. postgres only know the disk full path as metadata. Is
there any software or extension that can be used for this? Or do we have
to develop software for it?  Or what is the best approach for this? I'd
appreciate it if anyone with experience could make suggestions.

You could write a function in PL/Perl or PL/Python to open and read the
file and process the PDF data, using some third-party module that surely
exists somewhere.

#3Дмитрий Иванов
firstdismay@gmail.com
In reply to: Amine Tengilimoglu (#1)
Re: How to read an external pdf file from postgres?

What are you going to do with the data?
If you want to analyze it in some way, I can't think of a better option
with a Python function. Or do you just want to transfer them? There are
options here too, but in this case I like Python better.
--
Regards, Dmitry!

ср, 12 янв. 2022 г. в 16:16, Amine Tengilimoglu <aminetengilimoglu@gmail.com

Show quoted text

:

Hi;

I want to read an external pdf file from postgres. pdf file will
exist on the disk. postgres only know the disk full path as metadata. Is
there any software or extension that can be used for this? Or do we have to
develop software for it? Or what is the best approach for this? I'd
appreciate it if anyone with experience could make suggestions.

Thanks.

#4Ian Lawrence Barwick
barwick@gmail.com
In reply to: Amine Tengilimoglu (#1)
Re: How to read an external pdf file from postgres?

2022年1月12日(水) 20:16 Amine Tengilimoglu <aminetengilimoglu@gmail.com>:

Hi;

I want to read an external pdf file from postgres. pdf file will exist on the disk. postgres only know the disk full path as metadata. Is there any software or extension that can be used for this? Or do we have to develop software for it? Or what is the best approach for this? I'd appreciate it if anyone with experience could make suggestions.

By "read" do you mean "open the file and meaningful extract data from it"? If
so, speaking from prior experience, don't. And if you really have to, make sure
the source PDF is guaranteed to be in a well-defined, predictable format
enforceable by contract law and/or people with sharp pointy sticks. I have
successfully suppressed the memories of whatever it is I once had to do with
reading data from PDFs, but though the data was eventually imported into
PostgreSQL, there was a lot of mangling probably involving a Perl module (other
languages are probably available) before it got anywhere near the database.

Reagrds

Ian Barwick

--
EnterpriseDB: https://www.enterprisedb.com

#5Florents Tselai
florents.tselai@gmail.com
In reply to: Ian Lawrence Barwick (#4)
Re: How to read an external pdf file from postgres?

On 12 Jan 2022, at 4:35 PM, Ian Lawrence Barwick <barwick@gmail.com> wrote:

2022年1月12日(水) 20:16 Amine Tengilimoglu <aminetengilimoglu@gmail.com>:

Hi;

I want to read an external pdf file from postgres. pdf file will exist on the disk. postgres only know the disk full path as metadata. Is there any software or extension that can be used for this? Or do we have to develop software for it? Or what is the best approach for this? I'd appreciate it if anyone with experience could make suggestions.

By "read" do you mean "open the file and meaningful extract data from it"? If
so, speaking from prior experience, don't. And if you really have to, make sure
the source PDF is guaranteed to be in a well-defined, predictable format
enforceable by contract law and/or people with sharp pointy sticks. I have
successfully suppressed the memories of whatever it is I once had to do with
reading data from PDFs, but though the data was eventually imported into
PostgreSQL, there was a lot of mangling probably involving a Perl module (other
languages are probably available) before it got anywhere near the database.

Reagrds

Ian Barwick

--
EnterpriseDB: https://www.enterprisedb.com

https://github.com/Florents-Tselai/pgpdf