best way to write large data-streams quickly?

Started by Mark Moelleringabout 8 years ago4 messagesgeneral

Mark Moellering

markmoellering@psyberation.com

about 8 years ago

Everyone,

We are trying to architect a new system, which will have to take several
large datastreams (total of ~200,000 parsed files per second) and place
them in a database. I am trying to figure out the best way to import that
sort of data into Postgres.

I keep thinking i can't be the first to have this problem and there are
common solutions but I can't find any. Does anyone know of some sort
method, third party program, etc, that can accept data from a number of
different sources, and push it into Postgres as fast as possible?

Thanks in advance,

Mark Moellering

steve@blighty.com

about 8 years ago

In reply to: Mark Moellering (#1)

Re: best way to write large data-streams quickly?

On Apr 9, 2018, at 8:49 AM, Mark Moellering <markmoellering@psyberation.com> wrote:

Everyone,

We are trying to architect a new system, which will have to take several large datastreams (total of ~200,000 parsed files per second) and place them in a database. I am trying to figure out the best way to import that sort of data into Postgres.

I keep thinking i can't be the first to have this problem and there are common solutions but I can't find any. Does anyone know of some sort method, third party program, etc, that can accept data from a number of different sources, and push it into Postgres as fast as possible?

Take a look at http://ossc-db.github.io/pg_bulkload/index.html. Check the benchmarks for different situations compared to COPY.

Depending on what you're doing using custom code to parse your data and then do multiple binary COPYs in parallel may be better.

Cheers,
Steve

Mark Moellering

markmoellering@psyberation.com

almost 8 years ago

In reply to: Steve Atkins (#2)

Re: best way to write large data-streams quickly?

On Mon, Apr 9, 2018 at 12:01 PM, Steve Atkins <steve@blighty.com> wrote:

On Apr 9, 2018, at 8:49 AM, Mark Moellering <markmoellering@psyberation.

com> wrote:

Everyone,

We are trying to architect a new system, which will have to take several

large datastreams (total of ~200,000 parsed files per second) and place
them in a database. I am trying to figure out the best way to import that
sort of data into Postgres.

I keep thinking i can't be the first to have this problem and there are

common solutions but I can't find any. Does anyone know of some sort
method, third party program, etc, that can accept data from a number of
different sources, and push it into Postgres as fast as possible?

Take a look at http://ossc-db.github.io/pg_bulkload/index.html. Check the
benchmarks for different situations compared to COPY.

Depending on what you're doing using custom code to parse your data and
then do multiple binary COPYs in parallel may be better.

Cheers,
Steve

(fighting google slightly to keep from top-posting...)

Thanks!

How long can you run COPY? I have been looking at it more closely. In
some ways, it would be simple just to take data from stdin and send it to
postgres but can I do that literally 24/7? I am monitoring data feeds that
will never stop and I don't know if that is how Copy is meant to be used or
if I have to let it finish and start another one at some point?

Thanks for everyones' help and input!

Mark Moellering

gsievers19@comcast.net

almost 8 years ago

In reply to: Mark Moellering (#3)

Re: best way to write large data-streams quickly?

Mark Moellering <markmoellering@psyberation.com> writes:

<snip>

How long can you run COPY? I have been looking at it more closely.
In some ways, it would be simple just to take data from stdin and
send it to postgres but can I do that literally 24/7? I am
monitoring data feeds that will never stop and I don't know if that
is how Copy is meant to be used or if I have to let it finish and
start another one at some point?

Launch a single copy and pipe data into it for an extended period an/or
bulk is fine but nothing will be visible until the statement is finished
and, if it were run in a transaction block, the block committed.

HTH

Thanks for everyones' help and input!

Mark Moellering

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800