psql \copy

Started by Steve Clarkalmost 6 years ago9 messagesgeneral

steve.clark@netwolves.com

almost 6 years ago

Hello,

I am using psql to copy data extracted from an InfluxDB in csv format into postgresql.
I have a key field on the time field which I have defined as a bigint since the time I get
from InfluxDB is an epoch time.

My question is does psql abort the copy if it hits a duplicate key, or does it keep processing?

Thanks,
--
Stephen Clark
NetWolves Managed Services, LLC.
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com<mailto:steve.clark@netwolves.com>
http://www.netwolves.com

Email Confidentiality Notice: The information contained in this transmission may contain privileged and confidential and/or protected health information (PHI) and may be subject to protection under the law, including the Health Insurance Portability and Accountability Act of 1996, as amended (HIPAA). This transmission is intended for the sole use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are notified that any use, dissemination, distribution, printing or copying of this transmission is strictly prohibited and may subject you to criminal or civil penalties. If you have received this transmission in error, please contact the sender immediately and delete this email and any attachments from any computer. Vaso Corporation and its subsidiary companies are not responsible for data leaks that result from email messages received that contain privileged and confidential and/or protected health information (PHI).

Adrian Klaver

adrian.klaver@aklaver.com

almost 6 years ago

In reply to: Steve Clark (#1)

Re: psql \copy

On 4/24/20 8:55 AM, Steve Clark wrote:

Hello,

I am using psql to copy data extracted from an InfluxDB in csv format
into postgresql.
I have a key field on the time field which I have defined as a bigint
since the time I get
from InfluxDB is an epoch time.

My question is does psql abort the copy if it hits a duplicate key, or
does it keep processing?

Aborts.

\copy uses COPY so:

https://www.postgresql.org/docs/12/sql-copy.html

"COPY stops operation at the first error. This should not lead to
problems in the event of a COPY TO, but the target table will already
have received earlier rows in a COPY FROM. These rows will not be
visible or accessible, but they still occupy disk space. This might
amount to a considerable amount of wasted disk space if the failure
happened well into a large copy operation. You might wish to invoke
VACUUM to recover the wasted space."

Thanks,
--
Stephen Clark
*NetWolves Managed Services, LLC.*
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
http://www.netwolves.com

Email Confidentiality Notice: The information contained in this
transmission may contain privileged and confidential and/or protected
health information (PHI) and may be subject to protection under the law,
including the Health Insurance Portability and Accountability Act of
1996, as amended (HIPAA). This transmission is intended for the sole use
of the individual or entity to whom it is addressed. If you are not the
intended recipient, you are notified that any use, dissemination,
distribution, printing or copying of this transmission is strictly
prohibited and may subject you to criminal or civil penalties. If you
have received this transmission in error, please contact the sender
immediately and delete this email and any attachments from any computer.
Vaso Corporation and its subsidiary companies are not responsible for
data leaks that result from email messages received that contain
privileged and confidential and/or protected health information (PHI).

--
Adrian Klaver
adrian.klaver@aklaver.com

David G. Johnston

david.g.johnston@gmail.com

almost 6 years ago

In reply to: Steve Clark (#1)

Re: psql \copy

On Fri, Apr 24, 2020 at 8:55 AM Steve Clark <steve.clark@netwolves.com>
wrote:

Hello,

I am using psql to copy data extracted from an InfluxDB in csv format into
postgresql.
I have a key field on the time field which I have defined as a bigint
since the time I get
from InfluxDB is an epoch time.

My question is does psql abort the copy if it hits a duplicate key, or
does it keep processing?

Aborts

Steve Crawford

scrawford@pinpointresearch.com

almost 6 years ago

In reply to: Steve Clark (#1)

Re: psql \copy

On Fri, Apr 24, 2020 at 8:55 AM Steve Clark <steve.clark@netwolves.com>
wrote:

Hello,

I am using psql to copy data extracted from an InfluxDB in csv format into
postgresql.
I have a key field on the time field which I have defined as a bigint
since the time I get
from InfluxDB is an epoch time.

My question is does psql abort the copy if it hits a duplicate key, or
does it keep processing?

The copy will fail. You could import into a temporary table and preprocess

then copy to your permanent table or use an ETL solution to remove unwanted
data before importing. I don't know the nature of your data or project but
perhaps that column isn't suitable for a key.

Cheers,
Steve

Steve Clark

steve.clark@netwolves.com

almost 6 years ago

In reply to: Steve Crawford (#4)

Re: psql \copy

On 04/24/2020 11:59 AM, Steve Crawford wrote:
On Fri, Apr 24, 2020 at 8:55 AM Steve Clark <steve.clark@netwolves.com<mailto:steve.clark@netwolves.com>> wrote:
Hello,

My question is does psql abort the copy if it hits a duplicate key, or does it keep processing?

The copy will fail. You could import into a temporary table and preprocess then copy to your permanent table or use an ETL solution to remove unwanted data before importing. I don't know the nature of your data or project but perhaps that column isn't suitable for a key.

Cheers,
Steve
I am attempting to periodically pull time series data from an InfluxDB.
The column at issue is the timestamp. I have a script that pulls the last 15 minutes of data from the InfluxDB
as csv data and pipe it into a psql -c "\copy...." command. I was looking for the simplest way to do this.

--
Stephen Clark
NetWolves Managed Services, LLC.
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com<mailto:steve.clark@netwolves.com>
http://www.netwolves.com

Adrian Klaver

adrian.klaver@aklaver.com

almost 6 years ago

In reply to: Steve Clark (#5)

Re: psql \copy

On 4/24/20 9:12 AM, Steve Clark wrote:

On 04/24/2020 11:59 AM, Steve Crawford wrote:

On Fri, Apr 24, 2020 at 8:55 AM Steve Clark <steve.clark@netwolves.com
<mailto:steve.clark@netwolves.com>> wrote:

Hello,

I am using psql to copy data extracted from an InfluxDB in csv
format into postgresql.
I have a key field on the time field which I have defined as a
bigint since the time I get
from InfluxDB is an epoch time.

My question is does psql abort the copy if it hits a duplicate
key, or does it keep processing?

The copy will fail. You could import into a temporary table and
preprocess then copy to your permanent table or use an ETL solution to
remove unwanted data before importing. I don't know the nature of your
data or project but perhaps that column isn't suitable for a key.

Cheers,
Steve

I am attempting to periodically pull time series data from an InfluxDB.
The column at issue is the timestamp. I have a script that pulls the
last 15 minutes of data from the InfluxDB
as csv data and pipe it into a psql -c "\copy...." command. I was
looking for the simplest way to do this.

Then as suggested above pull into staging table that has no constraints
e.g. PK. Verify data and then push into permanent table.

--
Stephen Clark
*NetWolves Managed Services, LLC.*
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
http://www.netwolves.com

--
Adrian Klaver
adrian.klaver@aklaver.com

Rob Sargent

robjsargent@gmail.com

almost 6 years ago

In reply to: Steve Clark (#5)

Re: psql \copy

On 4/24/20 10:12 AM, Steve Clark wrote:

On 04/24/2020 11:59 AM, Steve Crawford wrote:

On Fri, Apr 24, 2020 at 8:55 AM Steve Clark
<steve.clark@netwolves.com <mailto:steve.clark@netwolves.com>> wrote:

Hello,

I am using psql to copy data extracted from an InfluxDB in csv
format into postgresql.
I have a key field on the time field which I have defined as a
bigint since the time I get
from InfluxDB is an epoch time.

My question is does psql abort the copy if it hits a duplicate
key, or does it keep processing?

The copy will fail. You could import into a temporary table and
preprocess then copy to your permanent table or use an ETL solution
to remove unwanted data before importing. I don't know the nature of
your data or project but perhaps that column isn't suitable for a key.

Cheers,
Steve

I am attempting to periodically pull time series data from an InfluxDB.
The column at issue is the timestamp. I have a script that pulls the
last 15 minutes of data from the InfluxDB
as csv data and pipe it into a psql -c "\copy...." command. I was
looking for the simplest way to do this.

Is the duplication due to overlapping 15min chunks (i.e. imprecise
definition of "15 minutes ago")? Perhaps retaining last timestamp sent
to pg and use in the get-from-influx call?

Show quoted text

--
Stephen Clark
*NetWolves Managed Services, LLC.*
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
http://www.netwolves.com

Email Confidentiality Notice: The information contained in this
transmission may contain privileged and confidential and/or protected
health information (PHI) and may be subject to protection under the
law, including the Health Insurance Portability and Accountability Act
of 1996, as amended (HIPAA). This transmission is intended for the
sole use of the individual or entity to whom it is addressed. If you
are not the intended recipient, you are notified that any use,
dissemination, distribution, printing or copying of this transmission
is strictly prohibited and may subject you to criminal or civil
penalties. If you have received this transmission in error, please
contact the sender immediately and delete this email and any
attachments from any computer. Vaso Corporation and its subsidiary
companies are not responsible for data leaks that result from email
messages received that contain privileged and confidential and/or
protected health information (PHI).

Steve Clark

steve.clark@netwolves.com

almost 6 years ago

In reply to: Adrian Klaver (#6)

Re: psql \copy

On 04/24/2020 12:15 PM, Adrian Klaver wrote:

On 4/24/20 9:12 AM, Steve Clark wrote:

On 04/24/2020 11:59 AM, Steve Crawford wrote:

On Fri, Apr 24, 2020 at 8:55 AM Steve Clark <steve.clark@netwolves.com<mailto:steve.clark@netwolves.com>
<mailto:steve.clark@netwolves.com><mailto:steve.clark@netwolves.com>> wrote:

Hello,

I am using psql to copy data extracted from an InfluxDB in csv
format into postgresql.
I have a key field on the time field which I have defined as a
bigint since the time I get
from InfluxDB is an epoch time.

My question is does psql abort the copy if it hits a duplicate
key, or does it keep processing?

The copy will fail. You could import into a temporary table and
preprocess then copy to your permanent table or use an ETL solution to
remove unwanted data before importing. I don't know the nature of your
data or project but perhaps that column isn't suitable for a key.

Cheers,
Steve

I am attempting to periodically pull time series data from an InfluxDB.
The column at issue is the timestamp. I have a script that pulls the
last 15 minutes of data from the InfluxDB
as csv data and pipe it into a psql -c "\copy...." command. I was
looking for the simplest way to do this.

Then as suggested above pull into staging table that has no constraints
e.g. PK. Verify data and then push into permanent table.

Thanks for the tip. I'll head down that road. Stay safe everyone.

--
Stephen Clark
*NetWolves Managed Services, LLC.*
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com<mailto:steve.clark@netwolves.com>
http://www.netwolves.com

Ron

ronljohnsonjr@gmail.com

almost 6 years ago

In reply to: Steve Clark (#1)

Re: psql \copy

You might want to investigate pg_bulkload for this activity.

On 4/24/20 10:55 AM, Steve Clark wrote:

Hello,

I am using psql to copy data extracted from an InfluxDB in csv format into
postgresql.
I have a key field on the time field which I have defined as a bigint
since the time I get
from InfluxDB is an epoch time.

My question is does psql abort the copy if it hits a duplicate key, or
does it keep processing?

--
Angular momentum makes the world go 'round.