psql \copy
Hello,
I am using psql to copy data extracted from an InfluxDB in csv format into postgresql.
I have a key field on the time field which I have defined as a bigint since the time I get
from InfluxDB is an epoch time.
My question is does psql abort the copy if it hits a duplicate key, or does it keep processing?
Thanks,
--
Stephen Clark
NetWolves Managed Services, LLC.
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com<mailto:steve.clark@netwolves.com>
http://www.netwolves.com
Email Confidentiality Notice: The information contained in this transmission may contain privileged and confidential and/or protected health information (PHI) and may be subject to protection under the law, including the Health Insurance Portability and Accountability Act of 1996, as amended (HIPAA). This transmission is intended for the sole use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are notified that any use, dissemination, distribution, printing or copying of this transmission is strictly prohibited and may subject you to criminal or civil penalties. If you have received this transmission in error, please contact the sender immediately and delete this email and any attachments from any computer. Vaso Corporation and its subsidiary companies are not responsible for data leaks that result from email messages received that contain privileged and confidential and/or protected health information (PHI).
On 4/24/20 8:55 AM, Steve Clark wrote:
Hello,
I am using psql to copy data extracted from an InfluxDB in csv format
into postgresql.
I have a key field on the time field which I have defined as a bigint
since the time I get
from InfluxDB is an epoch time.My question is does psql abort the copy if it hits a duplicate key, or
does it keep processing?
Aborts.
\copy uses COPY so:
https://www.postgresql.org/docs/12/sql-copy.html
"COPY stops operation at the first error. This should not lead to
problems in the event of a COPY TO, but the target table will already
have received earlier rows in a COPY FROM. These rows will not be
visible or accessible, but they still occupy disk space. This might
amount to a considerable amount of wasted disk space if the failure
happened well into a large copy operation. You might wish to invoke
VACUUM to recover the wasted space."
Thanks,
--
Stephen Clark
*NetWolves Managed Services, LLC.*
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
http://www.netwolves.comEmail Confidentiality Notice: The information contained in this
transmission may contain privileged and confidential and/or protected
health information (PHI) and may be subject to protection under the law,
including the Health Insurance Portability and Accountability Act of
1996, as amended (HIPAA). This transmission is intended for the sole use
of the individual or entity to whom it is addressed. If you are not the
intended recipient, you are notified that any use, dissemination,
distribution, printing or copying of this transmission is strictly
prohibited and may subject you to criminal or civil penalties. If you
have received this transmission in error, please contact the sender
immediately and delete this email and any attachments from any computer.
Vaso Corporation and its subsidiary companies are not responsible for
data leaks that result from email messages received that contain
privileged and confidential and/or protected health information (PHI).
--
Adrian Klaver
adrian.klaver@aklaver.com
On Fri, Apr 24, 2020 at 8:55 AM Steve Clark <steve.clark@netwolves.com>
wrote:
Hello,
I am using psql to copy data extracted from an InfluxDB in csv format into
postgresql.
I have a key field on the time field which I have defined as a bigint
since the time I get
from InfluxDB is an epoch time.My question is does psql abort the copy if it hits a duplicate key, or
does it keep processing?
Aborts
On Fri, Apr 24, 2020 at 8:55 AM Steve Clark <steve.clark@netwolves.com>
wrote:
Hello,
I am using psql to copy data extracted from an InfluxDB in csv format into
postgresql.
I have a key field on the time field which I have defined as a bigint
since the time I get
from InfluxDB is an epoch time.My question is does psql abort the copy if it hits a duplicate key, or
does it keep processing?The copy will fail. You could import into a temporary table and preprocess
then copy to your permanent table or use an ETL solution to remove unwanted
data before importing. I don't know the nature of your data or project but
perhaps that column isn't suitable for a key.
Cheers,
Steve
On 04/24/2020 11:59 AM, Steve Crawford wrote:
On Fri, Apr 24, 2020 at 8:55 AM Steve Clark <steve.clark@netwolves.com<mailto:steve.clark@netwolves.com>> wrote:
Hello,
I am using psql to copy data extracted from an InfluxDB in csv format into postgresql.
I have a key field on the time field which I have defined as a bigint since the time I get
from InfluxDB is an epoch time.
My question is does psql abort the copy if it hits a duplicate key, or does it keep processing?
The copy will fail. You could import into a temporary table and preprocess then copy to your permanent table or use an ETL solution to remove unwanted data before importing. I don't know the nature of your data or project but perhaps that column isn't suitable for a key.
Cheers,
Steve
I am attempting to periodically pull time series data from an InfluxDB.
The column at issue is the timestamp. I have a script that pulls the last 15 minutes of data from the InfluxDB
as csv data and pipe it into a psql -c "\copy...." command. I was looking for the simplest way to do this.
--
Stephen Clark
NetWolves Managed Services, LLC.
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com<mailto:steve.clark@netwolves.com>
http://www.netwolves.com
Email Confidentiality Notice: The information contained in this transmission may contain privileged and confidential and/or protected health information (PHI) and may be subject to protection under the law, including the Health Insurance Portability and Accountability Act of 1996, as amended (HIPAA). This transmission is intended for the sole use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are notified that any use, dissemination, distribution, printing or copying of this transmission is strictly prohibited and may subject you to criminal or civil penalties. If you have received this transmission in error, please contact the sender immediately and delete this email and any attachments from any computer. Vaso Corporation and its subsidiary companies are not responsible for data leaks that result from email messages received that contain privileged and confidential and/or protected health information (PHI).
On 4/24/20 9:12 AM, Steve Clark wrote:
On 04/24/2020 11:59 AM, Steve Crawford wrote:
On Fri, Apr 24, 2020 at 8:55 AM Steve Clark <steve.clark@netwolves.com
<mailto:steve.clark@netwolves.com>> wrote:Hello,
I am using psql to copy data extracted from an InfluxDB in csv
format into postgresql.
I have a key field on the time field which I have defined as a
bigint since the time I get
from InfluxDB is an epoch time.My question is does psql abort the copy if it hits a duplicate
key, or does it keep processing?The copy will fail. You could import into a temporary table and
preprocess then copy to your permanent table or use an ETL solution to
remove unwanted data before importing. I don't know the nature of your
data or project but perhaps that column isn't suitable for a key.Cheers,
SteveI am attempting to periodically pull time series data from an InfluxDB.
The column at issue is the timestamp. I have a script that pulls the
last 15 minutes of data from the InfluxDB
as csv data and pipe it into a psql -c "\copy...." command. I was
looking for the simplest way to do this.
Then as suggested above pull into staging table that has no constraints
e.g. PK. Verify data and then push into permanent table.
--
Stephen Clark
*NetWolves Managed Services, LLC.*
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
http://www.netwolves.com
--
Adrian Klaver
adrian.klaver@aklaver.com
On 4/24/20 10:12 AM, Steve Clark wrote:
On 04/24/2020 11:59 AM, Steve Crawford wrote:
On Fri, Apr 24, 2020 at 8:55 AM Steve Clark
<steve.clark@netwolves.com <mailto:steve.clark@netwolves.com>> wrote:Hello,
I am using psql to copy data extracted from an InfluxDB in csv
format into postgresql.
I have a key field on the time field which I have defined as a
bigint since the time I get
from InfluxDB is an epoch time.My question is does psql abort the copy if it hits a duplicate
key, or does it keep processing?The copy will fail. You could import into a temporary table and
preprocess then copy to your permanent table or use an ETL solution
to remove unwanted data before importing. I don't know the nature of
your data or project but perhaps that column isn't suitable for a key.Cheers,
SteveI am attempting to periodically pull time series data from an InfluxDB.
The column at issue is the timestamp. I have a script that pulls the
last 15 minutes of data from the InfluxDB
as csv data and pipe it into a psql -c "\copy...." command. I was
looking for the simplest way to do this.
Is the duplication due to overlapping 15min chunks (i.e. imprecise
definition of "15 minutes ago")? Perhaps retaining last timestamp sent
to pg and use in the get-from-influx call?
Show quoted text
--
Stephen Clark
*NetWolves Managed Services, LLC.*
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
http://www.netwolves.comEmail Confidentiality Notice: The information contained in this
transmission may contain privileged and confidential and/or protected
health information (PHI) and may be subject to protection under the
law, including the Health Insurance Portability and Accountability Act
of 1996, as amended (HIPAA). This transmission is intended for the
sole use of the individual or entity to whom it is addressed. If you
are not the intended recipient, you are notified that any use,
dissemination, distribution, printing or copying of this transmission
is strictly prohibited and may subject you to criminal or civil
penalties. If you have received this transmission in error, please
contact the sender immediately and delete this email and any
attachments from any computer. Vaso Corporation and its subsidiary
companies are not responsible for data leaks that result from email
messages received that contain privileged and confidential and/or
protected health information (PHI).
On 04/24/2020 12:15 PM, Adrian Klaver wrote:
On 4/24/20 9:12 AM, Steve Clark wrote:
On 04/24/2020 11:59 AM, Steve Crawford wrote:
On Fri, Apr 24, 2020 at 8:55 AM Steve Clark <steve.clark@netwolves.com<mailto:steve.clark@netwolves.com>
<mailto:steve.clark@netwolves.com><mailto:steve.clark@netwolves.com>> wrote:
Hello,
I am using psql to copy data extracted from an InfluxDB in csv
format into postgresql.
I have a key field on the time field which I have defined as a
bigint since the time I get
from InfluxDB is an epoch time.
My question is does psql abort the copy if it hits a duplicate
key, or does it keep processing?
The copy will fail. You could import into a temporary table and
preprocess then copy to your permanent table or use an ETL solution to
remove unwanted data before importing. I don't know the nature of your
data or project but perhaps that column isn't suitable for a key.
Cheers,
Steve
I am attempting to periodically pull time series data from an InfluxDB.
The column at issue is the timestamp. I have a script that pulls the
last 15 minutes of data from the InfluxDB
as csv data and pipe it into a psql -c "\copy...." command. I was
looking for the simplest way to do this.
Then as suggested above pull into staging table that has no constraints
e.g. PK. Verify data and then push into permanent table.
Thanks for the tip. I'll head down that road. Stay safe everyone.
--
Stephen Clark
*NetWolves Managed Services, LLC.*
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com<mailto:steve.clark@netwolves.com>
http://www.netwolves.com
--
Stephen Clark
NetWolves Managed Services, LLC.
Sr. Applications Architect
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com<mailto:steve.clark@netwolves.com>
http://www.netwolves.com
Email Confidentiality Notice: The information contained in this transmission may contain privileged and confidential and/or protected health information (PHI) and may be subject to protection under the law, including the Health Insurance Portability and Accountability Act of 1996, as amended (HIPAA). This transmission is intended for the sole use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are notified that any use, dissemination, distribution, printing or copying of this transmission is strictly prohibited and may subject you to criminal or civil penalties. If you have received this transmission in error, please contact the sender immediately and delete this email and any attachments from any computer. Vaso Corporation and its subsidiary companies are not responsible for data leaks that result from email messages received that contain privileged and confidential and/or protected health information (PHI).
You might want to investigate pg_bulkload for this activity.
On 4/24/20 10:55 AM, Steve Clark wrote:
Hello,
I am using psql to copy data extracted from an InfluxDB in csv format into
postgresql.
I have a key field on the time field which I have defined as a bigint
since the time I get
from InfluxDB is an epoch time.My question is does psql abort the copy if it hits a duplicate key, or
does it keep processing?
--
Angular momentum makes the world go 'round.