Re: speed up insert query

Started by Martin Gaintyover 25 years ago3 messagesgeneral
Jump to latest
#1Martin Gainty
mgainty@hotmail.com

2 things
tr_tran_time needs to be already in 'time format'
is_ok needs to be indexed (preferably bitmapped index)

HTH/
Martin
----- Original Message -----
From: "Tom Hart" <tomhart@coopfed.org>
To: "Postgres General List" <pgsql-general@postgresql.org>
Sent: Monday, November 26, 2007 5:30 PM
Subject: [GENERAL] speed up insert query

Hey everybody. I'm trying to speed up a query (not general optimization,
one query in particular), and I'm not sure if there's any way to get it
to go faster.

The query looks like this

INSERT INTO transaction
(
"tr_acct_num",
"tr_acct_typ",
"tr_atm_rec",
"tr_audit_seq",
"tr_branch_cd",
"tr_cash_amt",
...
"tr_tran_time",
"tr_trn_rev_point",
"tr_typ",
"tr_typ_cd",
"atm_trn_reg_e",
"dataset"
)
SELECT
iq_numeric("tr_acct_num"),
"tr_acct_typ",
iq_numeric("tr_atm_rec"),
iq_numeric("tr_audit_seq"),
iq_numeric("tr_branch_cd"),
iq_numeric("tr_cash_amt"),
...
cast("tr_tran_time" as time),
iq_numeric("tr_trn_rev_point"),
iq_numeric("tr_typ"),
iq_numeric("tr_typ_cd"),
"atm_trn_reg_e",
0

FROM transaction_import
WHERE is_ok = 'TRUE'
;

There's not a lot I seem to be able to do about the select portion of
this query (index on is_ok, the planner didn't even want to use it), but
is there anything I can do to speed up the import?

This is the EXPLAIN ANALYZE on the query

QUERY PLAN

--------------------------------------------------------------------------

---------------------------------------------------

Show quoted text

Seq Scan on transaction_import (cost=0.00..30953.68 rows=69239
width=434) (actual time=0.146..2974.609 rows=68913 loops=1)
Filter: is_ok
Total runtime: 179091.119 ms
(3 rows)

The query is inserting ~70,000 rows into a table with ~1.8 million rows
already in it. Anybody have any idea how I can keep this query from
taking so long?

--
Tom Hart
IT Specialist
Cooperative Federal
723 Westcott St.
Syracuse, NY 13210
(315) 471-1116 ext. 202
(315) 476-0567 (fax)

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#2Tom Hart
tomhart@coopfed.org
In reply to: Martin Gainty (#1)

Martin Gainty wrote:

2 things
tr_tran_time needs to be already in 'time format'
is_ok needs to be indexed (preferably bitmapped index)

HTH/
Martin

The data is COPY'ed from csv's that our internal software creates, and
we don't have control over output format. Is coaxing tr_tran_time into
proper time format on the _import table going to be less costly than
doing it on the fly in the query? Also, there are a couple more casts in
the query (as date). Are casts extremely costly?

The iq_numeric function uses regex to determine whether to return a
number or null. How costly are regex based functions used like this? I
can't see it being more efficient to edit this data while it's in a
table with all text fields, no key (have to permit duplicates at this
stage), and as of yet no indexes. (As I said I tried an index on is_ok,
both a btree and a hash, and the planner seems completely uninterested).

Also, I'm sure you've heard this, but the date on your email client is
drastically wrong. I appreciate your assistance but I can only imagine
that there are quite a few people missing your good advice because
they're not looking through the new posts from 2000.

Hey everybody. I'm trying to speed up a query (not general optimization,
one query in particular), and I'm not sure if there's any way to get it
to go faster.

The query looks like this

INSERT INTO transaction
(
"tr_acct_num",
"tr_acct_typ",
"tr_atm_rec",
"tr_audit_seq",
"tr_branch_cd",
"tr_cash_amt",
...
"tr_tran_time",
"tr_trn_rev_point",
"tr_typ",
"tr_typ_cd",
"atm_trn_reg_e",
"dataset"
)
SELECT
iq_numeric("tr_acct_num"),
"tr_acct_typ",
iq_numeric("tr_atm_rec"),
iq_numeric("tr_audit_seq"),
iq_numeric("tr_branch_cd"),
iq_numeric("tr_cash_amt"),
...
cast("tr_tran_time" as time),
iq_numeric("tr_trn_rev_point"),
iq_numeric("tr_typ"),
iq_numeric("tr_typ_cd"),
"atm_trn_reg_e",
0

FROM transaction_import
WHERE is_ok = 'TRUE'
;

--

Tom Hart
IT Specialist
Cooperative Federal
723 Westcott St.
Syracuse, NY 13210
(315) 471-1116 ext. 202
(315) 476-0567 (fax)

#3Tom Hart
tomhart@coopfed.org
In reply to: Tom Hart (#2)

Tom Hart wrote:

Martin Gainty wrote:

2 things
tr_tran_time needs to be already in 'time format'
is_ok needs to be indexed (preferably bitmapped index)

HTH/
Martin

The data is COPY'ed from csv's that our internal software creates, and
we don't have control over output format. Is coaxing tr_tran_time into
proper time format on the _import table going to be less costly than
doing it on the fly in the query? Also, there are a couple more casts
in the query (as date). Are casts extremely costly?

The iq_numeric function uses regex to determine whether to return a
number or null. How costly are regex based functions used like this? I
can't see it being more efficient to edit this data while it's in a
table with all text fields, no key (have to permit duplicates at this
stage), and as of yet no indexes. (As I said I tried an index on
is_ok, both a btree and a hash, and the planner seems completely
uninterested).

Also, I'm sure you've heard this, but the date on your email client is
drastically wrong. I appreciate your assistance but I can only imagine
that there are quite a few people missing your good advice because
they're not looking through the new posts from 2000.

I found at least one major optimization (or rather de-optimization
already in place, if the english language doesn't mind being flexed a
bit). My supervisor was playing with indexes on the rather large
transaction table. It turns out he had two multi-column indexes that
were composed of fields that were already indexed. These two indexes
didn't seem to be helping queries against the table much, but removing
just those two (there are still 4 or 5 single column indexes) cut my
execution time by 70%. That brings it to a much more manageable amount
of time.

Thanks for your reply. Hope this helps somebody else :-)

Hey everybody. I'm trying to speed up a query (not general
optimization,
one query in particular), and I'm not sure if there's any way to get it
to go faster.

The query looks like this

INSERT INTO transaction
(
"tr_acct_num",
"tr_acct_typ",
"tr_atm_rec",
"tr_audit_seq",
"tr_branch_cd",
"tr_cash_amt",
...
"tr_tran_time",
"tr_trn_rev_point",
"tr_typ",
"tr_typ_cd",
"atm_trn_reg_e",
"dataset"
)
SELECT
iq_numeric("tr_acct_num"),
"tr_acct_typ",
iq_numeric("tr_atm_rec"),
iq_numeric("tr_audit_seq"),
iq_numeric("tr_branch_cd"),
iq_numeric("tr_cash_amt"),
...
cast("tr_tran_time" as time),
iq_numeric("tr_trn_rev_point"),
iq_numeric("tr_typ"),
iq_numeric("tr_typ_cd"),
"atm_trn_reg_e",
0

FROM transaction_import
WHERE is_ok = 'TRUE'
;

--
Tom Hart
IT Specialist
Cooperative Federal
723 Westcott St.
Syracuse, NY 13210
(315) 471-1116 ext. 202
(315) 476-0567 (fax)