two questions about toast
Hi all,
I've a couple of doubts about toast:
1) is a REINDEX DATABASE hitting also toast tables or they need to be
reindexed manually?
2) while executing a query against toasted values I got this debug
message that I don't know what is meaning
DEBUG: building index "pg_toast_33875_index" on table "pg_toast_33875" serially
The query I was executing was:
SELECT lower( f::text ) || lower( t::text ) FROM crashy_table
WHERE id = '16385'
via a plpgsql PERFORM.
Thanks,
Luca
On 11/15/20 9:03 AM, Luca Ferrari wrote:
Hi all,
I've a couple of doubts about toast:
1) is a REINDEX DATABASE hitting also toast tables or they need to be
reindexed manually?
https://www.postgresql.org/docs/current/sql-reindex.html
"DATABASE
Recreate all indexes within the current database. Indexes on shared
system catalogs are also processed. This form of REINDEX cannot be
executed inside a transaction block.
"
"TABLE
Recreate all indexes of the specified table. If the table has a
secondary “TOAST” table, that is reindexed as well.
"
Database --> Table --> TOAST table.
2) while executing a query against toasted values I got this debug
message that I don't know what is meaning
This happened when you where doing the REINDEX DATABASE?
DEBUG: building index "pg_toast_33875_index" on table "pg_toast_33875" serially
The query I was executing was:
SELECT lower( f::text ) || lower( t::text ) FROM crashy_table
WHERE id = '16385'
via a plpgsql PERFORM.Thanks,
Luca
--
Adrian Klaver
adrian.klaver@aklaver.com
On Sun, Nov 15, 2020 at 6:45 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
Database --> Table --> TOAST table.
Thanks, I was not sure about.
2) while executing a query against toasted values I got this debug
message that I don't know what is meaningThis happened when you where doing the REINDEX DATABASE?
No, there was no indeed involved at that time.
Luca
Hello,
I am definitely out over my skis here so I’ll apologize in advance 😉.
Running version 12.5-1.pgdg20.04+1 on ubuntu. It’s essentially a
personal database I use to ingest sales forecast spreadsheets from which
I create custom reports for my job function.
I pull a new forecast spreadsheet each Monday. 80% of the records are
the same as the existing records from the week before.
Here’s what I (REALLY) want:
Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char) Example: 19743576 22072.37 Commit
- 90%
1. If the NEW dealid doesn't match any of the OLD dealids, insert the
new row
2. if the NEW dealid, revenue and stage fields ALL match the OLD
dealid, revenue and stage, skip (don't insert the NEW row)
3. If the NEW dealid matches an OLD dealid but either the NEW revenue
OR the stage fields have changed (don't match OLD record) insert new
row (I'll review both rows manually)
*Attempt 1: *Update chk field with 'same' if OLD revusd OR stage are
different than the NEW revusd OR stage
CREATE TRIGGER chk4chg
BEFORE
INSERT ON sfdc
FOR EACH ROW
BEGIN
UPDATE sfdc
SET chk = 'same'
WHERE ndealid = :NEW.ndealid
AND revusd = :NEW.revusd
AND stage = :NEW.stage
END chk4chg;
Remarkably, that works in that it will UPDATE the chk field with 'same'
|ndealid |revusd |stage |chk |
|17713063|1300000|Propose - 60% |same |
However, I must manually enter the parameters in dialogue box that
(inexplicably) pops up when I run this command.
*Attempt 2:*
CREATE OR REPLACE FUNCTION update_insert() RETURNS trigger AS $$
BEGIN
UPDATE sfdc
SET chk = 'same'
WHERE ndealid = OLD.ndealid;
AND NEW.revusd = OLD.revusd
AND NEW.stage = OLD.stage;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION Query returned successfully in 136 msec.
That's good news but the trigger doesn't actually update. It lacks
BEFORE INSERT ON sfdc FOR EACH ROW so low prospect for success :-).
*Attempt 3: *A little more sophisticated executing Function from Trigger
CREATE TRIGGER smart_update_same BEFORE INSERT ON sfdc
FOR EACH ROW EXECUTE FUNCTION update_insert();
CREATE OR REPLACE FUNCTION update_insert() RETURNS trigger AS $$
BEGIN
UPDATE sfdc
SET sfdc.chk = 'same'
WHERE NEW.ndealid = OLD.ndealid
AND NEW.revusd = OLD.revusd
AND NEW.stage = OLD.stage;
END;
$$ LANGUAGE plpgsql;
These 2 CREATEs return successfully but do not update the chk field on a
successful INSERT:
sales=# select ndealid,revusd,stage,chk from sfdc where ndealid = 19743576;
ndealid | revusd | stage | chk
19743576 | 22072.37 | Commit - 90% |
19743576 | 22072.37 | Commit - 90% |
19743576 | 22072.37 | Commit - 90% |
These 3 attempts won't give me what I REALLY want but I figure I could
use the chk field to delete the new inserts I didn't need.
Am I anywhere close (same county) to the right code?
Hagen
Larimer County, CO
On Saturday, November 21, 2020, Hagen Finley <hagen@datasundae.com> wrote:
I pull a new forecast spreadsheet each Monday. 80% of the records are the
same as the existing records from the week before.Here’s what I (REALLY) want:
Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char) Example: 19743576 22072.37 Commit
- 90%1. If the NEW dealid doesn't match any of the OLD dealids, insert the
new row
2. if the NEW dealid, revenue and stage fields ALL match the OLD
dealid, revenue and stage, skip (don't insert the NEW row)
3. If the NEW dealid matches an OLD dealid but either the NEW revenue
OR the stage fields have changed (don't match OLD record) insert new row
(I'll review both rows manually)Am I anywhere close (same county) to the right code?
IMO, don’t use triggers. Load the data into a temporary, or unlogged
table, and then run commands to do what you want against the live tables.
Truncate/drop before doing that again the following week.
David J.
On 11/21/20 8:00 AM, Hagen Finley wrote:
Hello,
I am definitely out over my skis here so I’ll apologize in advance 😉.
Running version 12.5-1.pgdg20.04+1 on ubuntu. It’s essentially a
personal database I use to ingest sales forecast spreadsheets from which
I create custom reports for my job function.I pull a new forecast spreadsheet each Monday. 80% of the records are
the same as the existing records from the week before.Here’s what I (REALLY) want:
Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char) Example: 19743576 22072.37 Commit
- 90%1. If the NEW dealid doesn't match any of the OLD dealids, insert the
new row
2. if the NEW dealid, revenue and stage fields ALL match the OLD
dealid, revenue and stage, skip (don't insert the NEW row)
3. If the NEW dealid matches an OLD dealid but either the NEW revenue
OR the stage fields have changed (don't match OLD record) insert new
row (I'll review both rows manually)*Attempt 1: *Update chk field with 'same' if OLD revusd OR stage are
different than the NEW revusd OR stageCREATE TRIGGER chk4chg
BEFORE
INSERT ON sfdc
FOR EACH ROW
BEGIN
UPDATE sfdc
SET chk = 'same'
WHERE ndealid = :NEW.ndealid
AND revusd = :NEW.revusd
AND stage = :NEW.stage
END chk4chg;Remarkably, that works in that it will UPDATE the chk field with 'same'
Not sure how.
More comments below.
|ndealid |revusd |stage |chk |
|17713063|1300000|Propose - 60% |same |
However, I must manually enter the parameters in dialogue box that
(inexplicably) pops up when I run this command.
What client are you using?
*Attempt 2:*
CREATE OR REPLACE FUNCTION update_insert() RETURNS trigger AS $$
BEGIN
UPDATE sfdc
SET chk = 'same'
WHERE ndealid = OLD.ndealid;
AND NEW.revusd = OLD.revusd
AND NEW.stage = OLD.stage;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;CREATE FUNCTION Query returned successfully in 136 msec.
That's good news but the trigger doesn't actually update. It lacks
BEFORE INSERT ON sfdc FOR EACH ROW so low prospect for success :-).*Attempt 3: *A little more sophisticated executing Function from Trigger
CREATE TRIGGER smart_update_same BEFORE INSERT ON sfdc
FOR EACH ROW EXECUTE FUNCTION update_insert();CREATE OR REPLACE FUNCTION update_insert() RETURNS trigger AS $$
BEGIN
UPDATE sfdc
SET sfdc.chk = 'same'
WHERE NEW.ndealid = OLD.ndealid
AND NEW.revusd = OLD.revusd
AND NEW.stage = OLD.stage;
END;
$$ LANGUAGE plpgsql;These 2 CREATEs return successfully but do not update the chk field on a
successful INSERT:sales=# select ndealid,revusd,stage,chk from sfdc where ndealid = 19743576;
ndealid | revusd | stage | chk
19743576 | 22072.37 | Commit - 90% |
19743576 | 22072.37 | Commit - 90% |
19743576 | 22072.37 | Commit - 90% |These 3 attempts won't give me what I REALLY want but I figure I could
use the chk field to delete the new inserts I didn't need.Am I anywhere close (same county) to the right code?
Don't do the UPDATE. Also I thought when the values matched you did
not want INSERT?
Instead:
IF NEW.ndealid = OLD.ndealid AND NEW.revusd = OLD.revusd
AND NEW.stage = OLD.stage THEN
RETURN NULL; --Will cancel INSERT
ELSE
RETURN NEW;
END IF;
Hagen
Larimer County, CO
--
Adrian Klaver
adrian.klaver@aklaver.com
On 11/21/20 8:20 AM, Adrian Klaver wrote:
On 11/21/20 8:00 AM, Hagen Finley wrote:
Hello,
Instead:
IF NEW.ndealid = OLD.ndealid AND NEW.revusd = OLD.revusd
AND NEW.stage = OLD.stage THEN
RETURN NULL; --Will cancel INSERT
ELSE
RETURN NEW;END IF;
Well this is what happens when I answer BC(before coffee). The above
will not work, if for no other reason then OLD does not exist in an
INSERT. Will try to come up with something that is in the realm of
possibility.
Hagen
Larimer County, CO
--
Adrian Klaver
adrian.klaver@aklaver.com
On 11/21/20 8:47 AM, Adrian Klaver wrote:
On 11/21/20 8:20 AM, Adrian Klaver wrote:
On 11/21/20 8:00 AM, Hagen Finley wrote:
Hello,
Instead:
IF NEW.ndealid = OLD.ndealid AND NEW.revusd = OLD.revusd
AND NEW.stage = OLD.stage THEN
RETURN NULL; --Will cancel INSERT
ELSE
RETURN NEW;END IF;
Well this is what happens when I answer BC(before coffee). The above
will not work, if for no other reason then OLD does not exist in an
INSERT. Will try to come up with something that is in the realm of
possibility.
Alright caffeine in the blood stream, so something that might actually work:
DECLARE
match_ct integer;
BEGIN
SELECT INTO
match_ct count(*)
FROM
sfdc
WHERE
ndealid = NEW.ndealid
AND
revusd = NEW.revusd
AND
stage = NEW.stage;
IF match_ct > 0 THEN
RETURN NULL; --Will cancel INSERT
ELSE
RETURN NEW;
END IF;
END;
Though I would also point you at David's solution. Given that you are
only looking at ~20% of the records being different it would save you a
lot of churning through INSERTs.
Hagen
Larimer County, CO
--
Adrian Klaver
adrian.klaver@aklaver.com
Thanks so much Adrian,
I like this approach but as you indicated it doesn't actually NULL the
INSERT.
Could we UPDATE the existing record (per my fledgling chk UPDATE and
then RETURN NULL? (More proof I don't know what I am talking about ;-).
Hagen
Show quoted text
On 11/21/20 10:11 AM, Adrian Klaver wrote:
On 11/21/20 8:47 AM, Adrian Klaver wrote:
On 11/21/20 8:20 AM, Adrian Klaver wrote:
On 11/21/20 8:00 AM, Hagen Finley wrote:
Hello,
Instead:
IF NEW.ndealid = OLD.ndealid AND NEW.revusd = OLD.revusd
AND NEW.stage = OLD.stage THEN
RETURN NULL; --Will cancel INSERT
ELSE
RETURN NEW;END IF;
Well this is what happens when I answer BC(before coffee). The above
will not work, if for no other reason then OLD does not exist in an
INSERT. Will try to come up with something that is in the realm of
possibility.Alright caffeine in the blood stream, so something that might actually
work:DECLARE
match_ct integer;
BEGINSELECT INTO
match_ct count(*)
FROM
sfdc
WHERE
ndealid = NEW.ndealid
AND
revusd = NEW.revusd
AND
stage = NEW.stage;IF match_ct > 0 THEN
RETURN NULL; --Will cancel INSERT
ELSE
RETURN NEW;
END IF;END;
Though I would also point you at David's solution. Given that you are
only looking at ~20% of the records being different it would save you
a lot of churning through INSERTs.Hagen
Larimer County, CO
David,
That's an interesting idea. I WOULD like to retain the OLD records that
are the same and only INSERT new or changed records. Is there a way to
compare the old and the new records without a trigger?
Hagen
Show quoted text
On 11/21/20 9:15 AM, David G. Johnston wrote:
On Saturday, November 21, 2020, Hagen Finley <hagen@datasundae.com
<mailto:hagen@datasundae.com>> wrote:I pull a new forecast spreadsheet each Monday. 80% of the records
are the same as the existing records from the week before.Here’s what I (REALLY) want:
Trigger looks at three fields prior to new insert: Deal ID
(numeric), revenue (numeric), stage(char) Example: 19743576
22072.37 Commit - 90%1. If the NEW dealid doesn't match any of the OLD dealids, insert
the new row
2. if the NEW dealid, revenue and stage fields ALL match the OLD
dealid, revenue and stage, skip (don't insert the NEW row)
3. If the NEW dealid matches an OLD dealid but either the NEW
revenue OR the stage fields have changed (don't match OLD
record) insert new row (I'll review both rows manually)Am I anywhere close (same county) to the right code?
IMO, don’t use triggers. Load the data into a temporary, or unlogged
table, and then run commands to do what you want against the live
tables. Truncate/drop before doing that again the following week.David J.
On 11/21/20 9:47 AM, Hagen Finley wrote:
Thanks so much Adrian,
I like this approach but as you indicated it doesn't actually NULL the
INSERT.
It should cause the INSERT not to happen if a row exists with the same
values for ndealid, revusd and stage. Are you seeing an INSERT for those
conditions?
Could we UPDATE the existing record (per my fledgling chk UPDATE and
then RETURN NULL? (More proof I don't know what I am talking about ;-).
The INSERT won't happen so I'm not sure what you want to check against?
Hagen
On 11/21/20 10:11 AM, Adrian Klaver wrote:
On 11/21/20 8:47 AM, Adrian Klaver wrote:
On 11/21/20 8:20 AM, Adrian Klaver wrote:
On 11/21/20 8:00 AM, Hagen Finley wrote:
Hello,
Instead:
IF NEW.ndealid = OLD.ndealid AND NEW.revusd = OLD.revusd
AND NEW.stage = OLD.stage THEN
RETURN NULL; --Will cancel INSERT
ELSE
RETURN NEW;END IF;
Well this is what happens when I answer BC(before coffee). The above
will not work, if for no other reason then OLD does not exist in an
INSERT. Will try to come up with something that is in the realm of
possibility.Alright caffeine in the blood stream, so something that might actually
work:DECLARE
match_ct integer;
BEGINSELECT INTO
match_ct count(*)
FROM
sfdc
WHERE
ndealid = NEW.ndealid
AND
revusd = NEW.revusd
AND
stage = NEW.stage;IF match_ct > 0 THEN
RETURN NULL; --Will cancel INSERT
ELSE
RETURN NEW;
END IF;END;
Though I would also point you at David's solution. Given that you are
only looking at ~20% of the records being different it would save you
a lot of churning through INSERTs.Hagen
Larimer County, CO
--
Adrian Klaver
adrian.klaver@aklaver.com
On Saturday, November 21, 2020, Hagen Finley <hagen@datasundae.com> wrote:
David,
That's an interesting idea. I WOULD like to retain the OLD records that
are the same and only INSERT new or changed records. Is there a way to
compare the old and the new records without a trigger?
A where clause?
David J.
On Saturday, November 21, 2020, David G. Johnston <
david.g.johnston@gmail.com> wrote:
On Saturday, November 21, 2020, Hagen Finley <hagen@datasundae.com> wrote:
David,
That's an interesting idea. I WOULD like to retain the OLD records that
are the same and only INSERT new or changed records. Is there a way to
compare the old and the new records without a trigger?A where clause?
Did you get the part in the plan where there are two tables, existing and
new? You write queries that join the two tables together and use the where
clause in those queries to limit records.
David J.
Yes but it didn’t sink in but the two table join idea does make sense – I’ll give that a try. THANK YOU.
From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Saturday, November 21, 2020 11:25 AM
To: Hagen Finley <hagen@datasundae.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: INSERT Trigger to check for existing records
On Saturday, November 21, 2020, David G. Johnston <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com> > wrote:
On Saturday, November 21, 2020, Hagen Finley <hagen@datasundae.com <mailto:hagen@datasundae.com> > wrote:
David,
That's an interesting idea. I WOULD like to retain the OLD records that are the same and only INSERT new or changed records. Is there a way to compare the old and the new records without a trigger?
A where clause?
Did you get the part in the plan where there are two tables, existing and new? You write queries that join the two tables together and use the where clause in those queries to limit records.
David J.
If you can modify your insert statement, and live with an extra column in
the data, no trigger is needed as best I can figure.
Create a unique index over the existing columns, add a "created_on" field
and call insert on conflict (unique index) do nothing.
This should give the behavior you want.
Hello Michael,
Thanks so much for this advice. As I mentioned previously, I'm not very
good at this yet, so forgive me if my response is obtuse. I really love
databases but my sentiments may be unrequited.
In reality my table has lots of columns (~30) including a report date
(repdate) and each week's pull has a new repdate ( in this case
2020-11-02 and 2020-11-09) which could function as a "created on" field.
To clarify, I would create an unique index on all the columns in the old
report records (2020-11-02) or just the three I am comparing
(dealid,stage and revenue)?
In either case, so far in my efforts it looks like the create index
fails because there are lots of rows with the same stage value, and a
few with the same revenue value.
Create UNIQUE INDEX idx_sfdc
ON sfdc(ndealid, stage, revusd);
ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid,
stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL
state: 23505
I probably could create an unique index on the dealid column as that
should be unique. Would that be enough? It seems like that would insert
ONLY the new records with a new (unique) dealid and that would
definitely by an important step forward.
I hesitate to admit I have no idea how I would code the "call insert on
conflict (unique index) do nothing" syntax, but I would be excited to learn.
Best,
Hagen
Show quoted text
On 11/22/20 8:54 AM, Michael Lewis wrote:
If you can modify your insert statement, and live with an extra column
in the data, no trigger is needed as best I can figure.Create a unique index over the existing columns, add a "created_on"
field and call insert on conflict (unique index) do nothing.This should give the behavior you want.
On 11/22/20 9:53 AM, Hagen Finley wrote:
Hello Michael,
Thanks so much for this advice. As I mentioned previously, I'm not very
good at this yet, so forgive me if my response is obtuse. I really love
databases but my sentiments may be unrequited.
The overriding issue is lack of a plan. From your first post:
"Here’s what I (REALLY) want:
Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char) Example: 19743576 22072.37 Commit
- 90%
1. If the NEW dealid doesn't match any of the OLD dealids, insert the
new row
2. if the NEW dealid, revenue and stage fields ALL match the OLD
dealid, revenue and stage, skip (don't insert the NEW row)
3. If the NEW dealid matches an OLD dealid but either the NEW revenue
OR the stage fields have changed (don't match OLD record) insert new
row (I'll review both rows manually)
"
And from later post:
" I figure I could
use the chk field to delete the new inserts I didn't need."
From this I come up with the following:
1) Data rules
a) If dealid in new data does not exist in old data INSERT row.
b) Id dealid is in both new and old data AND revenue OR stage don't
match then INSERT and mark for review.
c) If new dealid, revenue, stage match old dealid, revenue, stage
then do not INSERT.
2) Process the data. Choices
a) Use trigger on table sfdc
b) Use staging table to hold new data and then process into sfdc table
3) Process the data. Same basic principle for both choices in 2) Flowchart
a) In new data search for dealid in table sfdc if it does not exist
add data to sfdc.
b) If new data dealid does exist in sfdc
1) If revenue or stage field differ mark for review
2) If they do match skip further processing
4) Thoughts about above.
a) To me table sfdc should only hold vetted data that is known to be
unique per row.
b) The data for review 1)b) 3)b) should end up in another review
table e.g. sfdc_review.
c) Since from OP ' 80% of the records are
the same as the existing records from the week before.' it makes sense
to use the staging table 2)b) process rather then throwing away a lot of
INSERTs.
If this makes sense then it comes down to decision in which choice in 2)
to use. At that point it is filling in the flowchart with the exact
steps to take.
In reality my table has lots of columns (~30) including a report date
(repdate) and each week's pull has a new repdate ( in this case
2020-11-02 and 2020-11-09) which could function as a "created on" field.To clarify, I would create an unique index on all the columns in the old
report records (2020-11-02) or just the three I am comparing
(dealid,stage and revenue)?In either case, so far in my efforts it looks like the create index
fails because there are lots of rows with the same stage value, and a
few with the same revenue value.Create UNIQUE INDEX idx_sfdc
ON sfdc(ndealid, stage, revusd);ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid,
stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL
state: 23505I probably could create an unique index on the dealid column as that
should be unique. Would that be enough? It seems like that would insert
ONLY the new records with a new (unique) dealid and that would
definitely by an important step forward.I hesitate to admit I have no idea how I would code the "call insert on
conflict (unique index) do nothing" syntax, but I would be excited to learn.Best,
Hagen
On 11/22/20 8:54 AM, Michael Lewis wrote:
If you can modify your insert statement, and live with an extra column
in the data, no trigger is needed as best I can figure.Create a unique index over the existing columns, add a "created_on"
field and call insert on conflict (unique index) do nothing.This should give the behavior you want.
--
Adrian Klaver
adrian.klaver@aklaver.com
Thank you Adrian,
Your summary looks exactly right to me. I think option 2b looks more in reach for my limited skillset.
Let me see if I can make that work (dubious) and report.
Best,
Hagen
-----Original Message-----
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Sunday, November 22, 2020 11:26 AM
To: Hagen Finley <hagen@datasundae.com>; pgsql-general@lists.postgresql.org
Subject: Re: INSERT Trigger to check for existing records
On 11/22/20 9:53 AM, Hagen Finley wrote:
Hello Michael,
Thanks so much for this advice. As I mentioned previously, I'm not
very good at this yet, so forgive me if my response is obtuse. I
really love databases but my sentiments may be unrequited.
The overriding issue is lack of a plan. From your first post:
"Here’s what I (REALLY) want:
Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char) Example: 19743576 22072.37 Commit
- 90%
1. If the NEW dealid doesn't match any of the OLD dealids, insert the
new row
2. if the NEW dealid, revenue and stage fields ALL match the OLD
dealid, revenue and stage, skip (don't insert the NEW row)
3. If the NEW dealid matches an OLD dealid but either the NEW revenue
OR the stage fields have changed (don't match OLD record) insert new
row (I'll review both rows manually) "
And from later post:
" I figure I could
use the chk field to delete the new inserts I didn't need."
From this I come up with the following:
1) Data rules
a) If dealid in new data does not exist in old data INSERT row.
b) Id dealid is in both new and old data AND revenue OR stage don't
match then INSERT and mark for review.
c) If new dealid, revenue, stage match old dealid, revenue, stage then do not INSERT.
2) Process the data. Choices
a) Use trigger on table sfdc
b) Use staging table to hold new data and then process into sfdc table
3) Process the data. Same basic principle for both choices in 2) Flowchart
a) In new data search for dealid in table sfdc if it does not exist add data to sfdc.
b) If new data dealid does exist in sfdc
1) If revenue or stage field differ mark for review
2) If they do match skip further processing
4) Thoughts about above.
a) To me table sfdc should only hold vetted data that is known to be unique per row.
b) The data for review 1)b) 3)b) should end up in another review table e.g. sfdc_review.
c) Since from OP ' 80% of the records are the same as the existing records from the week before.' it makes sense to use the staging table 2)b) process rather then throwing away a lot of INSERTs.
If this makes sense then it comes down to decision in which choice in 2) to use. At that point it is filling in the flowchart with the exact steps to take.
In reality my table has lots of columns (~30) including a report date
(repdate) and each week's pull has a new repdate ( in this case
2020-11-02 and 2020-11-09) which could function as a "created on" field.To clarify, I would create an unique index on all the columns in the
old report records (2020-11-02) or just the three I am comparing
(dealid,stage and revenue)?In either case, so far in my efforts it looks like the create index
fails because there are lots of rows with the same stage value, and a
few with the same revenue value.Create UNIQUE INDEX idx_sfdc
ON sfdc(ndealid, stage, revusd);ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid,
stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL
state: 23505I probably could create an unique index on the dealid column as that
should be unique. Would that be enough? It seems like that would
insert ONLY the new records with a new (unique) dealid and that would
definitely by an important step forward.I hesitate to admit I have no idea how I would code the "call insert
on conflict (unique index) do nothing" syntax, but I would be excited to learn.Best,
Hagen
On 11/22/20 8:54 AM, Michael Lewis wrote:
If you can modify your insert statement, and live with an extra
column in the data, no trigger is needed as best I can figure.Create a unique index over the existing columns, add a "created_on"
field and call insert on conflict (unique index) do nothing.This should give the behavior you want.
--
Adrian Klaver
adrian.klaver@aklaver.com
Adrian and Michael,
My current insert logic (which works) is in a psycopg2 python script
which reads a spreadsheet row into an array, so for the moment I didn't
want to add that integration to my struggle.
cur = conn.cursor()
\
query = "INSERT INTO
sfdc(theater,country,account,smotion,opname,cprod,opid,*ndealid,*qnum,*stage,revusd*,cdate,bdate,age,opown,opnum,sonum,fbdate,region,dqnum,pid,closed,won,onum,repdate)
VALUES
(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s);"
cur.executemany(query, frecords)
conn.commit()
conn.close()
However, the following is something of a stepping stone towards the
destination and, (dare I say it? ;-) it works:
CREATE OR REPLACE FUNCTION same_test(did numeric)
RETURNS numeric AS $$
BEGIN
IF $1 IN
(SELECT ndealid from hygiene_112)
THEN
UPDATE hygiene_119 SET paid = 'SAME';
ELSE
UPDATE hygiene_119 SET paid = 'NEW';
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
I changed the dealid to something that doesn't exist (14593030) in the
11-2 table and the function updates the 11-9 table.status field to "NEW":
sales=# UPDATE hygiene_112 SET ndealid = 14593030 WHERE ndealid = 14593039;
UPDATE 1
SELECT same_test(14593039);
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|repdate |ndealid |revusd |stage |status |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|2020-11-09|14593039|1015624.23|Propose - 60% *|NEW * |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
When I changed it back I get the proper "SAME" update:
sales=# UPDATE hygiene_112 SET ndealid = 14593039 WHERE ndealid = 14593030;
UPDATE 1
SELECT same_test(14593039);
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|repdate |ndealid |revusd |stage |status |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|2020-11-09|14593039|1015624.23|Propose - 60% |*SAME * |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
I'm generally amazed when ANYTHING works so this is good news (to me).
It seems logical I could replace the UPDATE statement with an INSERT
statement at this point.
However, that only addresses one of the /data rules /on my checklist.
I'll keep forging ahead here and see what additional progress I can
attain. Very much appreciate your patient assistance here.
Best,
Hagen
Show quoted text
On 11/22/20 11:26 AM, Adrian Klaver wrote:
On 11/22/20 9:53 AM, Hagen Finley wrote:
Hello Michael,
Thanks so much for this advice. As I mentioned previously, I'm not
very good at this yet, so forgive me if my response is obtuse. I
really love databases but my sentiments may be unrequited.The overriding issue is lack of a plan. From your first post:
"Here’s what I (REALLY) want:
Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char) Example: 19743576 22072.37 Commit
- 90%1. If the NEW dealid doesn't match any of the OLD dealids, insert the
new row
2. if the NEW dealid, revenue and stage fields ALL match the OLD
dealid, revenue and stage, skip (don't insert the NEW row)
3. If the NEW dealid matches an OLD dealid but either the NEW revenue
OR the stage fields have changed (don't match OLD record) insert new
row (I'll review both rows manually)
"And from later post:
" I figure I could
use the chk field to delete the new inserts I didn't need."From this I come up with the following:
1) Data rules
a) If dealid in new data does not exist in old data INSERT row.
b) Id dealid is in both new and old data AND revenue OR stage don't
match then INSERT and mark for review.
c) If new dealid, revenue, stage match old dealid, revenue, stage
then do not INSERT.2) Process the data. Choices
a) Use trigger on table sfdc
b) Use staging table to hold new data and then process into sfdc table3) Process the data. Same basic principle for both choices in 2)
Flowchart
a) In new data search for dealid in table sfdc if it does not exist
add data to sfdc.
b) If new data dealid does exist in sfdc
1) If revenue or stage field differ mark for review
2) If they do match skip further processing
4) Thoughts about above.
a) To me table sfdc should only hold vetted data that is known to
be unique per row.
b) The data for review 1)b) 3)b) should end up in another review
table e.g. sfdc_review.
c) Since from OP ' 80% of the records are
the same as the existing records from the week before.' it makes sense
to use the staging table 2)b) process rather then throwing away a lot
of INSERTs.If this makes sense then it comes down to decision in which choice in
2) to use. At that point it is filling in the flowchart with the exact
steps to take.In reality my table has lots of columns (~30) including a report date
(repdate) and each week's pull has a new repdate ( in this case
2020-11-02 and 2020-11-09) which could function as a "created on" field.To clarify, I would create an unique index on all the columns in the
old report records (2020-11-02) or just the three I am comparing
(dealid,stage and revenue)?In either case, so far in my efforts it looks like the create index
fails because there are lots of rows with the same stage value, and a
few with the same revenue value.Create UNIQUE INDEX idx_sfdc
ON sfdc(ndealid, stage, revusd);ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid,
stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL
state: 23505I probably could create an unique index on the dealid column as that
should be unique. Would that be enough? It seems like that would
insert ONLY the new records with a new (unique) dealid and that would
definitely by an important step forward.I hesitate to admit I have no idea how I would code the "call insert
on conflict (unique index) do nothing" syntax, but I would be excited
to learn.Best,
Hagen
On 11/22/20 8:54 AM, Michael Lewis wrote:
If you can modify your insert statement, and live with an extra
column in the data, no trigger is needed as best I can figure.Create a unique index over the existing columns, add a "created_on"
field and call insert on conflict (unique index) do nothing.This should give the behavior you want.
Folks,
Just a quick question. *Using this FUNCTION:*
CREATE OR REPLACE FUNCTION same_test(did numeric)
RETURNS numeric AS $$
BEGIN
IF $1 IN
(SELECT dealid from hygiene_112)
THEN
UPDATE hygiene_119 SET paid = 'SAME';
ELSE
UPDATE hygiene_119 SET paid = 'NEW';
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
*Does the following query input the the dealids that result from the
SELECT statement into the parameter of the sames_test() FUNCTION?*
Select dealid sametest(dealid) FROM hygiene_123;
I doubt it does (my query runs a /long time)/ :-). I know I can utilize
python to push SELECT results into a array and then run a 'FOR d in
dealids' LOOP to feed the FUNCTION parameter but I'd like to learn how
to do that with nested SQL statements or FUNCTIONS.
Thanks!
Hagen
Show quoted text
On 11/22/20 4:28 PM, Hagen Finley wrote:
Adrian and Michael,
My current insert logic (which works) is in a psycopg2 python script
which reads a spreadsheet row into an array, so for the moment I
didn't want to add that integration to my struggle.cur = conn.cursor()
\
query = "INSERT INTO
sfdc(theater,country,account,smotion,opname,cprod,opid,*ndealid,*qnum,*stage,revusd*,cdate,bdate,age,opown,opnum,sonum,fbdate,region,dqnum,pid,closed,won,onum,repdate)VALUES
(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s);"cur.executemany(query, frecords)
conn.commit()
conn.close()However, the following is something of a stepping stone towards the
destination and, (dare I say it? ;-) it works:CREATE OR REPLACE FUNCTION same_test(did numeric)
RETURNS numeric AS $$
BEGIN
IF $1 IN
(SELECT ndealid from hygiene_112)
THEN
UPDATE hygiene_119 SET paid = 'SAME';
ELSE
UPDATE hygiene_119 SET paid = 'NEW';
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;I changed the dealid to something that doesn't exist (14593030) in the
11-2 table and the function updates the 11-9 table.status field to "NEW":sales=# UPDATE hygiene_112 SET ndealid = 14593030 WHERE ndealid =
14593039;
UPDATE 1SELECT same_test(14593039);
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+ |repdate |ndealid |revusd |stage |status | +----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+ |2020-11-09|14593039|1015624.23|Propose - 60% *|NEW * | +----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+When I changed it back I get the proper "SAME" update:
sales=# UPDATE hygiene_112 SET ndealid = 14593039 WHERE ndealid =
14593030;
UPDATE 1SELECT same_test(14593039);
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+ |repdate |ndealid |revusd |stage |status | +----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+ |2020-11-09|14593039|1015624.23|Propose - 60% |*SAME * | +----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+I'm generally amazed when ANYTHING works so this is good news (to me).
It seems logical I could replace the UPDATE statement with an INSERT
statement at this point.However, that only addresses one of the /data rules /on my checklist.
I'll keep forging ahead here and see what additional progress I can
attain. Very much appreciate your patient assistance here.Best,
Hagen
On 11/22/20 11:26 AM, Adrian Klaver wrote:
On 11/22/20 9:53 AM, Hagen Finley wrote:
Hello Michael,
Thanks so much for this advice. As I mentioned previously, I'm not
very good at this yet, so forgive me if my response is obtuse. I
really love databases but my sentiments may be unrequited.The overriding issue is lack of a plan. From your first post:
"Here’s what I (REALLY) want:
Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char) Example: 19743576 22072.37 Commit
- 90%1. If the NEW dealid doesn't match any of the OLD dealids, insert the
new row
2. if the NEW dealid, revenue and stage fields ALL match the OLD
dealid, revenue and stage, skip (don't insert the NEW row)
3. If the NEW dealid matches an OLD dealid but either the NEW revenue
OR the stage fields have changed (don't match OLD record) insert new
row (I'll review both rows manually)
"And from later post:
" I figure I could
use the chk field to delete the new inserts I didn't need."From this I come up with the following:
1) Data rules
a) If dealid in new data does not exist in old data INSERT row.
b) Id dealid is in both new and old data AND revenue OR stage don't
match then INSERT and mark for review.
c) If new dealid, revenue, stage match old dealid, revenue, stage
then do not INSERT.2) Process the data. Choices
a) Use trigger on table sfdc
b) Use staging table to hold new data and then process into sfdc
table3) Process the data. Same basic principle for both choices in 2)
Flowchart
a) In new data search for dealid in table sfdc if it does not
exist add data to sfdc.
b) If new data dealid does exist in sfdc
1) If revenue or stage field differ mark for review
2) If they do match skip further processing
4) Thoughts about above.
a) To me table sfdc should only hold vetted data that is known to
be unique per row.
b) The data for review 1)b) 3)b) should end up in another review
table e.g. sfdc_review.
c) Since from OP ' 80% of the records are
the same as the existing records from the week before.' it makes
sense to use the staging table 2)b) process rather then throwing away
a lot of INSERTs.If this makes sense then it comes down to decision in which choice in
2) to use. At that point it is filling in the flowchart with the
exact steps to take.In reality my table has lots of columns (~30) including a report
date (repdate) and each week's pull has a new repdate ( in this case
2020-11-02 and 2020-11-09) which could function as a "created on"
field.To clarify, I would create an unique index on all the columns in the
old report records (2020-11-02) or just the three I am comparing
(dealid,stage and revenue)?In either case, so far in my efforts it looks like the create index
fails because there are lots of rows with the same stage value, and
a few with the same revenue value.Create UNIQUE INDEX idx_sfdc
ON sfdc(ndealid, stage, revusd);ERROR: could not create unique index "idx_sfdc" DETAIL: Key
(ndealid, stage, revusd)=(19743576, Commit - 90% , 22072.37) is
duplicated. SQL state: 23505I probably could create an unique index on the dealid column as that
should be unique. Would that be enough? It seems like that would
insert ONLY the new records with a new (unique) dealid and that
would definitely by an important step forward.I hesitate to admit I have no idea how I would code the "call insert
on conflict (unique index) do nothing" syntax, but I would be
excited to learn.Best,
Hagen
On 11/22/20 8:54 AM, Michael Lewis wrote:
If you can modify your insert statement, and live with an extra
column in the data, no trigger is needed as best I can figure.Create a unique index over the existing columns, add a "created_on"
field and call insert on conflict (unique index) do nothing.This should give the behavior you want.