postgresql - Updating duplicates from one-to-many relationships. -
this isn't standard "how find duplicates" question, know how find duplicates, see below. question how update said records have child items matching records?
alright, i'm going give whole scenario can work problem.
duplicate records inserted result of critical system failure.
finding later duplicates , marking parent commission_import_commission_junction "is_processed = true" solves problem.
the complication commission_import_commission_junction , children commission_import_commission_junction_line_items must identical on columns compare.
the tables are:
commission_import_commission_junction - id - created_date - columns checked duplication - columns not checked duplication commission_import_commission_junction_line_items - id - columns checked duplication - columns not checked duplication
(for full table spec, check out create table statements in bottom-most block of code.)
the query mark duplicates on parent table commission_import_commission_junction:
update commission_import_commission_junction cicj set is_processed = true ( select min(created_date) first_date, member_id, site_id, action_status, action_type, ad_id, commission_id, country, event_date, locking_date, order_id, original, original_action_id, posting_date, website_id, advertiser_name, commission_amount, sale_amount, aggregator_affiliate_id commission_import_commission_junction inner_imports join commission_import_commission_junction_line_items inner_items on inner_items.commission_import_commission_junction_id = inner_imports.commission_import_commission_junction_id group member_id, site_id, action_status, action_type, ad_id, commission_id, country, event_date, locking_date, order_id, original, original_action_id, posting_date, website_id, advertiser_name, commission_amount, sale_amount, aggregator_affiliate_id having (count(*) > 1) ) dups -- main table columnn list (cicj.member_id, cicj.site_id, cicj.action_status, cicj.action_type, cicj.ad_id, cicj.commission_id, cicj.country, cicj.event_date, cicj.locking_date, cicj.order_id, cicj.original, cicj.original_action_id, cicj.posting_date, cicj.website_id, cicj.advertiser_name, cicj.commission_amount, cicj.sale_amount, cicj.aggregator_affiliate_id) not distinct -- other table column list (dups.member_id, dups.site_id, dups.action_status, dups.action_type, dups.ad_id, dups.commission_id, dups.country, dups.event_date, dups.locking_date, dups.order_id, dups.original, dups.original_action_id, dups.posting_date, dups.website_id, dups.advertiser_name, dups.commission_amount, dups.sale_amount, dups.aggregator_affiliate_id) , cicj.created_date <> dups.first_date , cicj.is_processed = false;
somewhere , somehow need check line_items duplicates.
the code below setup database, remember postgres specific.
-- "commission_import_build" record keeps information process of collecting commission information. duplicate commission_import_commission_junction records not exist same commission_import_build_id -- "commission_import_commission_junction" record description commission information customers purchase. -- "commission_import_commission_junction_line_items" records describing items in purchase. drop table if exists commission_import_commission_junction_line_items; drop table if exists commission_import_commission_junction; drop table if exists commission_import_builds; create table commission_import_builds ( commission_import_build_id serial not null, build_date timestamp time zone not null, constraint pkey_commission_import_build_id primary key (commission_import_build_id), constraint commission_import_builds_build_date_key unique (build_date) ); insert commission_import_builds (commission_import_build_id, build_date) values (1, '2011-01-01'); insert commission_import_builds (commission_import_build_id, build_date) values (2, '2011-01-02'); insert commission_import_builds (commission_import_build_id, build_date) values (3, '2011-01-03'); create table commission_import_commission_junction ( commission_import_commission_junction_id serial not null, member_id integer, site_id integer, action_status character varying not null, action_type character varying not null, ad_id bigint, commission_id bigint not null, country character varying, event_date timestamp time zone not null, locking_date timestamp time zone, order_id character varying not null, original boolean, original_action_id bigint not null, posting_date timestamp time zone not null, website_id bigint not null, advertiser_name character varying, commission_amount numeric(19,2) not null, sale_amount numeric(19,2) not null, aggregator_affiliate_id integer not null, is_processed boolean not null default false, created_date timestamp time zone not null default now(), member_transaction_id integer, commission_import_build_id integer not null, constraint pkey_commission_import_commission_junction_commission_import_co primary key (commission_import_commission_junction_id), constraint fk_commission_import_commission_junction_commission_import_buil foreign key (commission_import_build_id) references commission_import_builds (commission_import_build_id) match simple on update no action on delete no action ); create index idx_commission_import_commission_junction_is_processed on commission_import_commission_junction using btree (is_processed); insert commission_import_commission_junction (commission_import_commission_junction_id, action_status, action_type, commission_id, event_date, order_id, original_action_id, posting_date, website_id, commission_amount, sale_amount, aggregator_affiliate_id, commission_import_build_id, created_date) values (1, 'new', 'sale', 1234, '2011-02-04 14:39:52.989499-07', 'test-order', 1234567, '2011-02-04 14:39:52.989499-07', 123, 12.35, 123.45, 9876, 1, '2011-02-05'); insert commission_import_commission_junction (commission_import_commission_junction_id, action_status, action_type, commission_id, event_date, order_id, original_action_id, posting_date, website_id, commission_amount, sale_amount, aggregator_affiliate_id, commission_import_build_id, created_date) values (2, 'new', 'sale', 1234, '2011-02-04 14:39:52.989499-07', 'test-order', 1234567, '2011-02-04 14:39:52.989499-07', 123, 12.35, 123.45, 9876, 2, '2011-02-06'); insert commission_import_commission_junction (commission_import_commission_junction_id, action_status, action_type, commission_id, event_date, order_id, original_action_id, posting_date, website_id, commission_amount, sale_amount, aggregator_affiliate_id, commission_import_build_id, created_date) values (3, 'new', 'sale', 1234, '2011-02-04 14:39:52.989499-07', 'test-order', 1234567, '2011-02-04 14:39:52.989499-07', 123, 12.35, 123.45, 9876, 3, '2011-02-07'); select * commission_import_commission_junction; create table commission_import_commission_junction_line_items ( commission_import_commission_junction_line_item_id serial not null, commission_import_commission_junction_id integer not null, sku character varying, quantity integer, posting_date timestamp time zone, sale_amount numeric(19,2), discount numeric(19,2), constraint pkey_commission_import_commission_junction_link_items_commissio primary key (commission_import_commission_junction_line_item_id), constraint fkey_commission_import_commission_junction_line_items_commissio foreign key (commission_import_commission_junction_id) references commission_import_commission_junction (commission_import_commission_junction_id) match simple on update no action on delete no action ); insert commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) values (1, 'test1', 3, 23.45); insert commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) values (1, 'test2', 3, 67.50); insert commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) values (1, 'test3', 3, 32.50); insert commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) values (2, 'test1', 3, 23.45); insert commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) values (2, 'test2', 3, 67.50); insert commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) values (2, 'test3', 3, 32.50); insert commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) values (3, 'test1', 3, 23.45); insert commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) values (3, 'test2', 3, 67.50); insert commission_import_commission_junction_line_items (commission_import_commission_junction_id, sku, quantity, sale_amount) values (3, 'test3', 3, 32.50);
reminds me of duplicate elimination in direct marketing mailing lists
regardless of details of tables, parent-child dupe elimination algorithm follows these steps:
1) duplicates list matches old key new key (temp table)
2) update foreign key in child table
3) delete dupes parent
i admire detail in post, i'm going keep simple , easier read example table/column names:
-- step 1, list -- warning: t-sql syntax, adjust postgres -- if doesn't placement of "into..." clause select keep.primarykey keepkey , dupe.primarykey dupekey #dupelist ( select min(primarykey) primarykey , dupecriteria1 , dupecriteria2 thetable group dupecriteria1,dupecritera2 having count(*) > 1 ) keep join thetable dupe on keep.dupecriteria1 = dupe.dupecriteria1 , keep.dupecriteria2 = dupe.dupecriteria2 , keep.primarykey <> dupe.primarykey
once have that, update foreign key in child table:
update childtable set foreignkey = #temp1.keepkey #temp1 foreignkey = #temp1.dupekey
then delete out of parent table:
delete parenttable primarykey in (select dupekey #temp1)
Comments
Post a Comment