Focal Point Banner
Community Center Education Summit Technical Support User Groups
Let's Get Social!

Facebook Twitter LinkedIn YouTube
Go
New
Search
Notify
Tools
Reply
  
Using Aggregate in ETL
 Login/Join
 
Silver Member
posted
Have a source file of transactions. The two key fields are “job nbr” and “activity date”. Need to grab only the most current record. The source file is in no order. Example:

Source file:

JOB_NBR ACTIVITY_DATE FIELDA FIELDB FIELDC
12345 20060128 RTM GATE ONLINE
12345 20060124 MTO CHDR ONLINE
12345 20060201 RTM GATE ONLINE

I only want the 3rd record with activity date of 20060201 because it has the most current information. This record will be passed onto the destination file to update any existing record or be inserted if there is no match.

I’ve been trying various methods using Aggregate in ETL but cannot get the results I need.

I can write an RPC to write what I need to a Focus file and use that as my source but wanted to know if it can be done in ETL.

Thanks,
Jim
 
Posts: 43 | Location: San Jose, California. | Registered: July 18, 2005Reply With QuoteReport This Post
Platinum Member
posted Hide Post
What version are you using?.

There is no straight forward way to do it in the columns options. You would have to override the SQL using a correlated sub query.

SELECT JOB_NBR,ACTIVITY_DATE..........
FROM TABLEA
WHERE ACTIVITY_DATE = (SELECT MAX(ACTIVITY_DATE)
FROM TABLEB
WHERE TABLEB.JOB_NBR = TABLEA.JOB_NBR)

It will work for non relational sources too but you pay the price in terms of performance.


WF7.1.4 Prod/Test, MRE, self serve, DM
 
Posts: 176 | Location: Desplaines | Registered: August 05, 2004Reply With QuoteReport This Post
Silver Member
posted Hide Post
I'm using ETL 5.33.
 
Posts: 43 | Location: San Jose, California. | Registered: July 18, 2005Reply With QuoteReport This Post
Silver Member
posted Hide Post
I think I may have found a method in ETL.

I'm using the SORT option from SELECT COLUMNS of the Data Flow. I'm sorting by JOB NUMBER(ASC) and ACTIVITY DATE (ASC).

Take the 3 records in my example and sort on job number and activity date. Then the 3 records in the source file go to the destination file sorted. If the job number doesn't exist, the 1st sample record will get inserted, then the 2nd sample record will update, then the 3rd sample record will update resulting in the most current record for that job number in the destination file.

I tested it and got it to work. I spot checked several records in the destination file and only the most current (latest activity date) was there.

Thanks much,
Jim
 
Posts: 43 | Location: San Jose, California. | Registered: July 18, 2005Reply With QuoteReport This Post
Platinum Member
posted Hide Post
There is a performance price you are paying. You most probably are using key matching logic on the
target. A key matching logic issues a read for every record going to target and this slows down the load.

However if performance is not of importance than what you are doing is fine.


WF7.1.4 Prod/Test, MRE, self serve, DM
 
Posts: 176 | Location: Desplaines | Registered: August 05, 2004Reply With QuoteReport This Post
Silver Member
posted Hide Post
I am using key matching logic and yes, I agree wth the performance issue. However, the transaction file is only about 20,000 records and only takes about 5 minutes or less to load. I will have to keep the performance issue in mind when we start doing this with much larger transaction files.

Thank you for your advice.
Jim
 
Posts: 43 | Location: San Jose, California. | Registered: July 18, 2005Reply With QuoteReport This Post
  Powered by Social Strata  
 


Copyright © 1996-2018 Information Builders, leaders in enterprise business intelligence.