Focal Point
Using Aggregate in ETL

This topic can be found at:
http://forums.informationbuilders.com/eve/forums/a/tpc/f/1381057331/m/9081093551

February 21, 2006, 02:49 PM
Jim Dale
Using Aggregate in ETL
Have a source file of transactions. The two key fields are “job nbr” and “activity date”. Need to grab only the most current record. The source file is in no order. Example:

Source file:

JOB_NBR ACTIVITY_DATE FIELDA FIELDB FIELDC
12345 20060128 RTM GATE ONLINE
12345 20060124 MTO CHDR ONLINE
12345 20060201 RTM GATE ONLINE

I only want the 3rd record with activity date of 20060201 because it has the most current information. This record will be passed onto the destination file to update any existing record or be inserted if there is no match.

I’ve been trying various methods using Aggregate in ETL but cannot get the results I need.

I can write an RPC to write what I need to a Focus file and use that as my source but wanted to know if it can be done in ETL.

Thanks,
Jim
February 21, 2006, 06:52 PM
newtofocus
What version are you using?.

There is no straight forward way to do it in the columns options. You would have to override the SQL using a correlated sub query.

SELECT JOB_NBR,ACTIVITY_DATE..........
FROM TABLEA
WHERE ACTIVITY_DATE = (SELECT MAX(ACTIVITY_DATE)
FROM TABLEB
WHERE TABLEB.JOB_NBR = TABLEA.JOB_NBR)

It will work for non relational sources too but you pay the price in terms of performance.


WF7.1.4 Prod/Test, MRE, self serve, DM
February 21, 2006, 08:03 PM
Jim Dale
I'm using ETL 5.33.
February 22, 2006, 04:30 PM
Jim Dale
I think I may have found a method in ETL.

I'm using the SORT option from SELECT COLUMNS of the Data Flow. I'm sorting by JOB NUMBER(ASC) and ACTIVITY DATE (ASC).

Take the 3 records in my example and sort on job number and activity date. Then the 3 records in the source file go to the destination file sorted. If the job number doesn't exist, the 1st sample record will get inserted, then the 2nd sample record will update, then the 3rd sample record will update resulting in the most current record for that job number in the destination file.

I tested it and got it to work. I spot checked several records in the destination file and only the most current (latest activity date) was there.

Thanks much,
Jim
February 23, 2006, 11:20 AM
newtofocus
There is a performance price you are paying. You most probably are using key matching logic on the
target. A key matching logic issues a read for every record going to target and this slows down the load.

However if performance is not of importance than what you are doing is fine.


WF7.1.4 Prod/Test, MRE, self serve, DM
February 24, 2006, 11:55 PM
Jim Dale
I am using key matching logic and yes, I agree wth the performance issue. However, the transaction file is only about 20,000 records and only takes about 5 minutes or less to load. I will have to keep the performance issue in mind when we start doing this with much larger transaction files.

Thank you for your advice.
Jim