[CLOSED] possible for knowing or tacking the result of each data flow?

Is there any possible way to track via log or monitoring the result of each data flow that is passing to next flow? even the result is legit for computer but may be abnormal to human reading. Assume there might be a miscalculation of one data flow within a process(certain 1 flow is giving wrong result) and cause the final result of ETL is not expected.

How do we know the result is not expected?
Ans: By human's experience, logically, or this result is way off number than previous result.

scenario: Have a process with total of 5 data flows within this process. planing to run this process 4 times and each time per week. Entire process is completed successfully but those result of records for the 4 weeks as follow:

1st: 1000 (human: expected)
2nd: 1100 (human: expected)
3rd: 40 (human: abnormal, due to 2nd data flow is generating abnormal result so cause rest of flows are referenced wrong result)
4th: 50 (human: abnormal, same as 3rd week)

As above scenario we know from 3rd week's result is abnormal for us, but all processes are complete successfully. In this scenario, that we may know the issue is cause by Source data are updated or replaced. Therefore, any record or log to track which table(s) is/are updated?

I'm thinking about CDC(Capture Data Change) to track log within source database. but not sure it is only way or there is another way to backtrack flow? (haven't test CDC yet)

my objective is able to backtrack process and flows to know which result in given abnormal result. any suggestion?

This message has been edited. Last edited by: nox,

WebFOCUS v8.2.06 , Windows

Posts: 137 | Registered: August 29, 2019

dhagen

Virtuoso

posted

Hide Post

Have you thought about using the statistics table to compare the result to the previous X runs? If the difference is >< a determined percentage, then send a notification of a potential problem.

"There is no limit to what you can achieve ... if you don’t care who gets the credit." Roger Abbott

Posts: 1102 | Location: Toronto, Ontario | Registered: May 26, 2004

Ignored post by dhagen posted

Show Post

nox

Platinum Member

posted

Hide Post

creating statistics table to compare would be an idea, but does it cause performance issue once there are many flows need to run? In order to keep track each table within a flow, may need to setup a statistic for each table?

WebFOCUS v8.2.06 , Windows

Posts: 137 | Registered: August 29, 2019

Ignored post by nox posted

Show Post

Marina

Silver Member

posted

Hide Post

I would recommend to open a Hottrack caase for addressing this type of question.

Thanks.

Posts: 37 | Registered: February 13, 2007

Ignored post by Marina posted

Show Post