As of December 1, 2020, Focal Point is retired and repurposed as a reference repository. We value the wealth of knowledge that's been shared here over the years. You'll continue to have access to this treasure trove of knowledge, for search purposes only.
Join the TIBCO Community TIBCO Community is a collaborative space for users to share knowledge and support one another in making the best use of TIBCO products and services. There are several TIBCO WebFOCUS resources in the community.
From the Home page, select Predict: WebFOCUS to view articles, questions, and trending articles.
Select Products from the top navigation bar, scroll, and then select the TIBCO WebFOCUS product page to view product overview, articles, and discussions.
Request access to the private WebFOCUS User Group (login required) to network with fellow members.
Former myibi community members should have received an email on 8/3/22 to activate their user accounts to join the community. Check your Spam folder for the email. Please get in touch with us at community@tibco.com for further assistance. Reference the community FAQ to learn more about the community.
I am creating a report which lists the number of accounts in a city. The data comes from a mainframe database with no spellcheck or error-correction parameters in effect.
Here are w examples of what I mean:
quote:
BROOKVILLE BROOKEVILLE The correct spelling is BROOKEVILLE
BURNTONSVILLE BURTONSIVILLE BURTONSVILE BURTONSVILLE The correct spelling is BURTONSVILLE
To clean up this data, I have created a define to evaluate all possible entries for a given city and provide the corrected spelling.
quote:
DEFINE FILE MMDB2T ZONE/A3 = DECODE MAP01(02 '755' 06 '755' 04 '756' 19 '756' 09 '757' 11 '757' 12 '758' 14 '758' ELSE '999'); LOCALB/A20 = IF LOCALITY EQ 'BURNTONSVILLE' OR 'BURNTONSVILE' OR 'BURTONSIVILLE' OR 'BURTONSVILE' THEN 'BURTONSVILLE' ELSE LOCALITY;
LOCALBR/A20 = IF LOCALITY EQ 'BROOKVILLE' OR 'BROOKVILLE ' THEN 'BROOKEVILLE' ELSE LOCALITY ;
BLOCAL/A20 = IF LOCALITY EQ LOCALB THEN LOCALB ELSE IF LOCALITY EQ LOCALBR THEN LOCALBR ELSE LOCALITY;
END -*
TABLE FILE MMDB2T COUNT ACCT# BY ZONE BY BLOCAL -* LOCALITY WHERE ZONE EQ '755'; -*IF RECORDLIMIT EQ 100 END -RUN
I am not successful at consolidating the LOCALB and LOCALBR fields into one field so I can show their correct spelling with regard to the other cities I will be correcting and displaying on the report.
Does anyone have a suggestion?
Thanks!This message has been edited. Last edited by: Kerry,
I would have one IF clause for every locality that needs cleaning up (though there may be a limit on the number of ELSE IF statements...):
DEFINE FILE MMDB2T
ZONE/A3 =
DECODE MAP01(02 '755' 06 '755' 04 '756' 19 '756'
09 '757' 11 '757' 12 '758' 14 '758' ELSE '999');
BLOCAL/A20 =
IF LOCALITY EQ 'BURNTONSVILLE' OR 'BURNTONSVILE' OR 'BURTONSIVILLE' OR 'BURTONSVILE'
THEN 'BURTONSVILLE'
ELSE
IF LOCALITY EQ 'BROOKVILLE' OR 'BROOKVILLE '
THEN 'BROOKEVILLE'
ELSE
LOCALITY ;
END
TABLE FILE MMDB2T
COUNT ACCT#
-* LOCALITY
BY ZONE
BY BLOCAL
WHERE ZONE EQ '755';
END
Francis
Give me code, or give me retirement. In FOCUS since 1991
Production: WF 7.7.05M, Dev Studio, BID, MRE, WebSphere, DB2 / Test: WF 8.1.05M, App Studio, BI Portal, Report Caster, jQuery, HighCharts, Apache Tomcat, MS SQL Server
Whilst I agree wholeheartedly with Waz, it's often something that will never happen unfortunately
Another thought that I had for this situation is that you could create a FOCUS or other file with two fields, the first being indexed and containing the SOUNDEX value and the second having the correct spelling. That way you could do a define based join to this table from your original source and get the true spelling for data correction in passage. Throw Waz's suggestion of an exception process into the mix to monitor the mismatches and the method becomes recursive. The advantage would be that you would always have this file (?) and it could be added to when you found new and inventive ways of spelling places people lovingly call home!!
T
In FOCUS since 1986
WebFOCUS Server 8.2.01M, thru 8.2.07 on Windows Svr 2008 R2
WebFOCUS App Studio 8.2.06 standalone on Windows 10
Posts: 5694 | Location: United Kingdom | Registered: April 08, 2004
DECODE is definitely the way to go. I do this all the time because in our industry suppliers and customers are constantly changing their names, getting amalgamated, splitting up, and so on.
The easiest way is to create a separate text file LOCALITYLIST with pair-wise comparisons. The first column contains the incorrect version and the second contains the correct version, so:
With this sort of situation, knowing the number of exceptions can be of great advantage.
If say you only had 10 exceptions, does that justify the code change ?
If you had 1000, then a programatical solution may be the way to go.
But I would have thought that there would be some sort of due diligence when customer/client information is entered into an application. But I guess that may be due to trained monkeys.
I would definitely send some feedback about the extent of the data issues.
FYI - There is a limit on the number of "if-then-else" clauses although I don't know that number. I just know that I do encounter it on some city cleanup code that I use. I also use LIKE in the evaluation as well to deal with flaky users entering bad data.
In my mind, I would expect the software to provide lookups to validate user input. Unfortunately, on our older POS system there is nothing that prevents a user from entering the most bogus city, state, and zip code information known to mankind. On the other hand, I would also expect people to at least review their input prior to committing the records but this doesn't happen either. Unless we cracked down hard on this practice - nothing will change and without software to hard stop on bad data it wouldn't matter anyway as people do what they do. So I spend a short time cleaning data at the beginning of each month.
The real bummer about this stuff is that I cannot completely automate the process of data extraction and placement, due to the need to look at the data and try to interpret what the user meant with bogus names, abbreviations etc.
NorbThis message has been edited. Last edited by: Norb Eckert,
In our situation it's not entirely a case of people entering bad data. As I said, our suppliers and customers are continually amalgamating, changing their name, etc. We need to keep the original data and name, but for reporting purposes we need the historical data to be listed under the new entity. So the DECODE list works very well in most situations. This certainly applies in the one-shot data cleanup cae.
In other cases I have separate FOCUS databases containing the pairs of values. These can be JOINed to the faulty database (or one with missing info) to produce the desired result.
Besides using IB's "Data Clensing Tool" (which name I forget at the moment), the use of DECODE seems to be the best alternative in these "after the fact" (of dat entry) situations. Using something like George has (separate FOCUS databases containing the pairs of values) for the source of that DECODE.
Posts: 3132 | Location: Tennessee, Nashville area | Registered: February 23, 2005