Focal Point Banner


As of December 1, 2020, Focal Point is retired and repurposed as a reference repository. We value the wealth of knowledge that's been shared here over the years. You'll continue to have access to this treasure trove of knowledge, for search purposes only.

Join the TIBCO Community
TIBCO Community is a collaborative space for users to share knowledge and support one another in making the best use of TIBCO products and services. There are several TIBCO WebFOCUS resources in the community.

  • From the Home page, select Predict: WebFOCUS to view articles, questions, and trending articles.
  • Select Products from the top navigation bar, scroll, and then select the TIBCO WebFOCUS product page to view product overview, articles, and discussions.
  • Request access to the private WebFOCUS User Group (login required) to network with fellow members.

Former myibi community members should have received an email on 8/3/22 to activate their user accounts to join the community. Check your Spam folder for the email. Please get in touch with us at community@tibco.com for further assistance. Reference the community FAQ to learn more about the community.


Focal Point    Focal Point Forums  Hop To Forum Categories  WebFOCUS/FOCUS Forum on Focal Point     [SOLVED] cleaning dirty data in a define

Read-Only Read-Only Topic
Go
Search
Notify
Tools
[SOLVED] cleaning dirty data in a define
 Login/Join
 
Master
posted
I am creating a report which lists the number of accounts in a city. The data comes from a mainframe database with no spellcheck or
error-correction parameters in effect.

Here are w examples of what I mean:

quote:

BROOKVILLE
BROOKEVILLE
The correct spelling is BROOKEVILLE

BURNTONSVILLE
BURTONSIVILLE
BURTONSVILE
BURTONSVILLE
The correct spelling is BURTONSVILLE


To clean up this data, I have created a define to evaluate all possible entries for a given city and provide the corrected spelling.

quote:

DEFINE FILE MMDB2T
ZONE/A3 =
DECODE MAP01(02 '755' 06 '755' 04 '756' 19 '756'
09 '757' 11 '757' 12 '758' 14 '758' ELSE '999');
LOCALB/A20 =
IF LOCALITY EQ 'BURNTONSVILLE' OR 'BURNTONSVILE' OR 'BURTONSIVILLE'
OR 'BURTONSVILE'
THEN 'BURTONSVILLE' ELSE
LOCALITY;

LOCALBR/A20 = IF LOCALITY EQ 'BROOKVILLE' OR 'BROOKVILLE '
THEN 'BROOKEVILLE' ELSE LOCALITY ;

BLOCAL/A20 = IF LOCALITY EQ LOCALB THEN LOCALB
ELSE IF LOCALITY EQ LOCALBR THEN LOCALBR ELSE LOCALITY;

END
-*

TABLE FILE MMDB2T
COUNT ACCT#
BY ZONE
BY BLOCAL
-* LOCALITY
WHERE ZONE EQ '755';
-*IF RECORDLIMIT EQ 100
END
-RUN


I am not successful at consolidating the LOCALB and LOCALBR fields into one field so I can show their correct spelling with regard to the other cities I will be correcting and displaying on the report.

Does anyone have a suggestion?

Thanks!

This message has been edited. Last edited by: Kerry,


Tomsweb
WebFOCUS 8.1.05M, 8.2.x
APP Studio, Developer Studio, InfoAssist, Dashboards, charts & reports
Apache Tomcat/8.0.36
 
Posts: 573 | Location: Baltimore, MD | Registered: July 06, 2006Report This Post
Expert
posted Hide Post
I would have one IF clause for every locality that needs cleaning up (though there may be a limit on the number of ELSE IF statements...):

DEFINE FILE MMDB2T
ZONE/A3 =
DECODE MAP01(02 '755' 06 '755' 04 '756' 19 '756'
09 '757' 11 '757' 12 '758' 14 '758' ELSE '999');

BLOCAL/A20 =

IF LOCALITY EQ 'BURNTONSVILLE' OR 'BURNTONSVILE' OR 'BURTONSIVILLE' OR 'BURTONSVILE'
THEN 'BURTONSVILLE' 

ELSE

IF LOCALITY EQ 'BROOKVILLE' OR 'BROOKVILLE '
THEN 'BROOKEVILLE' 

ELSE 

LOCALITY ;

END

TABLE FILE MMDB2T
COUNT ACCT#
-* LOCALITY
BY ZONE
BY BLOCAL
WHERE ZONE EQ '755';
END


Francis


Give me code, or give me retirement. In FOCUS since 1991

Production: WF 7.7.05M, Dev Studio, BID, MRE, WebSphere, DB2 / Test: WF 8.1.05M, App Studio, BI Portal, Report Caster, jQuery, HighCharts, Apache Tomcat, MS SQL Server
 
Posts: 10577 | Location: Toronto, Ontario, Canada | Registered: April 27, 2005Report This Post
Expert
posted Hide Post
You may be able to use a DECODE file where the file consists of pairs ('Incorrect spelling' 'Correct Spelling')
 
Posts: 3132 | Location: Tennessee, Nashville area | Registered: February 23, 2005Report This Post
Expert
posted Hide Post
I thought that SOUNDEX might help you but it would only give you 3 matches, although you could try other spellings that are close -
FILEDEF TOMMAS DISK TOMDATA.MAS
-RUN
-WRITE TOMMAS FILE=TOMDATA,SUFFIX=XFOC
-WRITE TOMMAS SEGNAME=SEG1
-WRITE TOMMAS FIELD=LOCALITY, ,A50,A50,$
-RUN
CREATE FILE TOMDATA
MODIFY FILE TOMDATA
FREEFORM LOCALITY
DATA
BURNTONSVILLE,$
BURNTONSVILE,$
BURTONSIVILLE,$
BURTONSVILE,$
BURTONSVILLE,$
TOMSWEBVILLE,$
END
DEFINE FILE TOMDATA
  REALNAME/A50 = SOUNDEX('50','BURTONSVILLE','A50');
  BURTONSVILLE/A50 = SOUNDEX('50', LOCALITY, 'A50');
END
TABLE FILE TOMDATA
PRINT *
WHERE BURTONSVILLE EQ REALNAME
END

T



In FOCUS
since 1986
WebFOCUS Server 8.2.01M, thru 8.2.07 on Windows Svr 2008 R2  
WebFOCUS App Studio 8.2.06 standalone on Windows 10 
 
Posts: 5694 | Location: United Kingdom | Registered: April 08, 2004Report This Post
Master
posted Hide Post
It works. Thank you


Tomsweb
WebFOCUS 8.1.05M, 8.2.x
APP Studio, Developer Studio, InfoAssist, Dashboards, charts & reports
Apache Tomcat/8.0.36
 
Posts: 573 | Location: Baltimore, MD | Registered: July 06, 2006Report This Post
Expert
posted Hide Post
I would strongly suggest lobbying the owners of the data to fix these sorts of discrepancies.

It should not be the function of reporting to correct data issues.

Either the source data needs to be fixed or a warehouse created and a clean copy loaded.

It would be easy to write an exception report detailing all the close sounding names with SOUNDEX


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

 
Posts: 6347 | Location: 33°49'23.0"S, 151°11'41.0"E | Registered: October 31, 2006Report This Post
Expert
posted Hide Post
Whilst I agree wholeheartedly with Waz, it's often something that will never happen unfortunately Frowner

Another thought that I had for this situation is that you could create a FOCUS or other file with two fields, the first being indexed and containing the SOUNDEX value and the second having the correct spelling. That way you could do a define based join to this table from your original source and get the true spelling for data correction in passage. Throw Waz's suggestion of an exception process into the mix to monitor the mismatches and the method becomes recursive. The advantage would be that you would always have this file (?) and it could be added to when you found new and inventive ways of spelling places people lovingly call home!!

T



In FOCUS
since 1986
WebFOCUS Server 8.2.01M, thru 8.2.07 on Windows Svr 2008 R2  
WebFOCUS App Studio 8.2.06 standalone on Windows 10 
 
Posts: 5694 | Location: United Kingdom | Registered: April 08, 2004Report This Post
Master
posted Hide Post
DECODE is definitely the way to go. I do this all the time because in our industry suppliers and customers are constantly changing their names, getting amalgamated, splitting up, and so on.

The easiest way is to create a separate text file LOCALITYLIST with pair-wise comparisons. The first column contains the incorrect version and the second contains the correct version, so:

BROOKVIL    BROOKVILLE
BROOKVILE   BROOKVILLE
BROOKVILL   BROOKVILLE


Just filedef the text file and then it's a simple case of:
LOCALITY = DECODE LOCALITY(LOCALITYLIST ELSE LOCALITY);


WebFOCUS 7.7.05 Windows, Linux, DB2, IBM Lotus Notes, Firebird, Lotus Symphony/OpenOffice. Outputs PDF, Excel 2007 (for OpenOffice integration), WP
 
Posts: 674 | Location: Guelph, Ontario, Canada ... In Focus since 1985 | Registered: September 28, 2010Report This Post
Expert
posted Hide Post
Cool

quote:
DECODE is definitely the way to go.

quote:
It should not be the function of reporting to correct data issues.
 
Posts: 3132 | Location: Tennessee, Nashville area | Registered: February 23, 2005Report This Post
Expert
posted Hide Post
With this sort of situation, knowing the number of exceptions can be of great advantage.

If say you only had 10 exceptions, does that justify the code change ?

If you had 1000, then a programatical solution may be the way to go.

But I would have thought that there would be some sort of due diligence when customer/client information is entered into an application. But I guess that may be due to trained monkeys.

I would definitely send some feedback about the extent of the data issues.


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

 
Posts: 6347 | Location: 33°49'23.0"S, 151°11'41.0"E | Registered: October 31, 2006Report This Post
Platinum Member
posted Hide Post
FYI - There is a limit on the number of "if-then-else" clauses although I don't know that number. I just know that I do encounter it on some city cleanup code that I use. I also use LIKE in the evaluation as well to deal with flaky users entering bad data.


prod:7.6.9, win2k3 mre, caster, bid, devstudio 7.6.9
 
Posts: 242 | Location: Minneapolis | Registered: February 16, 2006Report This Post
Platinum Member
posted Hide Post
Waz,

In my mind, I would expect the software to provide lookups to validate user input. Unfortunately, on our older POS system there is nothing that prevents a user from entering the most bogus city, state, and zip code information known to mankind. On the other hand, I would also expect people to at least review their input prior to committing the records but this doesn't happen either. Unless we cracked down hard on this practice - nothing will change and without software to hard stop on bad data it wouldn't matter anyway as people do what they do. So I spend a short time cleaning data at the beginning of each month.

The real bummer about this stuff is that I cannot completely automate the process of data extraction and placement, due to the need to look at the data and try to interpret what the user meant with bogus names, abbreviations etc. Confused

Norb

This message has been edited. Last edited by: Norb Eckert,


prod:7.6.9, win2k3 mre, caster, bid, devstudio 7.6.9
 
Posts: 242 | Location: Minneapolis | Registered: February 16, 2006Report This Post
Expert
posted Hide Post
Agreed,

Thats the main problem with name/address, and what are the legal requirements if you are transposing a customers details, that are incorrect.

I think I'm depressed, Sweating


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

 
Posts: 6347 | Location: 33°49'23.0"S, 151°11'41.0"E | Registered: October 31, 2006Report This Post
Master
posted Hide Post
In our situation it's not entirely a case of people entering bad data. As I said, our suppliers and customers are continually amalgamating, changing their name, etc. We need to keep the original data and name, but for reporting purposes we need the historical data to be listed under the new entity. So the DECODE list works very well in most situations. This certainly applies in the one-shot data cleanup cae.

In other cases I have separate FOCUS databases containing the pairs of values. These can be JOINed to the faulty database (or one with missing info) to produce the desired result.


WebFOCUS 7.7.05 Windows, Linux, DB2, IBM Lotus Notes, Firebird, Lotus Symphony/OpenOffice. Outputs PDF, Excel 2007 (for OpenOffice integration), WP
 
Posts: 674 | Location: Guelph, Ontario, Canada ... In Focus since 1985 | Registered: September 28, 2010Report This Post
Expert
posted Hide Post
Besides using IB's "Data Clensing Tool" (which name I forget at the moment), the use of DECODE seems to be the best alternative in these "after the fact" (of dat entry) situations. Using something like George has (separate FOCUS databases containing the pairs of values) for the source of that DECODE. Roll Eyes
 
Posts: 3132 | Location: Tennessee, Nashville area | Registered: February 23, 2005Report This Post
  Powered by Social Strata  

Read-Only Read-Only Topic

Focal Point    Focal Point Forums  Hop To Forum Categories  WebFOCUS/FOCUS Forum on Focal Point     [SOLVED] cleaning dirty data in a define

Copyright © 1996-2020 Information Builders