I should be able to do this...but I'm just not getting it. I've tried holding and matching and nothing is working right.
I have a Table that has Vendor Account Number Name Name2 Street City Zip
These are apparently Free Entry Fields so someone can put something and it will generate a new Account number. John Jones 123 St John Street Houston Tx 77096
Then someone else can come in...and not see this and set up a new one
John Jones 123 Saint John St Houston Tx 77096
Then yet another J Jones 123 St John Street Houston Tx 77096
So you then have 3 things meaning the same thing with 3 different account numbers
I need to be able to take these three things and print them out in a report as potential duplicates. I've tried things with CONTAINS or LIKE and holding and matching...but I just can't seem to get both instances to appear in a report correctly. Anyone have some ideas...This message has been edited. Last edited by: Kerry,
In Focus since 1993. WebFOCUS 7.7.03 Win 2003
June 11, 2009, 04:37 PM
Francis Mariani
Data scrubbing/cleansing is a huge subject completely outside WebFOCUS - you should see all the services offered, e.g. Qualified Address: Standard List Scrubbing. Google may help find some scrubbing suggestions.
What are you testing in the CONTAINS or LIKE? How would you expect the names or addresses to be selected as potential duplicates? You may have to compare individual substrings of the whole string. You may want to strip out the words "Street", "St", "Road", "Rd", "Boulevard", "Blvd", etc from the address string before comparing. You may be able to use the SOUNDEX character function...
Francis
Give me code, or give me retirement. In FOCUS since 1991
Production: WF 7.7.05M, Dev Studio, BID, MRE, WebSphere, DB2 / Test: WF 8.1.05M, App Studio, BI Portal, Report Caster, jQuery, HighCharts, Apache Tomcat, MS SQL Server
June 11, 2009, 05:09 PM
Jason K.
Dirty data...tisk tisk.
If it were me, and the database relatively small, I'd sort the data by the zipcode, then by the address, then by name and give it to someone in MS Excel to look through and highlight all the potential problems or something.
If it were a larger database, I'd look into a SOUNDEX function which compares words based on how they sound rather than they are spelled for the Names and addresses.
Wish you the best, projects go from fun and exciting to mundane and disgusting based entirely on quality of data, IMHO.
Prod: Single Windows 2008 Server running Webfocus 7.7.03 Reporting server Web server IIS6/Tomcat, AS400 DB2 database.
June 12, 2009, 09:06 AM
Prarie
thanks Francis...I'll take a look at those. Jason we are talking a huge amount of data.
I was trying doing a compute making a Flag if NAME EQ LAST NAME and holding this data in a hold file then trying to do a MATCH back to the original to get both records but it is only giving me the one. I've tried OLD NEW OLD-NOR-NEW. I thought it should be After MATCH HOLD AS NEW. Which should take all the records in my New file...and match with the old...and print both. But I'm not getting that.
June 12, 2009, 10:00 AM
Prarie
quote:
SOUNDEX function
SOUNDEX function
Thanks Jason I was not familar with that Function...looks promising...
June 12, 2009, 12:11 PM
Darin Lee
We use SOUNDEX in several applications where we compare a policyholder name from our file with a company name submitted from external sources. Also for claimant name comparisons. It has proven to be pretty reliable as far as finding a lot of potential duplicates, but it still seems to miss a lot of others. Gets us part way there, which is better than nothing. You might also try using GETTOK to separate the first/middle and last names and compare with SOUNDEX to achieve better results.
Regards,
Darin
In FOCUS since 1991 WF Server: 7.7.04 on Linux and Z/OS, ReportCaster, Self-Service, MRE, Java, Flex Data: DB2/UDB, Adabas, SQL Server Output: HTML,PDF,EXL2K/07, PS, AHTML, Flex WF Client: 77 on Linux w/Tomcat
June 12, 2009, 12:57 PM
Dave Ayers
Prarie,
My first thought was to separate out the last name, then sort by zipcode and lastname, and detect duplicates (WHERE TOTAL GT 1) for output.
Look through the duplicates file for real duplicates versus coincidence.