[SOLVED] Fun with Name Column - Topic

Focal Point Banner

As of December 1, 2020, Focal Point is retired and repurposed as a reference repository. We value the wealth of knowledge that's been shared here over the years. You'll continue to have access to this treasure trove of knowledge, for search purposes only.

Join the TIBCO Community
TIBCO Community is a collaborative space for users to share knowledge and support one another in making the best use of TIBCO products and services. There are several TIBCO WebFOCUS resources in the community.

From the Home page, select Predict: WebFOCUS to view articles, questions, and trending articles.
Select Products from the top navigation bar, scroll, and then select the TIBCO WebFOCUS product page to view product overview, articles, and discussions.
Request access to the private WebFOCUS User Group (login required) to network with fellow members.

Former myibi community members should have received an email on 8/3/22 to activate their user accounts to join the community. Check your Spam folder for the email. Please get in touch with us at community@tibco.com for further assistance. Reference the community FAQ to learn more about the community.

Focal Point Forums

Hop To Forum Categories

WebFOCUS/FOCUS Forum on Focal Point

[SOLVED] Fun with Name Column

Read-Only Topic

Go

Search

Notify

Tools

Personal Zone	»
Member Directory
Focal Point Forums	»

Add to My Favorites

Printer Friendly Format

[SOLVED] Fun with Name Column

Platinum Member

posted

I have a request to search a db for misspelled names and the field I need to compare to is a single column that contains full name.

I have a PRSN_LST_NAME column with possible misspelled Last Name “Simth”

I need to compare it to a column on another table that has the INS_FULL_NAME ‘John Smith’.

I can't just say WHERE PRSN_LST_NAME NE INS_FULL_NAME because of the obvious. One has last name only and the other has full name so yes they are different.

What I need to do is be able to pull the last name out of the full name column. Is there a way to read this column and break it up into multiple columns using the ' 'space as a dilimiter?

If there was just First and Last name I think I could do it but we have no way of knowing how many different variations there could be.

Ex. John Smith; Mr John Smith; Mr John Smith III; Rev.John A Smith Jr III

If I could break it out I could then take the Last name column on the focus table and compare that with PRSR_LST_NAME to find only the records that are different.

Any help would be greatly appreciated.

Thanks

JV

This message has been edited. Last edited by: <Kathryn Henning>,

WebFOCUS 8105m
Windows, All Outputs

Posts: 156 | Registered: August 23, 2010

Report This Post

Expert

posted

GETTOK will work, you will have to use token number -1 to get the last word in the name.

This should work for Smith, but not for last names with multiple words.

Waz...

Prod:	WebFOCUS 7.6.10/8.1.04	Upgrade:	WebFOCUS 8.2.07	OS:	Linux	Outputs:	HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

Posts: 6347 | Location: 33°49'23.0"S, 151°11'41.0"E | Registered: October 31, 2006

Report This Post

Ignored post by Waz posted

Expert

posted

You can say

WHERE INS_FULL_NAME CONTAINS PRSN_LST_NAME

because "Smith" is in "Micheal W. Smith". However, as Waz pointed out, it wo'nt work for multi-word last names. But, You can parse out the fullname and search for each "piece" in the full name... Just some more fun with names...

Posts: 3132 | Location: Tennessee, Nashville area | Registered: February 23, 2005

Report This Post

Ignored post by Doug posted

Platinum Member

posted

Thanks for the feedback...

If I use contains I will only get the names that match, which would only give the names that are correctly spelled. I need to find the misspelled names which could be in either column but not in both.

So there is no way to break the INS_FULL_NAME column out into individual columns?
Ex. FNAME MNAME LNAME TNAME
John A Smith JR

WebFOCUS 8105m
Windows, All Outputs

Posts: 156 | Registered: August 23, 2010

Report This Post

Ignored post by Jveselka posted

Expert

posted

Check the help files within dev studio for SOUNDEX.

That may do what you need.

T

In FOCUS since 1986	WebFOCUS Server 8.2.01M, thru 8.2.07 on Windows Svr 2008 R2
	WebFOCUS App Studio 8.2.06 standalone on Windows 10

Posts: 5694 | Location: United Kingdom | Registered: April 08, 2004

Report This Post

Ignored post by Tony A posted

Virtuoso

posted

quote:
So there is no way to break the INS_FULL_NAME column out into individual columns?

Well, you could try and use GETTOK as Waz already suggested but this only works if you guarantee that your data follows a common and simple pattern (i.e. last name is always the second "token" and has no spaces in between) which is obviously not your case.

This requires a more advanced "parser" and you may be better off trying to have this separation implemented directly in your database but even doing it at that layer involves a similar complexity if your data was never captured separating the individual components of a name in the first place.

Prod/Dev: WF Server 8008/Win 2008 - WF Client 8008/Win 2008 - Dev. Studio: 8008/Windows 7 - DBMS: Oracle 11g Rel 2
Test: Dev. Studio 8008 /Windows 7 (Local) Output:HTML, EXL2K.

Posts: 1533 | Registered: August 12, 2005

Report This Post

Ignored post by njsden posted

Expert

posted

quote:
I need to find the misspelled names which could be in either column but not in both.

Have you onsidered MATCH FILE?

Posts: 3132 | Location: Tennessee, Nashville area | Registered: February 23, 2005

Report This Post

Ignored post by Doug posted

Virtuoso

posted

quote:
Originally posted by Doug:
You can say
WHERE INS_FULL_NAME CONTAINS PRSN_LST_NAME 
because "Smith" is in "Micheal W. Smith". ...

Not necessarily. CONTAINS PRSN_LST_NAME mean it contains all of it, including the trailing blanks. Even where the last name is truly last (no JR.), the truth value of that predicate depends on the sizes of the two fields.

If you can convert them to varchar (AnV), and get the length attribute to equal the trimmed length, it should work. Anyone know how to accomplish that?

- Jack Gross
WF through 8.1.05

Posts: 1925 | Location: NYC | In FOCUS since 1983 | Registered: January 11, 2005

Report This Post

Ignored post by j.gross posted

Virtuoso

posted

Here's a possible solution. Use function STRREP to replace the test value portion (last name) of your source string with a null. If a matching test value is found in the source string, it will be replaced with a null, otherwise no replacement will take place. Then compare the lengths of the source string before and after the replacement. If a null was inserted, a matching value was found and the string lengths will differ. If the string lengths are the same before and after the replacement, then no match was found. In the sample code below, I have upper-cased both source and search strings to ensure no-matches are not the result of differences in capitalization. I created DEFINE SEARCH_STR so some values will be found and some will not (i.e., JAPAN and ITALY). In your case, SEARCH_STR would be replaced by PRSN_LST_NAME. You can uncomment the WHERE clause to get a list of only those instances where no match was found.

DEFINE FILE CAR
 SOURCE_STR/A15 = UPCASE(15,COUNTRY,'A15');
 SOURCE_LEN/I2  = ARGLEN(15,SOURCE_STR,'I2');
 SEARCH_STR/A15 = IF (SOURCE_STR EQ 'ITALY') THEN 'LAT' ELSE
                  IF (SOURCE_STR EQ 'JAPAN') THEN 'PAJ' ELSE UPCASE(15,EDIT(COUNTRY,'$9999$'),'A15');
 STRING_LEN/I2  = ARGLEN(15,SEARCH_STR,'I2');
 REPLACED/A15   = STRREP(15,SOURCE_STR,STRING_LEN,SEARCH_STR,0,'X',15,'A15');
 REPLACE_LEN/I2 = ARGLEN(15,REPLACED,'I2');
END
-*
TABLE FILE CAR
 PRINT SOURCE_STR SOURCE_LEN SEARCH_STR REPLACED REPLACE_LEN
-*  WHERE (SOURCE_LEN EQ REPLACE_LEN);
END

Edit: Actually, here's a simpler solution using the POSIT function. Uncomment the WHERE clause to limit output to non-matching instances.

DEFINE FILE CAR
 SOURCE_STR/A15 = UPCASE(15,COUNTRY,'A15');
 SEARCH_STR/A15 = IF (SOURCE_STR EQ 'ITALY') THEN 'LAT' ELSE
                  IF (SOURCE_STR EQ 'JAPAN') THEN 'PAJ' ELSE UPCASE(15,EDIT(COUNTRY,'$9999$'),'A15');
 STRING_LEN/I2  = ARGLEN(15,SEARCH_STR,'I2');
 STRING_POS/I2  = POSIT(SOURCE_STR,15,SEARCH_STR,STRING_LEN,'I2');
END
-*
TABLE FILE CAR
 PRINT SOURCE_STR SEARCH_STR STRING_POS
-* WHERE (STRING_POS EQ 0);
END

This message has been edited. Last edited by: Dan Satchell,

WebFOCUS 7.7.05

Posts: 1213 | Location: Seattle, Washington - USA | Registered: October 22, 2007

Report This Post

Ignored post by Dan Satchell posted

Member

posted

Why not just compare to see if Full Name Contains the Last Name?

 
-*-********* TO TEST
DEFINE FILE CENTHR
lname/A14=IF CENTHR.EMPSEG.ID_NUM EQ 1 THEN 'DAMKIEWICZ,' ELSE CENTHR.EMPSEG.LNAME;
END
-***** for Production  uncomment
-*DEFINE FILE CENTHR
-*lname/A14=CENTHR.EMPSEG.LNAME;
-*END
TABLE FILE CENTHR
PRINT
     CENTHR.EMPSEG.FULLNAME
     CENTHR.EMPSEG.LNAME
     CENTHR.EMPSEG.ID_NUM
     COMPUTE Flag/I1 = IF CENTHR.EMPSEG.FULLNAME CONTAINS lname THEN 0 ELSE 1;
ON TABLE SET PAGE-NUM NOLEAD
ON TABLE NOTOTAL
ON TABLE HOLD AS HOLD_NAMES FORMAT ALPHA
ON TABLE SET STYLE *
$
ENDSTYLE
END

TABLE FILE HOLD_NAMES
PRINT
     HOLD_NAMES.HOLD_NAM.FULLNAME
     HOLD_NAMES.HOLD_NAM.LNAME
     HOLD_NAMES.HOLD_NAM.ID_NUM
     HOLD_NAMES.HOLD_NAM.lname
BY  LOWEST HOLD_NAMES.HOLD_NAM.Flag
WHERE HOLD_NAMES.HOLD_NAM.Flag EQ 0;
ON TABLE SET PAGE-NUM NOLEAD
ON TABLE NOTOTAL
ON TABLE PCHOLD FORMAT HTML
ON TABLE SET HTMLCSS ON
ON TABLE SET STYLE *
     INCLUDE = endeflt,
$
ENDSTYLE
END

The define might not be needed, you could compare right to the Last Name field, but I needed to misspell a last name to validate my logic.

~david

WebFOCUS 8.0.05
Windows- Excel, PDF

Posts: 21 | Location: Crystal City, VA & Kalamazoo, MI | Registered: October 11, 2010

Report This Post

Ignored post by djones posted

Master

posted

How about...

'John Alan Smith Jr. III'
Remove all spaces -> 'JohnAlanSmithJr.III'

Then test with '%smith%' would work fine...

( in Dutch there is even more fun with names : wiki )

G'luck
Dave

_____________________
WF: 8.0.0.9 > going 8.2.0.5

Posts: 668 | Location: Veghel, The Netherlands | Registered: February 16, 2010

Report This Post

Ignored post by Dave posted

Platinum Member

posted

Again, thanks for the feed back

Looks like all these examples are assuming I will know the name they are looking for but I won't. This is a report of misspelled names that will run bi weekly in report caster. Starting to look like this report is not freezable.

WebFOCUS 8105m
Windows, All Outputs

Posts: 156 | Registered: August 23, 2010

Report This Post

Ignored post by Jveselka posted

Virtuoso

posted

Other than trying the SOUNDEX function pointed out by Tony, I don't see how else you'd determine something as misspelt vs. a plain non-matching, unless of course you have access to some sort of dictionary table that gives you a list of possible misspellings for each "known" last name:

LAST_NAME    ALTERNATIVE_LM
-----------------------------
Smith        Simth
Smith        Smiht
Smith        Msith
Smith        Misth
Jones        Jnoes
Jones        Jonse
Jones        Ojnes
....

Having that should make it a breeze to get your records where PRSN_LST_NAME CONTAINS ALTERNATIVE_LM. Otherwise, well, interesting challenge.

Prod/Dev: WF Server 8008/Win 2008 - WF Client 8008/Win 2008 - Dev. Studio: 8008/Windows 7 - DBMS: Oracle 11g Rel 2
Test: Dev. Studio 8008 /Windows 7 (Local) Output:HTML, EXL2K.

Posts: 1533 | Registered: August 12, 2005

Report This Post

Ignored post by njsden posted

Member

posted

Something like this will give you every possible first/last name combination from your list of full names:

TABLE FILE CAR
PRINT
 COMPUTE NAME_ID/I11=LAST NAME_ID + 1;
 MODEL AS NAME
 COMPUTE WORD_CNT/I11 = 0;
 COMPUTE WORD/A24V = '';
ON TABLE SET ASNAMES ON
ON TABLE SET HOLDLIST PRINTONLY
ON TABLE HOLD FORMAT ALPHA
END
-RUN
-REPEAT LOOP &LINES TIMES;

-READ HOLD NOCLOSE &NAME_ID.I11. &NAME.A24.
-SET &WORD_CNT=0;
-IS_WORD
-SET &WORD_CNT=&WORD_CNT+1;
-SET &WORD=GETTOK(&NAME, 24, &WORD_CNT, ' ', 24, 'A24');
-IF &WORD EQ '' GOTO NO_WORD;

MODIFY FILE HOLD
FREEFORM NAME_ID NAME WORD_CNT WORD
MATCH NAME_ID WORD_CNT
ON NOMATCH INCLUDE
DATA
&NAME_ID,&NAME,&WORD_CNT,&WORD,$
END

-GOTO IS_WORD
-NO_WORD
-LOOP
-CLOSE HOLD

TABLE FILE HOLD
PRINT
 NAME_ID
 NAME
 WORD_CNT
 WORD
WHERE WORD_CNT GT 0;
ON TABLE HOLD FORMAT FOCUS
END

JOIN FILE HOLD AT NAME_ID TAG FN TO ALL
 FILE HOLD AT NAME_ID TAG LN AS J1
 WHERE FN.NAME_ID EQ LN.NAME_ID;
 WHERE FN.WORD_CNT LT LN.WORD_CNT;
END

TABLE FILE HOLD
PRINT
 FN.NAME_ID
 FN.NAME
 FN.WORD AS FIRST_NAME
 LN.WORD AS LAST_NAME;
END

You can then compare that with your list of correct first/last names to find those without an exact match.

I kind of doubt WebFOCUS's SOUNDEX will work well for finding the correct spelling. Ideally you'd have something like a Levenshtein distance subroutine to find the closest match.

WebFOCUS 7.6.7
Linux
HTML, Excel, PDF

Posts: 11 | Registered: February 19, 2009

Report This Post

Ignored post by Nick Hilt posted

Member

posted

If you have multi-word first/last names this won't work. It would really just be a lot easier to do this in an environment with arrays and regular expressions.

WebFOCUS 7.6.7
Linux
HTML, Excel, PDF

Posts: 11 | Registered: February 19, 2009

Report This Post

Ignored post by Nick Hilt posted

Expert

posted

I'd probaly get real creative with the SOUNDEX (NEW) with MATCH file (OLD) using NEW-NOT-OLD for starters...

Posts: 3132 | Location: Tennessee, Nashville area | Registered: February 23, 2005

Report This Post

Ignored post by Doug posted

Expert

posted

If I understand what you are trying to achieve, is this on the right track ?

You will have to do more checks to see if all parts match or just some.

-* Write out a master to read the TMP_NAM1 list
EX -LINES 5 EDAPUT MASTER,TMP_NAM1,CV,FILE
FILENAME=TMP_NAM1, SUFFIX=FIX,$
SEGNAME=TMP_NAM1, $
  FIELD=KEY       ,ALIAS=  ,I9  ,A10, $
  FIELD=FULL_NAME ,ALIAS=  ,A70 ,A70 ,$

-* Write out a master to read the TMP_NAM1 list
EX -LINES 6 EDAPUT MASTER,TMP_NAM2,CV,FILE
FILENAME=TMP_NAM2, SUFFIX=FIX,$
SEGNAME=TMP_NAM2, $
  FIELD=KEY        ,ALIAS=  ,I9  ,A10, $
  FIELD=FIRST_NAME ,ALIAS=  ,A30 ,A30 ,$
  FIELD=LAST_NAME  ,ALIAS=  ,A40 ,A40 ,$

-* Write out a master to read the TMP_NAM1 list
EX -LINES 3 EDAPUT FOCTEMP,TMP_NAM1,CV,FILE
1         Georg Johannes von Trapp
2         Mrs Jane Jones

-* Write out a master to read the TMP_NAM1 list
EX -LINES 3 EDAPUT FOCTEMP,TMP_NAM2,CV,FILE
1         Georg                         von Trapp
2         Jane                          Jonse

-RUN

FILEDEF TMP_NAM1 DISK tmp_nam1.ftm
FILEDEF TMP_NAM2 DISK tmp_nam2.ftm

-RUN

DEFINE FILE TMP_NAM1
 NAME_IDX/I9 = 1 ;
END

TABLE FILE TMP_NAM1
BY KEY
BY FULL_NAME
FOR NAME_IDX
1 OVER
2 OVER
3 OVER
4 OVER
5 OVER
6

ON TABLE HOLD AS HLD_NAM1

END

DEFINE FILE HLD_NAM1
 NAME_IDX/I9 = EDIT(E03) ;
END

TABLE FILE HLD_NAM1
PRINT FULL_NAME
      COMPUTE PART_NAME/A40 = GETTOK(FULL_NAME,70,NAME_IDX,' ',40,'A40') ;
BY    KEY
BY    NAME_IDX

ON TABLE HOLD AS FOC_NAM1 FORMAT FOCUS INDEX KEY

END

JOIN KEY IN TMP_NAM2 TO ALL KEY IN FOC_NAM1 AS J1

DEFINE FILE TMP_NAM2
 LAST_IDX/I9 = 1 ;
END

TABLE FILE TMP_NAM2
SUM   MAX.FIRST_NAME
      MAX.LAST_NAME
      MAX.FULL_NAME
      MAX.PART_NAME

BY    KEY
BY    NAME_IDX
PRINT COMPUTE DUMMY/A1 = ' ' ;

BY    KEY
BY    NAME_IDX
FOR LAST_IDX
1 OVER
2 OVER
3 OVER
4 OVER
5 OVER
6

ON TABLE HOLD AS HLD_NAM3
END

DEFINE FILE HLD_NAM3
 LAST_IDX/I9 = EDIT(E07) ;
END

TABLE FILE HLD_NAM3
PRINT 
      FIRST_NAME
      LAST_NAME
      FULL_NAME
      PART_NAME
      COMPUTE PART_LAST/A40 = GETTOK(LAST_NAME,70,LAST_IDX,' ',40,'A40') ;
      COMPUTE MATCH/A1 = IF PART_LAST EQ PART_NAME THEN 'Y' ELSE 'N' ;


BY    KEY
BY    NAME_IDX
BY    LAST_IDX

END

Waz...

Prod:	WebFOCUS 7.6.10/8.1.04	Upgrade:	WebFOCUS 8.2.07	OS:	Linux	Outputs:	HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

Posts: 6347 | Location: 33°49'23.0"S, 151°11'41.0"E | Registered: October 31, 2006

Report This Post

Ignored post by Waz posted

Expert

posted

... or to take Warren's lead (thanks Waz

) on the sample code but include the fact that you need misspelt matching -

-* Write out a master to read the TMP_NAM1 list
EX -LINES 15 EDAPUT MASTER,TMP_NAM1,CV,FILE
FILENAME=TMP_NAM1, SUFFIX=COM,$
SEGNAME=TMP_NAM1, $
  FIELD=KEY       ,ALIAS=  ,I9  ,A10, $
  FIELD=FULL_NAME ,ALIAS=  ,A70 ,A70 ,$
  DEFINE D_NM_PART1/A20 = GETTOK(FULL_NAME, 70, 1, ' ', 20, 'A20');
  DEFINE D_NM_SDEX1/A20 = SOUNDEX(20, D_NM_PART1, 'A4');
  DEFINE D_NM_PART2/A20 = GETTOK(FULL_NAME, 70, 2, ' ', 20, 'A20');
  DEFINE D_NM_SDEX2/A20 = SOUNDEX(20, D_NM_PART2, 'A4');
  DEFINE D_NM_PART3/A20 = GETTOK(FULL_NAME, 70, 3, ' ', 20, 'A20');
  DEFINE D_NM_SDEX3/A20 = SOUNDEX(20, D_NM_PART3, 'A4');
  DEFINE D_NM_PART4/A20 = GETTOK(FULL_NAME, 70, 4, ' ', 20, 'A20');
  DEFINE D_NM_SDEX4/A20 = SOUNDEX(20, D_NM_PART4, 'A4');
  DEFINE D_NM_PART5/A20 = GETTOK(FULL_NAME, 70, 5, ' ', 20, 'A20');
  DEFINE D_NM_SDEX5/A20 = SOUNDEX(20, D_NM_PART5, 'A4');

-* Write out a master to read the TMP_NAM1 list
EX -LINES 6 EDAPUT MASTER,TMP_NAM2,CV,FILE
FILENAME=TMP_NAM2, SUFFIX=COM,$
SEGNAME=TMP_NAM2, $
  FIELD=KEY        ,ALIAS=  ,I9  ,A10, $
  FIELD=LAST_NAME  ,ALIAS=  ,A20 ,A20 ,$
  DEFINE D_SOUNDEX/A4 = SOUNDEX(20, LAST_NAME, 'A4');

-* Write out a master to read the TMP_NAM1 list
EX -LINES 5 EDAPUT FOCTEMP,TMP_NAM1,CV,FILE
1,Michael W. Smith,$
2,Michael W. Smith III,$
3,Mrs Jane Jones,$
4,Ms Clara Johansen,$

-* Write out a master to read the TMP_NAM1 list
EX -LINES 5 EDAPUT FOCTEMP,TMP_NAM2,CV,FILE
1,Simth,$
2,Smith,$
3,Jones,$
4,Joens,$
-RUN

FILEDEF TMP_NAM1 DISK tmp_nam1.ftm
FILEDEF TMP_NAM2 DISK tmp_nam2.ftm
-RUN

MATCH FILE tmp_nam1
PRINT FULL_NAME D_NM_SDEX1 D_NM_SDEX2 D_NM_SDEX3 D_NM_SDEX4 D_NM_SDEX5
   BY KEY
RUN
FILE tmp_nam2
PRINT LAST_NAME D_SOUNDEX
   BY KEY
AFTER MATCH HOLD AS tmp_nam3 OLD-OR-NEW
END
-RUN

TABLE FILE tmp_nam3
PRINT FULL_NAME LAST_NAME
      COMPUTE POSS_MATCH/A1 = IF D_SOUNDEX EQ D_NM_SDEX1
                              OR D_SOUNDEX EQ D_NM_SDEX2
                              OR D_SOUNDEX EQ D_NM_SDEX3
                              OR D_SOUNDEX EQ D_NM_SDEX4
                              OR D_SOUNDEX EQ D_NM_SDEX5 THEN 'Y' ELSE 'N';
   BY KEY
END

Note the missplet

surnames in the second file.

T

In FOCUS since 1986	WebFOCUS Server 8.2.01M, thru 8.2.07 on Windows Svr 2008 R2
	WebFOCUS App Studio 8.2.06 standalone on Windows 10

Posts: 5694 | Location: United Kingdom | Registered: April 08, 2004

Report This Post

Ignored post by Tony A posted

Platinum Member

posted

Ok, thanks for all the help, but it looks like it is not going to happen this way. I was able to find another table that had a last name on it that I could compare to. Not exactly what the customer wanted but it will half to do.

Thanks again.

WebFOCUS 8105m
Windows, All Outputs

Posts: 156 | Registered: August 23, 2010

Report This Post

Ignored post by Jveselka posted

Powered by Social Strata

Please Wait. Your request is being processed...

Read-Only Topic

Focal Point Forums

Hop To Forum Categories

WebFOCUS/FOCUS Forum on Focal Point

[SOLVED] Fun with Name Column

Focal Point | Privacy Statement | Terms of Service

Copyright © 1996-2020 Information Builders

	View $GS_USERNAME's Public Profile
	Add $GS_USERNAME to my Ignore ListRemove $GS_USERNAME from my Ignore List
	View Recent Posts by $GS_USERNAME
	Notify me of New Posts by $GS_USERNAME

New Since your Last Visit