Problem:
JOINing two data files for which there exists DUPLICATE values on the HOST JOIN
field. Is the output generated correct? What would be JOINing the sample data
from the following two files using a ONE-TO-MANY JOIN?
DATAA:
* * * Top of File * * *
A ABC
A ABCD
B XXX
* * * End of File * * *
DATAB:
* * * Top of File * * *
A 025
A 030
A 031
A 040
A 050
B 015
B 020
B 040
B 045
B 050
* * * End of File * * *
Solution:
It will take every instance of the DATA1 value in the DATAA file and JOIN it to
every matching instance of DATA1 in the DATAB file.
Using the following JOIN:
JOIN DATAA.DATA1 IN DATAA TO ALL DATAB.DATA1 IN DATAB AS JOIN0
the results would be the following (using FOCUS databases):
PAGE 1
DATA1 DATA2 DATA3
----- ----- -----
A ABC 050
A ABC 040
A ABC 031
A ABC 030
A ABC 025
A ABCD 050
A ABCD 040
A ABCD 031
A ABCD 030
A ABCD 025
B XXX 050
B XXX 045
B XXX 040
B XXX 020
B XXX 015
If the files in question were sequential files (SUFFIX=FIX), you would have
received the following:
(FOC1072) DUPLICATES IN JOIN 'FROM' FIELD : DATAB / A
NUMBER OF RECORDS IN TABLE= 10 LINES= 10
PAUSE.. PLEASE ISSUE CARRIAGE RETURN WHEN READY
PAGE 1
DATA1 DATA2 DATA3
----- ----- -----
A ABC 025
A ABC 030
A ABC 031
A ABC 040
A ABC 050
B XXX 015
B XXX 020
B XXX 040
B XXX 045
B XXX 050
The second instance of 'A' in the DATA1 field in the DATAA file is a duplicate,
resulting in the FOC1072 error message and those records will not be displayed
in the report (5 records due to JOIN to XREF DATAB file).
Both of these results are expected behavior. The JOIN itself is working as
expected and it is supported behavior.
If you just want the first or the last instance of the HOST file DATAA value to
be JOINed to every instance of the XREF DATAB file, you could create an extract
off of DATAA that would aggregate the records to make the DATA1 value unique.
For example:
TABLE FILE DATAA
SUM DATA2 BY DATA1
ON TABLE HOLD AS HDATAA FORMAT FOCUS INDEX DATA1
END
-RUN
JOIN CLEAR *
JOIN HDATAA.DATA1 IN HDATAA TO ALL DATAB.DATA1 IN DATAB AS JOIN0
-RUN
TABLE FILE HDATAA
PRINT DATA1 DATA2 DATA3
END
Output generated:
PAGE 1
DATA1 DATA2 DATA3
----- ----- -----
A ABCD 050
A ABCD 040
A ABCD 031
A ABCD 030
A ABCD 025
B XXX 050
B XXX 045
B XXX 040
B XXX 020
B XXX 015
If I change the TABLE request to create the HDATAA HOLD FORMAT FOCUS file to
use FST., I will get the first instance rather than the last (since the field
is ALPHANUMERIC in nature, the last value would be picked up with aggregation
by default).
TABLE FILE DATAA
SUM FST.DATA2
BY DATA1
ON TABLE HOLD AS HDATAA FORMAT FOCUS INDEX DATA1
END
Output generated:
PAGE 1
DATA1 DATA2 DATA3
----- ----- -----
A ABC 050
A ABC 040
A ABC 031
A ABC 030
A ABC 025
B XXX 050
B XXX 045
B XXX 040
B XXX 020
B XXX 015
WFConsultant
WF 8105M on Win7/Tomcat