Focal Point
Join performance

This topic can be found at:
https://forums.informationbuilders.com/eve/forums/a/tpc/f/7971057331/m/4667026336

September 20, 2013, 12:39 AM
Waz
Join performance
I am hoping that someone has a good idea for this issue I have.

I have two tables both with possibly 10,000 records each.

Table 1 has Contract information (Hold File)
Table 2 has Price information (FOCUS File)

I need to get prices from table 2 where the price is between two dates in table 1.

The two dates in table 1 can be days or years apart.

I have working code that joins each table 1 record to all table 2 records, but the performance is abysmal.

Does anyone have any suggestions of better ways to do this ?


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

September 20, 2013, 05:31 AM
Danny-SRL
Waz,

One question: are the 2 dates in the same record or in different records?


Daniel
In Focus since 1982
wf 8.202M/Win10/IIS/SSA - WrapApp Front End for WF

September 20, 2013, 06:54 AM
Danny-SRL
Waz,

A bit of tinkering.
I created 2 files in the spirit of what you posted. The T1 file has both dates. See if this answers your problem:
  
-* File Waz.fex
TABLE FILE GGSALES
SUM UNITS
BY PRODUCT
RANKED BY DATE
ON TABLE HOLD AS T2 FORMAT FOCUS INDEX DATE
END
DEFINE FILE T2
P1/I3 WITH PRODUCT=IF PRODUCT EQ LAST PRODUCT THEN LAST P1 ELSE 1 + RDUNIF('D5.4') * 24;
P2/I3 WITH PRODUCT=IF PRODUCT EQ LAST PRODUCT THEN LAST P2 ELSE 1 + RDUNIF('D5.4') * 24;
END
TABLE FILE T2
LIST DATE BY PRODUCT BY DATE NOPRINT
WHERE RANK EQ P1 OR RANK EQ P2
ON TABLE SET HOLDLIST PRINTONLY
ON TABLE HOLD AS T1A
END
TABLE FILE T1A
SUM DATE AS DATE
BY PRODUCT
ACROSS LIST
ON TABLE SET ASNAMES ON
ON TABLE HOLD AS T1 FORMAT FOCUS
END
-RUN
JOIN FILE T1 AT DATE1 TAG T1
     TO MULTIPLE 
     FILE T2 AT DATE TAG T2 AS WAZ
     WHERE T1.PRODUCT EQ T2.PRODUCT AND T1.DATE1 LE T2.DATE;
     WHERE T1.PRODUCT EQ T2.PRODUCT AND T1.DATE2 GE T2.DATE;
END
TABLE FILE T1
PRINT T1.DATE1 T1.DATE2 T2.DATE T2.UNITS
BY T1.PRODUCT
END




Daniel
In Focus since 1982
wf 8.202M/Win10/IIS/SSA - WrapApp Front End for WF

September 20, 2013, 05:27 PM
Waz
I hadn't thought of an across.

I'll try it out.

Thanks

Oh, and yes the two dates are in the same record.


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

September 20, 2013, 05:41 PM
George Patton
What happens if you HOLD FORMAT FOCUS with an Index? Or maybe invert the join?


WebFOCUS 7.7.05 Windows, Linux, DB2, IBM Lotus Notes, Firebird, Lotus Symphony/OpenOffice. Outputs PDF, Excel 2007 (for OpenOffice integration), WP
September 21, 2013, 12:52 PM
Danny-SRL
Waz,

I used the ACROSS in order to create a record with 2 dates... but if it helps you, fine!

My idea was to use a WHERE-based JOIN.
Did this do something for you?


Daniel
In Focus since 1982
wf 8.202M/Win10/IIS/SSA - WrapApp Front End for WF

September 22, 2013, 06:17 PM
Waz
quote:
What happens if you HOLD FORMAT FOCUS with an Index



I should have mentioned that the working version uses an indexed focus file.

quote:
My idea was to use a WHERE-based JOIN


I had also used a WHERE based join. the performance did improve, but was still quite bad.

FYI,

My test case (Worst case scenario) has 100,000 in T1 and 100,000 in T2, and it took almost 50 hours to complete.

This is a job that is supposed to run daily.

This message has been edited. Last edited by: Waz,


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

September 22, 2013, 11:15 PM
j.gross
quote:
I have working code that joins each table 1 record to all table 2 records, but the performance is abysmal.


How is your Focus file structured?

If there is a sizeable collection of dependent fields in the reference table (file 2), it's advisable to isolate them from the fields that are needed to establish a match (item identifier, effective date expiry date), by placing them in a segtype=U segment.

If you build the reference table on the fly, with a multi-verb request, each verb in the TABLE ... HOLD request will generate a separate segment in the Focus file, with segtype reflecting its additional keys relative to its immediate parent. Thus,
TABLE FILE data_source
SUM COMPUTE NADA/A1=; NOPRINT 
  BY item_id
SUM FST.expiry_date 
  BY item_id
  BY effective_date
SUM other data fields
  BY item_id
  BY effective_date
ON TABLE HOLD FORMAT FOCUC INDEX item_id
END 

will produce a three-segment Focus file, with segtype of S1, S1, and U (respectively). A condition-based join of file 1 to this structure will use the index on item_id to locate the appropriate instance of the top segment; and will sift through its chain of children in the level two segment until the instance covering the date is located; and will then pull the price info from the corresponding level-three segment instance.
September 22, 2013, 11:25 PM
Waz
Jack, that is a great idea, multisegment may help immensely.


I must say,

Thanks Jerry for Focal Point.

Smiler


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

September 24, 2013, 10:17 AM
EricH
For this situation I recommend keeping the data in a relation DB - SQL Server, MySQL, etc.

In simplified terms your query will look something like this:
select
 whatever
from contract 
,    price 
where contract.key = contract.key
  and price.date between contract.start_date and contract.end_date

September 24, 2013, 11:11 AM
RSquared
Waz,

Why don't you extract all the key data for the join from table 1 where the records matc hthe date range, hild the file and then join to the Price table?


WF 7.6.11
Oracle
WebSphere
Windows NT-5.2 x86 32bit
September 24, 2013, 11:42 AM
Wep5622
Better yet, put it into an RDBMS that understands ranged types, such as PostgreSQL!

The downside; if you don't already have the required adapter, that solution gets expensive...

P.S. I have no experience with the WebFOCUS PostgreSQL adapter and have no idea what quality the queries it sends to the DB are. Unfortunately we don't use PG @work here.


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :