Focal Point
[CLOSED] Loading mainframe FOCUS DB - efficiency tips?

This topic can be found at:
https://forums.informationbuilders.com/eve/forums/a/tpc/f/7971057331/m/3607081795

January 18, 2011, 03:04 PM
Francis Mariani
[CLOSED] Loading mainframe FOCUS DB - efficiency tips?
I'm looking for tips on making FOCUS database loads more efficient.

I have a 13 million row HOLD file that is loaded into a multi-segment FOCUS DB via MODIFY. I have set CHECK to 50000.

There are 5 segments in the hierarchy, with the top 4 containing one alpha field each. These are indexed, as are three fields in the lowest segment.

The load job is taking very long, so I am trying to improve efficiency. Is it more efficient to load the DB with one Master that has no indexes and then use REBUILD to create the indexes?

Thanks,

This message has been edited. Last edited by: Francis Mariani,


Francis


Give me code, or give me retirement. In FOCUS since 1991

Production: WF 7.7.05M, Dev Studio, BID, MRE, WebSphere, DB2 / Test: WF 8.1.05M, App Studio, BI Portal, Report Caster, jQuery, HighCharts, Apache Tomcat, MS SQL Server
January 18, 2011, 03:18 PM
<JG>
I'd ask Art or Noreen.
January 18, 2011, 04:24 PM
Waz
From memory, there is a set command that speeds things up, I think it stops the match first, and just inserts.

Not sure what it is, and not sure if it is RDBMS related.


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

January 18, 2011, 04:43 PM
Waz
Is the command LOADONLY ?


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

January 18, 2011, 04:48 PM
j.gross
quote:
Is it more efficient to load the DB with one Master that has no indexes and then use REBUILD to create the indexes?


Absolutely. But you must use the primary Master for the CREATE, so that control information for all the indexed fields will be included in the FDT. (There is a limit to the number of indexes you can add after CREATE.)

I would put the three steps (create, modify, rebuild index) in separate invocations of FOCUS (separate jobsteps).
January 19, 2011, 02:59 AM
GamP
LOADONLY is indeed for SQL based databases only. It has no effect on Focus data files.

A maximum of 7 indexes can be added to the master file after the data file has been created and will be added to the data file using Rebuild. Since you have now already 7 indexes in place, it is the limit to what you can do after having created the file. I think I would start of with the first field in the first segment indexed. Then load the data and afterwards add the other 6 indexes and do a rebuild index. That leaves you with the possibility to add an eighth index when needed.


GamP

- Using AS 8.2.01 on Windows 10 - IE11.
in Focus since 1988
January 19, 2011, 04:28 AM
Waz
Keep in mind the order of the input file as well.


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

January 19, 2011, 09:18 AM
j.gross
What is the key structure for the bottom segment? Can you post the MFD?


It may help to load the Focus file in stages, one layer at a time.

That would entail five modifies, using five respective HOLD files, each covering all keys from root through the segment to be loaded + any non-key fields in that segment.

The Modify logic could then allow only "on Match Continue" on the parent segments and "on NoMatch Include" on the target segment; any other condition would deserve a Reject.

The layered approach ensures that only one segment page at a time (on deck to extend the file) is actually getting modified (receiving fresh segment instances), whereas a one-shot Modify would have updates to pages of all five segments interleaved. (I realize a parent segment's pointers gets updated when the first child instance of a parent instance is Included -- but the reduction in thrashing can still be significant)

Also consider using XFOCUS -- its larger page size may result in less waste space per page, which would reduce file size and hence the I/O requirements for the update.

[The XFOCUS data-page structure allows for non-homogeneous segment-instance size. In principle, this design would allow varchar (AnnV) fields to be stored in "trimmed" form (with the length descriptor set to the trimmed length of the content), with obvious file-size benefits -- but I do not know whether Modify is implemented to allow the application to control the stored length-descriptor value.]

This message has been edited. Last edited by: j.gross,
January 19, 2011, 10:42 AM
Francis Mariani
I ran the load job on the mainframe. 5 million rows took over 5 hours. This is a new db and my first time at working with XFOCUS. I decided on a multi-segment hierarchical db thinking this would be more efficient than a single-segment db - could I be wrong about that?

Master:

FILE=SALES1, SUFFIX=XFOC $
SEGMENT = TOP     , SEGTYPE=S1, $
  FIELD = TOP     , ALIAS=TOP     , FORMAT=A6   , FIELDTYPE=I, $
SEGMENT = DIVISION, SEGTYPE=S1, $
  FIELD = DIVISION, ALIAS=DIVISION, FORMAT=A6   , FIELDTYPE=I, $
SEGMENT = AREA    , SEGTYPE=S1, $
  FIELD = AREA    , ALIAS=AREA    , FORMAT=A6   , FIELDTYPE=I, $
SEGMENT = DISTRICT, SEGTYPE=S1, $
  FIELD = DISTRICT, ALIAS=DISTRICT, FORMAT=A6   , FIELDTYPE=I, $
SEGMENT = GROUP   , SEGTYPE=S1, $
  FIELD = GROUP   , ALIAS=GROUP   , FORMAT=A6   , FIELDTYPE=I, $
SEGMENT = UNIT    , SEGTYPE=S3, $
  FIELD = UNIT    , ALIAS=UNIT    , FORMAT=A6   , FIELDTYPE=I, $
  FIELD = ROW_NAME, ALIAS=ROW_NAME, FORMAT=A6   , FIELDTYPE=I, $
  FIELD = COL_NBR , ALIAS=COL_NBR , FORMAT=I2   , FIELDTYPE=I, $
  FIELD = SALES , ALIAS=SALES , FORMAT=P18.2BC, $
  FIELD = RETAIL , ALIAS=RETAIL , FORMAT=P18.2BC, $

Load program:

-SET &ECHO = 'ALL';

SET LINES      = 999999
SET ASNAMES    = ON
SET HOLDLIST   = PRINTONLY
SET HOLDFORMAT = ALPHA
-RUN

SET EXTSORT    = OFF
SET MASTER     = NEW
SET XFOCUS     = ON
SET JOINOPT    = NEW
-*SET XFCBINS = 256
-RUN

TABLE FILE FMDETH1
SUM
SALES/P18.2BC
RETAIL/P18.2BC

BY TOP
BY DIVISION
BY AREA
BY DISTRICT
BY GROUP
BY UNIT

BY ROW_NAME
BY COL_NBR

-*WHERE RECORDLIMIT EQ 100

ON TABLE HOLD AS FMDETH2
END
-RUN

CREATE FILE SALES1
-RUN

MODIFY FILE SALES1
CHECK 100000
FIXFORM FROM FMDETH2

MATCH TOP
  ON MATCH   CONTINUE
  ON NOMATCH INCLUDE

MATCH DIVISION
  ON MATCH   CONTINUE
  ON NOMATCH INCLUDE

MATCH AREA
  ON MATCH   CONTINUE
  ON NOMATCH INCLUDE

MATCH DISTRICT
  ON MATCH   CONTINUE
  ON NOMATCH INCLUDE

MATCH GROUP
  ON MATCH   CONTINUE
  ON NOMATCH INCLUDE

MATCH UNIT ROW_NAME COL_NBR
  ON MATCH   REJECT
  ON NOMATCH INCLUDE

DATA ON FMDETH2
END
-RUN

Thanks,


Francis


Give me code, or give me retirement. In FOCUS since 1991

Production: WF 7.7.05M, Dev Studio, BID, MRE, WebSphere, DB2 / Test: WF 8.1.05M, App Studio, BI Portal, Report Caster, jQuery, HighCharts, Apache Tomcat, MS SQL Server
January 19, 2011, 11:39 AM
GinnyJakes
Francis,

Here is a technique that I've used for years:
quote:
FOCUS Techniques/Code Examples
Tips for loading very large databases.
Many FOCUS applications require loading very large FOCUS databases. The following considerations help to reduce load time and CPU consumption:
• Have input records sorted (use system sort utility if necessary).
• Limit the number of fields in the Master File Description by using an alternate description with one field spanning over many fields.
• Establish indices after the data is loaded via the INDEX option of the REBUILD Utility. The following steps should be taken:
o Using a Master File Description with FIELDTYPE=I added to all fields that will be indexed, issue the CREATE FILE filename command. This will format the file for data and indices.
o Using a Master File Description with FIELDTYPE=I removed from all fields that will be indexed, issue the MODIFY FILE filename command. This will load the file with data without indices.
o Using a Master File Description with FIELDTYPE=I added to all fields that will be indexed, issue the REBUILD/INDEX command. This will add the indices to the file.
• Remove all COMPUTES and TYPES wherever possible.
• Set the number of bins higher to lessen the number of I/Os. (SET BINS=63).
• Suppress checkpointing or SET CHECK very high. (CHECK OFF or CHECK 10000).
• Avoid large numbers of rejected transactions.
• Avoid updating parent segments before including child segments. A separate routine for updates is preferable.
• Read segments into the MODIFY in the order which they appear in the FOCUS database and use full fieldnames to facilitate the search sequence.
These considerations can be applied together, separately, or in any way that your application dictates. The important point to remember is that there is no database size requirement for these considerations to apply to your application. For example, when loading only 15,000 unsorted transactions into a FOCUS database with 140 fields and 7 indices, the addition of many of the above techniques can reduce load time.
Note: The key to this technique is that it applies to the initial loading of a FOCUS database. However, many of the considerations can also apply to subsequent modifies to that database.


This is what Mr. Gross suggested above. I'm currently looking for the command that prevents FOCUS from doing the 'match' if the database is being initially loaded. If I find it, I post back.


Ginny
---------------------------------
Prod: WF 7.7.01 Dev: WF 7.6.9-11
Admin, MRE,self-service; adapters: Teradata, DB2, Oracle, SQL Server, Essbase, ESRI, FlexEnable, Google
January 19, 2011, 11:46 AM
Francis Mariani
Ginny, Jack,

Thanks for the tips. It appears that all that I have left to try is the multi-master non-indexed, index process...

Cheers,


Francis


Give me code, or give me retirement. In FOCUS since 1991

Production: WF 7.7.05M, Dev Studio, BID, MRE, WebSphere, DB2 / Test: WF 8.1.05M, App Studio, BI Portal, Report Caster, jQuery, HighCharts, Apache Tomcat, MS SQL Server
January 19, 2011, 11:52 AM
GinnyJakes
And if it is a 'new' file, leave the MATCH logic out. MODIFY default is ON NOMATCH INCLUDE.


Ginny
---------------------------------
Prod: WF 7.7.01 Dev: WF 7.6.9-11
Admin, MRE,self-service; adapters: Teradata, DB2, Oracle, SQL Server, Essbase, ESRI, FlexEnable, Google
January 20, 2011, 05:48 AM
Alan B
Ginny's tips are essential. 2 other things I do are:

1) With the alternate load master, use SEGTYPE=S0 for the MODIFY. Requires an absolute on data load order, but faster.
2) Rather than internl INDEX, use the external index or MDI. Have found these to be excellent.


Alan.
WF 7.705/8.007
January 20, 2011, 11:23 AM
Francis Mariani
I used TABLEF to prepare the HOLD file, then used a non-indexed Master to load the data and then used REBUILD and an indexed Master to index the data.

This takes about 15 minutes to process 13 million rows, which is very satisfactory compared to the over 12 hour estimate if I used an indexed master to load the data.

Thanks very much for the suggestions.


Francis


Give me code, or give me retirement. In FOCUS since 1991

Production: WF 7.7.05M, Dev Studio, BID, MRE, WebSphere, DB2 / Test: WF 8.1.05M, App Studio, BI Portal, Report Caster, jQuery, HighCharts, Apache Tomcat, MS SQL Server