[SOLVED] Why is TABLEF multiplying the amount of records in the table?

What verb are you issuing - SUM, PRINT, WRITE, ADD?

Try capturing the SQL using XRETRIEVE = OFF to get only the SQL and not actually run the process.

I would hazard a guess that the SQL is requesting everything from your tables for WebFOCUS to do the work.

T

In FOCUS since 1986	WebFOCUS Server 8.2.01M, thru 8.2.07 on Windows Svr 2008 R2
	WebFOCUS App Studio 8.2.06 standalone on Windows 10

Posts: 5694 | Location: United Kingdom | Registered: April 08, 2004

Ignored post by Tony A posted

Virtuoso

posted

There's no SQL to capture - they're FOCUS files after all.

Besides, I am not so much interested in what it is doing but more in why it is doing what it's doing; what decision in the parse tree caused a different "query plan" (for lack of a better word) to be chosen.
I'm suspecting that one of the servers can sort the results in memory and the other can't (and spills to disk), but I have no means to verify that claim (and I certainly don't understand why it's having so much trouble sorting something that's already sorted, if my suspicion is on the mark).

I'm using PRINT.
Does PRINT behave differently between TABLEF and TABLE?

I'm seeing a vastly different amount of results between using:

TABLE FILE yadayada
PRINT a b c d e f g
BY h
BY i
BY j
BY k
END
(1,567,390 lines)

and using:

TABLEF FILE yadayada
PRINT a b c d e f g
BY h
BY i
BY j
BY k
END
(over 400,000,000 lines)

WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :

Posts: 1669 | Location: Enschede, Netherlands | Registered: August 12, 2010

Ignored post by Wep5622 posted

Waz

Expert

posted

Any joins or cross referenced files in the master ?

Waz...

Prod:	WebFOCUS 7.6.10/8.1.04	Upgrade:	WebFOCUS 8.2.07	OS:	Linux	Outputs:	HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

Posts: 6347 | Location: 33°49'23.0"S, 151°11'41.0"E | Registered: October 31, 2006

Ignored post by Waz posted

Dan Satchell

Virtuoso

posted

The DB records may already be in sort order, but with TABLE if you specify BY phrases FOCUS is going to load them into the internal matrix and sort them anyway. I'm not sure how FOCUS handles TABLEF with BY phrases. You might try moving them to the PRINT verb and see if that helps. You can also issue the ? STAT command after the query to see if the internal matrix was used or not. Ideally, you don't want the internal matrix involved if speed is your goal. These settings may also have some impact:

SET AUTOTABLEF = ON
SET SAVEMATRIX = OFF
SET EXTSORT = OFF

WebFOCUS 7.7.05

Posts: 1213 | Location: Seattle, Washington - USA | Registered: October 22, 2007

Ignored post by Dan Satchell posted

Virtuoso

posted

When using TABLEF against a FOCUS db, the retrieval of the records is very different than when using TABLE.

TABLE will read the records physically across the structure, and then sorts within the internal matrix. TABLEF reads records in logical, key sequence, order and no internal matrix or external sort is generated. This can be demonstrated simply by the CAR file.

TABLE FILE CAR
PRINT COUNTRY
END

ENGLAND 
JAPAN 
ITALY 
W GERMANY 
FRANCE 
-------------------
TABLEF FILE CAR
PRINT COUNTRY
END

ENGLAND 
FRANCE 
JAPAN 
ITALY 
W GERMANY

(This is why FRANCE is out of sequence, as a demonstration of the TABLE / TABLEF differences)

So when issuing a TABLEF, the BY fields are treated very differently.

Try sorting on a non-key field.

TABLE FILE GGSALES
BY REGION
END

v.

TABLEF FILE GGSALES
BY REGION
END

Where the data retrieval provides a form of sorting in TABLEF, then there will be some line reduction. So in GGSALES, when a 'Northeast' record is logically followed by another 'Northeast' record, then this becomes 1 record, not 2 etc.

A TABLE with BY REGION produces 4 records, and TABLEF produces 40, which represent the number of times REGION changes when the file is read logically. If you want to prove this TABLEF GGSALES and PRINT REGION, count the number of times the value of REGION changes, it is 40.

Also bear in mind that if a FOCUS file is disorganized, the benefits of TABLEF can be reduced as the reading can be bouncing around the file causing much higher I/O than expected.

Alan.
WF 7.705/8.007

Posts: 1451 | Location: Portugal | Registered: February 07, 2007

Ignored post by Alan B posted

Virtuoso

posted

quote:
Originally posted by Alan B:
When using TABLEF against a FOCUS db, the retrieval of the records is very different than when using TABLE.

TABLE will read the records physically across the structure, and then sorts within the internal matrix. TABLEF reads records in logical, key sequence, order and no internal matrix or external sort is generated.

Are you saying that, before any sorting is done, TABLE and TABLEF don't read the records in the file in the same order? Are you implying that TABLEF reads the records in index order instead of sequentially?

I'd expect both commands to scan the file sequentially. Well, unless a WHERE clause on an indexed field was specified for TABLE (not TABLEF obviously) and the matching records don't compose a majority of the file - for those cases an index scan would probably be more efficient, right?

In our case we're applying TABLEF to a FOCUS file that's already been sorted on (most of) the BY fields in a previous step. We don't want data reduction there, but data multiplication should definitely not happen.

We do something like this:

TABLE FILE CAR
BY COUNTRY
BY CAR
ON TABLE HOLD FORMAT FOCUS INDEX COUNTRY CAR
END
-RUN

DEFINE FILE HOLD
 A/A3 = SUBSTR(10, COUNTRY, 1, 3, 3, A);
 B/A3 = SUBSTR(10, COUNTRY, 4, 6, 3, B);
END
TABLEF FILE HOLD
BY COUNTRY
BY A
BY B
BY CAR
END

The above does exactly what we intended for our table, where it unfortunately behaves quite differently.

Our actual TABLE is in the middle of a 4 hour RC job and is a "bit" harder to explain. I was hoping to speed that up using TABLEF, since the number of records shouldn't change by doing so and since the file it's based on is already sorted in the order we need (like in the above example). The step of just splitting up a date into separate fields for year, period and week is taking over 2 hours for ~1.5 mln records, which seems ridiculously long. My assumption is that WF does the sorting of the file all over again - unnecessarily so! That's why I tried replacing it with TABLEF (as no additional sorting is necessary), but that obviously didn't work out quite like I expected...

But, I just noticed something in the generated master of the "parent" focus file; not all the BY fields are indexed. We have 9 BY fields, but fields 2 and 5 aren't marked as index fields (because they aren't specified in the INDEX clause). Would that explain what's happening with TABLEF?

Would it help to change the BY fields to regular PRINT fields and just put an INDEX on them? After all, the records are already in the correct order because of how the parent FOCUS-file was sorted, right?

WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :

Posts: 1669 | Location: Enschede, Netherlands | Registered: August 12, 2010

Ignored post by Wep5622 posted

Virtuoso

posted

quote:
Are you saying that, before any sorting is done, TABLE and TABLEF don't read the records in the file in the same order? Are you implying that TABLEF reads the records in index order instead of sequentially?

NO. I am not implying, I am stating, that TABLEF reads a FOCUS file in 'Key Field Sequence Order', not Index Order, and that this is different than TABLE which reads the FOCUS file in physical order.

Why would you expect TABLE and TABLEF to read the file in the same way? TABLEF was created to read a FOCUS file this way so the data would be sorted as per the key fields and no Internal Matrix is then required. Providing of course that you want the data in Key Sequence order.

As you are creating new BY fields in the TABLEF part, then the BY fields are no longer in key sequence order, hence my example with REGION in GGSALES. Place the DEFINE in the previous step and create the file in Key Sequence Order using BY and TABLEF will work, but that would defeat the object of your exercise, because you would then have the data in the order you require.

Please do not mix up Indices and Key Fields, they are not the same. You can use indices for retrieval using filename.indexname (that is one index) and the request will access the file through that Index. But not necessarily in any prescribed order of other fields. That would be done using BY and Internal Matrix.

Remember that TABLEF was around many years prior to Relational DBs, and the fact that TABLEF is used there is only to remove the Internal Matrix processing, the actual reading of the data is handled by the relational engine, not FOCUS/WebFOCUS.

TABLE reading the file physically is quicker by the way, but the data is unsorted, hence the Internal Matrix is used.

If you are licensed for MDI then that can be very useful to you in this case. Otherwise I believe you are better off with plain TABLE.

Alan.
WF 7.705/8.007

Posts: 1451 | Location: Portugal | Registered: February 07, 2007

Ignored post by Alan B posted

Virtuoso

posted

I think we're just using different terminology for the same thing. What you seem to call "Key Sequence Order" is what I call "Index order". the only "difference" between the two appears to me that this "key sequence" is an index that's baked into the focus file. Is that about right?

What you call "Indices" appears to refer to external indices (MDI, I suppose). We don't have that feature, although I think we should look into it. Possibly it would speed up some of our queries (does WF have its own terminology for that too?) at the expense of complicating the code required.

It seems that I interpreted what TABLE and TABLEF do opposite of how they work. That leaves one question remaining; can I tell TABLE to not process its internal matrix? The data is already in the correct order, there's really no need to attempt to order it again.

With regards to:

quote:
the fact that TABLEF is used there is only to remove the Internal Matrix processing, the actual reading of the data is handled by the relational engine, not FOCUS/WebFOCUS.

The relational engine in this case is the FOCUS engine. The data is a FOCUS file.

WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :

Posts: 1669 | Location: Enschede, Netherlands | Registered: August 12, 2010

Ignored post by Wep5622 posted

Virtuoso

posted

Let's clear up terminology for everyone.

You cannot tell the FOCUS database engine to run a TABLE without Internal Matrix, as the Internal Matrix is (generally) used for sorting the data into the order determined in the TABLE request.

Alan.
WF 7.705/8.007

Posts: 1451 | Location: Portugal | Registered: February 07, 2007

Ignored post by Alan B posted

Virtuoso

posted

Thanks for taking the time clearing up the definitions for us (or at least for me). It helps.

quote:
Key Field - Defines the sort order of the logical data structure. S1 says that the first field is the key of the segment and sorted low - high. S2 states that the first 2 fields of the segment are the key to the segment and sorted low - high etc. Use SHn to sort high - low. The key field(s) define the uniqueness of that segment.

In RDBMSes the key fields would be called the primary key, which is typically implemented as a unique index over those fields.

Are key fields in FOCUS also implemented as a (unique) index? I'm starting to think that they might not be, as FOCUS files do not typically (need to) enforce constraints - such as key uniqueness - on their data.
IIUC, key uniqueness in FOCUS files is a result of the TABLE request that created the file; either by sorting and grouping on the BY fields (SUM) or by an added FOCLIST field (PRINT), right?

WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :

Posts: 1669 | Location: Enschede, Netherlands | Registered: August 12, 2010

Ignored post by Wep5622 posted

Virtuoso

posted

quote:
Are key fields in FOCUS also implemented as a (unique) index?

Errmmm. The key fields in a FOCUS file are there as an aid to preserve uniqueness for that segment and provide a logical sort sequence. That is, for the CAR file the S1 on COUNTRY means that you should have only 1 of each COUNTRY. There is always a BUT though. It is not the keys themselves that force uniqueness, but the code that is written to load the data. e.g:

MODIFY FILE CAR
FREEFORM COUNTRY
MATCH COUNTRY
ON MATCH INCLUDE
ON NOMATCH INCLUDE
DATA
ENGLAND,$
END

. The ON MATCH INCLUDE forces duplicate values, albeit with a warning message.

Basically there is no built in intelligence engine within the FOCUS db to force behaviours, so the whole is controlled by the intelligence, or not, of the code.

quote:
IIUC, key uniqueness in FOCUS files is a result of the TABLE request that created the file; either by sorting and grouping on the BY fields (SUM) or by an added FOCLIST field (PRINT), right?

Basically correct, the LIST is there to force uniqueness, thereby allowing the code that loads the data to ensure that the keys are loaded in a unique manner, without duplicates.

Because there is no built in intelligence on the FOCUS db itself means that you can actually create techniques that allow data manipulation of the data and structure that would be unavailable with a relational environment.

Alan.
WF 7.705/8.007

Posts: 1451 | Location: Portugal | Registered: February 07, 2007

Ignored post by Alan B posted

Virtuoso

posted

I see.

So if you would put a screening condition on a key field in a TABLE request, the file would need to be scanned sequentially to find the matching records. Unless of course you'd put an explicit index on those key fields. Correct?

I found a solution for the problem for which I started experimenting with TABLEF and that raises a related question.
It would appear that a TABLE request will always sort its data on BY-fields, regardless of whether the data is already in the correct order, is that correct?

Based on the above, I just realised a huge performance increase on one of our RC jobs: one step in the procedure going from over 2 hours back to 11 minutes!

The relevant parts of that procedure were two subsequent processing steps to generate a hold-file for use in our reports.
I improved it by first HOLDING the ordered data in a BINARY format HOLD-file.
Next I created a FOCUS-format HOLD-file from that (with a few added DEFINE'd fields in the PRINT body) without performing any ordering. Ordering is not needed in that last step, because the order of the records was already decided in the previous step and this later step is just sequentially reading those records and simply writing them out again with a few per-record changes. The result will be in the same order.

Originally we put the same BY-clauses in the later TABLE request as we had in the step before it. Apparently just those BY-clauses caused a huge performance hit...

What we had originally was something like this:

DEFINE FILE CAR
   SEATS2/I10 = SEATS * 2;
END
TABLE FILE CAR
SUM SEATS2
BY COUNTRY
BY CAR
ON TABLE HOLD AS CARSEATS1 FORMAT FOCUS INDEX COUNTRY CAR
END
-RUN

TABLE FILE CARSEATS1
PRINT SEATS2
BY COUNTRY
BY CAR
ON TABLE HOLD AS CARSEATS FORMAT FOCUS INDEX COUNTRY CAR
END

Which I changed to:

TABLE FILE CAR
SUM SEATS
BY COUNTRY
BY CAR
ON TABLE HOLD AS CARSEATS1 FORMAT FOCUS INDEX COUNTRY CAR
END
-RUN

DEFINE FILE CARSEATS1
   SEATS2/I10 = SEATS * 2;
END
TABLE FILE CARSEATS1
PRINT
 COUNTRY
 CAR
 SEATS2
ON TABLE HOLD AS CARSEATS FORMAT FOCUS INDEX COUNTRY CAR
END

Of course, with files this small the performance difference is negligible. Our original files are ~1.3GB in size though, and there it makes a BIG difference!

WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :

Posts: 1669 | Location: Enschede, Netherlands | Registered: August 12, 2010

Ignored post by Wep5622 posted

Baillecl

Platinum Member

posted

Difference between Table and Tablef.
What a nice trip through time.
In the very Old Days, the internal matrix had some limits (Number of lines) that made Focus somehow loose control of it (Empty files instead of x99 999 records expected.
Tablef was then very necessary = Using the IBM sort. When Focus and IBM Sort turned out to be close friends ( EXTSORT or about = ON) Tablef was no longer the trick of the specialist ...
In some cases, I still use it but closely look at the SQL generated ( READLIMIT -> FETCH FIRST 99 ROWS ONLY ).
Anyway, The main point of this discussion is that Tony A is a great guy. Sir, I humbly bow my head in front of you
Cordially and Focusely

Focus Mainframe 7.6.11
Dev Studio 7.6.11 and !!!
PC Focus, Focus for OS/2, FFW Six, MSO

Posts: 134 | Registered: November 06, 2007

Ignored post by Baillecl posted

GamP

Virtuoso

posted

quote:
So if you would put a screening condition on a key field in a TABLE request, the file would need to be scanned sequentially to find the matching records. Unless of course you'd put an explicit index on those key fields. Correct?

yes

quote:
I found a solution for the problem for which I started experimenting with TABLEF and that raises a related question. It would appear that a TABLE request will always sort its data on BY-fields, regardless of whether the data is already in the correct order, is that correct?

and yes.

In your changed environment, since you aready read the data sorted, you could also use TABLEF, bypassing the creation of the internal matrix, which could further improve the performance (less file IO).

GamP

- Using AS 8.2.01 on Windows 10 - IE11.
	in Focus since 1988

Posts: 1961 | Location: Netherlands | Registered: September 25, 2007

Ignored post by GamP posted

Virtuoso

posted

quote:
So if you would put a screening condition on a key field in a TABLE request, the file would need to be scanned sequentially to find the matching records. Unless of course you'd put an explicit index on those key fields. Correct?

Look up AUTOSTRATEGY:
The AUTOSTRATEGY parameter determines when FOCUS stops the search for a key field specified in a WHERE or IF test. When set to ON, the search ends when the key field is found, optimizing retrieval speed. When set to OFF, the search continues to the end of the data source.

and AUTOINDEX:
FOCUS automatically expedites data retrievals by exploiting indexed fields in (most) cases where TABLE requests contain equality or range tests on those fields.

Basically these can be used to help with screening on KEY fields with or without an Index and boost performance.

Alan.
WF 7.705/8.007

Posts: 1451 | Location: Portugal | Registered: February 07, 2007

Ignored post by Alan B posted

Virtuoso

posted

quote:
In your changed environment, since you aready read the data sorted, you could also use TABLEF, bypassing the creation of the internal matrix, which could further improve the performance (less file IO).

Ah right, because of the BY-fields, the BINARY file in our improved approach is in the same key sequence order as the records are laid out on disk. So in the file there's no difference in scanning the records in physical order or in key sequence order. Interesting!

However, from measurements it would appear that TABLE outperforms TABLEF in our case. That's probably because scanning the file in key sequence order involves key lookups.

I deduce that from the fact that TABLEF is "scanned in key sequence order", while TABLE is "scanned in physical order". Some emphasis has been put on the distinction between the two approaches to scan files, which implies that their scanning order is different. Since physical order leaves no room for interpretation, it follows that key sequence order involves a translation from the key sequence "entry" to the physical record. This also implies that the key sequence is actually a type of index on the file, which is supported by Alan's remark about AUTOSTRATEGY.

Apparently in our case the extra overhead of creating an internal matrix, which I suppose is fairly minimal with no ordering requirements, is negligible in comparison to the overhead of translating key sequence to physical order for every record.
Perhaps the internal matrix is required anyway to help in the creation of the index at the end of the TABLE request? In that case, the index creation undoes any benefits TABLEF could have provided.

There's probably some threshold for the number of records after which TABLE is more efficient than TABLEF. Of course, this threshold would be highly system-dependant and would also depend on the key definitions (number and size).

So far this has been a pretty interesting discussion; thanks for all the input!

WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :

Posts: 1669 | Location: Enschede, Netherlands | Registered: August 12, 2010

Ignored post by Wep5622 posted