Focal Point
[CASE OPENED] Strange ERROROUT behaviour

This topic can be found at:
https://forums.informationbuilders.com/eve/forums/a/tpc/f/7971057331/m/6747003716

March 01, 2012, 05:13 AM
Wep5622
[CASE OPENED] Strange ERROROUT behaviour
We have a rather large FOCEXEC here that prepares a number of FOCUS files for use by other reports. In the top of the report we have:
ERROROUT = ON


This report sometimes prints these lines near the end:
 Exiting due to Exit on Error...
 Exiting due to Exit on Error...


However, no actual error is printed at all!
Stranger yet, the errors occur infrequently, but always at the same point in the report. I've added prints for the FOCERRNUM system variable after each TABLE FILE (each followed by -RUN, of course), and this is the result:
 0 NUMBER OF RECORDS IN TABLE=     5848  LINES=   5848
 FOC-error code: 0
 0 NUMBER OF RECORDS IN TABLE=      507  LINES=    507
 FOC-error code: 0
 0 NUMBER OF RECORDS IN TABLE=      129  LINES=    129
 0 NUMBER OF RECORDS IN TABLE=    48932  LINES=  12233
 FOC-error code: 0
 0 NUMBER OF RECORDS IN TABLE=    12233  LINES=  12233
 FOC-error code: 0
 0 NUMBER OF RECORDS SELECTED=    11391  LINES=  11391
 0 NUMBER OF RECORDS SELECTED=    12200  LINES=  12200
 0 LINES OF MATCH OUTPUT     =    23591
 0 NUMBER OF RECORDS SELECTED=     5848  LINES=   5848
 0 LINES OF MATCH OUTPUT     =    29439
 FOC-error code: 0
 0 NUMBER OF RECORDS IN TABLE=    29439  LINES=  29439
 FOC-error code: 0
 0 NUMBER OF RECORDS IN TABLE=    12160  LINES=  11843
 FOC-error code: 0
 0 NUMBER OF RECORDS IN TABLE=    29439  LINES=  29439
 FOC-error code: 0
 0 NUMBER OF RECORDS IN TABLE=    12161  LINES=  12161
 0 NUMBER OF RECORDS IN TABLE=    29439  LINES=  17560
 FOC-error code: 0
 0 NUMBER OF RECORDS IN TABLE=    12161  LINES=  12161
 FOC-error code: 0
 0 NUMBER OF RECORDS IN TABLE=   302535  LINES= 302535
 FOC-error code: 0
 0 NUMBER OF RECORDS IN TABLE=    29439  LINES=    107
 Exiting due to Exit on Error...
 Exiting due to Exit on Error...
 0 NUMBER OF RECORDS IN TABLE=        0  LINES=      0
 FOC-error code: 0
 0 NUMBER OF RECORDS IN TABLE=        0  LINES=      0
 FOC-error code: 0


So, ERROROUT is aborting execution, but there's no actual error?!? I am confused! What is going on here?

This message has been edited. Last edited by: Wep5622,


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
March 01, 2012, 03:40 PM
Waz
Are you execing other fexes in the program ?

Could they be going this ?

Have you turned on ECHO to see where these are appearing ?


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

March 02, 2012, 04:26 AM
Wep5622
quote:
Originally posted by Waz:
Are you execing other fexes in the program ?

Could they be going this ?


No, we don't. We do have a SET ALL=SQL in that script though; I think it's possible that one of the queries that are sent to the database (DB/400) are returning some kind of error code. But then again, wouldn't the result of such a query have 0 LINES?

quote:
Have you turned on ECHO to see where these are appearing ?


Yes, we have, although not for the entire file - only for the part where the error seemed to pop up. That didn't turn up anything.

The cause is probably not the size of the file, it's only about 52MB. It happens (intermittently as it is) on two different systems, so I'm fairly sure we're not running into some kind of file-system limitation.

I'm starting to think that one of the HOLD files gets corrupted near the end. My reasoning is as follows, does that make sense?:

The error seems to occur on a TABLE FILE that creates a "summary" of an earlier HOLD-file (for use in launch page forms). That's the only query in the fex that reads that entire file. Or perhaps it doesn't...
Said TABLE FILE only reads a single field, which is indexed - it's quite possible (assuming WebFOCUS is smart enough to do so) it only ever reads the index, which means it could also be the index that gets corrupted.

I seem to recall there was a thread about checking FOCUS file integrity a while back.


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
March 04, 2012, 03:31 PM
Waz
Being intermittent is going to be difficult.

Are you able to change the FOCUS file to a non FOCUS one, to take the Index possibility out of the picture ?


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

March 05, 2012, 04:47 AM
Wep5622
I added an automatic consistency check for the suspect FOCUS-file that triggers when the final TABLE FILEs in the fex return 0 lines:
-IF &LINES NE 0 THEN EXIT;
-* We have reason to assume data corruption occurred in the source file if an
-* error occured at this point.

REBUILD
CHECK
PUR00001H1

-EXIT


Reportcaster triggered that bit of code twice over the weekend (Saturday and this morning), but apparently without detecting any issues with the file...

Starting from the part of the fex where the error seems to occur the output of one such runs reads:
~~ FOC-error code: 0 Lines: 107
Exiting due to Exit on Error...
Exiting due to Exit on Error...
~~ FOC-error code: 0 Lines: 0
~~ FOC-error code: 0 Lines: 0
Enter option
1. REBUILD (Optimize the database structure)
2. REORG (Alter the database structure)
3. INDEX (Build/modify the database index)
4. EXTERNAL INDEX (Build/modify an external index database)
5. CHECK (Check the database structure)
6. TIMESTAMP (Change the database timestamp)
7. DATE NEW (Convert old date formats to smartdate formats)
8. MDINDEX (Build/modify a multidimensional index)
CHECK
ENTER NAME OF FOCUS/FUSION FILE
STARTING..
REFERENCE..AT SEGMENT 1000
REFERENCE..AT SEGMENT 2000
REFERENCE..AT SEGMENT 3000
REFERENCE..AT SEGMENT 4000
REFERENCE..AT SEGMENT 5000
REFERENCE..AT SEGMENT 6000
REFERENCE..AT SEGMENT 7000
REFERENCE..AT SEGMENT 8000
REFERENCE..AT SEGMENT 9000
REFERENCE..AT SEGMENT 10000
REFERENCE..AT SEGMENT 11000
REFERENCE..AT SEGMENT 12000
REFERENCE..AT SEGMENT 13000
REFERENCE..AT SEGMENT 14000
REFERENCE..AT SEGMENT 15000
REFERENCE..AT SEGMENT 16000
REFERENCE..AT SEGMENT 17000
REFERENCE..AT SEGMENT 18000
REFERENCE..AT SEGMENT 19000
REFERENCE..AT SEGMENT 20000
REFERENCE..AT SEGMENT 21000
REFERENCE..AT SEGMENT 22000
REFERENCE..AT SEGMENT 23000
REFERENCE..AT SEGMENT 24000
REFERENCE..AT SEGMENT 25000
REFERENCE..AT SEGMENT 26000
REFERENCE..AT SEGMENT 27000
REFERENCE..AT SEGMENT 28000
REFERENCE..AT SEGMENT 29000
NUMBER OF SEGMENTS RETRIEVED= 29454
CHECK COMPLETED...
Task finished.


So according to REBUILD CHECK, the file is fine! Either REBUILD is wrong (I doubt it) or we're looking at the wrong file after all?

My colleague meanwhile turned a ON TABLE HOLD FORMAT FOCUS into an ON TABLE HOLD AS FOCCACHE/FOO FORMAT FOCUS, after which the fex ran successfully again. Frankly, I doubt that was the cause and expect we'll be seeing this error again in a few days. But at this point, anything is worth a try!


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
March 05, 2012, 06:49 PM
Jim Morrow
Two other places to look for error messages the edaprint.log and in the "work folder" where all your tempoary files are created. When an agent crashes this folder is left on disk, unless you are stupid enough to have "crashcleanup" turned on. Then you have lost the footprint of what was happening.

From the comment on DB/400 have you turned on error mesage from the DBMS? I don't have the syntax here at home.


Jim Morrow
Web Focus 7.6.10 under Windows 2003
MVS 7.3.3



March 06, 2012, 10:30 AM
Wep5622
The edaprint log doesn't show anything obvious around the times the issue occurred.
Where would that "work folder" be located? I couldn't find anything of that name.

When you say "turn on error messages from the DBMS", do you mean that as a setting in the DB/400 adapter or in the database environment (AS/400) itself?


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
March 06, 2012, 03:41 PM
Waz
The work folder is typically \ibi\srv{xx}\wfs\edatemp\ts{nnnnnn}.

Adn if the run is successful, it will be deleted.

There is a setting to keep the temp directory, can remember what it is though.


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

March 07, 2012, 04:42 AM
Wep5622
Thanks Waz.

While we do have several of such directories, none are from the problem dates that I took the other information from. Moreover, most of those ts* directories just contain a 0-Byte stderr.log and stdout.log...

We do have a few of those directories from earlier dates, so it looks like if there ever was a saved trace of the issue, it would have been there.

I'm starting to think that the detection of an error is what's the error, especially since the system is apparently unable to provide any information about the error it allegedly encountered. It may be a phantom...

There appears to be no trace of a real problem if we don't let the fex abort on error. It's possible that the problem in the resulting hold files is just hidden well enough for us to notice, of course. That's what I'm hoping to rule out at least.

Error: No error occurred in this statement. Music


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
March 07, 2012, 06:40 AM
GamP
quote:
There is a setting to keep the temp directory, can remember what it is though

SET TEMPERASE=OFF


GamP

- Using AS 8.2.01 on Windows 10 - IE11.
in Focus since 1988
March 07, 2012, 11:11 AM
Wep5622
We did some more digging and ended up splitting the report into two separate fexes, with the split just before the last successfully executed TABLE FILE (the one resulting in 107 lines). At the split-point in the first fex, we called the 2nd part using EX.

The result is as follows:
0 NUMBER OF RECORDS IN TABLE=     5892  LINES=   5892
 FOC-error code: 0       Lines:     5892
 0 NUMBER OF RECORDS IN TABLE=      508  LINES=    508
 ~~ FOC-error code: 0       Lines:      508
 0 NUMBER OF RECORDS IN TABLE=      130  LINES=    130
 0 NUMBER OF RECORDS IN TABLE=    49208  LINES=  12302
 ~~ FOC-error code: 0       Lines:    12302
 0 NUMBER OF RECORDS IN TABLE=    12302  LINES=  12302
 ~~ FOC-error code: 0       Lines:    12302
 0 NUMBER OF RECORDS SELECTED=    11460  LINES=  11460
 0 NUMBER OF RECORDS SELECTED=    12269  LINES=  12269
 0 LINES OF MATCH OUTPUT     =    23729
 0 NUMBER OF RECORDS SELECTED=     5892  LINES=   5892
 0 LINES OF MATCH OUTPUT     =    29621
 ~~ FOC-error code: 0       Lines:    29621
 0 NUMBER OF RECORDS IN TABLE=    29621  LINES=  29621
 ~~ FOC-error code: 0       Lines:    29621
 0 NUMBER OF RECORDS IN TABLE=    12238  LINES=  11918
 ~~ FOC-error code: 0       Lines:    11918
 0 NUMBER OF RECORDS IN TABLE=    29621  LINES=  29621
 ~~ FOC-error code: 0       Lines:    29621
 0 NUMBER OF RECORDS IN TABLE=    12239  LINES=  12239
 ~~ FOC-error code: 0       Lines:    12239
 0 NUMBER OF RECORDS IN TABLE=    29621  LINES=  17673
 ~~ FOC-error code: 0       Lines:    17673
 0 NUMBER OF RECORDS IN TABLE=    12239  LINES=  12239
 ~~ FOC-error code: 0       Lines:    12239
 0 NUMBER OF RECORDS IN TABLE=   304319  LINES= 304319
 ~~ FOC-error code: 0       Lines:   304319
 -* Continue here
 -* (Split up in an attempt to prevent a mysterious error)
 EX PUR00001L1_Quarter_Delivered_pt2
 -RUN
 0 NUMBER OF RECORDS IN TABLE=    29621  LINES=    108
 ~~ FOC-error code: 0       Lines:      108
 Exiting due to Exit on Error...
 Exiting due to Exit on Error...
 0 NUMBER OF RECORDS IN TABLE=        0  LINES=      0
 ~~ FOC-error code: 0       Lines:        0
 0 NUMBER OF RECORDS IN TABLE=        0  LINES=      0
 ~~ FOC-error code: 0       Lines:        0

The output after the EX-command is from the 2nd fex, the 1st fex ends at the EX command.

That output suggests that the error occurs in the 2nd part. However, I ran that 2nd fex separately 10 times in a row and no error occurred! That strongly indicates that the error was caused in the first part and propagated on to the 2nd part somehow...?

For now we have split our report-caster task into two separate tasks, so that the error at least won't break the report for our users, but obviously something is quite not right here. This will also allow us to verify in which part the error occurs, as code-wise they are truly separated now.


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
March 07, 2012, 03:35 PM
Waz
In your WebFOCUS server, do you have agent refresh set to 1 ?


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

March 08, 2012, 03:48 AM
Wep5622
Oh, that's an interesting angle...!

That parameter is on 100 for all our data services on both servers, except for the one on our test-server where I've been trying to reproduce this issue - there it's set to 25. The issue also happens on our production server when called through RCaster, which we gave it's own data service to prevent things from getting in each other's way.

Now that I know about this setting, I can relate it to messages in EDAPRINT.LOG about refreshing an agent after 100 connections. That happens with some regularity, but that won't happen in the middle of executing a procedure, would it?

Is there any reason to ever want that setting on 1? I can't imagine why someone would do that...


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
March 08, 2012, 03:34 PM
Waz
Past experience is that the more an agent is used the higher the risk of memory issues, etc.

We have all ours set to 1. single use, then it kills its self, and a new one starts up.

I know its a long shot, but perhaps its something that could be looked at.


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

March 09, 2012, 04:02 AM
Wep5622
Using very similar reasoning we just reduced the number to 25 on our production server as well. Changing that parameter also initiated a restart of "something". Either of those changed the runtime environment enough that we haven't seen the problem reoccur yet!

I do find it a bit worrying that an agent can't just keep running "forever" without accumulating garbage though, but this is not the first time we've seen such in an IBI product.


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :