Focal Point
[WORKAROUND] TABT input files

This topic can be found at:
https://forums.informationbuilders.com/eve/forums/a/tpc/f/7971057331/m/3777055326

August 27, 2012, 04:23 AM
Wep5622
[WORKAROUND] TABT input files
I have some input data files that are coming from an external source (outside my control) that I'm having a bit of trouble parsing.

The problem is, the files are tab-delimited, with a header, but...
The header covers the first two lines. In fact, seeing that there is a column for the number of headerlines in the accompanying database table, the number of header lines is possibly even variable!

I've got as far as automating the synonym creation of these files using DBMS TABT, but the resulting masters assume that the header line is only the first line. The second header line is interpreted as data and that causes the columns to be alpha-numeric instead of numeric.

Is there a setting to CREATE SYNONYM for TABT files that specifies how many header lines there are? Or another workaround for this?


P.S. The report will be reading ~50 of such files (~200 lines, 8 numeric columns each) and generate graphs from them - performance shouldn't be (much) worse than the Business Objects report its replacing.

This message has been edited. Last edited by: Wep5622,


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
August 27, 2012, 04:59 AM
Dave
You might be able to fix it.

Make a procedure that reads the incoming file like a flat-file.

Just -READ every row, incl. all header.
Based on your 'headerlines' and a counter you should be able, with a simple -IF, to -WRITE just one headerline and all following datalines.

Only 200 lines shouldn't take to long.


G'luck, Dave


_____________________
WF: 8.0.0.9 > going 8.2.0.5
August 27, 2012, 05:47 AM
Wep5622
I really had hoped that it wouldn't come down to rewriting 50*200 lines of data into new files before I can graph them. It's bad enough that I first need to generate 50 masters before I can even start graphing.

Perhaps I'll look into a way to apply Python to the problem and do the graphing external to WebFOCUS. That's probably less hard to do.


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
August 27, 2012, 09:32 AM
Dave
Or use phyton to strip the surplus headinglines?


_____________________
WF: 8.0.0.9 > going 8.2.0.5
August 27, 2012, 11:31 AM
Wep5622
Well, I tried with WebFOCUS first, as I realised I probably should download the files to the local disk anyway and it'd not be too inefficient to strip those extra header lines in the same process.

But now DM threw me a surprise!

The process is fairly simple, it goes like this:
FILEDEF INFILE ...
FILEDEF OUTFILE ...
-SET &LINENR = 1;
-SET &STATUS = 0;
...

-REPEAT :READLINE WHILE &STATUS EQ 0;
-READ INFILE, NOCLOSE &LINE
-&STATUS = &IORETURN;
-IF &STATUS NE 0 THEN GOTO :READLINE;

-IF &LINENR GT 1 AND &LINENR LE &NRHEADERLINES GOTO :SKPLINE;
-TYPE Write : &LINE
-WRITE OUTFILE NOCLOSE &LINE
-GOTO :CONTLINE
-:SKPLINE
-TYPE Skipped : &LINE
-:CONTLINE

-SET &LINENR = &LINENR +1;
-:READLINE

...
-CLOSE INFILE
-CLOSE OUTFILE


The actual process is a little more complicated, as I took the opportunity to combine records from different files that need to end up in the same graph into one file per graph.

The problem I run into is that in one of my input-files I have a header-line that goes like this:
PART_NO	No	T	E'	E"	tan รค	|E*|	f	Time	F dyn.	Stress dyn	Strain dyn	Stress stat	Strain stat	F stat.	Lo	Lm


I can see it printed to the screen, but it's not in my OUTFILE! Those two actions are in the same block of logic, so if a line gets printed to the screen, there SHOULD be a corresponding one in the file.

Other header-lines (from other files) are written just fine, it's just this one that DM chooses to ignore. Why is it doing that?

I checked a few things already:


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
August 27, 2012, 03:19 PM
Dan Satchell
I see a couple of possible issues with your DM code - although I don't know if they would cause the problem you are experiencing:

-REPEAT :READLINE WHILE &STATUS EQ 0;
-READ INFILE, NOCLOSE &LINE
-SET &STATUS = &IORETURN;
-IF &STATUS NE 0 THEN GOTO :READLINE;

Also, if you want to skip all header lines, wouldn't the code need to be:

-IF &LINENR GE 1 AND &LINENR LE &NRHEADERLINES GOTO :SKPLINE;

Finally, could you simply use your TABT synonyms and create a DEFINE record counter and use it to exclude the headers? Something like this:

DEFINE FILE xxx
 REC_COUNT/I5 = LAST REC_COUNT + 1 ;
END

TABLE/GRAPH FILE xxx
 PRINT /SUM ...
 BY ...
 WHERE (REC_COUNT GT &NRHEADERLINES);
END



WebFOCUS 7.7.05
August 27, 2012, 05:35 PM
Waz
quote:
The report will be reading ~50 of such files (~200 lines, 8 numeric columns each)


I'm coming in late here, but.

You say that you have ~50 200 line files to read, all with 8 columns of numerics

Is each file the same format ?, if so you could just have a generic master for all.

Secondly, if each file has two header records, then definitely strip them off, this can be done many ways.

With DM code
With TABLE FILE
With os call.

As for performance, 200 records is so tiny, there should be minimal hits in pre processing any files.


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

August 28, 2012, 04:36 AM
Wep5622
quote:
Originally posted by Dan Satchell:
I see a couple of possible issues with your DM code - although I don't know if they would cause the problem you are experiencing:

-REPEAT :READLINE WHILE &STATUS EQ 0;
-READ INFILE, NOCLOSE &LINE
-SET &STATUS = &IORETURN;
-IF &STATUS NE 0 THEN GOTO :READLINE;


Ah, that's just a typo introduced when simplifying my code. The THEN part of IF statements is optional, AFAIK, but it isn't incorrect to add it and IMO it's clearer.

quote:
Also, if you want to skip all header lines, wouldn't the code need to be:

-IF &LINENR GE 1 AND &LINENR LE &NRHEADERLINES GOTO :SKPLINE;


But I don't want to skip all header lines; The column titles are in the first line. If I'd remove those my masters would use the first line of data for the field definitions and that line wouldn't end up in my results as data, would it?

quote:
Finally, could you simply use your TABT synonyms and create a DEFINE record counter and use it to exclude the headers? Something like this:

DEFINE FILE xxx
 REC_COUNT/I5 = LAST REC_COUNT + 1 ;
END

TABLE/GRAPH FILE xxx
 PRINT /SUM ...
 BY ...
 WHERE (REC_COUNT GT &NRHEADERLINES);
END


Rather not, as that would cause my fields to be alphanumeric instead of numeric.
ISTM that the extra header-lines need to be stripped off before creating a master on these files, or the master definition will be incorrect.

I have to say I'm disappointed that there is no option to tell how many (header-)lines to skip when creating a master on a data file. That'd make synonym creation more flexible.


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
August 28, 2012, 04:53 AM
Wep5622
quote:
Originally posted by Waz:
Is each file the same format ?, if so you could just have a generic master for all.


Unfortunately not. Some have fewer colunms, and the columns are sometimes for different measures. They do tend to come in groups though, I already combine the files in the same group into a single data-file with a group-column prepended.

quote:
As for performance, 200 records is so tiny, there should be minimal hits in pre processing any files.


It probably is, but the report is competing with an existing report in another product that doesn't need to do all the re-processing that WF appears to be needing. It's too soon to tell whether that will noticeably impact performance, but in my experience file I/O tends to do just that, especially in concurrent situations. Well, we'll see.

Currently the bigger issue is that the processing isn't working reliably. As I mentioned we have header-lines that inexplicably go missing while it looks like the code actually attempted to write those lines.


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
August 28, 2012, 05:20 AM
Wep5622
It appears there was a logic error somewhere in my loops (I suspected as much), it's working now.


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
August 28, 2012, 05:41 PM
Waz
Just a thought, if it is always the same number of rows (including the header), the using a utility like tail from an OS call could do this with one line of code.


Waz...

Prod:WebFOCUS 7.6.10/8.1.04Upgrade:WebFOCUS 8.2.07OS:LinuxOutputs:HTML, PDF, Excel, PPT
In Focus since 1984
Pity the lost knowledge of an old programmer!

August 29, 2012, 05:17 AM
Wep5622
Heh, if I were on a real operating system then I wouldn't have a problem to begin with Wink

Unfortunately we're on a "tool" called Windows 2003.


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :
August 29, 2012, 06:31 AM
GamP
quote:
But I don't want to skip all header lines;

just because you use TABT as suffix in the master?
You could also change all TABT to TAB as suffix after having created the mfd, and remove all header lines.


GamP

- Using AS 8.2.01 on Windows 10 - IE11.
in Focus since 1988
August 29, 2012, 08:25 AM
Wep5622
Actually... I need to reference the columns by number to create the graphs, so it is indeed more convenient to just drop all headers! Those pesky field-names just get in the way :P

Good call.


WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :