 As of December 1, 2020, Focal Point is retired and repurposed as a reference repository. We value the wealth of knowledge that's been shared here over the years. You'll continue to have access to this treasure trove of knowledge, for search purposes only. Moving forward, myibi is our community platform to learn, share, and collaborate. We have the same Focal Point forum categories in myibi, so you can continue to have all new conversations there. If you need access to myibi, contact us at myibi@ibi.com and provide your corporate email address, company, and name. Connect to myibi
Focal Point Focal Point Forums WebFOCUS/FOCUS Forum on Focal Point [CLOSED] Median or Standard Deviation Read-Only Topic
 Go Search Notify Tools [CLOSED] Median or Standard Deviation Silver Member posted August 10, 2004 03:03 PM
Hi

I tried searching the board but didnt find anything. Is there a way to calculate the median or standard deviation of a data set in WebFocus ?

Thanks
Stephen

This message has been edited. Last edited by: Kerry,

 Posts: 38 | Registered: May 20, 2004 IP
Silver Member posted August 10, 2004 03:15 PM Hide Post
Stephen ,
This is an old problem for WF.
I just calculated standard deviation yesterday with WF 5.2.1 on AIX (self service application)
Unfortunately you have to write the formula using the prefix ASQ. AVE. and then SQRT.
it would be better for you to use "Sorting with multiple Display Command" (see Manual)
Paolo

 Posts: 36 | Location: Bologna Italy | Registered: March 11, 2004 IP
Silver Member posted August 10, 2004 03:16 PM Hide Post
Hmmm. Thanks Ill give it a try.

Stephen

 Posts: 38 | Registered: May 20, 2004 IP
Platinum Member posted August 10, 2004 03:57 PM Hide Post
The formula for Standard Deviation in an example..
TABLE FILE CAR
SUM ASQ.WEIGHT AVE.WEIGHT CNT.WEIGHT
COMPUTE STD=((ASQ.WEIGHT-AVE.WEIGHT *AVE.WEIGHT) **.5) /CNT.WEIGHT ;
END

The reason its not a function in WebFocus is that
often the denominator has to be (N-1) and not N
as in this example.

 Posts: 226 | Registered: June 08, 2003 IP
Platinum Member posted August 10, 2004 04:06 PM Hide Post
The formula for MEDIAN is shown in this example..
TABLE FILE CAR
PRINT WEIGHT NOPRINT
COMPUTE MEDIAN=MEDIAN +1 ;
WHERE TOTAL MEDIAN EQ 9 ;
BY WEIGHT

END

Note that the number of values has to be known
in advance (9). But you can find this out
automatically , eg
TABLE FILE CAR
RRINT WEIGHT AND HOLD
END
-SET &COUNT = &LINES ;
TABLE FILE CAR
PRINT WEIGHT NOPRINT
COMPUTE MEDIAN=MEDIAN +1 ;
WHERE TOTAL MEDIAN EQ &LINES ;
BY WEIGHT

END

 Posts: 226 | Registered: June 08, 2003 IP
Platinum Member posted August 10, 2004 04:09 PM Hide Post
Slight error in median.
TABLE FILE CAR
RRINT WEIGHT AND HOLD
END
-SET &COUNT = &LINES ;
TABLE FILE CAR
PRINT WEIGHT NOPRINT
COMPUTE MEDIAN=MEDIAN +1 ;
WHERE TOTAL MEDIAN EQ &COUNT ;
BY WEIGHT

END

if the &COUNT is even then often it and the next value is used and averaged.

 Posts: 226 | Registered: June 08, 2003 IP
Silver Member posted August 10, 2004 04:34 PM Hide Post
Great thanks.

 Posts: 38 | Registered: May 20, 2004 IP
Expert posted August 10, 2004 04:57 PM Hide Post
Stephen, these need to be manual calculations, which imho is a good thing, bcause you can calc stats at levels within the data set, and not just for the entire data set. You can also check the shape of your distribution so that you know if your SD means anything at all,and youcancalc your own confidence limits, while you're at it. So it takes a few passes at the data to do this. Here's what works for me: For SD: you'll need your sample size N and you can get this as an &var ,if you want,by counting lines when you prepare your dataset, and you'll need to be able to exponentiate. Remember that in focus X**Y will perform whatever exponentiation you need, and X**.5 for example will take the square root. You can use one of the SD formulae and the Confidence Limit formulae that you'll find in Excel Help, to remind you how to set it all up. For Median: you'll need to order your data set, then use the LIST verb to number them, take the max of that LIST column, cut it in half, and then mark your data set where the values change from below that mark to above it, MAKING YOUR OWN decision about what to do if your median falls between two values, and proceed accordingly. I like to do these things manually, because I can then compute weighted medians, which is most common, or rank the groups of my entire data set inorder of , say, Confidence, Or present a report of member variance compared to group variance, and show the bad guys at the top of the list. Bored yet?

 Posts: 3811 | Location: Manhattan | Registered: October 28, 2003 IP
<Pietro De Santis> posted August 10, 2004 05:36 PM
Yikes! I hope I never have to do that!  IP
Member posted August 10, 2004 06:25 PM Hide Post
While Gerry's answer is (of course) correct, it's fairly simple, and uses TWO passes. The MEDIAN can be calculated with a single pass, and within sort fields, but it's a bit more complex. Below is annotated code, to allow this. It uses multi-verb, as well as COMPUTEs, and WHERE TOTAL tests.

-* median is the 'middle' value (if odd number of values), or the
-* average of the two middle values (if even number)
-* when all the values are sorted
TABLE FILE &FILE
-* use cnt. to get number of instances
WRITE CNT.&FIELD NOPRINT
COMPUTE
-* use this if even number of instances
M1/I4=INT((CNT.&FIELD + 1)/2); NOPRINT COMPUTE
-* use this too for even. if odd number of instances - same as m1
M2/I4=INT((CNT.&FIELD + 2)/2); NOPRINT
-* sort field(s) within which to calculate median
BY &BYFIELD
-* get details in sort fields and 'median'ed field order
PRINT
COMPUTE
-* concat all by fields to compute the xlist (for when to reset)
BYFIELDS/A10=&BYFIELD; NOPRINT
-* use this compute to order the field within sorts
COMPUTE XLIST/I5=IF BYFIELDS EQ LAST BYFIELDS THEN XLIST + 1 ELSE 1; NOPRINT
-* get cumulative values for median calculation
-* if count is odd, M1 = M2, and only one value. If even, need average
COMPUTE C&FIELD/&FMT = IF M1 NE M2
THEN (LAST &FIELD + &FIELD)/2 ELSE &FIELD;
AS 'MEDIAN,&FIELD'
BY &BYFIELD BY &FIELD
-* only look at possible median values (m1 = m2 if odd, only m2 if even)
WHERE TOTAL XLIST EQ M2
END

This message has been edited. Last edited by: <Mabel>,

 Posts: 25 | Location: 2 Penn Plaza 28 fl | Registered: March 27, 2003 IP
Silver Member posted August 11, 2004 11:36 AM Hide Post
Gerald ,
the formula is wrong.
A wonderful old IBI Manual "Statistical Analyse" says that for computational puprposes is used:
TABLE FILE CAR
SUM ASQ.WEIGHT AVE.WEIGHT CNT.WEIGHT
COMPUTE STD=(
(ASQ.WEIGHT-CNT.WEIGHT*AVE.WEIGHT*AVE.WEIGHT) **.5) /(CNT.WEIGHT -1); END

Paolo

 Posts: 36 | Location: Bologna Italy | Registered: March 11, 2004 IP
Platinum Member posted August 11, 2004 03:56 PM Hide Post
Paulo,
Nope, your formula would give a negative
number. Actually you can take my formula
and multiple by cnt.x/(cnt.x -1) to adjust for the degrees of freedom.

 Posts: 226 | Registered: June 08, 2003 IP
Member posted August 11, 2004 05:37 PM Hide Post
Since we're talking FOCUS, you can get both standard deviation and variance with a LET, as follows:

LET
-* Population variance
PVAR = COMPUTE PVAR.<1>=(ASQ.<1>-(AVE.<1>*AVE.<1>));;
-* sample variance ( population variance time N/(N-1)
SVAR = CNT.<1> NOPRINT COMPUTE # SVAR2 <1>;
SVAR2= SVAR.<1>=(ASQ.<1>-(AVE.<1>*AVE.<1>)) * (CNT.<1>/(CNT.<1>-1)) ;;
-* population standard deviation
PSDEV = PVAR <1> # NOPRINT COMPUTE PSDEV.<1>=SQRT(PVAR.<1>);;
-* sample standard deviation
SSDEV = SVAR <1> # NOPRINT COMPUTE SSDEV.<1>=SQRT(SVAR.<1>);;
END So, to get the POPULATION variance of WEIGHT, you'd use:

PVAR WEIGHT

to get the SAMPLE standard deviation, you'd use:

SSDEV WEIGHT

This message has been edited. Last edited by: <Mabel>,

 Posts: 25 | Location: 2 Penn Plaza 28 fl | Registered: March 27, 2003 IP
Silver Member posted August 12, 2004 10:17 AM Hide Post
I'm sorry but I repeat that the formula is wrong.

I try

STDEVA1=(ASQ.var-(AVE.var*AVE.var))*(CNT.var/(CNT.var-1)) ;
STDEV1=SQRT(STDEVA1);
and
STDEVA=(ASQ.var-(CNT.var*(AVE.var)**2))/(CNT.var-1));
STDEV=SQRT(STDEVA);

on my sample :
var
6,97
16,99
19,1
45,52
25,64
3,02
384,81
26,00

Then I test both with excel
and the good formula is my formula STDEV.

STDEV1= 411,04
STDEV= 129,47
ST.DEV (excel)= 129,4718873

Paolo

 Posts: 36 | Location: Bologna Italy | Registered: March 11, 2004 IP
Member posted August 17, 2004 02:40 PM Hide Post
Paolo,

I'm not sure what your ENTIRE request was, but I took your data, loaded it into a file, and ran your code, as follows. I had to remove the final right paren (one too many), and got the following results:

STDEVA1 STDEV1 STDEVA STDEV
------- ------ ------ -----
16,762.95 129.47 -2,261.45 .00

Seems correct to me, for the calculation we gave.

Here's the code:

TABLE FILE PAOLO
SUM CNT.DATA NOPRINT

COMPUTE
STDEVA1=(ASQ.DATA-(AVE.DATA*AVE.DATA))*(CNT.DATA/(CNT.DATA-1)) ;
STDEV1=SQRT(STDEVA1);

STDEVA=(ASQ.DATA-(CNT.DATA*(AVE.DATA)**2))/(CNT.DATA-1);
STDEV=SQRT(STDEVA);
END
Now, my Stats book defines VARIANCE (St. Dev. squared) as:

(n * SUM(x**2) - (SUM(x))**2)/n**2

which can be re-arranged as:

(n * SUM(x**2))/n**2 - (SUM(x)**2)/n**2

which becomes:

SUM(x**2)/n - (SUM(x)/n)**2

The first term is the definition of ASQ; the second is the average squared. The last part is giving sample values (uses n-1 instead of n instances). This is the calculation given.

This message has been edited. Last edited by: <Mabel>,

 Posts: 25 | Location: 2 Penn Plaza 28 fl | Registered: March 27, 2003 IP
Member posted August 17, 2004 08:22 PM Hide Post
Not to belabor the point, but I found an old ANALYSE manual, dated 10/91. The formula for VARIANCE (Std. Dev. squared) is given as:

(Sum(x**2) - n * ave.x**2)/(n-1)

That first term is ASQ.x * n, NOT ASQ.x

Maybe THAT's what's causing the confusion.

 Posts: 25 | Location: 2 Penn Plaza 28 fl | Registered: March 27, 2003 IP
Silver Member posted August 19, 2004 04:39 PM Hide Post
Thanks to all for the input.

 Posts: 38 | Registered: May 20, 2004 IP
Silver Member posted August 23, 2004 12:55 PM Hide Post
I must apologize for myself,
I used ASQ.x * n while I wrote ASQ.x without n

My justification is that
my query is very complex within multiple
sort fields and I went wrong to copy in the mail
Paolo

 Posts: 36 | Location: Bologna Italy | Registered: March 11, 2004 IP
Virtuoso posted May 01, 2012 09:24 AM Hide Post
quote:
SUM(x**2)/n - (SUM(x)/n)**2

The first term is the definition of ASQ; the second is the average squared. The last part is giving sample values (uses n-1 instead of n instances). This is the calculation given.

I was looking for a standard deviation function myself. The above definition is exactly what wikipedia gives as: In FOCUS, that appears to translate to:
```COMPUTE STD = (ASQ.WEIGHT - (AVE.WEIGHT ** 2)) ** 0.5;
```

WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :

 Posts: 1669 | Location: Enschede, Netherlands | Registered: August 12, 2010 IP
Gold member posted May 02, 2012 01:22 PM Hide Post
This is an example of the standard deviation code I have used for years.
```-* File Standard_Deviation.fex
LET
-* Population variance
PVAR = COMPUTE PVAR.<1>=(ASQ.<1>-(AVE.<1>*AVE.<1>));;
-* sample variance ( population variance time N/(N-1)
SVAR = CNT.<1> NOPRINT COMPUTE # SVAR2 <1>;
SVAR2 = SVAR.<1>=(ASQ.<1>-(AVE.<1>*AVE.<1>)) * (CNT.<1>/(CNT.<1>-1)) ;;
-* population standard deviation
PSDEV = PVAR <1> # NOPRINT COMPUTE PSDEV.<1>=SQRT(PVAR.<1>);;
-* sample standard deviation
SSDEV = SVAR <1> # NOPRINT COMPUTE SSDEV.<1>=SQRT(SVAR.<1>);;
END

SET ASNAMES=ON

TABLE FILE CAR
SUM
AVE.MPG AS AVE_MPG
MIN.MPG AS MIN_MPG
MAX.MPG AS MAX_MPG
SSDEV MPG AS SD_MPG
CNT.MPG AS N
BY COUNTRY
ON TABLE HOLD AS HLD_STATS
END
DEFINE FILE HLD_STATS
SE_MPG/D16.6 = SD_MPG/SQRT(N);
END
TABLE FILE HLD_STATS
SUM
AVE_MPG
MIN_MPG
MAX_MPG
SD_MPG
N
SE_MPG
BY COUNTRY
END
```

 Posts: 60 | Location: Ellensburg Washington | Registered: May 22, 2009 IP
Platinum Member posted May 03, 2012 08:47 AM Hide Post
Not a single word about ANALYSE that was supposed to be a SAS competitor.
Ce que c'est que de nous
( Ashes to ashes, if you prefer )

Focus Mainframe 7.6.11
Dev Studio 7.6.11 and !!!
PC Focus, Focus for OS/2, FFW Six, MSO

 Posts: 134 | Registered: November 06, 2007 IP
Gold member posted May 04, 2012 01:01 PM Hide Post
ANALYSE code ran for many versions after it was no longer documented. I had to rewrite a lot of reports upon its demise.

 Posts: 60 | Location: Ellensburg Washington | Registered: May 22, 2009 IP
Platinum Member posted June 27, 2012 04:46 AM Hide Post
But it still works, if not maintained.
The difficult point is that the Masters created by Analyse are not correct and Focus can't read the Data created by Analyse.
I manually correct the master and I can have Analyse at Work.
Of course, it's not very professional, but some old cow-boys like to spit into a copper spitoon and drink an old scotch with old Focusians friends.
Focusely and Cordially

Focus Mainframe 7.6.11
Dev Studio 7.6.11 and !!!
PC Focus, Focus for OS/2, FFW Six, MSO

 Posts: 134 | Registered: November 06, 2007 IP
Virtuoso posted June 02, 2014 05:37 AM Hide Post
quote:
Originally posted by Wep5622:
quote:
SUM(x**2)/n - (SUM(x)/n)**2

The first term is the definition of ASQ; the second is the average squared. The last part is giving sample values (uses n-1 instead of n instances). This is the calculation given.

I was looking for a standard deviation function myself. The above definition is exactly what wikipedia gives as: In FOCUS, that appears to translate to:
```COMPUTE STD = (ASQ.WEIGHT - (AVE.WEIGHT ** 2)) ** 0.5;
```

I just found out that this formula is incorrect!

The definition of the formula (according to wikipedia) contains SUM((x - AVE.x) **2), which is not equivalent to SUM(x **2) - SUM(AVE.x **2)!

Equivalent would be (if I remember my high-school math correctly):
```    s = SUM(x**2 - 2*x*AVE.x + AVE.x**2)
<=> s = SUM(x**2) - 2*SUM(AVE.x)*SUM(x) + ASQ.x
```

I think this is a correct implementation:
```TABLE FILE EXAMPLE
SUM AVE.X AS X_AVE
BY GROUP
ON TABLE HOLD AS EXAMPLE_AVG FORMAT FOCUS INDEX GROUP
END

JOIN GROUP IN EXAMPLE TO GROUP IN EXAMPLE_AVG AS J0
DEFINE FILE EXAMPLE
DEV/D20.2 = (X - X_AVG) **2;
END
TABLE FILE EXAMPLE
COMPUTE DEV_AVG_SMP/D20.2 = DEV / (CNT.X -1); NOPRINT

COMPUTE STDDEV_POP/D20.2 = AVE.DEV ** 0.5;
COMPUTE STDDEV_SMP/D20.2 = DEV_AVG_SMP ** 0.5;
BY GROUP
END
```

The complexity of this issue raises the question why there is no standard implementation in WebFOCUS! Do we have an ETA for that? It's dearly needed.

This message has been edited. Last edited by: Wep5622,

WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :

 Posts: 1669 | Location: Enschede, Netherlands | Registered: August 12, 2010 IP
Virtuoso posted June 03, 2014 06:26 AM Hide Post
Easier yet, but not always an option: use SQL passthru and calculate stddev there.

That can be done more often than you'd think, because many databases support windowing aggregate functions these days, so you can do stuff like:
```SELECT stddev_samp(X) over (PARTITION BY GROUP) AS X_STDDEV, X
FROM EXAMPLE
;
```

Which is quite a bit more flexible than the "old":
```SELECT stddev_samp(X) AS X_STDDEV
FROM EXAMPLE
GROUP BY X
;
```

WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010
: Member of User Group Benelux :

 Posts: 1669 | Location: Enschede, Netherlands | Registered: August 12, 2010 IP Read-Only Topic
Focal Point Focal Point Forums WebFOCUS/FOCUS Forum on Focal Point [CLOSED] Median or Standard Deviation