Focal Point

[CLOSED] Median or Standard Deviation

This topic can be found at:

http://forums.informationbuilders.com/eve/forums/a/tpc/f/7971057331/m/7441097331

http://forums.informationbuilders.com/eve/forums/a/tpc/f/7971057331/m/7441097331

August 10, 2004, 03:03 PM

[CLOSED] Median or Standard Deviation

HiI tried searching the board but didnt find anything. Is there a way to calculate the median or standard deviation of a data set in WebFocus ?

Thanks

Stephen

August 10, 2004, 03:15 PM

This is an old problem for WF.

I just calculated standard deviation yesterday with WF 5.2.1 on AIX (self service application)

Unfortunately you have to write the formula using the prefix ASQ. AVE. and then SQRT.

it would be better for you to use "Sorting with multiple Display Command" (see Manual)

Paolo

August 10, 2004, 03:16 PM

Stephen

August 10, 2004, 03:57 PM

TABLE FILE CAR

SUM ASQ.WEIGHT AVE.WEIGHT CNT.WEIGHT

COMPUTE STD=((ASQ.WEIGHT-AVE.WEIGHT *AVE.WEIGHT) **.5) /CNT.WEIGHT ;

END

The reason its not a function in WebFocus is that

often the denominator has to be (N-1) and not N

as in this example.

August 10, 2004, 04:06 PM

TABLE FILE CAR

PRINT WEIGHT NOPRINT

COMPUTE MEDIAN=MEDIAN +1 ;

WHERE TOTAL MEDIAN EQ 9 ;

BY WEIGHT

END

Note that the number of values has to be known

in advance (9). But you can find this out

automatically , eg

TABLE FILE CAR

RRINT WEIGHT AND HOLD

END

-SET &COUNT = &LINES ;

TABLE FILE CAR

PRINT WEIGHT NOPRINT

COMPUTE MEDIAN=MEDIAN +1 ;

WHERE TOTAL MEDIAN EQ &LINES ;

BY WEIGHT

END

August 10, 2004, 04:09 PM

TABLE FILE CAR

RRINT WEIGHT AND HOLD

END

-SET &COUNT = &LINES ;

TABLE FILE CAR

PRINT WEIGHT NOPRINT

COMPUTE MEDIAN=MEDIAN +1 ;

WHERE TOTAL MEDIAN EQ &COUNT ;

BY WEIGHT

END

if the &COUNT is even then often it and the next value is used and averaged.

August 10, 2004, 04:34 PM

August 10, 2004, 04:57 PM

August 10, 2004, 05:36 PM

August 10, 2004, 06:25 PM

-* median is the 'middle' value (if odd number of values), or the

-* average of the two middle values (if even number)

-* when all the values are sorted

TABLE FILE &FILE

-* use cnt. to get number of instances

WRITE CNT.&FIELD NOPRINT

COMPUTE

-* use this if even number of instances

M1/I4=INT((CNT.&FIELD + 1)/2); NOPRINT COMPUTE

-* use this too for even. if odd number of instances - same as m1

M2/I4=INT((CNT.&FIELD + 2)/2); NOPRINT

-* sort field(s) within which to calculate median

BY &BYFIELD

-* get details in sort fields and 'median'ed field order

COMPUTE

-* concat all by fields to compute the xlist (for when to reset)

BYFIELDS/A10=&BYFIELD; NOPRINT

-* use this compute to order the field within sorts

COMPUTE XLIST/I5=IF BYFIELDS EQ LAST BYFIELDS THEN XLIST + 1 ELSE 1; NOPRINT

-* get cumulative values for median calculation

-* if count is odd, M1 = M2, and only one value. If even, need average

COMPUTE C&FIELD/&FMT = IF M1 NE M2

THEN (LAST &FIELD + &FIELD)/2 ELSE &FIELD;

AS 'MEDIAN,&FIELD'

BY &BYFIELD BY &FIELD

-* only look at possible median values (m1 = m2 if odd, only m2 if even)

WHERE TOTAL XLIST EQ M2

END

August 11, 2004, 11:36 AM

the formula is wrong.

A wonderful old IBI Manual "Statistical Analyse" says that for computational puprposes is used:

TABLE FILE CAR

SUM ASQ.WEIGHT AVE.WEIGHT CNT.WEIGHT

COMPUTE STD=(

(ASQ.WEIGHT-CNT.WEIGHT*AVE.WEIGHT*AVE.WEIGHT) **.5) /(CNT.WEIGHT -1); END

Paolo

August 11, 2004, 03:56 PM

Nope, your formula would give a negative

number. Actually you can take my formula

and multiple by cnt.x/(cnt.x -1) to adjust for the degrees of freedom.

August 11, 2004, 05:37 PM

LET

-* Population variance

PVAR = COMPUTE PVAR.<1>=(ASQ.<1>-(AVE.<1>*AVE.<1>));;

-* sample variance ( population variance time N/(N-1)

SVAR = CNT.<1> NOPRINT COMPUTE # SVAR2 <1>;

SVAR2= SVAR.<1>=(ASQ.<1>-(AVE.<1>*AVE.<1>)) * (CNT.<1>/(CNT.<1>-1)) ;;

-* population standard deviation

PSDEV = PVAR <1> # NOPRINT COMPUTE PSDEV.<1>=SQRT(PVAR.<1>);;

-* sample standard deviation

SSDEV = SVAR <1> # NOPRINT COMPUTE SSDEV.<1>=SQRT(SVAR.<1>);;

END So, to get the POPULATION variance of WEIGHT, you'd use:

PVAR WEIGHT

to get the SAMPLE standard deviation, you'd use:

SSDEV WEIGHT

August 12, 2004, 10:17 AM

I try

STDEVA1=(ASQ.var-(AVE.var*AVE.var))*(CNT.var/(CNT.var-1)) ;

STDEV1=SQRT(STDEVA1);

and

STDEVA=(ASQ.var-(CNT.var*(AVE.var)**2))/(CNT.var-1));

STDEV=SQRT(STDEVA);

on my sample :

var

6,97

16,99

19,1

45,52

25,64

3,02

384,81

26,00

Then I test both with excel

and the good formula is my formula STDEV.

STDEV1= 411,04

STDEV= 129,47

ST.DEV (excel)= 129,4718873

Paolo

August 17, 2004, 02:40 PM

I'm not sure what your ENTIRE request was, but I took your data, loaded it into a file, and ran your code, as follows. I had to remove the final right paren (one too many), and got the following results:

STDEVA1 STDEV1 STDEVA STDEV

------- ------ ------ -----

16,762.95 129.47 -2,261.45 .00

Seems correct to me, for the calculation we gave.

Here's the code:

TABLE FILE PAOLO

SUM CNT.DATA NOPRINT

COMPUTE

STDEVA1=(ASQ.DATA-(AVE.DATA*AVE.DATA))*(CNT.DATA/(CNT.DATA-1)) ;

STDEV1=SQRT(STDEVA1);

STDEVA=(ASQ.DATA-(CNT.DATA*(AVE.DATA)**2))/(CNT.DATA-1);

STDEV=SQRT(STDEVA);

END

Now, my Stats book defines VARIANCE (St. Dev. squared) as:

(n * SUM(x**2) - (SUM(x))**2)/n**2

which can be re-arranged as:

(n * SUM(x**2))/n**2 - (SUM(x)**2)/n**2

which becomes:

SUM(x**2)/n - (SUM(x)/n)**2

The first term is the definition of ASQ; the second is the average squared. The last part is giving sample values (uses n-1 instead of n instances). This is the calculation given.

August 17, 2004, 08:22 PM

(Sum(x**2) - n * ave.x**2)/(n-1)

That first term is ASQ.x * n, NOT ASQ.x

Maybe THAT's what's causing the confusion.

August 19, 2004, 04:39 PM

August 23, 2004, 12:55 PM

I used ASQ.x * n while I wrote ASQ.x without n

My justification is that

my query is very complex within multiple

sort fields and I went wrong to copy in the mail

Paolo

May 01, 2012, 09:24 AM

quote:SUM(x**2)/n - (SUM(x)/n)**2

The first term is the definition of ASQ; the second is the average squared. The last part is giving sample values (uses n-1 instead of n instances). This is the calculation given.

I was looking for a standard deviation function myself. The above definition is exactly what wikipedia gives as:

In FOCUS, that appears to translate to:

COMPUTE STD = (ASQ.WEIGHT - (AVE.WEIGHT ** 2)) ** 0.5;

WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010

: Member of User Group Benelux :

May 02, 2012, 01:22 PM

-* File Standard_Deviation.fex LET -* Population variance PVAR = COMPUTE PVAR.<1>=(ASQ.<1>-(AVE.<1>*AVE.<1>));; -* sample variance ( population variance time N/(N-1) SVAR = CNT.<1> NOPRINT COMPUTE # SVAR2 <1>; SVAR2 = SVAR.<1>=(ASQ.<1>-(AVE.<1>*AVE.<1>)) * (CNT.<1>/(CNT.<1>-1)) ;; -* population standard deviation PSDEV = PVAR <1> # NOPRINT COMPUTE PSDEV.<1>=SQRT(PVAR.<1>);; -* sample standard deviation SSDEV = SVAR <1> # NOPRINT COMPUTE SSDEV.<1>=SQRT(SVAR.<1>);; END SET ASNAMES=ON TABLE FILE CAR SUM AVE.MPG AS AVE_MPG MIN.MPG AS MIN_MPG MAX.MPG AS MAX_MPG SSDEV MPG AS SD_MPG CNT.MPG AS N BY COUNTRY ON TABLE HOLD AS HLD_STATS END DEFINE FILE HLD_STATS SE_MPG/D16.6 = SD_MPG/SQRT(N); END TABLE FILE HLD_STATS SUM AVE_MPG MIN_MPG MAX_MPG SD_MPG N SE_MPG BY COUNTRY END

May 03, 2012, 08:47 AM

Ce que c'est que de nous

( Ashes to ashes, if you prefer )

Focus Mainframe 7.6.11

Dev Studio 7.6.11 and !!!

PC Focus, Focus for OS/2, FFW Six, MSO

May 04, 2012, 01:01 PM

June 27, 2012, 04:46 AM

The difficult point is that the Masters created by Analyse are not correct and Focus can't read the Data created by Analyse.

I manually correct the master and I can have Analyse at Work.

Of course, it's not very professional, but some old cow-boys like to spit into a copper spitoon and drink an old scotch with old Focusians friends.

Focusely and Cordially

Focus Mainframe 7.6.11

Dev Studio 7.6.11 and !!!

PC Focus, Focus for OS/2, FFW Six, MSO

June 02, 2014, 05:37 AM

quote:Originally posted by Wep5622:quote:SUM(x**2)/n - (SUM(x)/n)**2

The first term is the definition of ASQ; the second is the average squared. The last part is giving sample values (uses n-1 instead of n instances). This is the calculation given.

I was looking for a standard deviation function myself. The above definition is exactly what wikipedia gives as:

In FOCUS, that appears to translate to:COMPUTE STD = (ASQ.WEIGHT - (AVE.WEIGHT ** 2)) ** 0.5;

I just found out that this formula is incorrect!

The definition of the formula (according to wikipedia) contains SUM((x - AVE.x) **2), which is

Equivalent would be (if I remember my high-school math correctly):

s = SUM(x**2 - 2*x*AVE.x + AVE.x**2) <=> s = SUM(x**2) - 2*SUM(AVE.x)*SUM(x) + ASQ.x

I think this is a correct implementation:

TABLE FILE EXAMPLE SUM AVE.X AS X_AVE BY GROUP ON TABLE HOLD AS EXAMPLE_AVG FORMAT FOCUS INDEX GROUP END JOIN GROUP IN EXAMPLE TO GROUP IN EXAMPLE_AVG AS J0 DEFINE FILE EXAMPLE DEV/D20.2 = (X - X_AVG) **2; END TABLE FILE EXAMPLE COMPUTE DEV_AVG_SMP/D20.2 = DEV / (CNT.X -1); NOPRINT COMPUTE STDDEV_POP/D20.2 = AVE.DEV ** 0.5; COMPUTE STDDEV_SMP/D20.2 = DEV_AVG_SMP ** 0.5; BY GROUP END

The complexity of this issue raises the question why there is no standard implementation in WebFOCUS! Do we have an ETA for that? It's dearly needed.

WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010

: Member of User Group Benelux :

June 03, 2014, 06:26 AM

That can be done more often than you'd think, because many databases support windowing aggregate functions these days, so you can do stuff like:

SELECT stddev_samp(X) over (PARTITION BY GROUP) AS X_STDDEV, X FROM EXAMPLE ;

Which is quite a bit more flexible than the "old":

SELECT stddev_samp(X) AS X_STDDEV FROM EXAMPLE GROUP BY X ;

WebFOCUS 8.1.03, Windows 7-64/2008-64, IBM DB2/400, Oracle 11g & RDB, MS SQL-Server 2005, SAP, PostgreSQL 11, Output: HTML, PDF, Excel 2010

: Member of User Group Benelux :