Focal Point
[SHARING] Help with preventing cartesian scans of OLAP database?

This topic can be found at:
https://forums.informationbuilders.com/eve/forums/a/tpc/f/7971057331/m/576104125

September 03, 2010, 11:42 PM
PeterV
[SHARING] Help with preventing cartesian scans of OLAP database?
Hi. This might be tough for me to explain, as I'm not 100% sure why my report is bombing out.

Here is what I know, possbily someone might have a suggestion for me to resolve the issue I'm dealing with?

I have a very simple report with an SSRS2005 cube as the source.

My code looks like this:

 -SET &completeddate = DATECVT((DATEMOV((DATEADD((DATEADD((DATECVT(&YYMD,'I8YYMD','YYMD')),'D',-6)),'M', -1)), 'EOM')), 'YYMD','I8YYMD');

DEFINE FILE DASHBOARD_MIDNIGHT_INPATIENT_STAY ADD
MonthDatetime/YYMD = HDATE(HINPUT(14, Month1 , 8, 'HYYMD'),'YYMD');
END

TABLE FILE DASHBOARD_MIDNIGHT_INPATIENT_STAY
PRINT
       YTD_Count_of_Discharges_including_Same_Day_Stays/D12
BY HIGHEST 1 MonthDatetime NOPRINT
BY Facility
BY Serv_Code
BY Month1
BY Fiscal_Year1
WHERE MonthDatetime GT '&completeddate' ;
WHERE Acct_Type_Descr EQ 'Inpatient';
ON TABLE NOTOTAL
-*ON TABLE HOLD AS DISCHARGEHOLD
END
 


The amper variable '&completeddate' calculates a date back in time from the current date that we use as the threshold of what to report from... this is a rolling report, so we're always trying to grab it at the appropriate time, when a month's data has been considered "completed" (by business standards).

From what I know of FOCUS, it should process the WHERE clause first, must like MSSQL does... unfortunately, this doesn't seem to be happening in my query.

Watching the logs on the OLAP server, when I fire off this fex, it immediately runs a cartesian scan on the cube server. I would assume this to happen if I weren't limiting the dataset with the initial WHERE clause, but with it, it seems to be forcing a day-by-day calculation across all of the values in the BY fields... resorting in a grotesquely huge amount of data being cached.

BTW, not sure if this matters from a FOCUS perspective or not, but the measure 'YTD_Count_of_Discharges_including_Same_Day_Stays' is a calculated measure summing up the year-to-date counts for this measure.

I don't really have anything else I can say definitively about it... I can't find any documentation that helps, neither can I find forum posts that seem to help. Any ideas? I figured I'd try here first and then open a support ticket if I couldn't get an answer here.

All I need to do is pull back the date set limited to the WHERE filters, without loading the entire cube's worth of data first. It's bombing the agents and maxing CPU usage on the olap servers, which is getting me looked at rather unkindly from our OLAP admins Smiler

Thanks for anything you can offer!

This message has been edited. Last edited by: Kerry,


WebFOCUS v. 7.6.10
___________________
Win2k3 Server
Intel Xeon 2.3Ghz/4Gb Ram Apache / IIS / SQL Srvr 2005
September 04, 2010, 03:50 AM
Dan Satchell
It's difficult to know for certain without seeing the SQL trace, but I suspect the sort on a DEFINEd field (BY HIGHEST 1 MonthDatetime) is causing the problem. Try changing the DEFINE to a COMPUTE, which is performed after retrieval, and then use a BY TOTAL:

-SET &completeddate = DATECVT((DATEMOV((DATEADD((DATEADD((DATECVT(&YYMD,'I8YYMD','YYMD')),'D',-6)),'M', -1)), 'EOM')), 'YYMD','I8YYMD');

TABLE FILE DASHBOARD_MIDNIGHT_INPATIENT_STAY
 PRINT YTD_Count_of_Discharges_including_Same_Day_Stays/D12
 COMPUTE MonthDatetime/YYMD = HDATE(HINPUT(14, Month1 , 8, 'HYYMD'),'YYMD');
 BY TOTAL HIGHEST 1 MonthDatetime NOPRINT
 BY Facility
 BY Serv_Code
 BY Month1
 BY Fiscal_Year1
 WHERE MonthDatetime GT '&completeddate' ;
 WHERE Acct_Type_Descr EQ 'Inpatient';
 ON TABLE NOTOTAL
-*ON TABLE HOLD AS DISCHARGEHOLD
END



WebFOCUS 7.7.05
September 04, 2010, 05:48 PM
PeterV
That was a good suggestion, I didn't think of processing the date in a COMPUTE vs a DEFINE. I'll have to remember that for other stuff going forward.

Coincidentally, the code runs if you make your WHERE into a WHERE TOTAL MonthDateTime.

Unfortunately, it didn't resolve the problem, it's still bombing the OLAP server.

Would a SQL Trace file help? I could get one run on Tues, since this is our Labor Day weekend and things are closed until then.

Appreciate the help though, very much! There's still so much that our team is learning about this language and how it processes Smiler


WebFOCUS v. 7.6.10
___________________
Win2k3 Server
Intel Xeon 2.3Ghz/4Gb Ram Apache / IIS / SQL Srvr 2005
September 05, 2010, 03:17 AM
Dan Satchell
Yes, WHERE TOTAL is required for filtering on a COMPUTEd field and is not applied until after data retrieval. So the approach I suggested above won't help with your problem. I think you will need to convert your &completeddate to a format that can be used in a WHERE clause for comparison with column Month1. This should reduce the number of rows being returned. Something like this:

-SET &completeddate = DATECVT((DATEMOV((DATEADD((DATEADD((DATECVT(&YYMD,'I8YYMD','YYMD')),'D',-6)),'M',-1)),'EOM')),'YYMD','I8YYMD');
-SET &xcompleteddate = HCNVRT(HDTTM(DATECVT(&completeddate,'I8YYMD','YYMD'),12,'HYYMDm'),'(H14)',14,'A14');

DEFINE FILE DASHBOARD_MIDNIGHT_INPATIENT_STAY
 MonthDatetime/YYMD = HDATE(HINPUT(14,Month1,8,'HYYMD'),'YYMD');
END

TABLE FILE DASHBOARD_MIDNIGHT_INPATIENT_STAY
 PRINT YTD_Count_of_Discharges_including_Same_Day_Stays/D12
 BY HIGHEST 1 MonthDatetime NOPRINT
 BY Facility
 BY Serv_Code
 BY Month1
 BY Fiscal_Year1
 WHERE Month1 GT '&xcompleteddate';
 WHERE Acct_Type_Descr EQ 'Inpatient';
 ON TABLE NOTOTAL
-*ON TABLE HOLD AS DISCHARGEHOLD
END



WebFOCUS 7.7.05
September 06, 2010, 09:30 PM
j.gross
Your request will generate a value for YTD_Count_of_Discharges_including_Same_Day_Stays, for each distinct combination of (Facility, Serv_Code, Month1, Fiscal_Year1), and then discard rows whose Month1 does not map to the HIGHEST MonthDatetime value. Is the time portion of Month1 constant or variable?

What is the generated SQL for the TABLE request as is, and when you remove the BY HIGHEST line?


- Jack Gross
WF through 8.1.05
September 07, 2010, 11:31 AM
PeterV
Thanks again Dan. It didnt' fix it, but I appreciate the suggestion. I learn a bit more each time.

Jack, I'm meeting with our OLAP admins this mornign to run a trace on the OLAP server to see what is actually being sent. I'll post the results later.

The Month1 is an A14 char type in the synonymn, dates are stored like 'Aug 2010'. I ended up re-COMPUTEing it differently... this iteration looked like this:
 MonthDateTime/YYMD=HDATE(HINPUT(14, Month1 , 8, 'HYYMD'), 'YYMD') 


I thought maybe I was converting the date incorrectly before. It formats ok, but again, still immediately blasts the OLAP server.

Oh well, this is one of those 'lessons learned in pain' I won't quickly forget, once we figure it out. Again, very much appreciate the suggestions and insight.


WebFOCUS v. 7.6.10
___________________
Win2k3 Server
Intel Xeon 2.3Ghz/4Gb Ram Apache / IIS / SQL Srvr 2005
September 07, 2010, 03:22 PM
j.gross
You are screening twice on the defined field MonthDatetime

1. BY HIGHEST 1 MonthDatetime NOPRINT
2. WHERE MonthDatetime GT '&completeddate' ;

If at all possible, rephrase both in terms of a database date field.

If you cannot get the limitation to a single month passed through, I would make it a two-stage affair -- hold the list of Month1 values that lie in the "HIGHEST 1 MonthDatetime" and satisfy other constraints, and then impose the date-range limit on the report by reference to that hold file ("WHERE Month1 IN FILE hold;")
September 08, 2010, 11:56 AM
PeterV
Yeah, we see that as soon as the query executes, it pretty much opens all of the dimensions on the cube server and starts sucking in data.

We figured out a way around this... not sure if it's the most elegant solution or not (and I'd welcome opinions on optimizing further), but it is functional and seems sustainable.

We have a FiscalYear field. I created two rolling variables, one reflecting 1 year back, one reflecting one year forward. I made a FISCALYEAR FROM TO statement as the first WHERE, and limited the data to a 3 year maximum, then ran the rest of the WHERE clause. It allows the data to return in about 30 seconds, which is acceptable for these reports, since they're monthly generated and not done as ad hoc end-user reporting.

Appreciate so much the valuable insight though. And again, welcome any critiques or suggestions on this as a 'solution'.

Peter


WebFOCUS v. 7.6.10
___________________
Win2k3 Server
Intel Xeon 2.3Ghz/4Gb Ram Apache / IIS / SQL Srvr 2005