Focal Point Banner


As of December 1, 2020, Focal Point is retired and repurposed as a reference repository. We value the wealth of knowledge that's been shared here over the years. You'll continue to have access to this treasure trove of knowledge, for search purposes only.

Join the TIBCO Community
TIBCO Community is a collaborative space for users to share knowledge and support one another in making the best use of TIBCO products and services. There are several TIBCO WebFOCUS resources in the community.

  • From the Home page, select Predict: WebFOCUS to view articles, questions, and trending articles.
  • Select Products from the top navigation bar, scroll, and then select the TIBCO WebFOCUS product page to view product overview, articles, and discussions.
  • Request access to the private WebFOCUS User Group (login required) to network with fellow members.

Former myibi community members should have received an email on 8/3/22 to activate their user accounts to join the community. Check your Spam folder for the email. Please get in touch with us at community@tibco.com for further assistance. Reference the community FAQ to learn more about the community.


Focal Point    Focal Point Forums  Hop To Forum Categories  WebFOCUS/FOCUS Forum on Focal Point     [CLOSED] Hierarchical Focus data files, materialized projections

Read-Only Read-Only Topic
Go
Search
Notify
Tools
[CLOSED] Hierarchical Focus data files, materialized projections
 Login/Join
 
Virtuoso
posted
I had posted this to an earlier thread on a related topic (with no responses), am re-posting to make the question more visible.


For an application querying an Oracle data with metrics data. Due to poor design of the Oracle table, leading to abysmal response time, I am thinking of using a set of hierarchical Focus files that are more tailored to the actual types of queries.

The data have over a dozen primary-key component columns which the user may filter or sort on, and about a half-million rows.

I was thinking of creating a hierarchical Focus database (an approach also suggested by Vivian). That led me to wondering what is the best way of
(1) ordering the segment key fields, and
(2) deciding whether to devote a segment to each or to combine some ("highly correlated") consecutive keys.

-- My hunch is that the top segment key should be the one that has the fewest distinct values (the one that, if it were the sole key, would yield the lowest cardinality). And, proceeding inductively, after n top keys have been identified, the next one (#n+1 in the series) should be the one which induces the least increase in cardinality: the one among the remaining candidates that yields the smallest cardinality for a projection of the data on the list of n+1 keys when it is used in the final slot.

-- That approach, by pushing the higher-fragmentation branch-points further down the segment list, should result in the fewest possible segment instances in the Focus file, and hence the greatest efficiency in extracting the answer-set for a query. Note that the row count in the lowest segment of the structure is invariant w/r/t the ordering of the keys.

Anyone with practical (production) experience to cite, please chime in.


It also occurred to me that one could significantly improve response time by creating a collection of projections, Focus files that each cover the same measures, suitably aggregated, for just a subset of the available dimensions. If we maintain a control table, listing

  • the available data sources;
  • which of the dimension keys of the full table each covers and which not;
  • and the cardinality of each table,

then, at query time, the application could analyze the filter parameters and aggregation parameters of a query, determine which available tables cover all the requisite keys, and which among those has least cardinality -- and direct the query to that table.
sum 
   projection_name 
   by lowest 1 cardinality
   where has_key1 ge &needs_key1; 
   etc. 

-- with the has_i vars and the &needs_i &vars set to 0 (no) or 1 (yes)


Such a structure (of

  • materialized projections,
  • their metadata, and
  • a decision tool for selecting the most compact adequate data source),

would be quite tunable. (Think of it as a poor man's cube structure.)

Again, has anyone out there explored such a structure for online queries of aggregated measures?

This message has been edited. Last edited by: Kathleen Butler,


- Jack Gross
WF through 8.1.05
 
Posts: 1925 | Location: NYC | In FOCUS since 1983 | Registered: January 11, 2005Report This Post
Platinum Member
posted Hide Post
Hi Jack,

I did just that; actually wrote an article Roll Em Up Zip Em Up for the old Systems Journal on just this topic.

Put your smallest distinct values at the top and then aggregate amounts/measures at the next levels and continue down. If you know which fields the user will filter on -- make that indexed and always always use indexed views for retrieval. My tables had 85 million rows that I aggregated in hierarchical fashion and we had retrieval times for reports in under 5 seconds - some nearly instantaneously.

The only thing to be careful about is to reduce the number of records at each level by analyzing the data --- you should try a couple of different structures and see which work the best.

Hope that helps.


Vivian Perlmutter
Aviter, Inc.


WebFOCUS Keysheet Rel. 8.0.2
(Almost) 1001 Ways to Work with Dates thru Rel. 8.0.2
Focus since 1982
WebFOCUS since the beginning
Vivian@aviter.com

 
Posts: 191 | Location: Henderson, Nevada | Registered: April 29, 2003Report This Post
Virtuoso
posted Hide Post
Vivian --

Impressive!

If I read you right, you placed rollups of the measures at each level (obviously with distinct names, or qualifying by the segment, but in either case it means fiddling (=dialog manager) with the code of the query).

Our application has activity-date-range + two organizational hierarchies + geographic + a few other attributes. The two org hierarchies are mutually exclusive -- a query either filters on one, or the other, or neither. My point is, there is no overall natural hierarchy to the dimensions, so I figured on having several materialized views, including the base view with all possible keys, tailoring the collections to the usage pattern, and store measures (with standard fieldnames) in the bottom segment in each (materialized) view.

Given that structure, I figure FOR EACH VIEW order the keys it encompasses as described (first level and inductively from there) to keep the upper-level segments relatively sparse.

The number of tests to run, in order to successively choose the best key for each successive position, for a table covering K keys, will be K + (K-1) + ... = K * (K+1) / 2; but (by the nature of the application), those can be based on scans of just the last 12 months rather than the entire history; and (as the views get populated) need not necessarily run against the base file: If we build and populate each view as soon as its "shape" is defined, the same method that will ultimately serve user queries can be applied here as well, to choose the most compact of the views so far that covers each respective query of the k(k+1)/2.

I figure a reasonable rule of thumb would be to combine successive keys (use S2 or S3 segtype) if the additional keys at that point are sufficiently correlated to the earlier keys that they would add no more than (say) 25% to the cardinality if the music stopped there. Those cardinality numbers need to be determined regardless in order to order the keys, so grouping them into segments can be attended to after the ordering of the keys is complete.


Anyone else have thoughts to add?


-jg
 
Posts: 1925 | Location: NYC | In FOCUS since 1983 | Registered: January 11, 2005Report This Post
Platinum Member
posted Hide Post
Jack,

One other thought -- You can have different databases if there is nothing in common with them and simply call out the database that you need -- or if there is a common top key you can have multiple mutually exclusive children under the top segment.

A thought


Vivian Perlmutter
Aviter, Inc.


WebFOCUS Keysheet Rel. 8.0.2
(Almost) 1001 Ways to Work with Dates thru Rel. 8.0.2
Focus since 1982
WebFOCUS since the beginning
Vivian@aviter.com

 
Posts: 191 | Location: Henderson, Nevada | Registered: April 29, 2003Report This Post
Master
posted Hide Post
We do the vast majority of our reporting out of FOCUS and XFOCUS data structures. Depending on need, they are updated every few hours, daily or weekly. Our data response times are usually a second or two. It generally takes longer to render the report (mostly excel) than it does to actually retrieve the data. Some of our structure have 20 years of data.

A properly built and maintained FOCUS/XFOCUS data file is very fast.

While we don't use it, you may want to investigate IBI's Hyperstage


In FOCUS since 1985. Prod WF 8.0.08 (z90/Suse Linux) DB (Oracle 11g), Self Serv, Report Caster, WebServer Intel/Linux.
 
Posts: 975 | Location: Oklahoma City | Registered: October 27, 2006Report This Post
Virtuoso
posted Hide Post
quote:
If you know which fields the user will filter on -- make that indexed and always always use indexed views for retrieval.


That has to be done judiciously. You need to determine (or guess intelligently) which index will provide the greatest boost in efficiency.

We have several attributes representing a contract option, indicated by an indicator column in the detail-level source database table. The query user can specify "only Yes", "only No", or "All, regardless". Yes is statistically rare in the data, so it pairs with a relatively sparse selection of the other dimensions' values.

If among the filter parameters the user specified "only Yes", you potentially gain a lot of mileage from entering through that index; for "only NO", there's little to gain. So the optimization may need to consider not only which filters are active, but also what filter-values was specified.


- Jack Gross
WF through 8.1.05
 
Posts: 1925 | Location: NYC | In FOCUS since 1983 | Registered: January 11, 2005Report This Post
  Powered by Social Strata  

Read-Only Read-Only Topic

Focal Point    Focal Point Forums  Hop To Forum Categories  WebFOCUS/FOCUS Forum on Focal Point     [CLOSED] Hierarchical Focus data files, materialized projections

Copyright © 1996-2020 Information Builders