[SOLVED]Best practices around querying/staging data

I am relatively new to WebFOCUS (with 2 months of experience) and I have been asked to review our current implementation and come up with suggestions on improving how we we stage/query data.

In a given fex, our current/old approach has been to query each individual table and "hold them". Then, we do all the filtering/joining/sorting in WebFOCUS. I think we all agree that this is not the most efficient approach. Our goal is to create wide denormalized tables at the database level perform the join/filter/sort at the database level.

So, if I were to create one of those views and I have multiple fexes (in a single Portal page) that query this view, would it be more efficient to perform multiple database calls (from each fex) or should I consider caching the data with the initial database query and have the subsequent queries reference the cached dataset? I understand that a lot of this depends on the size of the dataset, how the underlying tables are indexed, etc.

At this point, I am looking for some references regarding best practices and whatever suggestions you guys can offer. Thanks.

Raj

This message has been edited. Last edited by: <Emily McAllister>,

WebFOCUS 8.x, Windows, HTML/PDF

Posts: 23 | Registered: April 29, 2016

BabakNYC

Virtuoso

posted

Hide Post

What database are you using?
Are all the hold files created off of the same database?

In general, I always prefer to push as much of the data processing to the DBMS. In some cases, that's not possible. For instance, if you have to JOIN data from different data sources (say Oracle and SQL Server), then it might be a good idea to create some hold files and join them together. You'll have to figure out why the original design of the application called for creating hold files. There might be a good reason why they did it that way.

WebFOCUS 8206, Unix, Windows

Posts: 1853 | Location: New York City | Registered: December 30, 2015

Ignored post by BabakNYC posted

Show Post

rajje

Member

posted

Hide Post

as of now, we have data in both Oracle and SQL Server. Some of the hold files are created off of the same database and some off of different ones.

For cases where hold files were created off of the same database, my understanding was that our WebFOCUS developers didn't have permission/bandwidth to create database views/denormalized tables and thus everything was done in WebFOCUS.

So, what you are saying is that, provided we are able to stage all our data in a single data mart/warehouse, we should stay away from hold files, right?

WebFOCUS 8.x, Windows, HTML/PDF

Posts: 23 | Registered: April 29, 2016

Ignored post by rajje posted

Show Post

rajje

Member

posted

Hide Post

Also, what is the most efficient way to implement sorting? Our current practice is to call the same fex and rebuild the entire dataset.

WebFOCUS 8.x, Windows, HTML/PDF

Posts: 23 | Registered: April 29, 2016

Ignored post by rajje posted

Show Post

John_Edwards

Virtuoso

posted

Hide Post

If you have tables that are reused regularly (either by multiple reports or by single reports used many times) and require the combination of multiple tables and derived fields (i.e., "flattening") you likely should be considering some level of data warehousing. That is, create data tables that are run overnight and contain all of your normalized data in a denormalized form that better suits your reporting needs. In short, replace your Hold files with tables in a dbms.

Data Migrator (an I-Way tool often used in conjunction with WebFOCUS) may be a product you already own that will facilitate this, though you can do it with WebFOCUS and Report Caster as well.

The idea is that you do your hard-crunching once each night, or each hour, or each minute depending on your need for freshness. Your reporting code becomes much simpler because your joins and computed fields are already built in to the single table you're reporting from.

Views can do the same thing, but they run on-demand and that can slow your execution speed down. Having the job run for ten minutes overnight doesn't bother anyone.

The follow-on benefit to this approach is that you can create some seriously plush InfoAssist targets, with human-readable fields that can be pressed into service quickly, and can be indexed to make things run faster even in an ad-hoc environment.

This message has been edited. Last edited by: John_Edwards,

Posts: 1012 | Location: At the Mast | Registered: May 17, 2007

Ignored post by John_Edwards posted

Show Post

BabakNYC

Virtuoso

posted

Hide Post

1. Whenever possible, denormalize the tables using RDBMS views instead of creating hold files.
2. If your reports need to access operational tables because you don't have a separate Data Mart, then hold files are a good way to avoid hitting your live tables.
3. If you have to blend Oracle and SQL Server data with large tables go with hold files.

What I'd do is audit each application to find out if it's practical to replace hold files. Hold is just one of many different strategies in creating applications. Sometimes, it's the best way to approach the problem and sometimes there are alternative ways. Like John says, if you have users hitting a relatively sedentary data set over and over, maybe creating a hold file for that purpose is the most efficient way to let them do their work. If you have 10's of tables joined in a funky structure, denormalizing them in a view would make it a lot easier to work with.

WebFOCUS 8206, Unix, Windows

Posts: 1853 | Location: New York City | Registered: December 30, 2015

Ignored post by BabakNYC posted

Show Post

John_Edwards

Virtuoso

posted

Hide Post

I'll just mention here that I believe holding to FOCUS files for long term usage is not legit use of WebFOCUS software. I had a colleague get pinged on that when an SE showed up at his shop. I wouldn't be too concerned about it, but it's worth a mention.

Posts: 1012 | Location: At the Mast | Registered: May 17, 2007

Ignored post by John_Edwards posted

Show Post

CoolGuy

Virtuoso

posted

Hide Post

When my company was on 7.x (sadly we still are to a degree), we had a few users with code access (that should've never had it) that would build all sorts of convoluted holds and joins (across server nodes) all in a single fex that were all 500+ lines of code that ultimately ended in like a few lines at the very bottom for the actual output request. Sometimes even loads more code, just for 1 aggregate record of output. Not to mention no comments (so we have little idea what their end goal was in the first place...). Now we are stuck sifting through all the bad code of theirs to migrate it over to 8.x.

So, in conclusion, use holds sparingly if needed so that you don't hit operational tables, or are trying to combine data sets across nodes. Otherwise, denormalized (human-readable) views or tables on the data server end are the best way to go if you can get them created for you at your site. Dimensional hierarchies are the best for InfoAssist users, etc.

8.2.02M (production), 8.2.02M (test), Windows 10, all outputs.

Posts: 1113 | Location: USA | Registered: January 27, 2015

Ignored post by CoolGuy posted

Show Post

rajje

Member

posted

Hide Post

Hi guys, thank you very much for the feedback. We are definitely going to be using HOLD files sparingly from now on.

CoolGuy, we are in the same predicament that you are in. So, that's why we are making it a priority to stage the data properly and not repeat the same mistake.

WebFOCUS 8.x, Windows, HTML/PDF

Posts: 23 | Registered: April 29, 2016

Ignored post by rajje posted

Show Post

CoolGuy

Virtuoso

posted

Hide Post

rajje,

I feel your pain then... lol Best of luck to you and your team!

8.2.02M (production), 8.2.02M (test), Windows 10, all outputs.

Posts: 1113 | Location: USA | Registered: January 27, 2015

Ignored post by CoolGuy posted