Focal Point
[CASE-OPENED] Let's suppose I have an incredibly long string

This topic can be found at:
https://forums.informationbuilders.com/eve/forums/a/tpc/f/7971057331/m/6297001576

March 12, 2015, 02:33 PM
John_Edwards
[CASE-OPENED] Let's suppose I have an incredibly long string
I actually need to deal with this in Data Migrator, but for the moment I'm doing the R&D in WebFOCUS.

I have a field in SQLServer 2008 that is very long, an entire xml file dropped into a varchar(max) field. Even SQLServer's Management Studio is having trouble dealing with it. My best guess is that 100,000 characters is the outside edge of the length of this field.

The (ahem) LONG and short of it is this -- I need the contents of that field, one way or another, to pass through WebFOCUS. I need to extract specific parts of the xml and place them into separate fields in a datamart for a reporting application.


Here's the question -- how do I get the rest of the string? I'm open to creative solutions, be they on the WebFOCUS side or the SQL*Server side.

J.

This message has been edited. Last edited by: <Kathryn Henning>,



March 12, 2015, 02:40 PM
j.gross
I haven't been there, but there was a recent post regarding creating a synonym for an XML file. -- reporting against that synonym would enable you to directly address the parts, which are presumably of manageable size.

-j

NB - But if the xml record is part of another record, you'd need to isolate it as a separate file; which would bring us back to WF's varchar size limitations.
March 12, 2015, 02:52 PM
Alan B
The last time I had to deal with large XML strings, I believe I found that there was a 32k limit for WebFOCUS, strings / width of data retrieved in one record.

Somewhere I did get a solution, but cannot remember atm. I'll hunt through if nobody else can suggest.


Alan.
WF 7.705/8.007
March 12, 2015, 03:10 PM
John_Edwards
This is what I'm grinding on now -- a SQL*Server view with this in it. Note that "Detail" is my long field:

SUBSTRING(Detail, 1, 30000) AS Detail_1,
SUBSTRING(Detail, 30001, 30000) AS Detail_2,
SUBSTRING(Detail, 60001, 30000) AS Detail_3,
SUBSTRING(Detail, 90001, 30000) AS Detail_4,
SUBSTRING(Detail, 120001, 30000) AS Detail_5,
SUBSTRING(Detail, 150001, 30000) AS Detail_6

I can get a running master file against the view and have the six fields with the full string available to me in A30000V fields.

That lets me address the entire string from WF and I-Way. But I'll be searching for strings and I need to make sure the words (or even entire tags) I'm searching for don't get split across the boundaries of two Detail fields. So I'll likely have to have overlap between each string boundary to make sure I have at least one copy of each tag in one piece, and then do some sort of redundant search across all of them. A little kludgy, but that's what I-Way is for -- creative solutions.

The killer is that there's thousands of blank spaces between the tags in each copy of the string being stored. Likely each string could be cut in half, saving about 30k per record in a table with millions of records. It would be much easier if these records were tightened.

J.



March 12, 2015, 03:29 PM
Clif
Let the server treat the xml as xml. Describe the column as TX.

If you have an xsd that describes the xml that would be ideal. If not start by extracting one xml document from the MS SQL Server table and save it as a file with an extension of .xml.

Right click on the field name and select "Map External XML" and select the xsd or xml document.

That adds a new segment to the synonym with the elements in the xml.


N/A
March 13, 2015, 09:18 AM
John_Edwards
That sounds great Clif, but step one is getting a lasso around the doggone thing. As it stands even SQL*Server's tools aren't letting me extract the entire string in one piece. TX truncated or errorred-out in my master file.

That sounds like an interesting feature in the WF tool that I didn't know existed though -- is this from the Edit As Text part or the deluxe Edit In Synonym Editor part? I was of the opinion that I would need the XML Adapter to do XML work.

There's about a dozen values I need to pull and the rest I can throw on the floor. I'll need to do this about 2 million times, so I may just got straight to the fields I need the old fashioned way.



March 13, 2015, 11:36 AM
Clif



N/A
March 13, 2015, 11:44 AM
Clif



N/A
March 13, 2015, 11:44 AM
John_Edwards
That menu option is not lit on the environment I'm currently working on. I'm with a different client today so I can't check it in the environment I need it, but I'll give it a shot when I get the chance. Thanks for the tip.

J.



March 18, 2015, 10:21 AM
John_Edwards
Alright, great news, that Map External XML worked like a champ once I figured out the details and figured out a way to get a file loaded up for it.

-- I did a cut and paste on the six fields I built to subdivide the original field, dropping them into a text doc and splicing the pieces together (even that took a little creative thinking).

-- As a charity to all machines involved I did a find-and-replace on the huge volumes of blank characters in the file, which trimmed its size immensely.

-- Went back to Data Migrator and ran the Map External XML, using the sample .xml file that I built. (I did this on a copy of the master file in question in case it puked, which it didn't).

-- Boink! Revised master file with the xml fields included, all fields populated. Apparently the xml parser doesn't mind files that run over 32,767 characters, which is a good thing for xml because it's so doggone verbose.

That's a pretty cool feature, thanks for pointing me to it.

Of course, a bit of inspection with the new structure reveals to me that there are at least three different xml formats being contained in that single field, and I can't rule out there being sixteen or twenty due to the kind of data being managed. So I'll likely have to break each different xml file type out into different fields in a view, and then run Map External XML separately on each of those fields. This will almost assuredly produce THE SINGLE LARGEST MASTER FILE IN INFORMATION BUILDERS HISTORY.

Gentlemen, we stand on the shoulders of giants. Well you guys do, I stand on the shoulders of an aardvark. Hard to see from here.

Thank you all for your input.

J.



March 18, 2015, 12:24 PM
John_Edwards
. . . but . . . now I get Not Enough Virtual Memory when I try to report against it, even with a recordlimit of 1. I'll need to look in the synonym to see what's blowing the lid off the pot.



March 19, 2015, 03:56 PM
John_Edwards
Just to follow up, it's pretty clear I'm handling more than 32,767 characters when I'm parsing the XML field, but I'm not able to handle the longer values in the XML field. So EDA is handling more than 32k characters when it is resolving all the fields but is not handling the longest strings out there.

Does anyone on the inside have an idea how many characters EDA is able to handle internally when reaching for XML from a data field?

J.



April 15, 2015, 02:32 PM
John_Edwards
quote:
Let the server treat the xml as xml. Describe the column as TX.


Alright Clif, you say this, but I still can't consume the entire string for some entries. Here's my current descrip for the field:

FIELDNAME=DETAIL, ALIAS=Detail, USAGE=TX50, ACTUAL=TX, FIELDTYPE=R,
MISSING=ON, $


This pops the top off of Data Migrator job. I get a Not Enough Virtual Memory error when attempting to get data from the longer fields. Do you have a better way of making the connection to varchar(max)?

J.