Accelerate XML Write Speed?

I'm using Data Migrator to write XML output, and it slows down down to a snail's pace if I write a file of any significant size. Is there a secret sauce that I need to add in order to get the write speed to remain constant throughout the length of the run?

Update -- Looking at the Agent as it runs it appears that the speed of the write is directly related to the length of the file to date. At 1000 records it writes about 300 per minute, at 2000 records it has slowed down to about 150 per minute. It almost appears as if it is scanning the output to date for some reason and that takes longer as the output grows.

J.

This message has been edited. Last edited by: John_Edwards,

Posts: 1012 | Location: At the Mast | Registered: May 17, 2007

Clif

Guru

posted

Hide Post

When writing XML the entire document is accumulated in memory and then written to a file when complete.

There have been improvements in writing XML over the last several releases so please confirm you are using the current production 7.7.04M release.

If you open a hottrack case and provide a repro we can research this.

N/A

Posts: 397 | Location: New York City | Registered: May 03, 2007

Ignored post by Clif posted

Show Post

John_Edwards

Virtuoso

posted

Hide Post

I am running 7.7.03. I have a burning deadline on the 17th so I am not going to enter a ticket on this one right now. I'll be honest -- it looks slower than 7.6.11 to me, but I made some minor changes to the field assignments so I'm not doing an identical comparison.

It cruises along nicely doing its joins. It's when the agent steps to the Modify step that the slowdown begins.

J.

Posts: 1012 | Location: At the Mast | Registered: May 17, 2007

Ignored post by John_Edwards posted

Show Post

John_Edwards

Virtuoso

posted

Hide Post

Alright, when about 6000 records were already written the XML adapter seemed to have a throughput of about 20 records per minute. This means that a 15,000 record output consumes about ten hours of clock time. I can't help but think that there was some go-slow button that I pushed, so I'm going to open a case.

J.

This message has been edited. Last edited by: John_Edwards,

Posts: 1012 | Location: At the Mast | Registered: May 17, 2007

Ignored post by John_Edwards posted

Show Post

John_Edwards

Virtuoso

posted

Hide Post

Ok, I've figured out how to circumvent this problem for my particular project. It isn't the prettiest, but boy is it fast.

The problem is all about the MATCH. Data Migrator is performing a Match where the two choices are:

ON MATCH INCLUDE
ON NOMATCH INCLUDE

Doesn't strike me as a valuable addition to the run.

My particular structure has tens of thousands of entries of one particular element within the output file. Pretty much my file is a wrapper (the top element with a few attributes and a dozen single-copy elements within it) plus a big, massively duplicated structure within it after that. That big structure is the tens of thousands part I mentioned above, and the reason the Match runs so slowly.

Data Migrator is insisting that it look for duplicates of this big structure. There's no option to avoid it. The nature of my particular situation guarantees that duplicates can't exist. At write time Data Migrator issues the MATCH which for me will never exist. As the output grows, the search time for the Match grows as well and I'm stuck waiting. The run becomes a dirge.

My solution -- I pulled the big structure out of the surrounding XML (this is the master file I'm talking about) so that the master file defines just the big, heavily-repeated structure. I created a run that only writes the big structure. The result is an XML output that has multiple copies of the big structure, but doesn't perform the Match because it considers each of them a separate doc. It puts all of them into a single text file as peers one right after the other, attaching the attribute

 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

. . . to each one. I do a global find-and-replace on that string to remove it from the entire run. 30MB of output writes in about 15 seconds and the find-and-replace in Notepad runs in about 30 seconds. What remains is to cut and paste the wrapper on the top and bottom of this monster output.

So I'm doing the final steps by hand.

I'll be honest -- I'm not sure you necessarily want Match to happen on an XML stream output. In a case like this making it optional would have saved me hours of work and would save the product hours of execution time making it look super-fast and super slick. As it stands I cannot release the tool to my end-users, but I can respond in ten minutes when they need to adjust a calculation and resend 45,000 records.

J.

Posts: 1012 | Location: At the Mast | Registered: May 17, 2007

Ignored post by John_Edwards posted

Show Post

Please Wait. Your request is being processed...