Ok, I've figured out how to circumvent this problem for my particular project. It isn't the prettiest, but boy is it fast.
The problem is all about the MATCH. Data Migrator is performing a Match where the two choices are:
ON MATCH INCLUDE
ON NOMATCH INCLUDE
Doesn't strike me as a valuable addition to the run.
My particular structure has tens of thousands of entries of one particular element within the output file. Pretty much my file is a wrapper (the top element with a few attributes and a dozen single-copy elements within it) plus a big, massively duplicated structure within it after that. That big structure is the tens of thousands part I mentioned above, and the reason the Match runs so slowly.
Data Migrator is insisting that it look for duplicates of this big structure. There's no option to avoid it. The nature of my particular situation guarantees that duplicates can't exist. At write time Data Migrator issues the MATCH which for me will never exist. As the output grows, the search time for the Match grows as well and I'm stuck waiting. The run becomes a dirge.
My solution -- I pulled the big structure out of the surrounding XML (this is the master file I'm talking about) so that the master file defines just the big, heavily-repeated structure. I created a run that only writes the big structure. The result is an XML output that has multiple copies of the big structure, but doesn't perform the Match because it considers each of them a separate doc. It puts all of them into a single text file as peers one right after the other, attaching the attribute
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
. . . to each one. I do a global find-and-replace on that string to remove it from the entire run. 30MB of output writes in about 15 seconds and the find-and-replace in Notepad runs in about 30 seconds. What remains is to cut and paste the wrapper on the top and bottom of this monster output.
So I'm doing the final steps by hand.
I'll be honest -- I'm not sure you necessarily want Match to happen on an XML stream output. In a case like this making it optional would have saved me hours of work and would save the product hours of execution time making it look super-fast and super slick. As it stands I cannot release the tool to my end-users, but I can respond in ten minutes when they need to adjust a calculation and resend 45,000 records.
J.