I don't have any doc other than what is present with the software, but most of my knowledge with code pages comes from using and testing with many different clients on a bunch of different operating systems and data bases.
Things you need to know:
- The default code page of your server is critical. This defines what code page the data you are working will get converted to during an execution of a process. Specifically when it comes to writing of data. (a)
- The default code page should encapsulate all the characters you are expecting to process. (b)
- DBMSs can behave differently depending on the DBMS. You have to read all the doc for the DMBSs that you are using. All have their own configuration specifics when you install and configure. For example, DB2 can be installed using a UTF8 code page (it is something like 908 or something close to that), however, if you use the DB2 native CLI to connect, you have to have the environment variable DB2CODEPAGE set to the installed DBMS code page. If you do not, then the client will convert a data stream from what you send it to the client default, then the server will convert it back.
Example:
(a) - I your server is set to code page 65001 (UTF-8), and you read a single byte latin-1 defined file (CODEPAGE=137 in master); If you create a hold file it will be converted to 65001 (the server default). If you need single byte output, then you would have to write the data using a MODIFY to an existing master with the CODEPAGE=137 attribute already present in the file.
(b) - If you server is set to the installed default (437), and you have French characters, If you create a hold file, the French characters will not be held properly because there are no French characters in the 437 code page.
My recommended installation for the last 2 years:
- WF/DM server set to 65001
- WF Web application set to 65001
- configure all DBMSs for UTF8 data
- Add code page attribute to all masters with non-UTF8 data (for example, add CODEPAGE=137 to car.mas)
Lastly, the easiest way to understand this is to play with all the code page settings. I am confident that you can get a high level of understanding with a couple of days of playing around. Also, when it comes to WF, you are primarily dealing with 3 code pages: 437, 137, 65001.
437 is all numbers and symbols and all English characters as single bytes.
137 has all the 437 characters, plus French characters (minus 2 characters), all as single bytes.
65001 has everything in 137, where numbers and English chars are single bytes, all French characters stored as mult-bytes, and virtually every other character from almost every other language you should ever need.
"There is no limit to what you can achieve ... if you don’t care who gets the credit." Roger Abbott