EBCDIC is not being displayed correctly
-
- Full Member
- Posts: 28
- Joined: Wed Jan 06, 2010 4:21 am
EBCDIC is not being displayed correctly
EDP has a Binary Comparison Character set option to display binary data in either EBCDIC or ASCII. As an IBM Mainframe programmer, I use EBCDIC a lot, and noticed some characters are not displaying correctly:
On the left is the string: "Success ! CDSG, STPQ and LPQ: OK".
On the right is the string: "Success! CDSG, STPQ and LPQ: OK!".
(A blank was removed before the first exclamation mark and added to the end of the string after "OK".)
As you can see, the exclamation-mark is being incorrectly displayed as a right-square-bracket instead of as an exclamation-mark.
I don't know what Code Page EDP is using, but in the CP037 Code Page (which is the one I would expect to be used), hex 5A is an exclamation mark (ASCII hex 21), not a right square bracket:
* https://www.kreativekorp.com/charset/encoding/CP037/
* https://en.wikipedia.org/wiki/Code_page_37
Can this either be fixed or a new option provided so the user can choose which Code Page they prefer to be used instead of whatever code page EDP is currently using?
Thanks!
Keep up the otherwise good work!
On the left is the string: "Success ! CDSG, STPQ and LPQ: OK".
On the right is the string: "Success! CDSG, STPQ and LPQ: OK!".
(A blank was removed before the first exclamation mark and added to the end of the string after "OK".)
As you can see, the exclamation-mark is being incorrectly displayed as a right-square-bracket instead of as an exclamation-mark.
I don't know what Code Page EDP is using, but in the CP037 Code Page (which is the one I would expect to be used), hex 5A is an exclamation mark (ASCII hex 21), not a right square bracket:
* https://www.kreativekorp.com/charset/encoding/CP037/
* https://en.wikipedia.org/wiki/Code_page_37
Can this either be fixed or a new option provided so the user can choose which Code Page they prefer to be used instead of whatever code page EDP is currently using?
Thanks!
Keep up the otherwise good work!
Last edited by David B. Trout on Thu Jan 05, 2023 7:03 pm, edited 1 time in total.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
-
- Full Member
- Posts: 28
- Joined: Wed Jan 06, 2010 4:21 am
Re: EBCDIC is not being displayed correctly
.
FYI: Other programs seem to display EBCDIC data just fine:
.
FYI: Other programs seem to display EBCDIC data just fine:
.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
-
- Full Member
- Posts: 28
- Joined: Wed Jan 06, 2010 4:21 am
Re: EBCDIC is not being displayed correctly
P.S. It would also be nice if the left hand file offset column wasn't so wide too. In the EDP comparison example I posted, the file is only 224 bytes in size. Yet, the left hand file offset column is 16 hexadecimal digits wide!
I seriously doubt anyone would be comparing two 64-petabyte binary files with EDP.
IMHO, an 8 character (8 hex digits = 32-bits) wide file offset column should be plenty.
I seriously doubt anyone would be comparing two 64-petabyte binary files with EDP.
IMHO, an 8 character (8 hex digits = 32-bits) wide file offset column should be plenty.
Last edited by David B. Trout on Thu Jan 05, 2023 7:08 pm, edited 2 times in total.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
Re: EBCDIC is not being displayed correctly
We use a third-party Hex Editor library, and here's their conversion table:
So yes, 5A is converted to ASCII code 93, which is the closing bracket. We can change it to '!' but there may be other problems in this table, so perhaps you could take a look.
As for the address column, it's the standard address length for 64 bits. A 32-bit build has half of this length.
Code: Select all
const int e2a [256] =
{
//0 1 2 3 4 5 6 7 8 9 A B C D E F
0, 1, 2, 3, 156, 9, 134, 127, 151, 141, 142, 11, 12, 13, 14, 15, // 0
16, 17, 18, 19, 157, 133, 8, 135, 24, 25, 146, 143, 28, 29, 30, 31, // 1
128, 129, 130, 131, 132, 10, 23, 27, 136, 137, 138, 139, 140, 5, 6, 7, // 2
144, 145, 22, 147, 148, 149, 150, 4, 152, 153, 154, 155, 20, 21, 158, 26, // 3
' ', 160, 161, 162, 163, 164, 165, 166, 167, 168, 91, '.', '<', '(', '+', 33, // 4
'&', 169, 170, 171, 172, 173, 174, 175, 176, 177, 93, '$', '*', ')', ';', 94, // 5
'-', '/', 178, 179, 180, 181, 182, 183, 184, 185, 124, ',', '%', 95, '>', '?', // 6
186, 187, 188, 189, 190, 191, 192, 193, 194, 96, ':', '#', '@', 39, '=', 34, // 7
195, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 196, 197, 198, 199, 200, 201, // 8
202, 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 203, 204, 205, 206, 207, 208, // 9
209, 126, 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 210, 211, 212, 213, 214, 215, // A
216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, // B
123, 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 232, 233, 234, 235, 236, 237, // C
125, 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 238, 239, 240, 241, 242, 243, // D
92, 159, 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 244, 245, 246, 247, 248, 249, // E
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 250, 251, 252, 253, 254, 255 // F
};
As for the address column, it's the standard address length for 64 bits. A 32-bit build has half of this length.
psguru
PrestoSoft
PrestoSoft
-
- Full Member
- Posts: 28
- Joined: Wed Jan 06, 2010 4:21 am
Re: EBCDIC is not being displayed correctly
Just out of curiosity, do they document where THEY got it from?
I do not wish to be rude, but is there a reason why you guys can't do that? I posted the URLs to the official CP037 table, which is the one you should be using IMO:
..... https://www.kreativekorp.com/charset/encoding/CP037/
..... https://en.wikipedia.org/wiki/Code_page_37
Nevertheless, I shall take a look at it myself and will let you know which code points appear to be incorrect. Thanks.
Duh!
My point was that you do not need such a wide file offset column since the size of the files being compared are highly unlikely to be so large as to need it.
Regardless of the size of the host operating system or its file system (32 vs. 64), the size of the FILES being compared are almost always going to be much LESS than 4GB. So there's no need to have such a wide file offset column. The width of the file offset column is dependent on the size of the files being compared, not on the bitness (size) of the host operating system or file system on which the files reside.
So IMHO, the default should be to use only a 32-bit file offset column width, and only switch to a 64-bit file offset column width if/when such is actually needed. (which in my opinion is likely to be never)
Does that make sense now?
In any case, I thank you for your response. I really appreciate it. I will post my analysis of your "e2a" table in a few minutes.
Thanks.
Last edited by David B. Trout on Wed Mar 01, 2023 8:03 pm, edited 1 time in total.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
Re: EBCDIC is not being displayed correctly
Code: Select all
I do not wish to be rude, but is there a reason why you guys can't do that?
Code: Select all
So IMHO, the default should be to use only a 32-bit file offset column width, and only switch to a 64-bit file offset column width if/when such is actually needed. (which in my opinion is likely to be never)
Code: Select all
In any case, I thank you for your response. I really appreciate it. I will post my analysis of your "e2a" table in a few minutes.
psguru
PrestoSoft
PrestoSoft
-
- Full Member
- Posts: 28
- Joined: Wed Jan 06, 2010 4:21 am
Re: EBCDIC is not being displayed correctly
Nevertheless, I shall take a look at it myself and will let you know which code points appear to be incorrect. Thanks.
I will post my analysis of your "e2a" table in a few minutes.
I THINK I FOUND THE PROBLEM!
EDP appears to be using code page 500! (not 37):
(https://www.kreativekorp.com/charset/encoding/CP500/)
(https://en.wikipedia.org/wiki/Code_page ... code_pages):
Which exactly matches the translation table you posted.Code page 500, known as "International EBCDIC", "International Latin-1" or "International Number 5", is the other major EBCDIC encoding for the ISO/IEC 8859-1 repertoire. It is used in Belgium, Switzerland and on AS/400 systems in Canada. It is related to code page 37 and has the same repertoire, but differs in seven positions; in particular, it encodes [ and ] at 4A hex and 5A hex respectively, which are used for the cent sign (¢) and exclamation point (!) in code page 37. The caret (^) is also encoded at 5F hex, similarly to code page 1047. The ¢ is encoded at B0 hex, the ¬ at BA hex, the ! at 4F hex and the pipe character (|) at BB hex.
BUT... according to Wikipedia, code page 37 is actually the most used and best supported EBCDIC code page in the world:
(https://www.kreativekorp.com/charset/encoding/CP037/)
(https://en.wikipedia.org/wiki/Code_page_37):
Code page 37 is one of the most-used and best-supported EBCDIC code pages. It is used as the default z/OS code page in the United States and other English speaking countries. It is considered the "required" EBCDIC code page for the United States, and also used in Australia, New Zealand, the Netherlands, Portugal and Brazil, and on ESA/390 systems in Canada, but not on Canadian AS/400 systems, which use Code page 500 instead. It is one of four EBCDIC code pages (alongside 500, 875 and 1026) with mapping data supplied by Microsoft to the Unicode Consortium, and one of seven (alongside 273, 424, 500, 875, 1026 and 1140) supported by Python as standard.
So in my opinion the default table that EDP should be using should be 37 (not 500), and you should provide an option (two radio buttons?) to allow the user to choose which code page they prefer (37 or 500).
Doing that would provide EDP with the widest compatibility range possible, and should make the largest number of customers happy: those who prefer code page 500 and those who, like me, prefer code page 37 (the most widely used and best supported EBCDIC code page in the world).
Is there any chance of that maybe happening at some point in the future? I'm a Windows C/C++ GUI programmer myself, and the change in my experience seems in all honesty to to be fairly simple and straightforward.
Thank you for listening, and thank you for considering this change (bug fix?) request!
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
Re: EBCDIC is not being displayed correctly
Wow! The best to create a mess, I think.It is one of four EBCDIC code pages (alongside 500, 875 and 1026) with mapping data supplied by Microsoft to the Unicode Consortium, and one of seven (alongside 273, 424, 500, 875, 1026 and 1140) supported by Python as standard.
EBCDIC... And you call me archaic!
I agree with David: what's the use of so many digits for the file offset? A lot of screen real estate is wasted.
(N.B. I use most often the 32 bitter version, so I'm not particularly concerned with this problem, but anyway...)
Re: EBCDIC is not being displayed correctly
We'll add the following requests to the list of planned features:
Binary comparison improvements
Binary comparison improvements
- Ability to switch between code pages 500 and 37 fro EBCDIC encoding
- Reduce the size of the address column
psguru
PrestoSoft
PrestoSoft
-
- Full Member
- Posts: 28
- Joined: Wed Jan 06, 2010 4:21 am
Re: EBCDIC is not being displayed correctly
THANK YOU!!
You guys are the greatest!
EDP totally rocks!
(And it keeps getting better!)
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
Re: EBCDIC is not being displayed correctly
Could you please provide some sample files that encoded with code page 37? And maybe a couple of code page 500 files?
psguru
PrestoSoft
PrestoSoft
-
- Full Member
- Posts: 28
- Joined: Wed Jan 06, 2010 4:21 am
Re: EBCDIC is not being displayed correctly
.
Just create a 256-byte binary file contains the values hex 00 to hex FF.
Then translate it from code page 37 to ASCII, and display/dump the results.
Then do the exact same thing, but translate from code page 500 to ASCII.
Then simply eyeball each result to make sure each byte was translated correctly.
Then do the same thing, but in reverse: translate the same hex 00 to hex FF table from ASCII to code page 37, then to code page 500, and display/dump the results, and compare (eyeball) each against the ASCII output of the first test to, again, make sure things are being translated properly.
I'm sure I could certainly sit down and write such a program if I had the time to do so, but I'm not grasping why I'm the one that needs to do it.
I understand that it is me that is requesting the change to EDP, but if you're already in the process of adding code to your product to perform such translations, wouldn't it then be trivially easy to test such code using the technique I described?
(sigh) Give me some time and I will try to create some test files for you.
In the mean time (in the interim), while you are waiting for me, please give the hex 00 to hex FF table a try. I'm sure it should work just as well as any test file I could provide to you.
I do appreciate that you are making this change for me! Thank you for that!
.
Just create a 256-byte binary file contains the values hex 00 to hex FF.
Then translate it from code page 37 to ASCII, and display/dump the results.
Then do the exact same thing, but translate from code page 500 to ASCII.
Then simply eyeball each result to make sure each byte was translated correctly.
Then do the same thing, but in reverse: translate the same hex 00 to hex FF table from ASCII to code page 37, then to code page 500, and display/dump the results, and compare (eyeball) each against the ASCII output of the first test to, again, make sure things are being translated properly.
I'm sure I could certainly sit down and write such a program if I had the time to do so, but I'm not grasping why I'm the one that needs to do it.
I understand that it is me that is requesting the change to EDP, but if you're already in the process of adding code to your product to perform such translations, wouldn't it then be trivially easy to test such code using the technique I described?
(sigh) Give me some time and I will try to create some test files for you.
In the mean time (in the interim), while you are waiting for me, please give the hex 00 to hex FF table a try. I'm sure it should work just as well as any test file I could provide to you.
I do appreciate that you are making this change for me! Thank you for that!
.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
-
- Full Member
- Posts: 28
- Joined: Wed Jan 06, 2010 4:21 am
Re: EBCDIC is not being displayed correctly
.
Here are two test files for you:
. .
I hope that helps!
.
Here are two test files for you:
. .
I hope that helps!
.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
Re: EBCDIC is not being displayed correctly
Thanks. We were thinking about these code pages... Here's an idea: why should EBCDIC files treated as second-class citizens and compared as binary files? Or, for that matter, any other non-Unicode (ANSI) code pages? So one potential approach would be to have an option in EDP to define the default code page (with the default set to the Windows system page, typically 1252 in the US). This would allow, e.g, EBCDIC files to be opened and saved as text files, not as binary. Of course, with the option set to, say, page 37, this will make all "regular" text files look like garbage.
Another (perhaps a future) approach is to specify file's code page in the File Open dialog, to override the default setting. This way you could compare EBCDIC files by setting the page to 37/500 just for them.
Another (perhaps a future) approach is to specify file's code page in the File Open dialog, to override the default setting. This way you could compare EBCDIC files by setting the page to 37/500 just for them.
psguru
PrestoSoft
PrestoSoft
-
- Full Member
- Posts: 28
- Joined: Wed Jan 06, 2010 4:21 am
Re: EBCDIC is not being displayed correctly
You're very welcome. Because I very much want to see this option implemented, I took a break from my busy schedule and decided to try and create the test files for you. (I needed a break anyway.)
It was easier than I expected. I didn't even need to write any program at all! Our existing product (https://en.wikipedia.org/wiki/Hercules_(emulator)) allowed me to create them very quickly and easily. It supports a multitude of different code pages.
FYI: I also have a test file for EBCDIC code page 1047 too, if you're interested in it. It's another semi-popular EBCDIC code page.
Precisely!
That sounds ideal!
True, but that's to be expected. When you save a text file using an EBCDIC code page, "it looks like garbage" when you open it in e.g. Notepad too, because it's not in (Duh!) ASCII. It's in EBCDIC.
But oftentimes one needs to deal with EBCDIC files when working on mainframes. While the mainframes themselves are always EBCDIC, many mainframers use Windows, so when a file is transfered from the mainframe to Windows (and you want the file you receive to be an EXACT copy of what's on the mainframe and thus transfer the file in binary mode), it'd be nice if there was a tool such as EDP that could properly deal with these EBCDIC files. Hence my request.
That would work too.
I REALLY appreciate you guys looking into and seriously considering this request!
EDP ROCKS!
Last edited by David B. Trout on Fri Feb 24, 2023 11:10 am, edited 1 time in total.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook