EBCDIC is not being displayed correctly

General questions about using ExamDiff Pro, ideas for new features, bug reports, and usage tips.
Post Reply
David B. Trout
Junior Member
Posts: 24
Joined: Wed Jan 06, 2010 4:21 am

EBCDIC is not being displayed correctly

Post by David B. Trout »

EDP has a Binary Comparison Character set option to display binary data in either EBCDIC or ASCII. As an IBM Mainframe programmer, I use EBCDIC a lot, and noticed some characters are not displaying correctly: :(

edp-ebcdic.png
edp-ebcdic.png (90.46 KiB) Viewed 197 times

On the left is the string: "Success ! CDSG, STPQ and LPQ: OK".
On the right is the string: "Success! CDSG, STPQ and LPQ: OK!".

(A blank was removed before the first exclamation mark and added to the end of the string after "OK".)

As you can see, the exclamation-mark is being incorrectly displayed as a right-square-bracket instead of as an exclamation-mark.

I don't know what Code Page EDP is using, but in the CP037 Code Page (which is the one I would expect to be used), hex 5A is an exclamation mark (ASCII hex 21), not a right square bracket:

* https://www.kreativekorp.com/charset/encoding/CP037/
* https://en.wikipedia.org/wiki/Code_page_37

Can this either be fixed or a new option provided so the user can choose which Code Page they prefer to be used instead of whatever code page EDP is currently using?

Thanks!

Keep up the otherwise good work! :)
Last edited by David B. Trout on Thu Jan 05, 2023 7:03 pm, edited 1 time in total.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
David B. Trout
Junior Member
Posts: 24
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

.
FYI: Other programs seem to display EBCDIC data just fine:

HXD.png
HXD.png (51.13 KiB) Viewed 196 times
.
hexedit.png
hexedit.png (49.31 KiB) Viewed 196 times
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
David B. Trout
Junior Member
Posts: 24
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

P.S. It would also be nice if the left hand file offset column wasn't so wide too. In the EDP comparison example I posted, the file is only 224 bytes in size. Yet, the left hand file offset column is 16 hexadecimal digits wide!

I seriously doubt anyone would be comparing two 64-petabyte binary files with EDP. :P

IMHO, an 8 character (8 hex digits = 32-bits) wide file offset column should be plenty. :wink:
Last edited by David B. Trout on Thu Jan 05, 2023 7:08 pm, edited 2 times in total.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
User avatar
psguru
Site Admin
Posts: 2145
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: EBCDIC is not being displayed correctly

Post by psguru »

We use a third-party Hex Editor library, and here's their conversion table:

Code: Select all

const int e2a [256] =
{
//0    1    2    3    4    5    6    7    8    9    A    B    C    D    E    F
  0,   1,   2,   3, 156,   9, 134, 127, 151, 141, 142,  11,  12,  13,  14,  15,	// 0
 16,  17,  18,  19, 157, 133,   8, 135,  24,  25, 146, 143,  28,  29,  30,  31,	// 1
128, 129, 130, 131, 132,  10,  23,  27, 136, 137, 138, 139, 140,   5,   6,   7,	// 2
144, 145,  22, 147, 148, 149, 150,   4, 152, 153, 154, 155,  20,  21, 158,  26,	// 3
' ', 160, 161, 162, 163, 164, 165, 166, 167, 168,  91, '.', '<', '(', '+',  33,	// 4
'&', 169, 170, 171, 172, 173, 174, 175, 176, 177,  93, '$', '*', ')', ';',  94,	// 5
'-', '/', 178, 179, 180, 181, 182, 183, 184, 185, 124, ',', '%',  95, '>', '?',	// 6
186, 187, 188, 189, 190, 191, 192, 193, 194,  96, ':', '#', '@',  39, '=',  34,	// 7
195, 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 196, 197, 198, 199, 200, 201,	// 8
202, 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 203, 204, 205, 206, 207, 208,	// 9
209, 126, 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 210, 211, 212, 213, 214, 215,	// A
216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231,	// B
123, 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 232, 233, 234, 235, 236, 237,	// C
125, 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 238, 239, 240, 241, 242, 243,	// D
 92, 159, 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 244, 245, 246, 247, 248, 249,	// E
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 250, 251, 252, 253, 254, 255	// F
};
So yes, 5A is converted to ASCII code 93, which is the closing bracket. We can change it to '!' but there may be other problems in this table, so perhaps you could take a look.

As for the address column, it's the standard address length for 64 bits. A 32-bit build has half of this length.
psguru
PrestoSoft
David B. Trout
Junior Member
Posts: 24
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

psguru wrote: Thu Jan 05, 2023 3:25 pm We use a third-party Hex Editor library, and here's their conversion table:
Just out of curiosity, do they document where THEY got it from?

psguru wrote: Thu Jan 05, 2023 3:25 pm We can change it to '!' but there may be other problems in this table, so perhaps you could take a look.
I do not wish to be rude, but is there a reason why you guys can't do that? I posted the URLs to the official CP037 table, which is the one you should be using IMO:

..... https://www.kreativekorp.com/charset/encoding/CP037/
..... https://en.wikipedia.org/wiki/Code_page_37

Nevertheless, I shall take a look at it myself and will let you know which code points appear to be incorrect. Thanks.

psguru wrote: Thu Jan 05, 2023 3:25 pm As for the address column, it's the standard address length for 64 bits. A 32-bit build has half of this length.
Duh! :P

My point was that you do not need such a wide file offset column since the size of the files being compared are highly unlikely to be so large as to need it.

Regardless of the size of the host operating system or its file system (32 vs. 64), the size of the FILES being compared are almost always going to be much LESS than 4GB. So there's no need to have such a wide file offset column. The width of the file offset column is dependent on the size of the files being compared, not on the bitness (size) of the host operating system or file system on which the files reside.

So IMHO, the default should be to use only a 32-bit file offset column width, and only switch to a 64-bit file offset column width if/when such is actually needed. (which in my opinion is likely to be never)

Does that make sense now?

In any case, I thank you for your response. I really appreciate it. I will post my analysis of your "e2a" table in a few minutes.

Thanks.
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
User avatar
psguru
Site Admin
Posts: 2145
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: EBCDIC is not being displayed correctly

Post by psguru »

Code: Select all

I do not wish to be rude, but is there a reason why you guys can't do that?
Because it's not something we know well. We did look at the web resources, and they seem to be not very clear, at leas with our level of knowledge of EBCDIC encoding.

Code: Select all

So IMHO, the default should be to use only a 32-bit file offset column width, and only switch to a 64-bit file offset column width if/when such is actually needed. (which in my opinion is likely to be never)
Unfortunately, the code in the library is not easy to change in this area, so it's likely to stay as is.

Code: Select all

In any case, I thank you for your response. I really appreciate it. I will post my analysis of your "e2a" table in a few minutes.
Thank you.
psguru
PrestoSoft
David B. Trout
Junior Member
Posts: 24
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

Nevertheless, I shall take a look at it myself and will let you know which code points appear to be incorrect. Thanks.
I will post my analysis of your "e2a" table in a few minutes.

I THINK I FOUND THE PROBLEM!

EDP appears to be using code page 500! (not 37):

(https://www.kreativekorp.com/charset/encoding/CP500/)
(https://en.wikipedia.org/wiki/Code_page ... code_pages):
Code page 500, known as "International EBCDIC", "International Latin-1" or "International Number 5", is the other major EBCDIC encoding for the ISO/IEC 8859-1 repertoire. It is used in Belgium, Switzerland and on AS/400 systems in Canada. It is related to code page 37 and has the same repertoire, but differs in seven positions; in particular, it encodes [ and ] at 4A hex and 5A hex respectively, which are used for the cent sign (¢) and exclamation point (!) in code page 37. The caret (^) is also encoded at 5F hex, similarly to code page 1047. The ¢ is encoded at B0 hex, the ¬ at BA hex, the ! at 4F hex and the pipe character (|) at BB hex.
Which exactly matches the translation table you posted.


BUT... according to Wikipedia, code page 37 is actually the most used and best supported EBCDIC code page in the world:

(https://www.kreativekorp.com/charset/encoding/CP037/)
(https://en.wikipedia.org/wiki/Code_page_37):
Code page 37 is one of the most-used and best-supported EBCDIC code pages. It is used as the default z/OS code page in the United States and other English speaking countries. It is considered the "required" EBCDIC code page for the United States, and also used in Australia, New Zealand, the Netherlands, Portugal and Brazil, and on ESA/390 systems in Canada, but not on Canadian AS/400 systems, which use Code page 500 instead. It is one of four EBCDIC code pages (alongside 500, 875 and 1026) with mapping data supplied by Microsoft to the Unicode Consortium, and one of seven (alongside 273, 424, 500, 875, 1026 and 1140) supported by Python as standard.

So in my opinion the default table that EDP should be using should be 37 (not 500), and you should provide an option (two radio buttons?) to allow the user to choose which code page they prefer (37 or 500).

Doing that would provide EDP with the widest compatibility range possible, and should make the largest number of customers happy: those who prefer code page 500 and those who, like me, prefer code page 37 (the most widely used and best supported EBCDIC code page in the world).

Is there any chance of that maybe happening at some point in the future? I'm a Windows C/C++ GUI programmer myself, and the change in my experience seems in all honesty to to be fairly simple and straightforward.

Thank you for listening, and thank you for considering this change (bug fix?) request! :D
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
MSpagni
Expert Member
Posts: 523
Joined: Mon Mar 30, 2009 12:53 am
Location: Italy

Re: EBCDIC is not being displayed correctly

Post by MSpagni »

It is one of four EBCDIC code pages (alongside 500, 875 and 1026) with mapping data supplied by Microsoft to the Unicode Consortium, and one of seven (alongside 273, 424, 500, 875, 1026 and 1140) supported by Python as standard.
Wow! The best to create a mess, I think. :D

EBCDIC... And you call me archaic! :lol:

I agree with David: what's the use of so many digits for the file offset? A lot of screen real estate is wasted.
(N.B. I use most often the 32 bitter version, so I'm not particularly concerned with this problem, but anyway...)
User avatar
psguru
Site Admin
Posts: 2145
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: EBCDIC is not being displayed correctly

Post by psguru »

We'll add the following requests to the list of planned features:

Binary comparison improvements
  • Ability to switch between code pages 500 and 37 fro EBCDIC encoding
  • Reduce the size of the address column
psguru
PrestoSoft
David B. Trout
Junior Member
Posts: 24
Joined: Wed Jan 06, 2010 4:21 am

Re: EBCDIC is not being displayed correctly

Post by David B. Trout »

psguru wrote: Fri Jan 06, 2023 11:17 am We'll add the following requests to the list of planned features:

Binary comparison improvements
Ability to switch between code pages 500 and 37 fro EBCDIC encoding
Reduce the size of the address column
THANK YOU!! :D

You guys are the greatest!

EDP totally rocks!

(And it keeps getting better!) :D :D
"Fish" (David B. Trout)
"Programming today is a race between
software engineers striving to build bigger
and better idiot-proof programs, and the
Universe trying to produce bigger and better
idiots. So far, the Universe is winning"
- Rich Cook
Post Reply