Support for COBOL

General questions about using ExamDiff Pro, ideas for new features, bug reports, and usage tips.
Post Reply
ross51
New Member
Posts: 4
Joined: Wed Apr 05, 2023 8:35 am

Support for COBOL

Post by ross51 »

For us dinosaurs, could you add support for COBOL?
I just upgraded to the latest version with the expectation that COBOL would be supported, as there continue to be billions of lines of code still in operation. (Yes, COBOL is verbose.)

Alternatively, where could I find information on creating a plugin with all the COBOL reserved words and comment settings?
Thanks!

Another suggestion is to change from dark mode to light a one-button setting.
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: Support for COBOL

Post by psguru »

While we don't plan on implementing COBOL syntax highlighting, you can define a custom document type for COBOL, as described at https://blog.prestosoft.com/2012/09/com ... f-pro.html . Unfortunately, the syntax highlighting library we use does not support COBOL.
psguru
PrestoSoft
JeremyNicoll
Expert Member
Posts: 108
Joined: Sun May 02, 2010 12:00 pm
Location: Edinburgh

Re: Support for COBOL

Post by JeremyNicoll »

psguru wrote: Wed Apr 05, 2023 1:46 pm Unfortunately, the syntax highlighting library we use does not support COBOL.
It's worse than that: it doesn't support lots of things. One might be inclined to ask why you chose it.

And, it looks like defining a parser for it, would be very hard work - suitable perhaps as an academic project or a job for a comp sci student over a summer vacation.

It's a pity that no-one (so far as I know) has written a tool that could read popular text editor's syntax-colouring definitions (which may be much less sophisticated than full language parsers) and generate Tree-sitter parser definitions from them. It's also not clear to me whether one could use Tree-sitter to parse/colour structured data (ie not programming languages). Also one needs to have or be prepared to install a load of unfamiliar (to me anyway) programming tools even to get started on trying to define (and then generate) a Tree-sitter parser. And ... even if one does somehow manage that - would EDP be able dynamically to load user-defined parsers?

See: https://tree-sitter.github.io/tree-sitt ... ng-parsers


It's also, as far as I can tell, hard to add a language (or I suppose, a data-format) to the Guesslang software used in your "Automatic Programming Language Detection" feature. That though, might possibly be amenable to some sort of user-written plugin, if you could define for us some syntax for returning a "probable file format" decision and allow arbitrary code to be called to make that decision, either instead of Guesslang or (configurably) before or after Guesslang (depending on whether a user thought their own guesser should get first crack at the job, or only be called if Guesslang can't decide).
Last edited by JeremyNicoll on Fri Apr 07, 2023 2:52 am, edited 2 times in total.
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: Support for COBOL

Post by psguru »

Yes, we are well aware of these limitations and are constantly looking for alternatives to the parser engine we use. Tree-sitter is one alternative we are considering. As for user-defined parsers and language detectors, this is something we don't foresee in the immediate future.
psguru
PrestoSoft
JeremyNicoll
Expert Member
Posts: 108
Joined: Sun May 02, 2010 12:00 pm
Location: Edinburgh

Re: Support for COBOL

Post by JeremyNicoll »

psguru wrote: Thu Apr 06, 2023 10:13 am As for user-defined parsers and language detectors, this is something we don't foresee in the immediate future.
When you say "user-defined parsers", do you mean loading Tree-sitter parsers /if/ someone had managed to create one themselves?

I'd have thought the effort involved in allowing users to supply their own guesser would be nearly trivial; you already have code to support a plugin interface. I suppose it'd perhaps need called more than once as (of course) plugins can already change the content that will later be compared. So there might need to be a guess attempted on an unconverted file, and then on a converted one. But that problem must already affect when you call the Guesslang code? Surely all that'd really be new is returning a what-sort-of-file answer to EDP?
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: Support for COBOL

Post by psguru »

When you say "user-defined parsers", do you mean loading Tree-sitter parsers /if/ someone had managed to create one themselves?
I mean any kind of a parser library we end up switching to (if and when this is going to happen).
I'd have thought the effort involved in allowing users to supply their own guesser would be nearly trivial; you already have code to support a plugin interface. I suppose it'd perhaps need called more than once as (of course) plugins can already change the content that will later be compared. So there might need to be a guess attempted on an unconverted file, and then on a converted one. But that problem must already affect when you call the Guesslang code? Surely all that'd really be new is returning a what-sort-of-file answer to EDP?
We don't use Guesslang via the plug-in architecture. Guessing is done on the text buffer; whether it was generated by a plug-in is irrelevant. Then there's mapping to EDP doc types, currently only system-defined. I can foresee mapping to a user-defined type but that language has to be supported by Guesslang (https://github.com/yoeo/guesslang). Remember, type detection only occurs for untyped files, so it's not extremely common.
psguru
PrestoSoft
JeremyNicoll
Expert Member
Posts: 108
Joined: Sun May 02, 2010 12:00 pm
Location: Edinburgh

Re: Support for COBOL

Post by JeremyNicoll »

psguru wrote: Thu Apr 06, 2023 11:11 am We don't use Guesslang via the plug-in architecture. Guessing is done on the text buffer; whether it was generated by a plug-in is irrelevant. Then there's mapping to EDP doc types, currently only system-defined. I can foresee mapping to a user-defined type but that language has to be supported by Guesslang (https://github.com/yoeo/guesslang). Remember, type detection only occurs for untyped files, so it's not extremely common.
I didn't suppose that Guesslang was currently invoked by the plugin architecture, but was suggesting that if you decided to implement alternative ways of having guesses made, by user-supplied code, you could use the same basic logic to do it.

The whole point of my suggestion is that using Guesslang (or anything else) that doesn't support languages that your users use is really not very much use to us. So it'd be helpful if at least there was a place where our own code could make guesses about our own data - which will never - not least because in many cases it'd be proprietary - be supported by any public domain project. As for languages, I expect that many people might have loads of files on their PCs that are all eg .c, but that the variant of c in them is not the same in all cases.

When you say "untyped" files, do you mean those with no extension? Because it seems to me that there's lots of others - zillions of types of data all in .txt files, or logs from different applications all in generic ".log" files.

I have loads of BASIC programs which have all come from other platforms - in essence just backed-up on this system - and not only are some partly tokenised and some plain text, and they're for different flavours of BASIC, but their filetypes vary widely according to the needs of the real or emulated systems they're used on. I've renamed some of them so that they're more easily inspected by the text editors I use under Windows but not all of them. The same is true of lots of files which came from MVS systems where it's the full name of the PDS (vaguely analagous to a folder) that determines the type of the files held within. On Windows lots of those have been given .txt extensions so they'll readily load into an editor. It's not reasonable that one should have to force a set of files with no, or misleading, file extensions to some common form just so one can sensibly compare them.
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: Support for COBOL

Post by psguru »

When you say "untyped" files, do you mean those with no extension?
It means files with extensions not defined in Options | Doc Types and Default/Plain Text files.
psguru
PrestoSoft
ross51
New Member
Posts: 4
Joined: Wed Apr 05, 2023 8:35 am

Re: Support for COBOL

Post by ross51 »

I wasn't thinking of a parser, which would be difficult.
I was thinking of a simple match to a syntax file to color the keywords, like the way my text/programming editor does with the attached file.
Lol, after looking at this file, there are so many reserved words I don't know, much less used. I think I normally use about the same number of reserved words as the Python set.
Attachments
cobol-syntax.txt
(6.62 KiB) Downloaded 233 times
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: Support for COBOL

Post by psguru »

We are talking about the same thing. We'd have to implement the whole COBOL parser (this is how EDP does syntax highlighting) in order to color keywords, detect comments etc.
psguru
PrestoSoft
ross51
New Member
Posts: 4
Joined: Wed Apr 05, 2023 8:35 am

Re: Support for COBOL

Post by ross51 »

Ok, thanks. I appreciate the quick response.
ross51
New Member
Posts: 4
Joined: Wed Apr 05, 2023 8:35 am

Re: Support for COBOL

Post by ross51 »

After using EDP this morning, I realized that coloring the reserved words would not make any difference to its effectiveness.
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: Support for COBOL

Post by psguru »

It doesn't, it just makes the display of files prettier and easier to navigate.
psguru
PrestoSoft
MSpagni
Expert Member
Posts: 537
Joined: Mon Mar 30, 2009 12:53 am
Location: Italy

Re: Support for COBOL

Post by MSpagni »

Don't forget you at least have a possibility: you can define a regexp for detecting comments in your files.
This has also the bonus of allowing to enable the "ignore comments" option.

F.Y.I. I am an Ultraedit user and I made many .uew syntax definition files. They include the comments definition, the various groups of lists of keywords (with specific colors) and the regexp formula to detect the function definitions (and some other syntactic elements).
No need for a parser; a lexical analyzer is enough, together with a regexp engine, of course.
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: Support for COBOL

Post by psguru »

While this is doable, regex parsers can be slow on large files, and generally are considered somewhat obsolete. Tree-sitter-based parsers, on the other hand, are compiled, which makes them a lot faster. The downside is, of course, that one has to white the code for a specific language. EDP uses Crystal Edit Library, which is also compiled.
psguru
PrestoSoft
Post Reply