Page 1 of 1

Support for COBOL

Posted: Wed Apr 05, 2023 8:47 am
by ross51
For us dinosaurs, could you add support for COBOL?
I just upgraded to the latest version with the expectation that COBOL would be supported, as there continue to be billions of lines of code still in operation. (Yes, COBOL is verbose.)

Alternatively, where could I find information on creating a plugin with all the COBOL reserved words and comment settings?
Thanks!

Another suggestion is to change from dark mode to light a one-button setting.

Re: Support for COBOL

Posted: Wed Apr 05, 2023 1:46 pm
by psguru
While we don't plan on implementing COBOL syntax highlighting, you can define a custom document type for COBOL, as described at https://blog.prestosoft.com/2012/09/com ... f-pro.html . Unfortunately, the syntax highlighting library we use does not support COBOL.

Re: Support for COBOL

Posted: Thu Apr 06, 2023 10:04 am
by JeremyNicoll
psguru wrote: Wed Apr 05, 2023 1:46 pm Unfortunately, the syntax highlighting library we use does not support COBOL.
It's worse than that: it doesn't support lots of things. One might be inclined to ask why you chose it.

And, it looks like defining a parser for it, would be very hard work - suitable perhaps as an academic project or a job for a comp sci student over a summer vacation.

It's a pity that no-one (so far as I know) has written a tool that could read popular text editor's syntax-colouring definitions (which may be much less sophisticated than full language parsers) and generate Tree-sitter parser definitions from them. It's also not clear to me whether one could use Tree-sitter to parse/colour structured data (ie not programming languages). Also one needs to have or be prepared to install a load of unfamiliar (to me anyway) programming tools even to get started on trying to define (and then generate) a Tree-sitter parser. And ... even if one does somehow manage that - would EDP be able dynamically to load user-defined parsers?

See: https://tree-sitter.github.io/tree-sitt ... ng-parsers


It's also, as far as I can tell, hard to add a language (or I suppose, a data-format) to the Guesslang software used in your "Automatic Programming Language Detection" feature. That though, might possibly be amenable to some sort of user-written plugin, if you could define for us some syntax for returning a "probable file format" decision and allow arbitrary code to be called to make that decision, either instead of Guesslang or (configurably) before or after Guesslang (depending on whether a user thought their own guesser should get first crack at the job, or only be called if Guesslang can't decide).

Re: Support for COBOL

Posted: Thu Apr 06, 2023 10:13 am
by psguru
Yes, we are well aware of these limitations and are constantly looking for alternatives to the parser engine we use. Tree-sitter is one alternative we are considering. As for user-defined parsers and language detectors, this is something we don't foresee in the immediate future.

Re: Support for COBOL

Posted: Thu Apr 06, 2023 10:39 am
by JeremyNicoll
psguru wrote: Thu Apr 06, 2023 10:13 am As for user-defined parsers and language detectors, this is something we don't foresee in the immediate future.
When you say "user-defined parsers", do you mean loading Tree-sitter parsers /if/ someone had managed to create one themselves?

I'd have thought the effort involved in allowing users to supply their own guesser would be nearly trivial; you already have code to support a plugin interface. I suppose it'd perhaps need called more than once as (of course) plugins can already change the content that will later be compared. So there might need to be a guess attempted on an unconverted file, and then on a converted one. But that problem must already affect when you call the Guesslang code? Surely all that'd really be new is returning a what-sort-of-file answer to EDP?

Re: Support for COBOL

Posted: Thu Apr 06, 2023 11:11 am
by psguru
When you say "user-defined parsers", do you mean loading Tree-sitter parsers /if/ someone had managed to create one themselves?
I mean any kind of a parser library we end up switching to (if and when this is going to happen).
I'd have thought the effort involved in allowing users to supply their own guesser would be nearly trivial; you already have code to support a plugin interface. I suppose it'd perhaps need called more than once as (of course) plugins can already change the content that will later be compared. So there might need to be a guess attempted on an unconverted file, and then on a converted one. But that problem must already affect when you call the Guesslang code? Surely all that'd really be new is returning a what-sort-of-file answer to EDP?
We don't use Guesslang via the plug-in architecture. Guessing is done on the text buffer; whether it was generated by a plug-in is irrelevant. Then there's mapping to EDP doc types, currently only system-defined. I can foresee mapping to a user-defined type but that language has to be supported by Guesslang (https://github.com/yoeo/guesslang). Remember, type detection only occurs for untyped files, so it's not extremely common.

Re: Support for COBOL

Posted: Thu Apr 06, 2023 11:32 am
by JeremyNicoll
psguru wrote: Thu Apr 06, 2023 11:11 am We don't use Guesslang via the plug-in architecture. Guessing is done on the text buffer; whether it was generated by a plug-in is irrelevant. Then there's mapping to EDP doc types, currently only system-defined. I can foresee mapping to a user-defined type but that language has to be supported by Guesslang (https://github.com/yoeo/guesslang). Remember, type detection only occurs for untyped files, so it's not extremely common.
I didn't suppose that Guesslang was currently invoked by the plugin architecture, but was suggesting that if you decided to implement alternative ways of having guesses made, by user-supplied code, you could use the same basic logic to do it.

The whole point of my suggestion is that using Guesslang (or anything else) that doesn't support languages that your users use is really not very much use to us. So it'd be helpful if at least there was a place where our own code could make guesses about our own data - which will never - not least because in many cases it'd be proprietary - be supported by any public domain project. As for languages, I expect that many people might have loads of files on their PCs that are all eg .c, but that the variant of c in them is not the same in all cases.

When you say "untyped" files, do you mean those with no extension? Because it seems to me that there's lots of others - zillions of types of data all in .txt files, or logs from different applications all in generic ".log" files.

I have loads of BASIC programs which have all come from other platforms - in essence just backed-up on this system - and not only are some partly tokenised and some plain text, and they're for different flavours of BASIC, but their filetypes vary widely according to the needs of the real or emulated systems they're used on. I've renamed some of them so that they're more easily inspected by the text editors I use under Windows but not all of them. The same is true of lots of files which came from MVS systems where it's the full name of the PDS (vaguely analagous to a folder) that determines the type of the files held within. On Windows lots of those have been given .txt extensions so they'll readily load into an editor. It's not reasonable that one should have to force a set of files with no, or misleading, file extensions to some common form just so one can sensibly compare them.

Re: Support for COBOL

Posted: Thu Apr 06, 2023 11:38 am
by psguru
When you say "untyped" files, do you mean those with no extension?
It means files with extensions not defined in Options | Doc Types and Default/Plain Text files.

Re: Support for COBOL

Posted: Thu Apr 06, 2023 3:44 pm
by ross51
I wasn't thinking of a parser, which would be difficult.
I was thinking of a simple match to a syntax file to color the keywords, like the way my text/programming editor does with the attached file.
Lol, after looking at this file, there are so many reserved words I don't know, much less used. I think I normally use about the same number of reserved words as the Python set.

Re: Support for COBOL

Posted: Thu Apr 06, 2023 3:51 pm
by psguru
We are talking about the same thing. We'd have to implement the whole COBOL parser (this is how EDP does syntax highlighting) in order to color keywords, detect comments etc.

Re: Support for COBOL

Posted: Fri Apr 07, 2023 6:10 am
by ross51
Ok, thanks. I appreciate the quick response.

Re: Support for COBOL

Posted: Fri Apr 07, 2023 7:04 am
by ross51
After using EDP this morning, I realized that coloring the reserved words would not make any difference to its effectiveness.

Re: Support for COBOL

Posted: Fri Apr 07, 2023 7:48 am
by psguru
It doesn't, it just makes the display of files prettier and easier to navigate.

Re: Support for COBOL

Posted: Sat Apr 08, 2023 3:00 am
by MSpagni
Don't forget you at least have a possibility: you can define a regexp for detecting comments in your files.
This has also the bonus of allowing to enable the "ignore comments" option.

F.Y.I. I am an Ultraedit user and I made many .uew syntax definition files. They include the comments definition, the various groups of lists of keywords (with specific colors) and the regexp formula to detect the function definitions (and some other syntactic elements).
No need for a parser; a lexical analyzer is enough, together with a regexp engine, of course.

Re: Support for COBOL

Posted: Sat Apr 08, 2023 10:35 am
by psguru
While this is doable, regex parsers can be slow on large files, and generally are considered somewhat obsolete. Tree-sitter-based parsers, on the other hand, are compiled, which makes them a lot faster. The downside is, of course, that one has to white the code for a specific language. EDP uses Crystal Edit Library, which is also compiled.