"standard" regex \r and \n switches do not work

General questions about using ExamDiff Pro, ideas for new features, bug reports, and usage tips.
Post Reply
why_am_i_hiding
Junior Member
Posts: 10
Joined: Tue Feb 24, 2009 1:18 pm

"standard" regex \r and \n switches do not work

Post by why_am_i_hiding »

I have 2 text files, one with:
---
Invoice Date: 01/20/2009 PAYMENT DUE DATE:

02/19/2009
---
And another with:
---
Invoice Date: 02/20/2009 PAYMENT DUE DATE:

03/20/2009
---
(Yes, there is a "blank line" between the text and the date)
The "regular expressions" site (http://www.regular-expressions.info/) tutorial says that line ends (blank lines) can be trapped so that an expression like:
Date:\r\n\r\n\d\d[/]\d\d[/]
This SHOULD allow for the blank line and correctly ignore the different dates for the "payment due" part.
(The "invoice date" part is easilly and correctly trapped with a different regex)
My little regex testing program "RegExBuddy" says that this is valid and correct, but ExamDiff (both v4.0 and 4.5beta) do not seem to support this.
Even using the "fuzzy line matching" advanced feature in v4.5b is of no help.
Is there a plan to include support for the "\r\n" combo, (or something similar) to allow for what is a valid regex that can cover a "block" of lines?
User avatar
psguru
Site Admin
Posts: 2396
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: "standard" regex \r and \n switches do not work

Post by psguru »

The "Lines matching regular expression" option in Options | Compare works only on single lines, so \r and \n have no effect. However, the Comments feature (see Options | Document Types) allows to specify any regex that can span multiple lines. You can then use Options | Compare | Ignore Comments (or the toolbar button) to ignore your multi-line regex.
psguru
PrestoSoft
why_am_i_hiding
Junior Member
Posts: 10
Joined: Tue Feb 24, 2009 1:18 pm

Re: "standard" regex \r and \n switches do not work

Post by why_am_i_hiding »

MOST EXCELLENT -THANKS! I will try this out immediately!
why_am_i_hiding
Junior Member
Posts: 10
Joined: Tue Feb 24, 2009 1:18 pm

Re: "standard" regex \r and \n switches do not work

Post by why_am_i_hiding »

Here is the regex I tried:
Issue\sDate:\s\r\n\d\d-\d\d-\d\d
Here is an example of the text from 2 different files, file1:

(Ed. 9-06) Issue Date:
01-16-09
BUREAU

... for File2:

(Ed. 9-06) Issue Date:
01-30-09
BUREAU

This does NOT get trapped. I also tried "Issue\sDate:\s\n\r\d\d-\d\d-\d\d" just in case the text file is "wierd". Also non-functional.
The similar problem (when there is a blank line between the "Date: " and the date) also continues.
Replacing "\n\r" with "\n\r\n\r" also does not work in those instances.
The blank line is truly blank - I checked (in both text and binary modes) with ultra-edit and there is NOTHING there.
In the general options, selecting "comments" or not makes no difference.
This is the only thing I have for the text file type.
:(
why_am_i_hiding
Junior Member
Posts: 10
Joined: Tue Feb 24, 2009 1:18 pm

Re: "standard" regex \r and \n switches do not work

Post by why_am_i_hiding »

In continuation, I also tried it without specifying the "\s" for the blank character that follows the "Date: " text.
Nope!
It just does not want to pay attention to the "\r\n" combo, and I have run out of ideas.
Thanks for your (attempted) help.
:?
User avatar
psguru
Site Admin
Posts: 2396
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: "standard" regex \r and \n switches do not work

Post by psguru »

Depending what you want, this will work:

Code: Select all

.*Date.*\n\d\d-\d\d-\d\d\n
psguru
PrestoSoft
why_am_i_hiding
Junior Member
Posts: 10
Joined: Tue Feb 24, 2009 1:18 pm

Re: "standard" regex \r and \n switches do not work

Post by why_am_i_hiding »

Once again - GREAT - I'll try it out and let you know.
(In advance and anticipation: THANKS!)
:mrgreen:
why_am_i_hiding
Junior Member
Posts: 10
Joined: Tue Feb 24, 2009 1:18 pm

Re: "standard" regex \r and \n switches do not work

Post by why_am_i_hiding »

Once again, I am sorry to report that your suggestion is NFG (not functioning good). I tried it, both with and without the exact text I need to test, and the result is the same. ExamDiff (both 4.0 and 4.5Beta) insist on identifying the "date" line as both a changed line and as changed in changed, highlighting, in this case, the "day" field.
Is it possible that there is some form of "interference" by some other regex? One of the expressions I use in "within the line" is:
"|Issue\sDate:\s{1,4}\d\d[-/]\d\d[-/]\d\d"
(please see the full regex line for "words" below. I have tried relocating it, removing it, placing it on comments, all to no avail) The above is needed by other lines where the fixed text is like what you see here: "Issue Date: 01-" etc.
The number of space characters between the colon and the first digit of the date changes on various lines (anywhere from 1 to 4), as "implied" by the regex. Also, the character separating the digits of the date changes (dashes or slashes). I tried removing this particular part of the "words" regex. No effect. I tried placing it along with your "comment" regex, also no effect, worse yet, other differences which it had (correctly) filtered now appeared as differences. The "comment" thing does not seem to be working, because if the expression were properly acting, should it not function equally well in the "word" or in the "comment" parts? Is there something "interesting" about "comment" regexes? I have tried selecting and not selecting the "comments" option on the main "options - compare" section - Nothing. Is there some other option that I should select or deselect? I have tried every combination of the various "blanks" options, no result that I consider positive. In fact, the only one needed that does what I expect is the "all white space in lines" (which works perfectly)
I also tried all of the above combinations/variants of your suggestion both in "Words" and just in "comments" using the \r switch. Also NFG. I also tried rèplacing your suggested \n with \n. No change (sorry, I could not resist the "pun")

I hate to abandon this as it seems to be so close to a solution. Just in case, I include the complete "words" regex:
Printed:\s\d\d[-:/]\d\d[-:/]\d\d|WSD\s\d{7}|WSD\d{7}|\d{6}\s\d{2}|\d{5}[0]{8}|Invoice\sDate:\s{3}\d\d[-/]\d\d[-/]\d{4}|Issue\sDate:\s{1,4}\d\d[-/]\d\d[-/]\d\d|PAYMENT\sDUE\sDATE:\s{4}\d\d[-/]\d\d[-/]\d{4}|Oper\sInit:\s{4}.{1,7}\s|Operator:\s.{2,8}|INTERIM\sAUDIT\s\d{3}\s\d\d[-/]\d\d[-/]\d\d\s\d\d[-/]\d\d[-/]\d\d|\$|error\scorrect|total|safety|premium|subject|standard|discount|expense|constant
Totally removing the above (and or deselecting "words") had no positive effect. It just found more differences that the above had previously correctly removed. Cutting the whole mess and putting it into comments also did not work. It found all the things previously (correctly) ignored. (Another factor in my suspicion that there is something funny about "comments")
(I also tried all three detail levels. In "Lines" it id's the entire date line as different, with the other two, it highlights only the day part of the date - all above combos that I tried did the same)
From the report: (edited only to remove totally irrelevent text and to SUBSTITUTE SPACES FOR UNDERSCORES BECAUSE OF WHAT THE "POSTING" DOES TO LEADING AND TRAILING BLANKS)
... diff 1
___(Ed. 9-06)___Issue_Date:_
01-16-09
...
... diff 2:
(Ed. 9-06)____Issue Date:_
01-16-09
...
DOCUMENT "2":
___(Ed. 9-06)___Issue Date:_
01-30-09
...
(Ed. 9-06)____Issue Date:_
01-30-09
...
Any other ideas? :?
why_am_i_hiding
Junior Member
Posts: 10
Joined: Tue Feb 24, 2009 1:18 pm

Re: "standard" regex \r and \n switches do not work

Post by why_am_i_hiding »

Sorry - too late, I just noticed an error in my previous post:
I also tried rèplacing your suggested \n with \n.
should be
I also tried rèplacing your suggested \n with \r.
User avatar
psguru
Site Admin
Posts: 2396
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: "standard" regex \r and \n switches do not work

Post by psguru »

Well, let me show you my results:
EDP4.png
EDP4.png (44.89 KiB) Viewed 19852 times
EDP5.png
EDP5.png (57.01 KiB) Viewed 19852 times
EDP3.png
EDP3.png (40.23 KiB) Viewed 19852 times
psguru
PrestoSoft
why_am_i_hiding
Junior Member
Posts: 10
Joined: Tue Feb 24, 2009 1:18 pm

Re: "standard" regex \r and \n switches do not work

Post by why_am_i_hiding »

WELL GLORY BE!
How did I NOT know that I had to make ANOTHER "TEXT" type?!? That the "default" was NOT where the expression should have gone!
With that, it worked!
THANKS!
:mrgreen: :!: :lol:
User avatar
psguru
Site Admin
Posts: 2396
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: "standard" regex \r and \n switches do not work

Post by psguru »

To be fair, your case is not particularly trivial. Notice, however, that you cannot enable the "Participate in ignoring comments" options for Default/Plain Text, and you needed this option for obvious reasons.
psguru
PrestoSoft
why_am_i_hiding
Junior Member
Posts: 10
Joined: Tue Feb 24, 2009 1:18 pm

Re: "standard" regex \r and \n switches do not work

Post by why_am_i_hiding »

So that is why it was not available! I just thought that it was one of those "non-option options" as in "accept this or else" style.
Presumptive of me.
I apologise.
Anyway, your solution works BRILLIANTLY.
Thanks again.
Just out of curiosity: Why use the "dot asterisk" instead of specifying the "\r"? Were you trying for a more generic answer that I was actually trying for? Is it BETTER to do it that way than to specify the \r?
User avatar
psguru
Site Admin
Posts: 2396
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: "standard" regex \r and \n switches do not work

Post by psguru »

Just out of curiosity: Why use the "dot asterisk" instead of specifying the "\r"? Were you trying for a more generic answer that I was actually trying for? Is it BETTER to do it that way than to specify the \r?
I actually don't think "\r" will work - use "\n" when you need to specify a linebreak. ".*" simply means "any character any number of times", and I use it for this purpose.
psguru
PrestoSoft
Post Reply