"standard" regex \r and \n switches do not work
-
- Junior Member
- Posts: 10
- Joined: Tue Feb 24, 2009 1:18 pm
"standard" regex \r and \n switches do not work
I have 2 text files, one with:
---
Invoice Date: 01/20/2009 PAYMENT DUE DATE:
02/19/2009
---
And another with:
---
Invoice Date: 02/20/2009 PAYMENT DUE DATE:
03/20/2009
---
(Yes, there is a "blank line" between the text and the date)
The "regular expressions" site (http://www.regular-expressions.info/) tutorial says that line ends (blank lines) can be trapped so that an expression like:
Date:\r\n\r\n\d\d[/]\d\d[/]
This SHOULD allow for the blank line and correctly ignore the different dates for the "payment due" part.
(The "invoice date" part is easilly and correctly trapped with a different regex)
My little regex testing program "RegExBuddy" says that this is valid and correct, but ExamDiff (both v4.0 and 4.5beta) do not seem to support this.
Even using the "fuzzy line matching" advanced feature in v4.5b is of no help.
Is there a plan to include support for the "\r\n" combo, (or something similar) to allow for what is a valid regex that can cover a "block" of lines?
---
Invoice Date: 01/20/2009 PAYMENT DUE DATE:
02/19/2009
---
And another with:
---
Invoice Date: 02/20/2009 PAYMENT DUE DATE:
03/20/2009
---
(Yes, there is a "blank line" between the text and the date)
The "regular expressions" site (http://www.regular-expressions.info/) tutorial says that line ends (blank lines) can be trapped so that an expression like:
Date:\r\n\r\n\d\d[/]\d\d[/]
This SHOULD allow for the blank line and correctly ignore the different dates for the "payment due" part.
(The "invoice date" part is easilly and correctly trapped with a different regex)
My little regex testing program "RegExBuddy" says that this is valid and correct, but ExamDiff (both v4.0 and 4.5beta) do not seem to support this.
Even using the "fuzzy line matching" advanced feature in v4.5b is of no help.
Is there a plan to include support for the "\r\n" combo, (or something similar) to allow for what is a valid regex that can cover a "block" of lines?
Re: "standard" regex \r and \n switches do not work
The "Lines matching regular expression" option in Options | Compare works only on single lines, so \r and \n have no effect. However, the Comments feature (see Options | Document Types) allows to specify any regex that can span multiple lines. You can then use Options | Compare | Ignore Comments (or the toolbar button) to ignore your multi-line regex.
psguru
PrestoSoft
PrestoSoft
-
- Junior Member
- Posts: 10
- Joined: Tue Feb 24, 2009 1:18 pm
Re: "standard" regex \r and \n switches do not work
MOST EXCELLENT -THANKS! I will try this out immediately!
-
- Junior Member
- Posts: 10
- Joined: Tue Feb 24, 2009 1:18 pm
Re: "standard" regex \r and \n switches do not work
Here is the regex I tried:
Issue\sDate:\s\r\n\d\d-\d\d-\d\d
Here is an example of the text from 2 different files, file1:
(Ed. 9-06) Issue Date:
01-16-09
BUREAU
... for File2:
(Ed. 9-06) Issue Date:
01-30-09
BUREAU
This does NOT get trapped. I also tried "Issue\sDate:\s\n\r\d\d-\d\d-\d\d" just in case the text file is "wierd". Also non-functional.
The similar problem (when there is a blank line between the "Date: " and the date) also continues.
Replacing "\n\r" with "\n\r\n\r" also does not work in those instances.
The blank line is truly blank - I checked (in both text and binary modes) with ultra-edit and there is NOTHING there.
In the general options, selecting "comments" or not makes no difference.
This is the only thing I have for the text file type.

Issue\sDate:\s\r\n\d\d-\d\d-\d\d
Here is an example of the text from 2 different files, file1:
(Ed. 9-06) Issue Date:
01-16-09
BUREAU
... for File2:
(Ed. 9-06) Issue Date:
01-30-09
BUREAU
This does NOT get trapped. I also tried "Issue\sDate:\s\n\r\d\d-\d\d-\d\d" just in case the text file is "wierd". Also non-functional.
The similar problem (when there is a blank line between the "Date: " and the date) also continues.
Replacing "\n\r" with "\n\r\n\r" also does not work in those instances.
The blank line is truly blank - I checked (in both text and binary modes) with ultra-edit and there is NOTHING there.
In the general options, selecting "comments" or not makes no difference.
This is the only thing I have for the text file type.

-
- Junior Member
- Posts: 10
- Joined: Tue Feb 24, 2009 1:18 pm
Re: "standard" regex \r and \n switches do not work
In continuation, I also tried it without specifying the "\s" for the blank character that follows the "Date: " text.
Nope!
It just does not want to pay attention to the "\r\n" combo, and I have run out of ideas.
Thanks for your (attempted) help.

Nope!
It just does not want to pay attention to the "\r\n" combo, and I have run out of ideas.
Thanks for your (attempted) help.

Re: "standard" regex \r and \n switches do not work
Depending what you want, this will work:
Code: Select all
.*Date.*\n\d\d-\d\d-\d\d\n
psguru
PrestoSoft
PrestoSoft
-
- Junior Member
- Posts: 10
- Joined: Tue Feb 24, 2009 1:18 pm
Re: "standard" regex \r and \n switches do not work
Once again - GREAT - I'll try it out and let you know.
(In advance and anticipation: THANKS!)

(In advance and anticipation: THANKS!)

-
- Junior Member
- Posts: 10
- Joined: Tue Feb 24, 2009 1:18 pm
Re: "standard" regex \r and \n switches do not work
Once again, I am sorry to report that your suggestion is NFG (not functioning good). I tried it, both with and without the exact text I need to test, and the result is the same. ExamDiff (both 4.0 and 4.5Beta) insist on identifying the "date" line as both a changed line and as changed in changed, highlighting, in this case, the "day" field.
Is it possible that there is some form of "interference" by some other regex? One of the expressions I use in "within the line" is:
"|Issue\sDate:\s{1,4}\d\d[-/]\d\d[-/]\d\d"
(please see the full regex line for "words" below. I have tried relocating it, removing it, placing it on comments, all to no avail) The above is needed by other lines where the fixed text is like what you see here: "Issue Date: 01-" etc.
The number of space characters between the colon and the first digit of the date changes on various lines (anywhere from 1 to 4), as "implied" by the regex. Also, the character separating the digits of the date changes (dashes or slashes). I tried removing this particular part of the "words" regex. No effect. I tried placing it along with your "comment" regex, also no effect, worse yet, other differences which it had (correctly) filtered now appeared as differences. The "comment" thing does not seem to be working, because if the expression were properly acting, should it not function equally well in the "word" or in the "comment" parts? Is there something "interesting" about "comment" regexes? I have tried selecting and not selecting the "comments" option on the main "options - compare" section - Nothing. Is there some other option that I should select or deselect? I have tried every combination of the various "blanks" options, no result that I consider positive. In fact, the only one needed that does what I expect is the "all white space in lines" (which works perfectly)
I also tried all of the above combinations/variants of your suggestion both in "Words" and just in "comments" using the \r switch. Also NFG. I also tried rèplacing your suggested \n with \n. No change (sorry, I could not resist the "pun")
I hate to abandon this as it seems to be so close to a solution. Just in case, I include the complete "words" regex:
Printed:\s\d\d[-:/]\d\d[-:/]\d\d|WSD\s\d{7}|WSD\d{7}|\d{6}\s\d{2}|\d{5}[0]{8}|Invoice\sDate:\s{3}\d\d[-/]\d\d[-/]\d{4}|Issue\sDate:\s{1,4}\d\d[-/]\d\d[-/]\d\d|PAYMENT\sDUE\sDATE:\s{4}\d\d[-/]\d\d[-/]\d{4}|Oper\sInit:\s{4}.{1,7}\s|Operator:\s.{2,8}|INTERIM\sAUDIT\s\d{3}\s\d\d[-/]\d\d[-/]\d\d\s\d\d[-/]\d\d[-/]\d\d|\$|error\scorrect|total|safety|premium|subject|standard|discount|expense|constant
Totally removing the above (and or deselecting "words") had no positive effect. It just found more differences that the above had previously correctly removed. Cutting the whole mess and putting it into comments also did not work. It found all the things previously (correctly) ignored. (Another factor in my suspicion that there is something funny about "comments")
(I also tried all three detail levels. In "Lines" it id's the entire date line as different, with the other two, it highlights only the day part of the date - all above combos that I tried did the same)
From the report: (edited only to remove totally irrelevent text and to SUBSTITUTE SPACES FOR UNDERSCORES BECAUSE OF WHAT THE "POSTING" DOES TO LEADING AND TRAILING BLANKS)
... diff 1
___(Ed. 9-06)___Issue_Date:_
01-16-09
...
... diff 2:
(Ed. 9-06)____Issue Date:_
01-16-09
...
DOCUMENT "2":
___(Ed. 9-06)___Issue Date:_
01-30-09
...
(Ed. 9-06)____Issue Date:_
01-30-09
...
Any other ideas?
Is it possible that there is some form of "interference" by some other regex? One of the expressions I use in "within the line" is:
"|Issue\sDate:\s{1,4}\d\d[-/]\d\d[-/]\d\d"
(please see the full regex line for "words" below. I have tried relocating it, removing it, placing it on comments, all to no avail) The above is needed by other lines where the fixed text is like what you see here: "Issue Date: 01-" etc.
The number of space characters between the colon and the first digit of the date changes on various lines (anywhere from 1 to 4), as "implied" by the regex. Also, the character separating the digits of the date changes (dashes or slashes). I tried removing this particular part of the "words" regex. No effect. I tried placing it along with your "comment" regex, also no effect, worse yet, other differences which it had (correctly) filtered now appeared as differences. The "comment" thing does not seem to be working, because if the expression were properly acting, should it not function equally well in the "word" or in the "comment" parts? Is there something "interesting" about "comment" regexes? I have tried selecting and not selecting the "comments" option on the main "options - compare" section - Nothing. Is there some other option that I should select or deselect? I have tried every combination of the various "blanks" options, no result that I consider positive. In fact, the only one needed that does what I expect is the "all white space in lines" (which works perfectly)
I also tried all of the above combinations/variants of your suggestion both in "Words" and just in "comments" using the \r switch. Also NFG. I also tried rèplacing your suggested \n with \n. No change (sorry, I could not resist the "pun")
I hate to abandon this as it seems to be so close to a solution. Just in case, I include the complete "words" regex:
Printed:\s\d\d[-:/]\d\d[-:/]\d\d|WSD\s\d{7}|WSD\d{7}|\d{6}\s\d{2}|\d{5}[0]{8}|Invoice\sDate:\s{3}\d\d[-/]\d\d[-/]\d{4}|Issue\sDate:\s{1,4}\d\d[-/]\d\d[-/]\d\d|PAYMENT\sDUE\sDATE:\s{4}\d\d[-/]\d\d[-/]\d{4}|Oper\sInit:\s{4}.{1,7}\s|Operator:\s.{2,8}|INTERIM\sAUDIT\s\d{3}\s\d\d[-/]\d\d[-/]\d\d\s\d\d[-/]\d\d[-/]\d\d|\$|error\scorrect|total|safety|premium|subject|standard|discount|expense|constant
Totally removing the above (and or deselecting "words") had no positive effect. It just found more differences that the above had previously correctly removed. Cutting the whole mess and putting it into comments also did not work. It found all the things previously (correctly) ignored. (Another factor in my suspicion that there is something funny about "comments")
(I also tried all three detail levels. In "Lines" it id's the entire date line as different, with the other two, it highlights only the day part of the date - all above combos that I tried did the same)
From the report: (edited only to remove totally irrelevent text and to SUBSTITUTE SPACES FOR UNDERSCORES BECAUSE OF WHAT THE "POSTING" DOES TO LEADING AND TRAILING BLANKS)
... diff 1
___(Ed. 9-06)___Issue_Date:_
01-16-09
...
... diff 2:
(Ed. 9-06)____Issue Date:_
01-16-09
...
DOCUMENT "2":
___(Ed. 9-06)___Issue Date:_
01-30-09
...
(Ed. 9-06)____Issue Date:_
01-30-09
...
Any other ideas?

-
- Junior Member
- Posts: 10
- Joined: Tue Feb 24, 2009 1:18 pm
Re: "standard" regex \r and \n switches do not work
Sorry - too late, I just noticed an error in my previous post:
I also tried rèplacing your suggested \n with \n.
should be
I also tried rèplacing your suggested \n with \r.
I also tried rèplacing your suggested \n with \n.
should be
I also tried rèplacing your suggested \n with \r.
Re: "standard" regex \r and \n switches do not work
Well, let me show you my results:
psguru
PrestoSoft
PrestoSoft
-
- Junior Member
- Posts: 10
- Joined: Tue Feb 24, 2009 1:18 pm
Re: "standard" regex \r and \n switches do not work
WELL GLORY BE!
How did I NOT know that I had to make ANOTHER "TEXT" type?!? That the "default" was NOT where the expression should have gone!
With that, it worked!
THANKS!

How did I NOT know that I had to make ANOTHER "TEXT" type?!? That the "default" was NOT where the expression should have gone!
With that, it worked!
THANKS!



Re: "standard" regex \r and \n switches do not work
To be fair, your case is not particularly trivial. Notice, however, that you cannot enable the "Participate in ignoring comments" options for Default/Plain Text, and you needed this option for obvious reasons.
psguru
PrestoSoft
PrestoSoft
-
- Junior Member
- Posts: 10
- Joined: Tue Feb 24, 2009 1:18 pm
Re: "standard" regex \r and \n switches do not work
So that is why it was not available! I just thought that it was one of those "non-option options" as in "accept this or else" style.
Presumptive of me.
I apologise.
Anyway, your solution works BRILLIANTLY.
Thanks again.
Just out of curiosity: Why use the "dot asterisk" instead of specifying the "\r"? Were you trying for a more generic answer that I was actually trying for? Is it BETTER to do it that way than to specify the \r?
Presumptive of me.
I apologise.
Anyway, your solution works BRILLIANTLY.
Thanks again.
Just out of curiosity: Why use the "dot asterisk" instead of specifying the "\r"? Were you trying for a more generic answer that I was actually trying for? Is it BETTER to do it that way than to specify the \r?
Re: "standard" regex \r and \n switches do not work
I actually don't think "\r" will work - use "\n" when you need to specify a linebreak. ".*" simply means "any character any number of times", and I use it for this purpose.Just out of curiosity: Why use the "dot asterisk" instead of specifying the "\r"? Were you trying for a more generic answer that I was actually trying for? Is it BETTER to do it that way than to specify the \r?
psguru
PrestoSoft
PrestoSoft