Exclude known different words

General questions about using ExamDiff Pro, ideas for new features, bug reports, and usage tips.
Post Reply
jerry8989
New Member
Posts: 2
Joined: Thu Dec 10, 2020 11:11 am

Exclude known different words

Post by jerry8989 »

Hello,

I'm using ExamDiff Pro to compare website code and it works great except when there are references to different URLs.
For example, https://www.tempTEST.com and https://www.tempPROD.com don't match but it is a valid difference. Is there a way to exclude this compare? So if "https://www.temp" and ".com" is the same on both pages it would be considered a match?

This is an issue when I'm comparing directories and looking for files that don't match except for this one url issue.

Thank you
User avatar
psguru
Site Admin
Posts: 2228
Joined: Sat May 15, 2004 4:23 pm
Location: California
Contact:

Re: Exclude known different words

Post by psguru »

There's no direct support to substitute words but you can exclude parts of lines matching a regular expression, such as

Code: Select all

tempTEST|tempPROD
psguru
PrestoSoft
jerry8989
New Member
Posts: 2
Joined: Thu Dec 10, 2020 11:11 am

Re: Exclude known different words

Post by jerry8989 »

Thank you psguru,

I was able to get it to work. When I added another check it stops working for the first one.

If I do tempTest|tempProd, webTest|webProd, both checks stop working.

Thanks again
JeremyNicoll
Expert Member
Posts: 108
Joined: Sun May 02, 2010 12:00 pm
Location: Edinburgh

Re: Exclude known different words

Post by JeremyNicoll »

I don't think you can have a comma-separated list of conditions. You can have one regular expression. So maybe

tempTest|tempProd|webTest|webProd

that is, a list of four alternative values all of which, if present, will be ignored. Having said that, you said that

tempTest|tempProd, webTest|webProd

didn't work. I think it DID work. It just didn't do what you expected. It should have meant:

"tempTest" or "tempProd, webTest" or "webProd"

so it probably did exclude "tempTest" and "webProd", but as EDP would (presumably) never have found the literal
"tempProd, webTest" in any lines of the files, it would not have excluded either "tempProd" or "webTest" and thus
appear to not work when in fact it had done precisely what you told it to do.

You probably need to read a decent regex tutorial. EDP's Help has a single-page aide-memoire, and a link to a
tutorial website. There are lots of tutorials available. Note that several "flavours" of RegEx exist; the general
ideas are the same in them all but the exact characters which have special meanings and just how complex an
expression can be varies. Make sure you know (from the aide-memoire) which characters do what in EDP's RE
engine,even if reading a tutorial for some other RE flavour.

I should add that the term "regular expression" is an academic one, stemming from the formal classification of types
of grammar for arbitrary languages. Linguists study forms of natural language, while computer scientists have for
ages studied forms of much more restricted languages, for example programming languages. It's important when one
designs a programming language that no program, that might be written in it, is ambiguous - there has to be only one
correct way of interpreting what it means. When a compiler or interpreter is written it has to be able to recognise
exactly whether a programmer's program is valid or not, and if it is, generate code that will when executed do exactly
what the program meant. [Note that that's not necessarily what the programmer thought it meant.]

Anyway, what makes a "regular expression" special is that (although perhaps complex to write by hand) it has a tightly
defined syntax (so easy for a program to understand what it means) and is unambiguous.

You might find this useful too: https://en.wikipedia.org/wiki/Regular_expression
Post Reply