[logo] [logo]

Oxfordshire Family History Society

Transcribed Wills - Regular Expressions


Return to put what you have just learned into practice

Using Regular Expressions to search for Surnames Mentioned

Suppose you want to search the Index of Persons Mentioned in wills to find any instances of a specific surname you are researching. The simplest approach is just to enter the surname you are looking for into the search box. The snag is that there are often many different ways in which the same surname is spelled. So ideally you want to enter a Pattern into the search box that will match any one of a variety of different spellings. A regular expression is a very precise way to specify such a pattern. You may think the phrase "regular expression" sounds daunting and must be very complicated. Far from it! All you need to know can be summed up in the following simple "rules", which show how the search pattern is built up from letters and some punctuation symbols.

(letters) Have their normal meaning.
Example:   SMITH matches SMITH and nothing else.
. (full stop) Matches any single character.
Example:   BE.L matches BEAL, BEBL, BECL, . . BELL, . . etc.
[ ] (square brackets) Match any one of the letters inside the brackets.
Example:   BE[AEL]L matches BEAL, BEEL or BELL and nothing else.
+ (plus sign) Matches one or more instances of the previous item.
Example:   BEL+MAN matches BELMAN or BELLMAN, but not BEMAN.
* (asterisk) Matches zero or more instances of the previous item.
Example:   BEL*MAN matches BEMAN or BELMAN or BELLMAN.
? (question mark) Matches zero or one instance of the previous item.
Example:   BEL?MAN matches BEMAN or BELMAN but not BELLMAN. (Rarely needed when matching surnames.)

Of course this covers only a fraction of what is possible using regular expressions, but it really is all you need to know to match any surname pattern you may be looking for.

Here are some more practical examples to study. When you have looked at these, click here to try some patterns for yourself. Then when you are comfortable searching for single surnames and feel ready to be a bit more ambitious, read on to the next section.

SM[IY]THE* Matches SMITH, SMYTH, SMITHE and SMYTHE.
P[AE]+R*SON Matches PARSON, PERSON, PEARSON, PEERSON etc.
W*RIG*H*T+E* Matches WRITE, RITE, RIGHT, WRITT etc.
[BP]R.* Matches all surnames starting BR or PR.


A Shortcut

When looking for surname variants there are certain patterns that are often required. For example very frequently vowels change, so that PARSON, may appear as PERSON, PEARSON, PORSON, PURSON, PARSUN etc. To save typing, we provide a shortcut for this situation. Note that this is not standard regular expression syntax, it is a convenience introduced in this specific case for surname matching.

~ (tilde) Matches any vowel or consecutive sequence of vowels.
i.e.   ~   is exactly equivalent to [AEIOUY]+
Examples:   P~RS~N matches PERSON, PEARSON, PORSON, PURSON, PARSUN etc.


Some Clever Stuff

The previous section has told you all you need to know to search for a surname with its spelling variants. Maybe there are several different surnames you are interested in and you would like to look for them all at once. There are a couple more symbols you can include in your search pattern to make this possible.

( ) (round brackets) Group parts of a pattern together and treat the group as a single entity.
Examples:   (BE[AEL]L) matches BEAL, BEEL or BELL.
(B[OU]+K) matches BOK, BUK, BOUK or BOOK.
(You will see why we need the brackets next.)
| (vertical bar) Acts as an OR operator to combine two or more entities.
Example:   (BE[AEL]L)|(B[OU]+K) matches BELL or BOOK (and their variants).

In this way, you can build up a complicated pattern that selects all the names you are currently researching, with their spelling variants. So for example, I use the following combination to hunt for all the DEELEY, EELEY, FREEBORN and PARSONS families I am currently researching:

([DH]'*~LE*Y)|(FR~S*B~.*NE*)|(P~RS~NE*S*)

One point is worthy of note about this, the apostrophe that appears in the D'OYLEY variant of the DEELEY name, has no special meaning in a regular expression, so can be included in the pattern just like a letter.

Of course you will write and test each of the surname bits on its own to make sure it does what you want, and only combine them when you are happy with each separately, but even so you are probably wondering:

"Am I really expected to type in such a complicated pattern every time I search for wills? Surely it would be easier to search for one name at a time?"

True, but here comes the clever bit. Once you have built up the pattern and checked that it works, use ctrl-D to "bookmark" the resultant page in your web browser or add it to your "favorites", (depending on what terminology your particular browser uses. Firefox and Opera use the word "bookmark", Internet Explorer uses "favorite".) Fortunately the ctrl-D key combination works in all popular browsers. The complete search pattern will be remembered as part of your bookmark/favorite. Next time you want to look to see if there are any more wills of interest, just click the bookmark and the job is done!

If you are not familiar with use of "bookmarks" or "favorites" have a look at your browser's "help" information.

If this has whetted your appetite and you are curious to know even more about regular expressions, try putting javascript regular expression into your favourite internet search engine. This will bring you many references to the complete syntax. On the other hand, if you are already a regular expression expert, you may have noticed that some of the examples given above should strictly not work. That is because we have presented a rather simplified user syntax sufficient for the task in hand. The extra bits needed to make correctly formed javascript regular expressions are added "behind the scenes".


Registered Charity Number 275891

© 2000-2008 Oxfordshire Family History Society
webmaster@ofhs.org.uk
This page structure was last updated on 2008-09-10