As you'll notice, text files turn out to be important under Unix. One consequence of this is that you'll often need to search for something in text files. For example, you are editing a file in Emacs and are looking for something, or you are in a directory and want to know in which files a given string occurs, etc.
The simplest searches are searches for a simple word. This is the kind of search you do in Emacs when you do a C-s. However, you often want to search for something more general. You don't want to search exactly for a string, but any string that matches some "shape" you care about. For example, you may want to search for all strings made up exclusively of 0's and 1's.
There is a convenient notation for writing such things: regular expressions. If you stay in Computer Science long enough, you'll encounter regular expressions again and again.
Formally (and we'll try not to be too formal here), a regular expression r (also called simply a regexp) represents a set of strings. A string is said to match a regexp r if the string is one of the strings represented by r.
By default, a regexp which is simply a string of characters matches a single string, namely itself. Hence, the regexp abs matches a single string, abc. However, some characters have a special meaning in a regexp:
What if you want to match a "special" character exactly? For example, what if you want to match the * character? If you escape a special character, i.e., if you precede it by a backslash character \, then the character stands for itself, not for a regular expression operator. (To match a \ exactly, you escape it as well, i.e., you write \\.). Thus, a\**c matches an a followed by any number of *'s, follwed by a c.
Some characters help delineate where a matching can occur:
The above operators define so-called basic regular expressions. These operators are supported in most programs using regular expressions of some sort. Some programs in fact implement extended regular expressions, which provide slightly more operators:
More operators in fact exist, some allowing you to match the same substring in multiple places in a given string. I'll refer you to the links on the web page for more information.
We saw that C-s and C-r can be used to perform searches for a string in Emacs. To search for a regular expression instead, we can use M-C-s and M-c-r. The minibuffer will ask you for a regexp to search for.
It is also possible to perform replacements of the strings matched by a regular expression into some other string, possibly including parts of the matched string. I'll refer you to the documentation for such. One such function is replace-regexp.
It is possible to search for a regular expression in a file directly from the command line. There are in fact many commands to do so, differing in the kind of regular expressions they can handle:
By default, grep and company returns every line of the files for which there is a match of the regular expression. Some interesting options to grep are -i to ignore the case of letters during matches, and -v to output the lines for which there is no match.