Tuesday, 11 September 2007

Javascript: RegExp (regular expression) object

Regular expressions are a powerful tool for performing pattern matches in Strings in JavaScript. You can perform complex tasks that once required lengthy procedures with just a few lines of code using regular expressions. Regular expressions are implemented in JavaScript in two ways:

Literal syntax:

//match all 7 digit numbers
var phonenumber= /\d{7}/

Dynamically, with the RegExp() constructor:

//match all 7 digit numbers (note how "\d" is defined as "\\d")
var phonenumber=new RegExp("\\d{7}", "g")

The RegExp() method allows you to dynamically construct the search pattern as a string, and is useful when the pattern is not known ahead of time.

Related Tutorials (highly recommended readings)

Pattern flags (switches)

Property Description Example
 i Ignore the case of characters. /The/i matches "the" and "The" and "tHe"
 g Global search for all occurrences of a pattern /ain/g matches both "ain"s in "No pain no gain", instead of just the first.
 gi Global search, ignore case. /it/gi matches all "it"s in "It is our IT department" 
 m Multiline mode. Causes ^ to match beginning of line or end of string. Causes $ to match end of line or end of string. JavaScript1.5+ only. /hip$/m matches "hip" as well as "hip\nhop"

Position Matching

Symbol Description Example
 ^ Only matches the beginning of a string. /^The/ matches "The" in "The night" by not "In The Night"
 $ Only matches the end of a string. /and$/ matches "and" in "Land" but not "landing"
 \b Matches any word boundary (test characters must exist at the beginning or end of a word within the string) /ly\b/ matches "ly" in "This is really cool."
 \B Matches any non-word boundary. /\Bor/ matches "or" in "normal" but not "origami."
(?=pattern) A positive look ahead. Requires that the following pattern in within the input. Pattern is not included as part of the actual match.  JavaScript1.5+ only. /(?=Chapter)\d+/ matches any digits when it's proceeded by the words "Chapter", such as 2 in "Chapter 2", though not "I have 2 kids."
(?!pattern) A negative look ahead. Requires that the following pattern is not within the input. Pattern is not included as part of the actual match.  JavaScript1.5+ only. /JavaScript(?! Kit)/ matches any occurrence of the word "JavaScript" except when it's inside the phrase "JavaScript Kit"

Literals

Symbol Description
Alphanumeric All alphabetical and numerical characters match themselves literally. So /2 days/ will match "2 days" inside a string.
\O Matches NUL character.
 \n Matches a new line character
 \f Matches a form feed character
 \r Matches carriage return character
 \t Matches a tab character
 \v Matches a vertical tab character
 \xxx Matches the ASCII character expressed by the octal number xxx.

"\50" matches left parentheses character "("
 \xdd Matches the ASCII character expressed by the hex number dd.

"\x28" matches left parentheses character "("
 \uxxxx Matches the ASCII character expressed by the UNICODE xxxx.

"\u00A3" matches "£".

The backslash (\) is also used when you wish to match a special character literally. For example, if you wish to match the symbol "$" literally instead of have it signal the end of the string, backslash it: /\$/ 

Character Classes

Symbol Description Example
 [xyz] Match any one character enclosed in the character set. You may use a hyphen to denote range. For example. /[a-z]/ matches any letter in the alphabet, /[0-9]/ any single digit. /[AN]BC/ matches "ABC" and "NBC" but not "BBC" since the leading "B" is not in the set.
 [^xyz] Match any one character not enclosed in the character set. The caret indicates that none of the characters

NOTE: the caret used within a character class is not to be confused with the caret that denotes the beginning of a string. Negation is only performed within the square brackets.

/[^AN]BC/ matches "BBC" but not "ABC" or "NBC".
 . (Dot). Match any character except newline or another Unicode line terminator. /b.t/ matches "bat", "bit", "bet" and so on.
 \w Match any alphanumeric character including the underscore. Equivalent to [a-zA-Z0-9_]. /\w/ matches "200" in "200%"
 \W Match any single non-word character. Equivalent to [^a-zA-Z0-9_]. /\W/ matches "%" in "200%"
 \d Match any single digit. Equivalent to [0-9].
 \D Match any non-digit. Equivalent to [^0-9]. /\D/ matches "No" in "No 342222"
 \s Match any single space character. Equivalent to [ \t\r\n\v\f].
 \S Match any single non-space character. Equivalent to [^ \t\r\n\v\f].
 

Repetition

Symbol Description Example
{x} Match exactly x occurrences of a regular expression. /\d{5}/ matches 5 digits.
{x,} Match x or more occurrences of a regular expression. /\s{2,}/ matches at least 2 whitespace characters.
{x,y} Matches x to y number of occurrences of a regular expression. /\d{2,4}/ matches at least 2 but no more than 4 digits.
? Match zero or one occurrences. Equivalent to {0,1}. /a\s?b/ matches "ab" or "a b".
* Match zero or more occurrences. Equivalent to {0,}. /we*/ matches "w" in "why" and "wee" in "between", but nothing in "bad"
+ Match one or more occurrences. Equivalent to {1,}. /fe+d/ matches both "fed" and "feed"

Alternation & Grouping

Symbol Description Example
( ) Grouping characters together to create a clause. May be nested. /(abc)+(def)/ matches one or more occurrences of "abc" followed by one occurrence of "def".
(?: ) Grouping only, so items are grouped into a single unit, but the characters that match this group are not remembered. In other words, no numbered references are created for the items within the parenthesis. JavaScript 1.5 feature. /(?:.d){2}/ matches but doesn't capture "cdad".
 
| Alternation combines clauses into one regular expression and then matches any of the individual clauses. Similar to "OR" statement. /(ab)|(cd)|(ef)/ matches "ab" or "cd" or "ef".

Back references

Symbol Description Example
( )\n Matches a parenthesized clause in the pattern string. n is the number of the clause to the left of the back reference. (\w+)\s+\1 matches any word that occurs twice in a row, such as "hubba hubba." The \1 denotes that the first word after the space must match the portion of the string that matched the pattern in the last set of parentheses. If there were more than one set of parentheses in the pattern string you would use \2 or \3 to match the appropriate grouping to the left of the backreference. Up to 9 backreferences can be used in a pattern string.

Regular Expression methods

Method Description Example
String.match( regular expression ) Executes a search for a match within a string based on a regular expression. It returns an array of information or null if no match are found.

Note: Also updates the $1…$9 properties in the RegExp object.

var oldstring="Peter has 8 dollars and Jane has 15"
newstring=oldstring.match(/\d+/g)
//returns the array ["8","15"]
String.replace( regular expression, replacement text ) Searches and replaces the regular expression portion (match) with the replaced text instead.

Note: Also supports the replacement of regular expression with the specified RegExp $1…$9 properties.

var oldstring="(304)434-5454"
newstring=oldstring.replace(/[\(\)-]/g, "")
//returns "3044345454" (removes "(", ")", and "-")
String.split ( string literal or regular expression ) Breaks up a string into an array of substrings based on a regular expression or fixed string. var oldstring="1,2, 3,  4,   5"
newstring=oldstring.split(/\s*,\s*/)
//returns the array ["1","2","3","4","5"]
String.search( regular expression ) Tests for a match in a string. It returns the index of the match, or -1 if not found. Does NOT support global searches (ie: "g" flag not supported). "Amy and George".search(/george/i)
//returns 8
RegExp.exec(string) Applies the RegExp to the given string, and returns the match information. var match = /s(amp)le/i.exec("Sample text")
//returns ["Sample","amp"]
RegExp.test(string) Tests if the given string matches the Regexp, and returns true if matching, false if not. var pattern=/george/i
pattern.test("Amy and George")
//retuns true

Example- Replace "<", ">", "&" and quotes (" and ') with the equivalent HTML entity instead

function html2entities(){
var re=/[(<>"'&]/g
for (i=0; i<arguments.length; i++)
arguments[i].value=arguments[i].value.replace(re, function(m){return replacechar(m)})
}

function replacechar(match){
if (match=="<")
return "&lt;"
else if (match==">")
return "&gt;"
else if (match=="\"")
return "&quot;"
else if (match=="'")
return "&#039;"
else if (match=="&")
return "&amp;"
}

html2entities(document.form.namefield.value, document.form.hobbyfield.value)



--
Prakash Samariya (IT Professional, HDSE)
Mob: 9879074678 Res: +91-79-32924610
http://ps-india.blogspot.com/
http://psamariya.googlepages.com/
Below Nelson's School, Opp SBI, Punit Ahram Road, Maninagar, Ahmedabad - 380008, Gujarat, India.

No comments:

Hits4Pay