Thursday, July 31, 2008

Regular Expression

Regular Expression Glossary

Table 1: Anchoring Metacharacters
Metacharacter Description
^ Anchor the expression to the start of a line
$ Anchor the expression to the end of a line

Table 2: Quantifiers, or Repetition Operators
Quantifier Description
* Match 0 or more times
? Match 0 or 1 time
+ Match 1 or more times
{m} Match exactly m times
{m,} Match at least m times
{m, n} Match at least m times but no more than n times
Table 3: Predefined POSIX Character Classes
Character Class Description
[:alpha:] Alphabetic characters
[:lower:] Lowercase alphabetic characters
[:upper:] Uppercase alphabetic characters
[:digit:] Numeric digits
[:alnum:] Alphanumeric characters
[:space:] Space characters (nonprinting), such as carriage return, newline, vertical tab, and form feed
[:punct:] Punctuation characters
[:cntrl:] Control characters (nonprinting)
[:print:] Printable characters
Table 4: Alternate Matching and Grouping of Expressions
Metacharacter Description
| Alternation Separates alternates, often used with grouping operator ()
( ) Group Groups subexpression into a unit for alternations, for quantifiers, or for backreferencing (see "Backreferences" section)
[char] Character list Indicates a character list; most metacharacters inside a character list are understood as literals, with the exception of character classes, and the ^ and - metacharacters

Table 5: The REGEXP_LIKE Operator

Syntax Description
REGEXP_LIKE(source_string, pattern
[, match_parameter])
source_string supports character datatypes (CHAR, VARCHAR2, CLOB, NCHAR, NVARCHAR2, and NCLOB but not LONG). The pattern parameter is another name for the regular expression. match_parameter allows optional parameters such as handling the newline character, retaining multiline formatting, and providing control over case-sensitivity.

Table 6: The REGEXP_INSTR Function

Syntax Description
REGEXP_INSTR(source_string, pattern
[, start_position
[, occurrence
[, return_option
[, match_parameter]]]])
This function looks for a pattern and returns the first position of the pattern. Optionally, you can indicate the start_position you want to begin the search. The occurrence parameter defaults to 1 unless you indicate that you are looking for a subsequent occurrence. The default value of the return_option is 0, which returns the starting position of the pattern; a value of 1 returns the starting position of the next character following the match.

Table 7: Explanation of 5-digit + 4 Zip-Code Expression

Syntax Description
Empty space that must be matched
[:digit:] POSIX numeric digit class
] End of character list
{5} Repeat exactly five occurrences of the character list
( Start of subexpression
- A literal hyphen, because it is not a range metacharacter inside a character list
[ Start of character list
[:digit:] POSIX [:digit:] class
[ Start of character list
] End of character list
{4} Repeat exactly four occurrences of the character list
) Closing parenthesis, to end the subexpression
? The ? quantifier matches the grouped subexpression 0 or 1 time thus making the 4-digit code optional
$ Anchoring metacharacter, to indicate the end of the line

Table 8: The REGEXP_SUBSTR Function

Syntax Description
REGEXP_SUBSTR(source_string, pattern
[, position [, occurrence
[, match_parameter]]])
The REGEXP_SUBSTR function returns the substring that matches the pattern.

Table 9: The REGEXP_REPLACE Function

Syntax Description
REGEXP_REPLACE(source_string, pattern
[, replace_string [, position
[,occurrence, [match_parameter]]]])
This function replaces the matching pattern with a specified replace_string, allowing complex search-and-replace operations.

Table 10: Backreference Metacharacter

Metacharacter Description
\digit Backslash Followed by a digit between 1 and 9, the backslash matches the preceding digit-th parenthesized subexpression.
(Note: The backslash has another meaning in regular expressions; depending on the context it can also mean the Escape character

Table 11: Explanation of Pattern-Swap Regular Expression

Regular-Expression Item Description
( Start of first subexpression
. Match any single character except a newline
* Repetition operator, matches previous . metacharacter 0 to n times
) End of first subexpression; result of the match is captured in \1
(In this example, it's Ellen.)
Empty space that needs to be present
( Start of the second subexpression
. Match any single character except a newline
* Repetition operator matches the previous . metacharacter 0 to n times
) End of second subexpression; result of this match is captured in \2
(In this example, it stores Hildi.)
Empty space
( Start of third subexpression
. Match any single character except a newline
* Repetition operator matches . metacharacter 0 to n times
) End of third subexpression; result of this match is captured in \3
(In this example, it holds Smith.)

Table 12: Explanation of the Social Security Number Regular Expression

Regular-Expression Item Description
^ Start of line character (Regular expression cannot have any leading characters before the match.)
( Start subexpression and list alternates separated by the | metacharacter
[ Start of character list
[:digit:] POSIX numeric digit class
] End of character list
{3} Repeat exactly three occurrences of character list
- A hyphen
[ Start of character list
[:digit:] POSIX numeric digit class
] End of character list
{2} Repeat exactly two occurrences of character list
- Another hyphen
[ Start of character list
[:digit:] POSIX numeric digit class
] End of character list
{4} Repeat exactly four occurrences of character list
| Alternation metacharacter; ends the first choice and starts the next alternate expression
[ Start of character list
[:digit:] POSIX numeric digit class.
] End of character list
{9} Repeat exactly nine occurrences of character list
) Ending parenthesis, to close the subexpression group used for alternation
$ Anchoring metacharacter, to indicate the end of the line; no extra characters can follow the pattern


No comments:

Post a Comment