grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, fixed = FALSE, useBytes = FALSE, invert = FALSE)
• pattern
: string to be matched, supports regular expression
• x
: string or string vector
• ignore.case
: if FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching
• perl
: logical. Should perl-compatible regexps be used? Has priority over extended
• fixed
: logical. If TRUE, pattern is a string to be matched as is. Overrides all conflicting arguments
• useBytes
: logical. If TRUE the matching is done byte-by-byte rather than character-by-character
• invert
: logical. If TRUE return indices or values for elements that do not match
grep(value = FALSE)
returns an integer vector of the indices
of the elements of x
that yielded a match (or not, for
invert = TRUE
).
> grep("rect", "draw a rectangle")[1] 1 > str <- c("Regular", "expression", "examples of R language") > x <- grep("ex",str,value=F) > x
[1] 2 3
> x <- "line 4322: He is now 25 years old, and weights 130lbs"; > x <- grep("\\d","",x) > x
[1] 1
• grep(value = TRUE)
returns a character vector containing the
selected elements of x
(after coercion, preserving names but no
other attributes).
> grep("rect", "draw a rectangle", value=T)[1] "draw a rectangle" > x <- grep("ex",str,value=T) > x
[1] "expression" "examples of R language"
• grepl
returns a logical vector (match or not for each element of
x
).
> x <- grepl("ex",str) > x [1] FALSE TRUE TRUE
R has various functions for regular expression based match and replaces. The grep
, grepl
, regexpr
and gregexpr
functions are used for searching
for matches, while sub
and gsub
for performing replacement.
• sub
and gsub
return a character vector of the same
length and with the same attributes as x
(after possible
coercion to character). Elements of character vectors x
which
are not substituted will be returned unchanged (including any declared
encoding). If useBytes = FALSE
a non-ASCII substituted result
will often be in UTF-8 with a marked encoding (e.g. if there is a
UTF-8 input, and in a multibyte locale unless fixed = TRUE
).
> str <- c("Regular", "expression", "examples of R language") > x <- sub("x.ress","",str) > x
[1] "Regular" "eion" "examples of R language"
> x <- sub("x.+e","",str) > x
[1] "Regular" "ession" "e"
> x <- "line 4322: He is now 25 years old, and weights 130lbs"; > x <- gsub("[[:digit:]]","",x) > x
[1] "line : He is now years old, and weights lbs"
> x <- "line 4322: He is now 25 years old, and weights 130lbs"; > x <- gsub("\\d+","",x) > x
[1] "line : He is now years old, and weights lbs"
• regexpr
returns an integer vector of the same length as
text
giving the starting position of the first match or
-1 if there is none, with attribute "match.length"
, an
integer vector giving the length of the matched text (or -1 for
no match). The match positions and lengths are in characters unless
useBytes = TRUE
is used, when they are in bytes.
> str <- c("Regular", "expression", "examples of R language") > x <- regexpr("x*ress",str) > x
[1] -1 4 -1
• gregexpr
returns a list of the same length as text
each
element of which is of the same form as the return value for regexpr
,
except that the starting positions of every (disjoint) match are
given.
> str <- c("Regular", "expression", "examples of R language") > x <- gregexpr("x*ress",str) > x
[[1]] [1] -1 attr(,"match.length") [1] -1 attr(,"useBytes") [1] TRUE [[2]] [1] 4 attr(,"match.length") [1] 4 attr(,"useBytes") [1] TRUE [[3]] [1] -1 attr(,"match.length") [1] -1 attr(,"useBytes") [1] TRUE
Regular Expression Syntax: