| grep {base} | R Documentation |
grep searches for matches to pattern (its first
argument) within the character vector x (second
argument). regexpr does too, but returns more detail in a
different format.
sub and gsub perform replacement of matches
determined by regular expression matching.
grep(pattern, x, ignore.case=FALSE, extended=TRUE, value=FALSE)
sub(pattern, replacement, x,
ignore.case=FALSE, extended=TRUE)
gsub(pattern, replacement, x,
ignore.case=FALSE, extended=TRUE)
regexpr(pattern, text, extended=TRUE)
pattern |
character string containing a regular expression
to be matched in the vector of character string vec. |
x, text |
a character vector where matches are sought. |
ignore.case |
if FALSE, the pattern matching is
case sensitive and if TRUE, case is ignored during matching. |
extended |
if TRUE, extended regular expression matching
is used, and if FALSE basic regular expressions are used. |
value |
if FALSE, a vector containing the (integer) indices
of the matches determined by grep is returned,
and if TRUE, a vector containing the matching
elements themselves is returned. |
replacement |
a replacement for matched pattern in
sub and gsub. |
The two *sub functions differ only in that sub replaces only
the first occurrence of a pattern whereas gsub replaces
all occurrences.
The regular expressions used are those specified by POSIX 1003.2,
either extended or basic, depending on the value of the
extended argument.
For grep a vector giving either the indices of the elements
of x that yielded a match or, if value is TRUE,
the matched elements.
For sub and gsub a character vector of the same
length as the original.
For regexpr an integer vector of the same length as
text giving the starting position of the first match, or -1
if there is none, with attribute "match.length" giving the
length of the matched text (or -1 for no match).
charmatch, pmatch, match.
apropos uses regexps and has nice examples.
grep("[a-z]", letters)
txt <- c("arm","foot","lefroo", "bafoobar")
if(any(i <- grep("foo",txt)))
cat("`foo' appears at least once in\n\t",txt,"\n")
i # 2 and 4
txt[i]
## Double all 'a' or 'b's; "\" must be escaped, i.e. `doubled'
gsub("([ab])", "\\1_\\1_", "abc and ABC")
txt <- c("The", "licenses", "for", "most", "software", "are",
"designed", "to", "take", "away", "your", "freedom",
"to", "share", "and", "change", "it.",
"", "By", "contrast,", "the", "GNU", "General", "Public", "License",
"is", "intended", "to", "guarantee", "your", "freedom", "to",
"share", "and", "change", "free", "software", "--",
"to", "make", "sure", "the", "software", "is",
"free", "for", "all", "its", "users")
( i <- grep("[gu]", txt) ) # indices
stopifnot( txt[i] == grep("[gu]", txt, value = TRUE) )
(ot <- sub("[b-e]",".", txt))
txt[ot != gsub("[b-e]",".", txt)]#- gsub does "global" substitution
txt[gsub("g","#", txt) !=
gsub("g","#", txt, ignore.case = TRUE)] # the "G" words
regexpr("en", txt)