public final class Matcher extends Object implements MatchResult
Pattern.
A matcher is created from a pattern by invoking the pattern's
matcher method. Once created, a matcher can be used
to perform three different kinds of match operations:
The matches method attempts to match the entire input
sequence against the pattern.
The lookingAt method attempts to match the input sequence,
starting at the beginning, against the pattern.
The find method scans the input sequence looking for the next
subsequence that matches the pattern.
Each of these methods returns a boolean indicating success or failure. More information about a successful match can be obtained by querying the state of the matcher.
A matcher finds matches in a subset of its input called the region. By
default, the region contains all of the matcher's input. The region can be
modified via theregion method and queried via the
regionStart and regionEnd methods.
The way that the region boundaries interact with some pattern constructs can
be changed. See useAnchoringBounds and
useTransparentBounds for more details.
This class also defines methods for replacing matched subsequences with new
strings whose contents can, if desired, be computed from the match result.
The appendReplacement and appendTail methods can be used in tandem in order to collect the result into
an existing string buffer, or the more convenient replaceAll method can be used to create a string in which every matching
subsequence in the input sequence is replaced.
The explicit state of a matcher includes the start and end indices of the most recent successful match. It also includes the start and end indices of the input subsequence captured by each capturing group in the pattern as well as a total count of such subsequences. As a convenience, methods are also provided for returning these captured subsequences in string form.
The explicit state of a matcher is initially undefined; attempting to query
any part of it before a successful match will cause an
IllegalStateException to be thrown. The explicit state of a matcher
is recomputed by every match operation.
The implicit state of a matcher includes the input character sequence as well
as the append position, which is initially zero and is updated by the
appendReplacement method.
A matcher may be reset explicitly by invoking its reset() method or,
if a new input sequence is desired, its reset(CharSequence) method. Resetting a matcher discards its explicit state
information and sets the append position to zero.
Instances of this class are not safe for use by multiple concurrent threads.
| Modifier and Type | Field and Description |
|---|---|
static int |
CAPTURE_TREE
Enables the creation of a so-called Capture Tree during matching.
|
| Modifier and Type | Method and Description |
|---|---|
Matcher |
appendReplacement(StringBuffer sb,
CaptureReplacer replacer)
Implements a non-terminal append-and-replace step.
|
Matcher |
appendReplacement(StringBuffer sb,
java.util.function.Function<Matcher,String> evaluator) |
Matcher |
appendReplacement(StringBuffer sb,
String replacement)
Implements a non-terminal append-and-replace step.
|
StringBuffer |
appendTail(StringBuffer sb)
Implements a terminal append-and-replace step.
|
CaptureTree |
captureTree()
Returns the
CaptureTree of the previous match operation. |
int |
end()
Returns the offset after the last character matched.
|
int |
end(int group)
Returns the offset after the last character of the subsequence captured by
the given group during the previous match operation.
|
int |
end(String name)
Returns the offset after the last character of the subsequence captured by
the given named-capturing group during
the previous match operation.
|
boolean |
find()
Attempts to find the next subsequence of the input sequence that matches the
pattern.
|
boolean |
find(int start)
Resets this matcher and then attempts to find the next subsequence of the
input sequence that matches the pattern, starting at the specified index.
|
int |
getMode()
Returns this matcher's matching mode.
|
String |
group()
Returns the input subsequence matched by the previous match.
|
String |
group(int group)
Returns the input subsequence captured by the given group during the previous
match operation.
|
String |
group(String name)
Returns the input subsequence captured by the given
named-capturing group during the
previous match operation.
|
int |
groupCount()
Returns the number of capturing groups in this matcher's pattern.
|
boolean |
hasAnchoringBounds()
Queries the anchoring of region bounds for this matcher.
|
boolean |
hasTransparentBounds()
Queries the transparency of region bounds for this matcher.
|
boolean |
hitEnd()
Returns true if the end of input was hit by the search engine in the last
match operation performed by this matcher.
|
boolean |
lookingAt()
Attempts to match the input sequence, starting at the beginning of the
region, against the pattern.
|
boolean |
matches()
Attempts to match the entire region against the pattern.
|
Pattern |
pattern()
Returns the pattern that is interpreted by this matcher.
|
static String |
quoteReplacement(String s)
Returns a literal replacement
String for the specified
String. |
Matcher |
region(int start,
int end)
Sets the limits of this matcher's region.
|
int |
regionEnd()
Reports the end index (exclusive) of this matcher's region.
|
int |
regionStart()
Reports the start index of this matcher's region.
|
String |
replaceAll(CaptureReplacer replacer)
Replaces every subsequence of the input sequence that matches the pattern
with the replacement string computed with the given
CaptureReplacer. |
String |
replaceAll(java.util.function.Function<Matcher,String> evaluator)
Replaces every subsequence of the input sequence that matches the pattern
with the replacement string computed with the given Match Evaluator.
|
String |
replaceAll(String replacement)
Replaces every subsequence of the input sequence that matches the pattern
with the given replacement string.
|
String |
replaceFirst(CaptureReplacer replacer)
Replaces the first subsequence of the input sequence that matches the pattern
with the replacement string computed with the given
CaptureReplacer. |
String |
replaceFirst(java.util.function.Function<Matcher,String> evaluator) |
String |
replaceFirst(String replacement)
Replaces the first subsequence of the input sequence that matches the pattern
with the given replacement string.
|
boolean |
requireEnd()
Returns true if more input could change a positive match into a negative one.
|
Matcher |
reset()
Resets this matcher.
|
Matcher |
reset(CharSequence input)
Resets this matcher with a new input sequence.
|
void |
setMode(int mode)
Sets this matcher's matching mode
|
int |
start()
Returns the start index of the previous match.
|
int |
start(int group)
Returns the start index of the subsequence captured by the given group during
the previous match operation.
|
int |
start(String name)
Returns the start index of the subsequence captured by the given
named-capturing group during the
previous match operation.
|
MatchResult |
toMatchResult()
Returns the match state of this matcher as a
MatchResult. |
String |
toString()
Returns the string representation of this matcher.
|
Matcher |
useAnchoringBounds(boolean b)
Sets the anchoring of region bounds for this matcher.
|
Matcher |
usePattern(Pattern newPattern)
Changes the Pattern that this Matcher uses to find matches
with.
|
Matcher |
useTransparentBounds(boolean b)
Sets the transparency of region bounds for this matcher.
|
public static final int CAPTURE_TREE
setMode(int),
captureTree(),
Constant Field Valuespublic Pattern pattern()
public MatchResult toMatchResult()
MatchResult. The result
is unaffected by subsequent operations performed upon this matcher.MatchResult with the state of this matcherpublic Matcher usePattern(Pattern newPattern)
This method causes this matcher to lose information about the groups of the last match that occurred. The matcher's position in the input is maintained and its last append position is unaffected.
newPattern - The new pattern used by this matcherIllegalArgumentException - If newPattern is nullpublic Matcher reset()
Resetting a matcher discards all of its explicit state information and sets its append position to zero. The matcher's region is set to the default region, which is its entire character sequence. The anchoring and transparency of this matcher's region boundaries are unaffected.
public Matcher reset(CharSequence input)
Resetting a matcher discards all of its explicit state information and sets its append position to zero. The matcher's region is set to the default region, which is its entire character sequence. The anchoring and transparency of this matcher's region boundaries are unaffected.
input - The new input character sequencepublic int start()
start in interface MatchResultIllegalStateException - If no match has yet been attempted, or if the previous match
operation failedpublic int start(int group)
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.start(0) is equivalent to m. start().
start in interface MatchResultgroup - The index of a capturing group in this matcher's patternIllegalStateException - If no match has yet been attempted, or if the previous match
operation failedIndexOutOfBoundsException - If there is no capturing group in the pattern with the given
indexpublic int start(String name)
start in interface MatchResultname - The name of a named-capturing group in this matcher's pattern-1
if the match was successful but the group itself did not match
anythingIllegalStateException - If no match has yet been attempted, or if the previous match
operation failedIllegalArgumentException - If there is no capturing group in the pattern with the given namepublic int end()
end in interface MatchResultIllegalStateException - If no match has yet been attempted, or if the previous match
operation failedpublic int end(int group)
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.end(0) is equivalent to m. end().
end in interface MatchResultgroup - The index of a capturing group in this matcher's patternIllegalStateException - If no match has yet been attempted, or if the previous match
operation failedIndexOutOfBoundsException - If there is no capturing group in the pattern with the given
indexpublic int end(String name)
end in interface MatchResultname - The name of a named-capturing group in this matcher's pattern-1 if the match was successful but the group itself did not
match anythingIllegalStateException - If no match has yet been attempted, or if the previous match
operation failedIllegalArgumentException - If there is no capturing group in the pattern with the given namepublic String group()
For a matcher m with input sequence s, the expressions m.group() and s.substring(m. start(), m.end()) are equivalent.
Note that some patterns, for example a*, match the empty string. This method will return the empty string when the pattern successfully matches the empty string in the input.
group in interface MatchResultIllegalStateException - If no match has yet been attempted, or if the previous match
operation failedpublic String group(int group)
For a matcher m, input sequence s, and group index g , the expressions m.group(g) and s.substring(m.start(g), m.end(g)) are equivalent.
Capturing groups are indexed from left to right, starting at one. Group zero denotes the entire pattern, so the expression m.group(0) is equivalent to m.group().
If the match was successful but the group specified failed to match any part of the input sequence, then null is returned. Note that some groups, for example (a*), match the empty string. This method will return the empty string when such a group successfully matches the empty string in the input.
group in interface MatchResultgroup - The index of a capturing group in this matcher's patternIllegalStateException - If no match has yet been attempted, or if the previous match
operation failedIndexOutOfBoundsException - If there is no capturing group in the pattern with the given
indexpublic String group(String name)
If the match was successful but the group specified failed to match any part of the input sequence, then null is returned. Note that some groups, for example (a*), match the empty string. This method will return the empty string when such a group successfully matches the empty string in the input.
group in interface MatchResultname - The name of a named-capturing group in this matcher's patternIllegalStateException - If no match has yet been attempted, or if the previous match
operation failedIllegalArgumentException - If there is no capturing group in the pattern with the given namepublic void setMode(int mode)
mode - The matching mode, a bit mask that may include currently only
CAPTURE_TREEcaptureTree()public int getMode()
setMode(int)public CaptureTree captureTree()
CaptureTree of the previous match operation.
The CaptureTree contains all captures made during the previous match
operation of all capturing groups in a
hierarchical data structure. E.g.
Matcher matcher = Pattern.compile("(?x)" + "(?(DEFINE)" + "(?<sum> (?'summand')(?:\\+(?'summand'))+ )"
+ "(?<summand> (?'product') | (?'number') )" + "(?<product> (?'factor')(?:\\*(?'factor'))+ )"
+ "(?<factor>(?'number') ) " + "(?<number>\\d++)" + ")" + "(?'sum')").matcher("5+6*8");
matcher.setMode(Matcher.CAPTURE_TREE);
matcher.matches();
System.out.println(matcher.captureTree());
prints out
0
sum
summand
number
summand
product
factor
number
factor
number
CaptureTree of the previous match operationIllegalStateException - If no match has yet been attempted, or if the previous match
operation failed or if the CAPTURE_TREE matching mode hasn't been
set wjth setMode(int)CaptureTree,
setMode(int),
CAPTURE_TREEpublic int groupCount()
Group zero denotes the entire pattern by convention. It is not included in this count.
Any non-negative integer smaller than or equal to the value returned by this method is guaranteed to be a valid group index for this matcher.
groupCount in interface MatchResultpublic boolean matches()
If the match succeeds then more information can be obtained via the start, end, and group methods.
public boolean find()
This method starts at the beginning of this matcher's region, or, if a previous invocation of the method was successful and the matcher has not since been reset, at the first character not matched by the previous match.
If the match succeeds then more information can be obtained via the start, end, and group methods.
public boolean find(int start)
If the match succeeds then more information can be obtained via the
start, end, and group methods, and subsequent
invocations of the find() method will start at the first character
not matched by this match.
start - the index to start searching for a matchIndexOutOfBoundsException - If start is less than zero or if start is greater than the length
of the input sequence.public boolean lookingAt()
Like the matches method, this method always starts at the
beginning of the region; unlike that method, it does not require that the
entire region be matched.
If the match succeeds then more information can be obtained via the start, end, and group methods.
public static String quoteReplacement(String s)
String for the specified
String.
This method produces a String that will work as a literal
replacement s in the appendReplacement method of
the Matcher class. The String produced will match the
sequence of characters in s treated as a literal sequence.
Slashes ('\') and dollar signs ('$') will be given no special meaning.s - The string to be literalizedpublic Matcher appendReplacement(StringBuffer sb, String replacement)
This method performs the following actions:
It reads characters from the input sequence, starting at the append position,
and appends them to the given string buffer. It stops after reading the last
character preceding the previous match, that is, the character at index
start() - 1.
It appends the given replacement string to the string buffer.
It sets the append position of this matcher to the index of the last
character matched, plus one, that is, to end().
The replacement string may contain references to subsequences captured during
the previous match: Each occurrence of ${name } or
$g will be replaced by the result of evaluating the
corresponding group(name) or group(g) respectively. For $g, the first number after the
$ is always treated as part of the group reference. Subsequent
numbers are incorporated into g if they would form a legal group reference.
Only the numerals '0' through '9' are considered as potential components of
the group reference. If the second group matched the string "foo",
for example, then passing the replacement string "$2bar" would cause
"foobar" to be appended to the string buffer. A dollar sign
($) may be included as a literal in the replacement string by
preceding it with a backslash ( \$).
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
This method is intended to be used in a loop together with the
appendTail and find methods. The following
code, for example, writes one dog two dogs in the
yard to the standard-output stream:
Pattern p = Pattern.compile("cat");
Matcher m = p.matcher("one cat two cats in the yard");
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, "dog");
}
m.appendTail(sb);
System.out.println(sb.toString());
sb - The target string bufferreplacement - The replacement stringIllegalStateException - If no match has yet been attempted, or if the previous match
operation failedIllegalArgumentException - If the replacement string refers to a named-capturing group that
does not exist in the patternIndexOutOfBoundsException - If the replacement string refers to a capturing group that does
not exist in the patternpublic Matcher appendReplacement(StringBuffer sb, java.util.function.Function<Matcher,String> evaluator)
public Matcher appendReplacement(StringBuffer sb, CaptureReplacer replacer)
IllegalStateException - If no match has yet been attempted, or if the previous match
operation failed or if the CAPTURE_TREE matching mode hasn't been
set with setMode(int)replaceAll(CaptureReplacer)public StringBuffer appendTail(StringBuffer sb)
This method reads characters from the input sequence, starting at the append
position, and appends them to the given string buffer. It is intended to be
invoked after one or more invocations of the appendReplacement method in order to copy the remainder of the input
sequence.
sb - The target string bufferpublic String replaceAll(String replacement)
This method first resets this matcher. It then scans the input sequence
looking for matches of the pattern. Characters that are not part of any match
are appended directly to the result string; each match is replaced in the
result by the replacement string. The replacement string may contain
references to captured subsequences as in the appendReplacement method.
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
Given the regular expression a*b, the input "aabfooaabfooabfoob", and the replacement string "-", an invocation of this method on a matcher for that expression would yield the string "-foo-foo-foo-".
Invoking this method changes this matcher's state. If the matcher is to be used in further matching operations then it should first be reset.
replacement - The replacement stringpublic String replaceAll(java.util.function.Function<Matcher,String> evaluator)
evaluator - The Match Evaluator to be used to compute the replacement stringreplaceAll(String)public String replaceAll(CaptureReplacer replacer)
CaptureReplacer.
E.g.
Pattern pattern = Pattern.compile("(?x)" + "(?(DEFINE)" + "(?<sum> (?'summand')(?:\\+(?'summand'))+ )"
+ "(?<summand> (?'product') | (?'number') )" + "(?<product> (?'factor')(?:\\*(?'factor'))+ )"
+ "(?<factor>(?'number') )" + "(?<number>\\d++)" + ")" + "(?'sum')");
Matcher matcher = pattern.matcher("First: 6+7*8 Second: 6*8+7");
String replacement = matcher.replaceAll(new DefaultCaptureReplacer() {
@Override
public String replace(CaptureTreeNode node) {
if ("sum".equals(node.getGroupName())) {
return "sum(" + node.getChildren().stream().filter(n -> "summand".equals(n.getGroupName()))
.map(n -> replace(n)).collect(Collectors.joining(",")) + ")";
} else if ("product".equals(node.getGroupName())) {
return "product(" + node.getChildren().stream().filter(n -> "factor".equals(n.getGroupName()))
.map(n -> replace(n)).collect(Collectors.joining(",")) + ")";
} else
return super.replace(node);
}
});
System.out.println(replacement);
prints out
First: sum(6,product(7,8)) Second: sum(product(6,8),7)
replacer - The CaptureReplacer to be used to compute the replacement
stringreplaceAll(String),
DefaultCaptureReplacerpublic String replaceFirst(String replacement)
This method first resets this matcher. It then scans the input sequence
looking for a match of the pattern. Characters that are not part of the match
are appended directly to the result string; the match is replaced in the
result by the replacement string. The replacement string may contain
references to captured subsequences as in the appendReplacement method.
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.
Given the regular expression dog, the input "zzzdogzzzdogzzz", and the replacement string "cat", an invocation of this method on a matcher for that expression would yield the string "zzzcatzzzdogzzz".
Invoking this method changes this matcher's state. If the matcher is to be used in further matching operations then it should first be reset.
replacement - The replacement stringpublic String replaceFirst(java.util.function.Function<Matcher,String> evaluator)
public String replaceFirst(CaptureReplacer replacer)
CaptureReplacer.replacer - The CaptureReplacer to be used to compute the replacement
stringreplaceAll(CaptureReplacer)public Matcher region(int start, int end)
start parameter and end at the index specified by the
end parameter.
Depending on the transparency and anchoring being used (see
useTransparentBounds and
useAnchoringBounds), certain constructs such as
anchors may behave differently at or around the boundaries of the region.
start - The index to start searching at (inclusive)end - The index to end searching at (exclusive)IndexOutOfBoundsException - If start or end is less than zero, if start is greater than the
length of the input sequence, if end is greater than the length
of the input sequence, or if start is greater than end.public int regionStart()
regionStart (inclusive) and regionEnd (exclusive).public int regionEnd()
regionStart (inclusive) and regionEnd (exclusive).public boolean hasTransparentBounds()
This method returns true if this matcher uses transparent bounds, false if it uses opaque bounds.
See useTransparentBounds for a description of
transparent and opaque bounds.
By default, a matcher uses opaque region boundaries.
Matcher.useTransparentBounds(boolean)public Matcher useTransparentBounds(boolean b)
Invoking this method with an argument of true will set this matcher to use transparent bounds. If the boolean argument is false, then opaque bounds will be used.
Using transparent bounds, the boundaries of this matcher's region are transparent to lookahead, lookbehind, and boundary matching constructs. Those constructs can see beyond the boundaries of the region to see if a match is appropriate.
Using opaque bounds, the boundaries of this matcher's region are opaque to lookahead, lookbehind, and boundary matching constructs that may try to see beyond them. Those constructs cannot look past the boundaries so they will fail to match anything outside of the region.
By default, a matcher uses opaque bounds.
b - a boolean indicating whether to use opaque or transparent regionsMatcher.hasTransparentBounds()public boolean hasAnchoringBounds()
This method returns true if this matcher uses anchoring bounds, false otherwise.
See useAnchoringBounds for a description of
anchoring bounds.
By default, a matcher uses anchoring region boundaries.
Matcher.useAnchoringBounds(boolean)public Matcher useAnchoringBounds(boolean b)
Invoking this method with an argument of true will set this matcher to use anchoring bounds. If the boolean argument is false, then non-anchoring bounds will be used.
Using anchoring bounds, the boundaries of this matcher's region match anchors such as ^ and $.
Without anchoring bounds, the boundaries of this matcher's region will not match anchors such as ^ and $.
By default, a matcher uses anchoring region boundaries.
b - a boolean indicating whether or not to use anchoring bounds.Matcher.hasAnchoringBounds()public String toString()
Returns the string representation of this matcher. The string representation
of a Matcher contains information that may be useful for
debugging. The exact format is unspecified.
public boolean hitEnd()
Returns true if the end of input was hit by the search engine in the last match operation performed by this matcher.
When this method returns true, then it is possible that more input would have changed the result of the last search.
public boolean requireEnd()
Returns true if more input could change a positive match into a negative one.
If this method returns true, and a match was found, then more input could cause the match to be lost. If this method returns false and a match was found, then more input might change the match but the match won't be lost. If a match was not found, then requireEnd has no meaning.
Copyright © 2019. All rights reserved.