Section <GRPAR> contains rules to complete the
partial parsing provided by the chart parser. The tree is
completed by combining chunk pairs as stated by the rules. Rules
are applied from highest priority (lower values) to lowest
priority (higher values), and left-to right.
That is, the pair of adjacent chunks matching the most prioritary
rule is found, and the rule is applied, joining both chunks in
one. The process is repeated until only one chunk is left.
The rules can be enabled/disabled via the activation of global flags. Each rule may be stated to be enabled only if certain flags are on. If none of its enabling flags are on, the rule is not applied. Each rule may also state which flags have to be toggled on/off after its application, thus enabling/disabling other rule subsets.
Each line contains a rule, with the format:
priority flags context (ancestor,descendant) operation op-params flag-opswhere:
priority is a number stating the priority of a rule
(the lower the number, the higher the priority).
flags is a list of strings separated by vertical bars
(``|''). Each string is the name of a flag that will
cause the rule to be enabled. If enabling_flags equals
``-'', the rule will be always enabled.
context is a context limiting the application of the
rule only to chunk pairs that are surrounded by the appropriate
context (``-'' means no limitations, and the rule is applied to
any matching chunk pair) (see below).
(ancestor,descendant) are the labels of the adjacent pair of
chunks the rule will be applied to. The labels are either assigned by
the chunk parser, or by a RELABEL operation on some other completion rule.
The pair must be enclosed in parenthesis, separated by a comma, and
contain NO whitespaces.
The chunk labels may be suffixed with one extra condition of the form:
(form), <lemma>, [class], or {PoS_regex}.
For instance,
| The label: | Would match: |
np |
any chunk labeled np by the chunker |
np(cats) |
any chunk labeled np by the chunker |
with a head word with form cats |
|
np<cat> |
any chunk labeled np by the chunker |
with a head word with lemma cat |
|
np[animal] |
any chunk labeled np by the chunker |
with a head word with a lemma in animal |
|
category (see CLASS section below) |
|
np{^N.M} |
any chunk labeled np by the chunker |
| with a head word with a PoS tag matching | |
the ^N.M regular expression |
operation is the way in which ancestor
and descendant nodes are to be combined (see below).
op-params component has two meanings, depending
on the operation field: top_left and
top_right operations must be followed by the literal
RELABEL plus the new label(s) to assign to the chunks.
Other operations must be followed by the literal MATCHING
plus the label to be matched.
For top_left and top_right operations the labels
following the keyword RELABEL state the labels with which
each chunk in the pair will be relabelled, in the format
label1:label2. If specified, label1 will be the
new label for the left chunk, and label2 the one for the
right chunk. A dash ( ``-'') means no relabelling. In none of
both chunks is to be relabelled, ``-'' may be used instead of
``-:-''.
For example, the rule:
20 - - (np,pp<of>) top_left RELABEL np-of:- - will hang the pp chunk as a daughter of the left chunk
in the pair (i.e. np), then
relabel the np to np-of, and leave the label for
the pp unchanged.
For last_left, last_right and
cover_last_left operations, the label following the keyword
MATCHING states the label that a node must have in
order to be considered a valid ``last'' and get the subtree as
a new child. This label may carry the same modifying suffixes
than the chunk labels. If no node with this label is found in
the tree, the rule is not applied.
For example, the rule:
20 - - (vp,pp<of>) last_left MATCHING np -
will hang the pp chunk as a daughter of the last subtree
labeled np found inside the vp chunk.
flag-ops is a space-separated list of
flags to be toggled on/off. The list may be empty (meaning that
the rule doesn't change the status of any flag). If a flag name
is preceded by a ``+'', it will be toggled on. If the leading
symbol is a ``-'', it will be toggled off.
For instance, the rule:
20 - - (np,pp<of>) top_left RELABEL - -
states that if two subtrees labelled np and pp are
found contiguous in the partial tree, and the second head word has
lemma of, then the later (rightmost) is added as a new child
of the former (leftmost), whatever the context is, without need of
any special flag active, and performing no relabelling of the new
tree root.
The supported tree-building operations are the following:
top_left: The right subtree is added as a daughter of
the left subtree. The root of the new tree is the root of the
left subtree. If a label value other than ``-'' is
specified, the root is relabelled with that string.
last_left: The right subtree is added as a daughter of
the last node inside the left subtree matching label value
(or to the root if none is found). The root of the new tree is
the root of the left subtree.
top_right: The left subtree is added as a new daughter
of the right subtree. The root of the new tree is the root of the
right subtree. If a label value other than ``-'' is
specified, the root is relabelled with that string.
last_right: The left subtree is added as a daughter of
the last node inside the right subtree matching label
value (or to the root if none is found). The root of the new tree
is the root of the right subtree.
cover_last_left: The left subtree (label value. The node
The context may be specified as a sequence of chunk labels,
separated by underscores ``_''.
One of the chunk labels must be $$, and refers to the pair of chunks
which the rule is being applied to.
For instance, the rule:
20 - $$_vp (np,pp<of>) top_left RELABEL -
would add the rightmost chunk in the pair (pp<of>) under the
leftmost (np) only if the chunk immediate to the right of the pair
is labeled vp.
Other admitted labels in the context are: ? (matching exactly
one chunk, with any label), * (matching zero or more chunks
with any label), and OUT (matching a sentence boundary).
For instance the context np_$$_*_vp_?_OUT would match a
sentence in which the focus pair of chunks is immediately after an
np, and the second-to-last chunk is labeled vp.
Context conditions can be globally negated preceding them with an exclamation
mark (!). E.g. !np_$$_*_vp would cause the rule to be applied only
if that particular context is not satisfied.
Context condition components may also be individually negated
preceding them with the symbol ~. E.g. the rule
np_$$_~vp would be satisfied if the preceding chunk is
labeled np and the following chunk has any label but
vp.
Enabling flags may be defined and used at the grammarian's will. For instance, the rule:
20 INIT|PH1 $$_vp (np,pp<of>) last_left MATCHING npms[animal] +PH2 -INIT -PH1
Will be applied if either INIT or PH1 flags are
on, the chunk pair is a np followed by a pp with head
lemma of, and the context (one vp chunk following the
pair) is met. Then, the deepest rightmost node matching the label
npms[animal] will be sought in the left chunk, and the right
chunk will be linked as one of its children. If no such node is found,
the rule will not be applied.
After applying the rule, the flag PH2 will be toggled
on, and the flags INIT and PH1 will be toggled
off.
The only predefined flag is INIT, which is toggled on when
the parsing starts. The grammarian can define any alphanumerical
string as a flag, simply toggling it on in some rule.
Lluís Padró 2013-09-09