This chapter explains the method of parsing with a grammar developed with the MAYZ toolkit.
While the MAYZ toolkit supports the development of a lexicon and templates, we need a parser for the parsing of sentences with a grammar developed by MAYZ. The package of MAYZ includes "UP", an efficient general-purpose parser for unification-based grammars. With implements several interfaces required by UP, you can parse sentences with the developed grammar.
To use UP, interfaces for accessing a grammar and probabilistic models must be implemented. The interfaces are defined in "mayz/parser.lil".
The interfaces of UP at least required for parsing are as follows. Grammar writers need to implement all of them.
| sentence_to_word_lattice(+$Input, -$WordLattice) | |
| $Input | input sentence |
| $WordLattice | list of extent |
| Splits an input sentence $Input into words, and returns a word lattice $WordLattice. | |
| lexical_entry(+$Word, -$LexName) | |
| $Word | input word |
| $LexName | name of a lexical entry |
| Returns the name of a lexical entry assigned to $Word. A word can have multiple $LexName. | |
| lexical_entry_sign(+$LexName, -$Sign) | |
| $LexName | name of a lexical entry |
| $Sign | sign of a lexical entry |
| Returns the sign of a lexical entry. A unique sign must be assigned to $LexName. | |
| id_schema_unary(+$SchemaName, +$Dtr, -$Mother, -$DCP) | |
| $SchemaName | schema name |
| $Dtr | sign of the daughter |
| $Mother | sign of the mother |
| $DCP | LiLFeS program executed after schema application |
| Applies a unary schema. If your grammar does not require unary rules, this need not be implemented. | |
| id_schema_binary(+$SchemaName, +$Left, +$Right, -$Mother, -$DCP) | |
| $SchemaName | schema name |
| $Left | sign of the left daughter |
| $Right | sign of the right daughter |
| $Mother | sign of the mother |
| $DCP | LiLFeS program executed after schema application |
| Applies a binary schema. | |
| root_sign($Sign) | |
| $Sign | sign of the root node |
| Condition of a root node. | |
| reduce_sign(+$InSign, -$OutSign, -$SignPlus) | |
| $InSign | the sign of the mother of schema application |
| $OutSign | a reduced sign |
| $SignPlus | information removed from $OutSign |
| This predicate is applied to the mother sign after the success of schema application. In a following process of parsing, $OutSign is used instead of $InSign. By removing unnecessary information from $InSign (e.g. daughter structures), equivalent $OutSigns are factored and regarded as a unique sign in the following process. $SignPlus can have the information removed from the sign, and it is stored in SIGN_PLUS of 'edge_link'. | |
"mayz/sample_hpsg.lil" is an example grammar of HPSG and includes a sample implementation of the above interfaces.
Since the above interfaces do not have access to probabilistic models, a parser cannot invoke disambiguation. If you use UP with the grammar with the above interfaces only, run UP with the option "-nofom". For example, when you use "mayz/sample_hpsg.lil", run the following command.
% up -i -nofom -l mayz/sample_hpsg
When you need disambiguation, the following interfaces must be implemented. With implementing the followings, UP computes figures-of-merit (FOM) during parsing, and we can obtain the best analysis using 'best_fom_sign/2' etc. Since FOMs are summed up, log-probabilities should be used when you apply probabilistic models.
| fom_root(+$Sign, -$FOM) | |
| $Sign | sign of the root node |
| $FOM | FOM of the root node |
| Returns FOM of the root node. | |
| fom_binary(+$RuleName, +$LeftDtr, +$RightDtr, +$MotherSign, +$SignPlus, -$FOM) | |
| $RuleName | schema name |
| $LeftDtr | sign of the left daughter |
| $RightDtr | sign of the right daughter |
| $MotherSign | sign of the mother |
| $SignPlus | 3rd argument of 'reduce_sign/3' |
| $FOM | FOM |
| Returns FOM of binary schema application. | |
| fom_unary(+$RuleName, +$Dtr, +$MotherSign, +$SignPlus, -$FOM) | |
| $RuleName | schema name |
| $Dtr | sign of the daughter |
| $MotherSign | sign of the mother |
| $SignPlus | 3rd argument of 'reduce_sign/3' |
| $FOM | FOM |
| Returns FOM of unary schema application. | |
| fom_terminal(+$LexName, +$Sign, +$SignPlus, -$FOM) | |
| $LexName | LEX_NAME (the second argument of 'lexical_entry/3') |
| $Sign | sign of a lexical entry |
| $SignPlus | 3rd argument of 'reduce_sign/3' |
| $FOM | FOM |
| Returns FOM of a terminal sign. | |
| fom_lexical entry(+$Word, +$LexName, -$FOM) | |
| $Word | word |
| $LexName | LEX_NAME (the second argument of 'lexical_entry/3') |
| $FOM | FOM |
| Returns FOM of a lexical entry | |
When you use UP with the grammar with the above interfaces, run UP with the option "-fom" or "-iter". For example, when the grammar file is "mygrammar.lil", execute the following command.
% up -i -iter -l mygrammar
See the manual of UP for other functions of UP.
MAYZ provides functions only for getting a lexicon and templates from a database. Grammar developers are supposed to implement the interfaces of UP. For details, see "How to use UP".
MAYZ provides the following tools for accessing the databases of a lexicon and templates. They are implemented in "mayz/grammar.lil". MAYZ also provides a tool for employing an external tagger.
| import_lexicon($LexFile, $TemplateFile) | |
| $LexFile | file name of a lexicon |
| $TemplateFile | file name of a template database |
| Imports a lexicon and a template database. | |
| lookup_lexicon(+$Word, -$TempNameList) | |
| $Word | a feature structure representing a "word" |
| $TempNameList | a list of lex_template |
| Returns a list of template names assigned to a word by looking up a lexicon. | |
| lookup_template(+$TempName, -$Template) | |
| $TempName | lex_template |
| $Template | a feature structure |
| Returns a feature structure of a lexical entry template by looking up a template database. | |
To use the above tools, you need to implement the following interfaces.
| lexicon_lookup_key(+$Word, -$Key) | |
| $Word | a feature structure representing a "word" |
| $Key | a key for looking up a lexicon |
| Given a feature structure representing a "word" (an element of the list returned by 'sentence_to_word_lattice/2'), this interface returns a key for looking up a lexicon (corresponding to the third argument of 'inverse_lexical_rule/5' and the fourth argument of 'lexical_rule/5'). | |
| unknown_word_lookup_key(+$Word, -$Key) | |
| $Word | A feature structure representing a "word" |
| $Key | a key for looking up a lexicon |
| Given a feature structure representing a "word", this interface returns a key for looking up a lexicon for an unknown word. | |
When making a lexical entry in 'lexical_entry/2' and 'lexical_entry_sign/2', the tools "lookup_lexicon/2" and "lookup_template/2" will be used.
Probabilistic models developed using unimaker, or forestmaker can be used as a figure-of-merit (FOM) model in UP. MAYZ provides a parser, mayzup, specialized for the probabilistic models developed with MAYZ. This parser provides builtin-predicates for computing FOM (log probability) using interfaces used in the development of probabilistic models, i.e., extract_XXX_event and feature_mask/3.
The following predicates are provided only in mayzup.
| init_amis_model(+$ModelName, +$ModelFile) | |
| $ModelName | model name |
| $ModelFile | name of the parameter file |
| Initializes a model with reading parameters from $ModelFile, and also incorporates corresponding feature_masks. | |
| delete_amis_model(+$ModelName) | |
| $ModelName | model name |
| delete a model created by 'init_amis_model/2'. | |
| amis_event_weight(+$ModelName, +$Category, +$Event, -$FOM) | |
| $ModelName | model name |
| $Category | category name |
| $Event | event (list of strings) |
| $FOM | FOM of the event (log probability) |
| Returns FOM (log probability) of the event represented as a list of strings. 'feature_mask/3' of the category $Category is used. | |
| amis_log_probability(+$ModelName, +$Category, +$EventList, -$FOM) | |
| $ModelName | model name |
| $Category | category name |
| $EventList | list of events (list of lists of events) |
| $FOM | list of FOMs |
| Computes a weight of each event in $EventList, and computes its probability by normalizing weights. | |
FOM of an event can be computed using the above built-in predicates. Computed FOMs are passed to a parser using the interfaces introduced in "How to use UP".
The usage of mayzup is almost the same as up. For example, when you use "mygrammar.lil", run the following command.
% mayzup -i -iter -l mygrammar