|
kgrams
0.1.0
|
Word dictionary for language models. More...
#include <Dictionary.h>

Public Member Functions | |
| Dictionary () | |
| Default constructor. More... | |
| Dictionary (const std::vector< std::string > &dict) | |
| Initialize Dictionary from list of words. More... | |
| bool | contains (std::string word) const |
| Check if a word is contained in the Dictionary. More... | |
| void | insert (std::string word) |
| Insert a word in the Dictionary. More... | |
| std::string | word (std::string index) const |
| Return the word corresponding to a given word index. More... | |
| std::string | index (std::string word) const |
| Return the index corresponding to a given word. More... | |
| size_t | length () const |
| Return size of the dictionary, excluding the special tokens (BOS, EOS, UNK). More... | |
| size_t | size () const |
| Return size of the dictionary, excluding the special tokens (BOS, EOS, UNK). More... | |
| std::pair< size_t, std::string > | kgram_code (std::string kgram) const |
| Extract k-gram code from a string. More... | |
Word dictionary for language models.
This class has two main purposes: (i) store a list of "known" words to be used within a language model and (ii) provide conversions between word and k-gram tokens and word and k-gram codes (strings of integers), where the latters are employed in the internal implementation of kgramFreqs class.
|
inline |
Default constructor.
Only special tokens (BOS, EOS, UNK) are included in the dictionary.
|
inline |
Initialize Dictionary from list of words.
| dict | A vector of strings. List of words to be included in the dictionary. |
In addition to the words explicitly included, the constructor also adds the special tokens (BOS, EOS, UNK) to the dictionary.
|
inline |
Check if a word is contained in the Dictionary.
| word | A string. |
|
inline |
Return the index corresponding to a given word.
| word | A string. |
|
inline |
Insert a word in the Dictionary.
| word | A string. |
|
inline |
Extract k-gram code from a string.
| kgram | a string. |
Automatically takes care of leading, trailing and multiple spaces, recognizes the EOS token.
|
inline |
Return size of the dictionary, excluding the special tokens (BOS, EOS, UNK).
|
inline |
Return size of the dictionary, excluding the special tokens (BOS, EOS, UNK).
|
inline |
Return the word corresponding to a given word index.
| index | A string. |
1.8.17