antu.io.token_indexers package¶
Submodules¶
antu.io.token_indexers.char_token_indexer module¶
-
class
antu.io.token_indexers.char_token_indexer.
CharTokenIndexer
(related_vocabs: List[str], transform: Callable[[str], str] = <function CharTokenIndexer.<lambda>>)[source]¶ Bases:
antu.io.token_indexers.token_indexer.TokenIndexer
A
CharTokenIndexer
determines how string token get represented as arrays of list of character indices in a model.Parameters: - related_vocabs :
List[str]
Which vocabularies are related to the indexer.
- transform :
Callable[[str,], str]
, optional (default=``lambda x:x``) What changes need to be made to the token when counting or indexing. Commonly used are lowercase transformation functions.
Methods
count_vocab_items
(token, counters, Dict[str, …)Each character in the token is counted directly as an element. tokens_to_indices
(tokens, vocab)Takes a list of tokens and converts them to one or more sets of indices. -
count_vocab_items
(token: str, counters: Dict[str, Dict[str, int]]) → None[source]¶ Each character in the token is counted directly as an element.
Parameters: - counter :
Dict[str, Dict[str, int]]
We count the number of strings if the string needs to be counted to some counters.
- counter :
-
tokens_to_indices
(tokens: List[str], vocab: antu.io.vocabulary.Vocabulary) → Dict[str, List[List[int]]][source]¶ Takes a list of tokens and converts them to one or more sets of indices. During the indexing process, each token item corresponds to a list of index in the vocabulary.
Parameters: - vocab :
Vocabulary
vocab
is used to get the index of each item.
- vocab :
- related_vocabs :
antu.io.token_indexers.single_id_token_indexer module¶
-
class
antu.io.token_indexers.single_id_token_indexer.
SingleIdTokenIndexer
(related_vocabs: List[str], transform: Callable[[str], str] = <function SingleIdTokenIndexer.<lambda>>)[source]¶ Bases:
antu.io.token_indexers.token_indexer.TokenIndexer
A
SingleIdTokenIndexer
determines how string token get represented as arrays of single id indices in a model.Parameters: - related_vocabs :
List[str]
Which vocabularies are related to the indexer.
- transform :
Callable[[str,], str]
, optional (default=``lambda x:x``) What changes need to be made to the token when counting or indexing. Commonly used are lowercase transformation functions.
Methods
count_vocab_items
(token, counters, Dict[str, …)The token is counted directly as an element. tokens_to_indices
(tokens, vocab)Takes a list of tokens and converts them to one or more sets of indices. -
count_vocab_items
(token: str, counters: Dict[str, Dict[str, int]]) → None[source]¶ The token is counted directly as an element.
Parameters: - counter :
Dict[str, Dict[str, int]]
We count the number of strings if the string needs to be counted to some counters.
- counter :
-
tokens_to_indices
(tokens: List[str], vocab: antu.io.vocabulary.Vocabulary) → Dict[str, List[int]][source]¶ Takes a list of tokens and converts them to one or more sets of indices. During the indexing process, each item corresponds to an index in the vocabulary.
Parameters: - vocab :
Vocabulary
vocab
is used to get the index of each item.
Returns: - res :
Dict[str, List[int]]
if the token and index list is [w1:5, w2:3, w3:0], the result will be {‘vocab_name’ : [5, 3, 0]}
- vocab :
- related_vocabs :
antu.io.token_indexers.token_indexer module¶
-
class
antu.io.token_indexers.token_indexer.
TokenIndexer
[source]¶ Bases:
object
A
TokenIndexer
determines how string tokens get represented as arrays of indices in a model.Methods
count_vocab_items
(token, counter, Dict[str, …)Defines how each token in the field is counted. tokens_to_indices
(tokens, vocab)Takes a list of tokens and converts them to one or more sets of indices. -
count_vocab_items
(token: str, counter: Dict[str, Dict[str, int]]) → None[source]¶ Defines how each token in the field is counted. In most cases, just use the string as a key. However, for character-level
TokenIndexer
, you need to traverse each character in the string.Parameters: - counter :
Dict[str, Dict[str, int]]
We count the number of strings if the string needs to be counted to some counters.
- counter :
-
tokens_to_indices
(tokens: List[str], vocab: antu.io.vocabulary.Vocabulary) → Dict[str, Indices][source]¶ Takes a list of tokens and converts them to one or more sets of indices. This could be just an ID for each token from the vocabulary.
Parameters: - vocab :
Vocabulary
vocab
is used to get the index of each item.
- vocab :
-