antu.io.token_indexers package¶

Submodules¶

antu.io.token_indexers.char_token_indexer module¶

class antu.io.token_indexers.char_token_indexer.CharTokenIndexer(related_vocabs: List[str], transform: Callable[[str], str] = <function CharTokenIndexer.<lambda>>)[source]¶

Bases: antu.io.token_indexers.token_indexer.TokenIndexer

A CharTokenIndexer determines how string token get represented as arrays of list of character indices in a model.

Parameters:	related_vocabs : `List[str]` Which vocabularies are related to the indexer. transform : `Callable[[str,], str]`, optional (default=``lambda x:x``) What changes need to be made to the token when counting or indexing. Commonly used are lowercase transformation functions.

Methods

`count_vocab_items`(token, counters, Dict[str, …)	Each character in the token is counted directly as an element.
`tokens_to_indices`(tokens, vocab)	Takes a list of tokens and converts them to one or more sets of indices.

count_vocab_items(token: str, counters: Dict[str, Dict[str, int]]) → None[source]¶

Each character in the token is counted directly as an element.

Parameters:	counter : `Dict[str, Dict[str, int]]` We count the number of strings if the string needs to be counted to some counters.

tokens_to_indices(tokens: List[str], vocab: antu.io.vocabulary.Vocabulary) → Dict[str, List[List[int]]][source]¶

Takes a list of tokens and converts them to one or more sets of indices. During the indexing process, each token item corresponds to a list of index in the vocabulary.

Parameters:	vocab : `Vocabulary` `vocab` is used to get the index of each item.

antu.io.token_indexers.single_id_token_indexer module¶

class antu.io.token_indexers.single_id_token_indexer.SingleIdTokenIndexer(related_vocabs: List[str], transform: Callable[[str], str] = <function SingleIdTokenIndexer.<lambda>>)[source]¶

Bases: antu.io.token_indexers.token_indexer.TokenIndexer

A SingleIdTokenIndexer determines how string token get represented as arrays of single id indices in a model.

Parameters:	related_vocabs : `List[str]` Which vocabularies are related to the indexer. transform : `Callable[[str,], str]`, optional (default=``lambda x:x``) What changes need to be made to the token when counting or indexing. Commonly used are lowercase transformation functions.

Methods

`count_vocab_items`(token, counters, Dict[str, …)	The token is counted directly as an element.
`tokens_to_indices`(tokens, vocab)	Takes a list of tokens and converts them to one or more sets of indices.

count_vocab_items(token: str, counters: Dict[str, Dict[str, int]]) → None[source]¶

The token is counted directly as an element.

Parameters:	counter : `Dict[str, Dict[str, int]]` We count the number of strings if the string needs to be counted to some counters.

tokens_to_indices(tokens: List[str], vocab: antu.io.vocabulary.Vocabulary) → Dict[str, List[int]][source]¶

Takes a list of tokens and converts them to one or more sets of indices. During the indexing process, each item corresponds to an index in the vocabulary.

Parameters:	vocab : `Vocabulary` `vocab` is used to get the index of each item.
Returns:	res : `Dict[str, List[int]]` if the token and index list is [w1:5, w2:3, w3:0], the result will be {‘vocab_name’ : [5, 3, 0]}

antu.io.token_indexers.token_indexer module¶

class antu.io.token_indexers.token_indexer.TokenIndexer[source]¶

Bases: object

A TokenIndexer determines how string tokens get represented as arrays of indices in a model.

Methods

`count_vocab_items`(token, counter, Dict[str, …)	Defines how each token in the field is counted.
`tokens_to_indices`(tokens, vocab)	Takes a list of tokens and converts them to one or more sets of indices.

count_vocab_items(token: str, counter: Dict[str, Dict[str, int]]) → None[source]¶

Defines how each token in the field is counted. In most cases, just use the string as a key. However, for character-level TokenIndexer, you need to traverse each character in the string.

Parameters:	counter : `Dict[str, Dict[str, int]]` We count the number of strings if the string needs to be counted to some counters.

tokens_to_indices(tokens: List[str], vocab: antu.io.vocabulary.Vocabulary) → Dict[str, Indices][source]¶

Takes a list of tokens and converts them to one or more sets of indices. This could be just an ID for each token from the vocabulary.

Parameters:	vocab : `Vocabulary` `vocab` is used to get the index of each item.

antu.io.token_indexers package¶

Submodules¶

antu.io.token_indexers.char_token_indexer module¶

antu.io.token_indexers.single_id_token_indexer module¶

antu.io.token_indexers.token_indexer module¶

Module contents¶