antu.io.token_indexers package

Submodules

antu.io.token_indexers.char_token_indexer module

class antu.io.token_indexers.char_token_indexer.CharTokenIndexer(related_vocabs: List[str], transform: Callable[[str], str] = <function CharTokenIndexer.<lambda>>)[source]

Bases: antu.io.token_indexers.token_indexer.TokenIndexer

A CharTokenIndexer determines how string token get represented as arrays of list of character indices in a model.

Parameters:
related_vocabs : List[str]

Which vocabularies are related to the indexer.

transform : Callable[[str,], str], optional (default=``lambda x:x``)

What changes need to be made to the token when counting or indexing. Commonly used are lowercase transformation functions.

Methods

count_vocab_items(token, counters, Dict[str, …) Each character in the token is counted directly as an element.
tokens_to_indices(tokens, vocab) Takes a list of tokens and converts them to one or more sets of indices.
count_vocab_items(token: str, counters: Dict[str, Dict[str, int]]) → None[source]

Each character in the token is counted directly as an element.

Parameters:
counter : Dict[str, Dict[str, int]]

We count the number of strings if the string needs to be counted to some counters.

tokens_to_indices(tokens: List[str], vocab: antu.io.vocabulary.Vocabulary) → Dict[str, List[List[int]]][source]

Takes a list of tokens and converts them to one or more sets of indices. During the indexing process, each token item corresponds to a list of index in the vocabulary.

Parameters:
vocab : Vocabulary

vocab is used to get the index of each item.

antu.io.token_indexers.single_id_token_indexer module

class antu.io.token_indexers.single_id_token_indexer.SingleIdTokenIndexer(related_vocabs: List[str], transform: Callable[[str], str] = <function SingleIdTokenIndexer.<lambda>>)[source]

Bases: antu.io.token_indexers.token_indexer.TokenIndexer

A SingleIdTokenIndexer determines how string token get represented as arrays of single id indices in a model.

Parameters:
related_vocabs : List[str]

Which vocabularies are related to the indexer.

transform : Callable[[str,], str], optional (default=``lambda x:x``)

What changes need to be made to the token when counting or indexing. Commonly used are lowercase transformation functions.

Methods

count_vocab_items(token, counters, Dict[str, …) The token is counted directly as an element.
tokens_to_indices(tokens, vocab) Takes a list of tokens and converts them to one or more sets of indices.
count_vocab_items(token: str, counters: Dict[str, Dict[str, int]]) → None[source]

The token is counted directly as an element.

Parameters:
counter : Dict[str, Dict[str, int]]

We count the number of strings if the string needs to be counted to some counters.

tokens_to_indices(tokens: List[str], vocab: antu.io.vocabulary.Vocabulary) → Dict[str, List[int]][source]

Takes a list of tokens and converts them to one or more sets of indices. During the indexing process, each item corresponds to an index in the vocabulary.

Parameters:
vocab : Vocabulary

vocab is used to get the index of each item.

Returns:
res : Dict[str, List[int]]

if the token and index list is [w1:5, w2:3, w3:0], the result will be {‘vocab_name’ : [5, 3, 0]}

antu.io.token_indexers.token_indexer module

class antu.io.token_indexers.token_indexer.TokenIndexer[source]

Bases: object

A TokenIndexer determines how string tokens get represented as arrays of indices in a model.

Methods

count_vocab_items(token, counter, Dict[str, …) Defines how each token in the field is counted.
tokens_to_indices(tokens, vocab) Takes a list of tokens and converts them to one or more sets of indices.
count_vocab_items(token: str, counter: Dict[str, Dict[str, int]]) → None[source]

Defines how each token in the field is counted. In most cases, just use the string as a key. However, for character-level TokenIndexer, you need to traverse each character in the string.

Parameters:
counter : Dict[str, Dict[str, int]]

We count the number of strings if the string needs to be counted to some counters.

tokens_to_indices(tokens: List[str], vocab: antu.io.vocabulary.Vocabulary) → Dict[str, Indices][source]

Takes a list of tokens and converts them to one or more sets of indices. This could be just an ID for each token from the vocabulary.

Parameters:
vocab : Vocabulary

vocab is used to get the index of each item.

Module contents