antu.io package¶

Submodules¶

antu.io.instance module¶

class antu.io.instance.Instance(fields: List[antu.io.fields.field.Field] = None)[source]¶

Bases: collections.abc.Mapping, typing.Generic

An Instance is a collection (list) of multiple data fields.

Parameters:	fields : `List[Field]`, optional (default=``None``) A list of multiple data fields.

Methods

`add_field`(field)	Add the field to the existing `Instance`.
`count_vocab_items`(counter, Dict[str, int]])	Increments counts in the given `counter` for all of the vocabulary items in all of the `Fields` in this `Instance`.
`dynamic_index_fields`(vocab, dynamic_fields)	Indexes all fields in this `Instance` using the provided `Vocabulary`.
`get`(k[,d])
`index_fields`(vocab)	Indexes all fields in this `Instance` using the provided `Vocabulary`.
`items`()
`keys`()
`values`()

add_field(field: antu.io.fields.field.Field) → None[source]¶

Add the field to the existing Instance.

Parameters:	field : `Field` Which field needs to be added.

count_vocab_items(counter: Dict[str, Dict[str, int]]) → None[source]¶

Increments counts in the given counter for all of the vocabulary items in all of the Fields in this Instance.

Parameters:	counter : `Dict[str, Dict[str, int]]` We count the number of strings if the string needs to be counted to some counters.

dynamic_index_fields(vocab: antu.io.vocabulary.Vocabulary, dynamic_fields: Set[str]) → Dict[str, Dict[str, Indices]][source]¶

Indexes all fields in this Instance using the provided Vocabulary. This mutates the current object, it does not return a new Instance. A DataIterator will call this on each pass through a dataset; we use the indexed flag to make sure that indexing only happens once. This means that if for some reason you modify your vocabulary after you’ve indexed your instances, you might get unexpected behavior.

Parameters:	vocab : `Vocabulary` `vocab` is used to get the index of each item.
Returns:	res : `Dict[str, Dict[str, Indices]]` Returns the Indices corresponding to the instance. The first key is field name and the second key is the vocabulary name.

index_fields(vocab: antu.io.vocabulary.Vocabulary) → Dict[str, Dict[str, Indices]][source]¶

Indexes all fields in this Instance using the provided Vocabulary. This mutates the current object, it does not return a new Instance. A DataIterator will call this on each pass through a dataset; we use the indexed flag to make sure that indexing only happens once. This means that if for some reason you modify your vocabulary after you’ve indexed your instances, you might get unexpected behavior.

Parameters:	vocab : `Vocabulary` `vocab` is used to get the index of each item.
Returns:	res : `Dict[str, Dict[str, Indices]]` Returns the Indices corresponding to the instance. The first key is field name and the second key is the vocabulary name.

antu.io.vocabulary module¶

class antu.io.vocabulary.Vocabulary(counters: Dict[str, Dict[str, int]] = {}, min_count: Dict[str, int] = {}, pretrained_vocab: Dict[str, List[str]] = {}, intersection_vocab: Dict[str, str] = {}, no_pad_namespace: Set[str] = {}, no_unk_namespace: Set[str] = {})[source]¶

Bases: object

Parameters:

counters : Dict[str, Dict[str, int]], optional (default= dict() ): Element statistics for datasets.
min_count : Dict[str, int], optional (default= dict() ): Defines the minimum number of occurrences when some counter are converted to vocabulary.
pretrained_vocab : Dict[str, List[str]], optional (default= dict(): External pre-trained vocabulary.
intersection_vocab : Dict[str, str], optional (default= dict() ): Defines the intersection with which vocabulary takes, when loading some oversized pre-trained vocabulary.
no_pad_namespace : Set[str], optional (default= set() ): Defines which vocabularies do not have pad token.
no_unk_namespace : Set[str], optional (default= set() ): Defines which vocabularies do not have oov token.

Methods

`add_token_to_namespace`(token, namespace)	Extend the vocabulary by add token to vocabulary namespace.
`extend_from_counter`(counters, Dict[str, …)	Extend the vocabulary from the dataset statistic counters after defining the vocabulary.
`extend_from_pretrained_vocab`(…)	Extend the vocabulary from the pre-trained vocabulary after defining the vocabulary.
`get_token_from_index`(index, vocab_name)	Gets the token of a index in the vocabulary.
`get_token_index`(token, vocab_name)	Gets the index of a token in the vocabulary.
`get_vocab_size`(namespace)	Gets the size of a vocabulary.

get_padding_index
get_unknow_index

add_token_to_namespace(token: str, namespace: str) → None[source]¶

Extend the vocabulary by add token to vocabulary namespace.

Parameters:	token : `str` The token that needs to be added. namespace : `str` Which vocabulary needs to be added to.

extend_from_counter(counters: Dict[str, Dict[str, int]], min_count: Union[int, Dict[str, int]] = {}, no_pad_namespace: Set[str] = {}, no_unk_namespace: Set[str] = {}) → None[source]¶

Extend the vocabulary from the dataset statistic counters after defining the vocabulary.

Parameters:

counters : Dict[str, Dict[str, int]]: Element statistics for datasets.
min_count : Dict[str, int], optional (default= dict() ): Defines the minimum number of occurrences when some counter are converted to vocabulary.
no_pad_namespace : Set[str], optional (default= set() ): Defines which vocabularies do not have pad token.
no_unk_namespace : Set[str], optional (default= set() ): Defines which vocabularies do not have oov token.

extend_from_pretrained_vocab(pretrained_vocab: Dict[str, List[str]], intersection_vocab: Dict[str, str] = {}, no_pad_namespace: Set[str] = {}, no_unk_namespace: Set[str] = {}) → None[source]¶

Extend the vocabulary from the pre-trained vocabulary after defining the vocabulary.

Parameters:

pretrained_vocab : Dict[str, List[str]]: External pre-trained vocabulary.
intersection_vocab : Dict[str, str], optional (default= dict() ): Defines the intersection with which vocabulary takes, when loading some oversized pre-trained vocabulary.
no_pad_namespace : Set[str], optional (default= set() ): Defines which vocabularies do not have pad token.
no_unk_namespace : Set[str], optional (default= set() ): Defines which vocabularies do not have oov token.

get_padding_index(namespace: str) → int[source]¶

get_token_from_index(index: int, vocab_name: str) → str[source]¶

Gets the token of a index in the vocabulary.

Parameters:	index : `int` Gets the token of which index. namespace : `str` Which vocabulary this index belongs to.
Returns:	Token : `str`

get_token_index(token: str, vocab_name: str) → int[source]¶

Gets the index of a token in the vocabulary.

Parameters:	token : `str` Gets the index of which token. namespace : `str` Which vocabulary this token belongs to.
Returns:	Index : `int`

get_unknow_index(namespace: str) → int[source]¶

get_vocab_size(namespace: str) → int[source]¶

Gets the size of a vocabulary.

Parameters:	namespace : `str` Which vocabulary.
Returns:	Vocabulary size : `int`

antu.io package¶

Subpackages¶

Submodules¶

antu.io.instance module¶

antu.io.vocabulary module¶

Module contents¶