putput.presets package

Submodules

putput.presets.displaCy module

putput.presets.displaCy.preset() → Callable[source]

Configures the Pipeline for the ‘DISPLACY’ ENT format.

The ENT format: https://spacy.io/usage/visualizers#manual-usage

Returns:A Callable that when called returns parameters for instantiating a Pipeline. This Callable can be passed into putput.Pipeline as the ‘preset’ argument.

Examples

>>> import json
>>> from pathlib import Path
>>> from putput.pipeline import Pipeline
>>> pattern_def_path = Path(__file__).parent.parent.parent / 'tests' / 'doc' / 'example_pattern_definition.yml'
>>> dynamic_token_patterns_map = {'ITEM': ('fries',)}
>>> p = Pipeline.from_preset(preset(),
...                          pattern_def_path,
...                          dynamic_token_patterns_map=dynamic_token_patterns_map)
>>> generator = p.flow(disable_progress_bar=True)
>>> for token_visualizer, group_visualizer in generator:
...     print(json.dumps(token_visualizer, sort_keys=True))
...     print(json.dumps(group_visualizer, sort_keys=True))
...     break
{"ents": [{"end": 11, "label": "ADD", "start": 0},
        {"end": 17, "label": "ITEM", "start": 12},
        {"end": 29, "label": "ADD", "start": 18},
        {"end": 35, "label": "ITEM", "start": 30},
        {"end": 39, "label": "CONJUNCTION", "start": 36},
        {"end": 45, "label": "ITEM", "start": 40}],
"text": "can she get fries can she get fries and fries",
"title": "Tokens"}
{"ents": [{"end": 17, "label": "ADD_ITEM", "start": 0},
        {"end": 35, "label": "ADD_ITEM", "start": 18},
        {"end": 39, "label": "None", "start": 36},
        {"end": 45, "label": "None", "start": 40}],
        "text": "can she get fries can she get fries and fries",
        "title": "Groups"}

putput.presets.factory module

putput.presets.factory.get_preset(preset: str) → Callable[source]

A factory that gets a ‘preset’ Callable.

Parameters:preset – the preset’s name.
Returns:The return value of calling a preset’s ‘preset’ function without arguments.

Examples

>>> from pathlib import Path
>>> from putput.pipeline import Pipeline
>>> pattern_def_path = Path(__file__).parent.parent.parent / 'tests' / 'doc' / 'example_pattern_definition.yml'
>>> dynamic_token_patterns_map = {'ITEM': ('fries',)}
>>> p = Pipeline.from_preset('IOB2',
...                          pattern_def_path,
...                          dynamic_token_patterns_map=dynamic_token_patterns_map)
>>> generator = p.flow(disable_progress_bar=True)
>>> for utterance, tokens, groups in generator:
...     print(utterance)
...     print(tokens)
...     print(groups)
...     break
can she get fries can she get fries and fries
('B-ADD I-ADD I-ADD', 'B-ITEM', 'B-ADD I-ADD I-ADD', 'B-ITEM', 'B-CONJUNCTION', 'B-ITEM')
('B-ADD_ITEM I-ADD_ITEM I-ADD_ITEM I-ADD_ITEM', 'B-ADD_ITEM I-ADD_ITEM I-ADD_ITEM I-ADD_ITEM',
 'B-None', 'B-None')

putput.presets.iob2 module

putput.presets.iob2.preset(*, tokens_to_include: Optional[Sequence[str]] = None, tokens_to_exclude: Optional[Sequence[str]] = None, groups_to_include: Optional[Sequence[str]] = None, groups_to_exclude: Optional[Sequence[str]] = None) → Callable[source]

Configures the Pipeline for ‘IOB2’ format.

Adheres to IOB2: https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging).

This function should be used as the ‘preset’ argument of putput.Pipeline instead of the ‘IOB2’ str to specify which tokens and groups map to ‘O’.

Parameters:
  • tokens_to_include – A sequence of tokens that should not be mapped to ‘O’. Useful if the majority of tokens should be excluded. Cannot be used in conjunction with ‘tokens_to_exclude’.
  • tokens_to_exclude – A sequence of tokens that should map to ‘O’. Useful if the majority of tokens should be included. Cannot be used in conjunction with ‘tokens_to_include’.
  • groups_to_include – A sequence of groups that should not be mapped to ‘O’. Useful if the majority of groups should be excluded. Cannot be used in conjunction with ‘groups_to_exclude’.
  • groups_to_exclude – A sequence of groups that should map to ‘O’. Useful if the majority of groups should be included. Cannot be used in conjunction with ‘groups_to_include’.
Returns:

A Callable that when called returns parameters for instantiating a Pipeline. This Callable can be passed into putput.Pipeline as the ‘preset’ argument.

Examples

>>> from pathlib import Path
>>> from putput.pipeline import Pipeline
>>> pattern_def_path = Path(__file__).parent.parent.parent / 'tests' / 'doc' / 'example_pattern_definition.yml'
>>> dynamic_token_patterns_map = {'ITEM': ('fries',)}
>>> p = Pipeline.from_preset(preset(tokens_to_include=('ITEM',), groups_to_include=('ADD_ITEM',)),
...                          pattern_def_path,
...                          dynamic_token_patterns_map=dynamic_token_patterns_map)
>>> generator = p.flow(disable_progress_bar=True)
>>> for utterance, tokens, groups in generator:
...     print(utterance)
...     print(tokens)
...     print(groups)
...     break
can she get fries can she get fries and fries
('O O O', 'B-ITEM', 'O O O', 'B-ITEM', 'O', 'B-ITEM')
('B-ADD_ITEM I-ADD_ITEM I-ADD_ITEM I-ADD_ITEM', 'B-ADD_ITEM I-ADD_ITEM I-ADD_ITEM I-ADD_ITEM', 'O', 'O')

putput.presets.luis module

putput.presets.luis.preset(*, intent_map: Mapping[str, str] = None, entities: Optional[Sequence[str]] = None) → Callable[source]

Configures the Pipeline for LUIS test format.

Adheres to: https://docs.microsoft.com/en-us/azure/cognitive-services/luis/luis-tutorial-batch-testing.

This function should be used as the ‘preset’ argument of putput.Pipeline instead of the ‘LUIS’ str to specify intents and entities.

Examples

>>> import json
>>> from pathlib import Path
>>> from putput.pipeline import Pipeline
>>> from pprint import pprint
>>> import random
>>> random.seed(0)
>>> pattern_folder = Path(__file__).parent.parent.parent / 'tests' / 'doc'
>>> pattern_def_path = pattern_folder / 'example_pattern_definition_with_intents.yml'
>>> dynamic_token_patterns_map = {'ITEM': ('fries',)}
>>> p = Pipeline.from_preset('LUIS',
...                          pattern_def_path,
...                          dynamic_token_patterns_map=dynamic_token_patterns_map)
>>> for luis_result in p.flow(disable_progress_bar=True):
...     print(json.dumps(luis_result, sort_keys=True))
...     break
{"entities": [{"endPos": 16, "entity": "ITEM", "startPos": 12},
              {"endPos": 34, "entity": "ITEM", "startPos": 30},
              {"endPos": 44, "entity": "ITEM", "startPos": 40}],
 "intent": "ADD_INTENT",
 "text": "can she get fries can she get fries and fries"}
Parameters:
  • intent_map – A mapping from an utterance pattern string to a single intent. The value ‘__DISCARD’ is reserved.
  • entities – A sequence of tokens that are considered entities. To make all tokens entities, give a list with only the value ‘__ALL’. E.g. entities=[‘_ALL’]
Returns:

A Callable that when called returns parameters for instantiating a Pipeline. This Callable can be passed into putput.Pipeline as the ‘preset’ argument.

putput.presets.stochastic module

putput.presets.stochastic.preset(*, chance: int = 20) → Callable[source]

Randomly replaces words with synonyms from wordnet synsets.

Tags each word in the utterance with nltk’s part of speech tagger. Using the part of speech, each word in the utterance is replaced with a randomly chosen word from the first synset with the same part of speech as the word to replace, subject to the specified chance. If no synset exists with the same part of speech, the original word will not be replaced.

Downloads nltk’s wordnet, punkt, and averaged_perceptron_tagger if non-existent on the host.

Parameters:chance – The chance between [0, 100] for each word to be replaced by a synonym.
Returns:A Callable that when called returns parameters for instantiating a Pipeline. This Callable can be passed into putput.Pipeline as the ‘preset’ argument.

Examples

>>> from pathlib import Path
>>> from putput.pipeline import Pipeline
>>> pattern_def_path = Path(__file__).parent.parent.parent / 'tests' / 'doc' / 'example_pattern_definition.yml'
>>> dynamic_token_patterns_map = {'ITEM': ('fries',)}
>>> p = Pipeline.from_preset(preset(chance=100),
...                          pattern_def_path,
...                          dynamic_token_patterns_map=dynamic_token_patterns_map,
...                          seed=0)
>>> generator = p.flow(disable_progress_bar=True)
>>> for utterance, tokens, groups in generator:
...     print(utterance)
...     print(tokens)
...     print(groups)
...     break
can she acquire chips can she acquire french-fried_potatoes and french_fries
('[ADD(can she acquire)]', '[ITEM(chips)]',
 '[ADD(can she acquire)]', '[ITEM(french-fried_potatoes)]',
 '[CONJUNCTION(and)]', '[ITEM(french_fries)]')
('{[ADD(can she acquire)] [ITEM(chips)]}',
 '{[ADD(can she acquire)] [ITEM(french-fried_potatoes)]}',
 '{[CONJUNCTION(and)]}', '{[ITEM(french_fries)]}')

Module contents