dstk.adaptors package#

Submodules#

dstk.adaptors.adaptors module#

This module provides function decorators that adapt the input types of processing functions to improve flexibility and composability across workflows.

Specifically, it includes:

  • accepts_sentences_and_collocates: Allows a function to seamlessly handle both individual token sequences and lists of such sequences (e.g., sentences or collocate groups).

  • accepts_tags: Allows functions designed for plain tokens to accept and return POS-tagged inputs (POSTaggedWord), preserving tag alignment.

These adaptors make it easier to integrate diverse data types into a unified processing pipeline without requiring duplication of logic.

dstk.adaptors.adaptors.accepts_sentences_and_collocates(method: Callable[[...], T]) Callable[[...], list[T] | T][source]#

Decorator that allows a function to accept either a single input (e.g., a list of tokens or collocates) or a list of such inputs (e.g., sentences or collocate groups). If a list of inputs is passed, the function is applied to each element in the list, and a list of results is returned.

If the input is not a list of sentences or collocates, the function is applied normally.

Parameters:

method (Callable[..., T]) – The function to wrap.

Returns:

A wrapped function that handles both single and batched inputs.

Return type:

Callable[…, list[T] | T]

dstk.adaptors.adaptors.accepts_tags(method: Callable[[...], T]) Callable[[...], list[POSTaggedWord] | T][source]#

Decorator that allows a function designed to operate on plain tokens to also handle POS-tagged word inputs (i.e., sequences of POSTaggedWord).

The function will automatically extract the token part, apply the method, and then re-attach the POS tags to the result. If the number of returned tokens does not match the original length, the POS tags are inferred from a lowercase mapping of the original words.

Parameters:

method (Callable[..., T]) – The function to wrap.

Returns:

A wrapped function that processes POS-tagged input and returns a POSTaggedWordList.

Return type:

Callable[…, POSTaggedWordList | T]

dstk.adaptors.typeguards module#

Provides a set of type guard functions to safely and explicitly check the types of various token and workflow-related objects.

These functions help with runtime type checking and enable more precise type hinting and static analysis when working with linguistic data structures such as:

  • POS-tagged word lists

  • Collocates lists

  • Sentences (token or string sequences)

  • Workflow step definitions

  • Token-based collocates

By using these type guards, code can branch safely based on the structure and types of input data, improving robustness and developer experience.

Example:

if is_pos_tags(tokens):
    # tokens is now narrowed to POSTaggedWordList type
    process_pos_tags(tokens)
dstk.adaptors.typeguards.is_collocates(tokens: Any) TypeGuard[list[tuple[Word, ...]]][source]#

Checks if the input is a list of collocate tuples, where each tuple contains strings or Token instances, cexcluding types like POSTaggedWord or Bigram.

Parameters:

tokens (Any) – The object to check.

Returns:

True if tokens is a non-empty list of tuples of strings or Token instances (excluding POSTaggedWord and Bigram), otherwise False.

Return type:

bool

dstk.adaptors.typeguards.is_pos_tags(tokens: Any) TypeGuard[list[POSTaggedWord]][source]#

Checks if the input is a list of POS-tagged words (POSTaggedWordList).

Parameters:

tokens (Any) – The object to check.

Returns:

True if tokens is a non-empty list where all elements are instances of POSTaggedWord, otherwise False.

Return type:

bool

dstk.adaptors.typeguards.is_sentence(tokens: Any) TypeGuard[list[list[Word]] | list[list[tuple[Word, ...]] | list[tuple[Word, tuple[str, str]]] | list[POSTaggedWord] | list[Bigram]]][source]#

Checks if the input is a list of sentences, where each sentence is either:

  • A list of Token instances,

  • A list of strings, or

  • A list of POSTaggedWord instances.

Parameters:

tokens (Any) – The object to check.

Returns:

True if tokens matches the described sentence structure, otherwise False.

Return type:

bool

dstk.adaptors.typeguards.is_token_collocates(collocates: tuple[Word, ...]) TypeGuard[tuple[Token, ...]][source]#

Checks if the input collocates tuple consists exclusively of Token instances and excludes Bigram and POSTaggedWord types.

Parameters:

collocates (Collocates) – The collocates tuple to check.

Returns:

True if all elements in collocates are Token instances and not Bigram or POSTaggedWord, otherwise False.

Return type:

bool

dstk.adaptors.typeguards.is_workflow(workflow: Any) TypeGuard[list[dict[str, dict[str, Any]]]][source]#

Checks if the input is a workflow structure, i.e., a non-empty list of dictionaries where each dictionary maps string method names to argument dictionaries with string keys.

Parameters:

workflow (Any) – The object to check.

Returns:

True if workflow matches the workflow structure, otherwise False.

Return type:

bool

Module contents#