dstk.adaptors package#
Submodules#
dstk.adaptors.adaptors module#
This module provides function decorators that adapt the input types of processing functions to improve flexibility and composability across workflows.
Specifically, it includes:
accepts_sentences_and_collocates: Allows a function to seamlessly handle both individual token sequences and lists of such sequences (e.g., sentences or collocate groups).
accepts_tags: Allows functions designed for plain tokens to accept and return POS-tagged inputs (POSTaggedWord), preserving tag alignment.
These adaptors make it easier to integrate diverse data types into a unified processing pipeline without requiring duplication of logic.
- dstk.adaptors.adaptors.accepts_sentences_and_collocates(method: Callable[[...], T]) Callable[[...], list[T] | T] [source]#
Decorator that allows a function to accept either a single input (e.g., a list of tokens or collocates) or a list of such inputs (e.g., sentences or collocate groups). If a list of inputs is passed, the function is applied to each element in the list, and a list of results is returned.
If the input is not a list of sentences or collocates, the function is applied normally.
- Parameters:
method (Callable[..., T]) – The function to wrap.
- Returns:
A wrapped function that handles both single and batched inputs.
- Return type:
Callable[…, list[T] | T]
- dstk.adaptors.adaptors.accepts_tags(method: Callable[[...], T]) Callable[[...], list[POSTaggedWord] | T] [source]#
Decorator that allows a function designed to operate on plain tokens to also handle POS-tagged word inputs (i.e., sequences of POSTaggedWord).
The function will automatically extract the token part, apply the method, and then re-attach the POS tags to the result. If the number of returned tokens does not match the original length, the POS tags are inferred from a lowercase mapping of the original words.
- Parameters:
method (Callable[..., T]) – The function to wrap.
- Returns:
A wrapped function that processes POS-tagged input and returns a POSTaggedWordList.
- Return type:
Callable[…, POSTaggedWordList | T]
dstk.adaptors.typeguards module#
Provides a set of type guard functions to safely and explicitly check the types of various token and workflow-related objects.
These functions help with runtime type checking and enable more precise type hinting and static analysis when working with linguistic data structures such as:
POS-tagged word lists
Collocates lists
Sentences (token or string sequences)
Workflow step definitions
Token-based collocates
By using these type guards, code can branch safely based on the structure and types of input data, improving robustness and developer experience.
Example:
if is_pos_tags(tokens):
# tokens is now narrowed to POSTaggedWordList type
process_pos_tags(tokens)
- dstk.adaptors.typeguards.is_collocates(tokens: Any) TypeGuard[list[tuple[Word, ...]]] [source]#
Checks if the input is a list of collocate tuples, where each tuple contains strings or Token instances, cexcluding types like POSTaggedWord or Bigram.
- Parameters:
tokens (Any) – The object to check.
- Returns:
True if tokens is a non-empty list of tuples of strings or Token instances (excluding POSTaggedWord and Bigram), otherwise False.
- Return type:
bool
- dstk.adaptors.typeguards.is_pos_tags(tokens: Any) TypeGuard[list[POSTaggedWord]] [source]#
Checks if the input is a list of POS-tagged words (POSTaggedWordList).
- Parameters:
tokens (Any) – The object to check.
- Returns:
True if tokens is a non-empty list where all elements are instances of POSTaggedWord, otherwise False.
- Return type:
bool
- dstk.adaptors.typeguards.is_sentence(tokens: Any) TypeGuard[list[list[Word]] | list[list[tuple[Word, ...]] | list[tuple[Word, tuple[str, str]]] | list[POSTaggedWord] | list[Bigram]]] [source]#
Checks if the input is a list of sentences, where each sentence is either:
A list of Token instances,
A list of strings, or
A list of POSTaggedWord instances.
- Parameters:
tokens (Any) – The object to check.
- Returns:
True if tokens matches the described sentence structure, otherwise False.
- Return type:
bool
- dstk.adaptors.typeguards.is_token_collocates(collocates: tuple[Word, ...]) TypeGuard[tuple[Token, ...]] [source]#
Checks if the input collocates tuple consists exclusively of Token instances and excludes Bigram and POSTaggedWord types.
- Parameters:
collocates (Collocates) – The collocates tuple to check.
- Returns:
True if all elements in collocates are Token instances and not Bigram or POSTaggedWord, otherwise False.
- Return type:
bool
- dstk.adaptors.typeguards.is_workflow(workflow: Any) TypeGuard[list[dict[str, dict[str, Any]]]] [source]#
Checks if the input is a workflow structure, i.e., a non-empty list of dictionaries where each dictionary maps string method names to argument dictionaries with string keys.
- Parameters:
workflow (Any) – The object to check.
- Returns:
True if workflow matches the workflow structure, otherwise False.
- Return type:
bool