dstk.templates package#

Submodules#

dstk.templates.rules module#

Defines type-based exclusion rules that constrain which methods can be applied to data at different stages in a workflow.

Each rule maps a data type (e.g., POSTaggedWordList, Sentences, str, Neighbors) to a set of module-specific restrictions. These rules help ensure that operations are semantically valid and compatible with the current data representation, enabling type-aware validation and error handling during workflow execution.

Structure:

Each rule is a RulesTemplate (dict) where:

  • Keys are module names (e.g., “tokenizer”, “text_processor”).

  • Values define methods to exclude (either a list of method names or “*” for all).

The TypeRules dictionary aggregates all individual rules and serves as a centralized configuration for type-based behavior enforcement.

Use case: These rules are primarily used in the validation step of workflow builders to prevent method misuse based on data type.

dstk.templates.templates module#

Defines reusable templates that specify the structure and constraints of workflows and step-based pipelines.

Each template outlines the allowed sequence of method steps for a given module, enforcing constraints such as:

  • Which methods are permitted or excluded at each step (include / exclude)

  • Whether methods can be used more than once (repeat)

  • Whether more than one method can be selected on each step (chaining)

  • How types are transformed via triggers

Key Components:

  • Workflow Templates (WorkflowTemplate): Describe valid method sequences for individual modules (e.g., tokenization, text processing, dimensionality reduction).

  • Stage Templates (StageTemplate): Group related modules into stages to define multi-module workflows.

  • Stage Modules (StageModules): Define allowed module names for each stage in a stage-based workflow.

These templates are used by WorkflowBuilder and StageWorkflowBuilder to validate workflows and enforce correct sequencing of operations.

Examples of Defined Templates:

  • TokenizerTemplate: Defines the tokenization workflow including model selection, unit selection (sentences/tokens), and token processing.

  • TextProcessorTemplate: Defines generic text processing steps like lowercasing, joining, etc.

  • TextMatrixBuilderTemplate: Specifies steps to create a document-term and co-occurrence matrix.

  • PlotEmbeddingsTemplate: Governs how word embeddings are plotted after clustering.

The templates provide a flexible and declarative way to define what each step in a workflow is allowed to do based on processing intent and data type.

Module contents#