dstk.data_visualization package#
Submodules#
dstk.data_visualization.plot_embeddings module#
Visualization utilities for word embeddings using UMAP dimensionality reduction.
This module provides a tool to project high-dimensional word embeddings into 2D or 3D space for visualization purposes. By employing the UMAP algorithm, it reduces complex data dimensions while preserving the underlying semantic structure, allowing users to visually explore how words relate to one another in a vector space.
Core functionalities include:
Projecting high-dimensional embeddings into both 2D and 3D coordinate systems.
Generating interactive scatter plots using Plotly for intuitive exploration.
Automatically identifying and coloring clusters based on input data.
Customizing UMAP parameters (e.g., number of neighbors, distance metrics) to refine the projection.
Exporting visualizations as standalone HTML files for easy sharing or offline viewing.
This module is designed to help linguists and digital humanities researchers interpret word embeddings by identifying semantic clusters and patterns through visual inspection.
- dstk.data_visualization.plot_embeddings.plot_embeddings(embeddings: DataFrame, n_dimensions: Literal[2, 3] = 2, labels: bool = False, show: bool = True, path: str | None = None, n_neighbors: int = 15, projection_metric: str = 'cosine', min_dist: float = 0.1, approximate: int | None = None) Figure[source]#
Generates a 2D or 3D visualization of word embeddings using UMAP.
- Parameters:
embeddings (DataFrame) – DataFrame containing word embeddings.
n_dimensions (int) – Output dimensionality (2 or 3).
labels (bool) – If True, display word labels on points.
show (bool) – If True, display the plot.
path (str | None) – If provided, save the figure as an HTML file.
n_neighbors (int) – Number of neighbors used by UMAP.
projection_metric (str) – Distance metric used by UMAP (e.g. “cosine”).
min_dist (float) – Minimum distance between embedded points.
approximate (int | None) – If set, subsamples the embeddings for faster computation.
- Returns:
Plotly figure with the projected embeddings.
- Return type:
Figure