Skip to content

Imageomics/emb-explorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Embedding Explorer

DOI

Visual exploration and clustering tool for image embeddings. Users can either bring pre-calculated embeddings to explore, or use the interface to embed their images and then explore those embeddings.

Screenshots

Embed & Explore Precalculated Embedding Exploration
Embedding Interface Smart Filtering
Cluster Summary Interactive Exploration
Taxonomy Tree

Features

Embed & Explore - Embed images using pretrained models (CLIP, BioCLIP), cluster with K-Means, visualize with PCA/t-SNE/UMAP, and repartition images by cluster.

Precalculated Embeddings - Load parquet files (or directories of parquets) with precomputed embeddings, apply dynamic cascading filters, and explore clusters with taxonomy tree navigation. See Data Format for the expected schema and Backend Pipeline for how embeddings flow through clustering and visualization.

Installation

git clone https://github.com/Imageomics/emb-explorer.git
cd emb-explorer

# Using uv (recommended)
uv venv && source .venv/bin/activate
uv pip install -e .

GPU Acceleration (optional)

A GPU is not required — everything works on CPU out of the box. But if you have an NVIDIA GPU with CUDA, clustering and dimensionality reduction (KMeans, t-SNE, UMAP) will be significantly faster via cuML.

# CUDA 12.x 
uv pip install -e ".[gpu-cu12]"

# CUDA 13.x
uv pip install -e ".[gpu-cu13]"

The app auto-detects GPU availability at runtime and falls back to CPU if anything goes wrong — no configuration needed. You can also manually select backends (cuML, FAISS, sklearn) in the sidebar.

Usage

Standalone Apps

# Embed & Explore - Interactive image embedding and clustering
streamlit run apps/embed_explore/app.py

# Precalculated Embeddings - Explore precomputed embeddings from parquet
streamlit run apps/precalculated/app.py

Entry Points (after pip install)

emb-embed-explore    # Launch Embed & Explore app
emb-precalculated    # Launch Precalculated Embeddings app
list-models          # List available embedding models

Example Data

An example dataset (data/example_1k.parquet) is provided with BioCLIP 2 embeddings for testing. Please see the data README for more information about this sample set.

Remote HPC Usage

# On compute node
streamlit run apps/precalculated/app.py --server.port 8501

# On local machine (port forwarding)
ssh -N -L 8501:<COMPUTE_NODE>:8501 <USER>@<LOGIN_NODE>

# Access at http://localhost:8501

Acknowledgements

OpenCLIP | Streamlit | Altair

About

An interactive tool for classifying images with a pretrained model and exploring clustering results in 2D space.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors