Runpod Flash Examples

A collection of example applications showcasing Runpod Flash - a framework for building production-ready AI applications with distributed GPU and CPU computing.

What is Flash?

Flash is a Python framework that lets you run functions on Runpod's Serverless infrastructure with a single decorator. Write code locally, deploy globally—Flash handles provisioning, scaling, and routing automatically.

from runpod_flash import Endpoint, GpuType

@Endpoint(name="image-gen", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, dependencies=["torch", "diffusers"])
async def generate_image(prompt: str) -> bytes:
    # This runs on a cloud GPU, not your laptop
    ...

Key features:

@Endpoint decorator: Mark any async function to run on serverless infrastructure
Auto-scaling: Scale to zero when idle, scale up under load
Local development: flash run starts a local server with hot reload
One-command deploy: flash deploy packages and ships your code

Prerequisites

Python 3.10+
uv: Install with curl -LsSf https://astral.sh/uv/install.sh | sh
Runpod account: Sign up here

Quick Start

# Clone and install
git clone https://github.com/runpod/flash-examples.git
cd flash-examples
uv sync && uv pip install -e .

# Authenticate with Runpod
uv run flash login

# Run all examples locally
uv run flash run

Open http://localhost:8888/docs to explore all endpoints.

Using pip, poetry, or conda? See DEVELOPMENT.md for alternative setups.

Examples

Category	Example	Description
Getting Started	01_hello_world	Basic GPU worker
	02_cpu_worker	CPU-only worker
	03_mixed_workers	GPU + CPU pipeline
	04_dependencies	Dependency management
ML Inference	01_text_to_speech	Qwen3-TTS model serving
Advanced	05_load_balancer	HTTP routing with load balancer
Scaling	01_autoscaling	Worker autoscaling configuration
Data	01_network_volumes	Persistent storage with network volumes

More examples coming soon in each category.

CLI Commands

flash login              # Authenticate with Runpod (opens browser)
flash run                # Run development server (localhost:8888)
flash build              # Build deployment package
flash deploy --env <name># Build and deploy to environment
flash undeploy <name>    # Delete deployed endpoint

See CLI-REFERENCE.md for complete documentation.

Key Concepts

Endpoint

The Endpoint class configures functions for execution on Runpod's serverless infrastructure:

Queue-based (one function = one endpoint):

from runpod_flash import Endpoint, GpuType

@Endpoint(name="my-worker", gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, workers=(0, 3), dependencies=["torch"])
async def process(data: dict) -> dict:
    import torch
    # this code runs on Runpod GPUs
    return {"result": "processed"}

Load-balanced (multiple routes, shared workers):

from runpod_flash import Endpoint

api = Endpoint(name="my-api", cpu="cpu3c-1-2", workers=(1, 3))

@api.get("/health")
async def health():
    return {"status": "ok"}

@api.post("/compute")
async def compute(data: dict) -> dict:
    return {"result": data}

Client mode (connect to an existing endpoint):

from runpod_flash import Endpoint

ep = Endpoint(id="ep-abc123")
job = await ep.run({"prompt": "hello"})
await job.wait()
print(job.output)

Resource Types

GPU Workers (gpu=):

Type	Use Case
`GpuType.NVIDIA_GEFORCE_RTX_4090`	RTX 4090 (24GB)
`GpuType.NVIDIA_RTX_6000_ADA_GENERATION`	RTX 6000 Ada (48GB)
`GpuType.NVIDIA_A100_80GB_PCIe`	A100 (80GB)

CPU Workers (cpu=):

Type	Specs
`cpu3g-2-8`	2 vCPU, 8GB RAM
`cpu3c-4-8`	4 vCPU, 8GB RAM (Compute)
`cpu5c-4-16`	4 vCPU, 16GB RAM (Latest)

Auto-Scaling

Workers automatically scale based on demand:

workers=(0, 3) - Scale from 0 to 3 workers (cost-efficient)
workers=(1, 5) - Keep 1 warm, scale up to 5
idle_timeout=5 - Minutes before scaling down

Resources

Contributing

See CONTRIBUTING.md for contribution guidelines and DEVELOPMENT.md for development setup.

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.github		.github
.vscode		.vscode
01_getting_started		01_getting_started
02_ml_inference		02_ml_inference
03_advanced_workers		03_advanced_workers
04_scaling_performance		04_scaling_performance
05_data_workflows		05_data_workflows
06_real_world		06_real_world
docs/cli		docs/cli
misc		misc
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CLI-REFERENCE.md		CLI-REFERENCE.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Runpod Flash Examples

What is Flash?

Prerequisites

Quick Start

Examples

CLI Commands

Key Concepts

Endpoint

Resource Types

Auto-Scaling

Resources

Contributing

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Runpod Flash Examples

What is Flash?

Prerequisites

Quick Start

Examples

CLI Commands

Key Concepts

Endpoint

Resource Types

Auto-Scaling

Resources

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages