triton-kernels

Star

Here are 16 public repositories matching this topic...

linkedin / Liger-Kernel

Star

Efficient Triton Kernels for LLM Training

triton llama hacktoberfest mistral finetuning llms llm-training llama3 phi3 gemma2 triton-kernels

Updated Feb 24, 2026
Python

flagos-ai / FlagGems

Star

FlagGems is an operator library for large language models implemented in the Triton Language.

pytorch triton triton-kernels

Updated Feb 22, 2026
Python

harleyszhang / llm_note

Star

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

cuda-programming transformer-models kv-cache llm vllm llm-inference triton-kernels

Updated Dec 10, 2025
Python

harleyszhang / lite_llama

Star

A light llama-like llm inference framework based on the triton kernel.

python3 attention llama llm llm-inference llama3 llava-llama3 triton-kernels qwen2-5

Updated Jan 5, 2026
Python

NX-AI / mlstm_kernels

Star

Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.

deep-learning rnn llm xlstm triton-kernels

Updated Feb 18, 2026
Jupyter Notebook

stackav-oss / conch

Star

A "standard library" of Triton kernels.

amd cuda inference nvidia rocm triton-lang vllm bitsandbytes triton-kernels

Updated Oct 2, 2025
Python

WithNucleusAI / manifold-constrained-Hyper-Connections-triton

Star

Manifold-Constrained Hyper-Connections with fused Triton kernels for efficient training

deepseek triton-kernels hyper-connections

Updated Feb 7, 2026
Python

kyolebu / triton-misadventures

Star

Educational resource demonstrating common GPU programming pitfalls and solutions using Triton kernels.

gpu-acceleration triton-lang triton-kernels

Updated Aug 28, 2025
Jupyter Notebook

xmc-aalto / elmo

Star

Official Code for the paper ELMO : Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces (in ICML 2025)

chunking multi-label-classification extreme-classification low-precision-training large-output-space float8 triton-kernels gradient-fusion

Updated Jul 15, 2025
Python

KernelHeim – development ground of custom Triton and CUDA kernel functions designed to optimize and accelerate machine learning workloads on NVIDIA GPUs. Inspired by the mythical stronghold of the gods, KernelHeim is a forge where high-performance kernels are crafted to unlock the full potential of the hardware.

cuda-kernels parallel-programming triton-kernels