Desde web/:
npm run test:e2eOpciones utiles:
npm run test:e2e:headedpara ver el navegador.npm run test:e2e:uipara usar la UI de Playwright.
Notas:
- El runner levanta
viteautomaticamente enhttp://127.0.0.1:5173. - Reporte HTML:
web/playwright-report/. - Artefactos de fallos (screenshots/videos/traces):
web/test-results/.
Base:
uv run python scripts/play_pygame.pyMain flags:
--mode {play,spectate}--agent1 {human,random,heuristic,model}--agent2 {human,random,heuristic,model}--level1 {easy,normal,hard}heuristic level for P1--level2 {easy,normal,hard}heuristic level for P2--level {easy,normal,hard}default level for both (fallback)--ckpt <path>model checkpoint--sims <int>MCTS simulations--device {auto,cpu,cuda}--seed <int>use-1for non-deterministic
Examples:
uv run python scripts/play_pygame.py --mode play --agent1 human --agent2 heuristic --level2 normal
uv run python scripts/play_pygame.py --mode spectate --agent1 heuristic --agent2 heuristic --level1 easy --level2 hard
uv run python scripts/play_pygame.py --mode spectate --agent1 model --agent2 heuristic --level2 hard --ckpt checkpoints/last.ckpt --sims 220Main entrypoint is now:
uv run python train.pytrain_improved.py is kept as compatibility wrapper.
Para evitar ejecutar un GitHub Action durante dias, el flujo esta separado en dos workflows:
train-runpod-start.yml: crea/inicia el pod de entrenamiento y termina rapido.train-runpod-reconcile.yml: revisa estado del pod (manual o por cron) y lo destruye cuando termina.
Secrets requeridos en GitHub:
RUNPOD_API_TOKENHF_TOKENPULUMI_ACCESS_TOKEN
Variable recomendada en GitHub (Repository variables):
RUNPOD_TRAIN_STACK(ejemplo:dieg0code/train)
Iniciar entrenamiento:
gh workflow run train-runpod-start.yml \
--ref main \
-f stack=dieg0code/train \
-f pod_name=ataxx-zero-train \
-f gpu_type_id="NVIDIA GeForce RTX 4090" \
-f cloud_type=SECURE \
-f image_name="runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04" \
-f repository=dieg0code/ataxx-zero \
-f git_ref=main \
-f hf_repo_id=dieg0code/ataxx-zero \
-f hf_run_id=policy_spatial_v1 \
-f train_args="--no-onnx --quiet --devices 1 --strategy auto --num-workers 4 --keep-local-ckpts 2 --keep-log-versions 1 --hf --iterations 40 --episodes 70 --sims 600 --epochs 5 --batch-size 512 --lr 9e-4 --weight-decay 1e-4 --save-every 3 --opp-self 0.45 --opp-heuristic 0.50 --opp-random 0.05 --opp-heu-easy 0.00 --opp-heu-normal 0.25 --opp-heu-hard 0.75 --model-swap-prob 0.5 --selfplay-workers 8 --monitor-log-every 3"Reconciliar manualmente (si quieres forzar chequeo/destruccion):
gh workflow run train-runpod-reconcile.yml \
--ref main \
-f stack=dieg0code/trainNotas operativas:
- El pod sigue entrenando en RunPod aunque el Action termine.
- El workflow de reconcile destruye pods en estado terminal para cortar cobro.
- Checkpoints HF se separan por
--hf-run-idpara no mezclar lineas de modelo.
Use dependency groups so each environment installs only what it needs.
- Base only:
uv sync- API runtime (+dev tools):
uv sync --group api --group dev- Training (+dev tools):
uv sync --group train --group dev- Pygame UI (+dev tools):
uv sync --group ui --group dev- ONNX export (+dev tools):
uv sync --group export --group dev- Full environment (all groups):
uv sync --all-groupsTraining flags:
--iterations <int>--episodes <int>episodes per iteration--sims <int>MCTS simulations--epochs <int>--batch-size <int>--lr <float>--weight-decay <float>--save-every <int>--seed <int>--checkpoint-dir <path>--log-dir <path>--onnx-path <path>--no-onnxdisable ONNX export at checkpoint time--quietless console output (recommended for Kaggle)--keep-local-ckpts <int>local manual checkpoints to keep--keep-log-versions <int>TensorBoard versions to keep--devices <int>trainer devices (GPUs/accelerator processes)--strategy <name>Lightning strategy (auto,ddp, etc.)--num-workers <int>workers para DataLoader--persistent-workersmantiene workers vivos entre épocas (sinum-workers > 0)--no-persistent-workersdesactiva lo anterior--strict-probsvalida que los porcentajes sumen 1.0 exacto--no-evaldesactiva evaluacion periodica--eval-every <int>cada cuantas iteraciones evaluar--eval-games <int>numero de partidas de evaluacion--eval-sims <int>simulaciones MCTS durante evaluacion--eval-heuristic-level {easy,normal,hard}rival heuristico para evaluacion--opp-self <float>peso de oponenteself(modelo vs sí mismo)--opp-heuristic <float>peso de oponente heurístico--opp-random <float>peso de oponente aleatorio--opp-heuristic-level {easy,normal,hard}nivel del heurístico en el pool--opp-heu-easy <float>peso deeasydentro del pool heurístico--opp-heu-normal <float>peso denormaldentro del pool heurístico--opp-heu-hard <float>peso deharddentro del pool heurístico--model-swap-prob <float>probabilidad de cambiar de lado (P1/P2) por episodio--verbose--hfenable Hugging Face upload--hf-repo-id <org_or_user/repo>
Examples:
Quick smoke run:
uv run python train.py --iterations 2 --episodes 8 --epochs 1 --sims 80 --batch-size 64 --save-every 1 --verboseKaggle clean run (low logs + auto cleanup):
uv run python train.py --no-onnx --quiet --keep-local-ckpts 2 --keep-log-versions 1 --iterations 20 --episodes 50 --sims 300 --epochs 4 --batch-size 96 --lr 1e-3 --weight-decay 1e-4 --save-every 3Kaggle 2x T4 (use both GPUs):
uv run python train.py --no-onnx --quiet --devices 2 --strategy ddp --keep-local-ckpts 2 --keep-log-versions 1 --hf --hf-repo-id your_user/ataxx-zero --iterations 40 --episodes 70 --sims 420 --epochs 5 --batch-size 96 --lr 9e-4 --weight-decay 1e-4 --save-every 3Kaggle estable con opponent pool (recomendado):
uv run python train.py --no-onnx --quiet --devices 1 --strategy auto --keep-local-ckpts 2 --keep-log-versions 1 --hf --hf-repo-id your_user/ataxx-zero --iterations 40 --episodes 70 --sims 420 --epochs 5 --batch-size 96 --lr 9e-4 --weight-decay 1e-4 --save-every 3 --opp-self 0.80 --opp-heuristic 0.15 --opp-random 0.05 --opp-heu-easy 0.20 --opp-heu-normal 0.50 --opp-heu-hard 0.30 --model-swap-prob 0.5Kaggle estable + evaluacion automatica + best checkpoint:
uv run python train.py --no-onnx --quiet --devices 1 --strategy auto --num-workers 3 --persistent-workers --keep-local-ckpts 2 --keep-log-versions 1 --hf --hf-repo-id your_user/ataxx-zero --iterations 40 --episodes 70 --sims 420 --epochs 5 --batch-size 96 --lr 9e-4 --weight-decay 1e-4 --save-every 3 --strict-probs --eval-every 3 --eval-games 12 --eval-sims 220 --eval-heuristic-level hard --opp-self 0.85 --opp-heuristic 0.12 --opp-random 0.03 --opp-heu-easy 0.05 --opp-heu-normal 0.20 --opp-heu-hard 0.75 --model-swap-prob 0.5If your environment is missing ONNX tooling, use:
uv run python train.py --no-onnx ...Standard local run:
uv run python train.py --iterations 20 --episodes 50 --epochs 5 --sims 400 --batch-size 128 --lr 1e-3 --weight-decay 1e-4With Hugging Face checkpoint upload:
# set token first (PowerShell)
$env:HF_TOKEN="your_token_here"
uv run python train.py --hf --hf-repo-id your_user/ataxx-zero --save-every 5Use this when you have a good GPU session and want best quality per run.
uv run python train.py \
--iterations 40 \
--episodes 120 \
--sims 600 \
--epochs 8 \
--batch-size 128 \
--lr 8e-4 \
--weight-decay 1e-4 \
--save-every 5 \
--verbose \
--hf --hf-repo-id your_user/ataxx-zeroGood quality/speed balance for regular experimentation.
uv run python train.py \
--iterations 20 \
--episodes 50 \
--sims 300 \
--epochs 4 \
--batch-size 128 \
--lr 1e-3 \
--weight-decay 1e-4 \
--save-every 5Use this to verify pipeline, logging, checkpoints, and no-NaN behavior quickly.
uv run python train.py \
--iterations 3 \
--episodes 10 \
--sims 80 \
--epochs 1 \
--batch-size 64 \
--save-every 1 \
--verboseWhen you want stronger targets (MCTS) but cheaper gradient updates.
uv run python train.py \
--iterations 18 \
--episodes 70 \
--sims 500 \
--epochs 3 \
--batch-size 96 \
--lr 9e-4 \
--save-every 3If you already have HF checkpoints and only want to continue training.
$env:HF_TOKEN="your_token_here"
uv run python train.py \
--iterations 30 \
--episodes 40 \
--sims 250 \
--epochs 3 \
--hf --hf-repo-id your_user/ataxx-zeroNotes for Colab/Kaggle:
- If runtime time is limited, prioritize lowering
--episodesfirst, then--iterations. --simshas the biggest impact on self-play quality and runtime.- If you hit memory limits, lower
--batch-sizeto96or64. - Keep
--save-everysmall (3to5) when using temporary sessions.
Best choice: decent-but-fast profile.
uv run python train.py \
--iterations 16 \
--episodes 35 \
--sims 220 \
--epochs 3 \
--batch-size 96 \
--lr 1e-3 \
--save-every 4Best choice: strong self-play with moderate epochs.
uv run python train.py \
--iterations 24 \
--episodes 60 \
--sims 380 \
--epochs 4 \
--batch-size 128 \
--lr 9e-4 \
--save-every 4Best choice: pro mode.
uv run python train.py \
--iterations 45 \
--episodes 130 \
--sims 700 \
--epochs 8 \
--batch-size 192 \
--lr 8e-4 \
--weight-decay 1e-4 \
--save-every 5 \
--hf --hf-repo-id your_user/ataxx-zeroQuick rule:
T4: prioritize shorter runs and checkpoint often.L4: use as default if available.A100: maximize self-play quality (--sims,--episodes) and larger batch.
Install API environment:
uv sync --group api --group devRun server:
uv run uvicorn api.app:app --app-dir src --host 0.0.0.0 --port 8000 --reloadWeb UI (browser):
http://127.0.0.1:8000/webThe web UI is a first playable version (Human P1 vs AI P2) and calls:
POST /api/v1/gameplay/movefor AI decisions.
Scaffold location:
web/Install dependencies:
cd web
npm installRun in development:
npm run devBuild production assets:
npm run buildEste repo despliega la app como servicio unico (FastAPI + frontend estatico) usando el
Dockerfile de la raiz y GitHub Actions con Railway CLI.
Workflow:
.github/workflows/deploy-railway-app.yml
Se dispara automaticamente en push a main/master cuando cambian:
src/**web/**Dockerfilepyproject.tomluv.lockalembic/**alembic.ini
Secrets requeridos en GitHub:
RAILWAY_TOKENRAILWAY_PROJECT_IDRAILWAY_ENVIRONMENT_IDRAILWAY_SERVICE_ID
Comando usado por el workflow:
railway up --ci --project $RAILWAY_PROJECT_ID --environment $RAILWAY_ENVIRONMENT_ID --service $RAILWAY_SERVICE_IDPuedes hacerlo sin buscar menus en UI:
- Crea proyecto/servicio una vez en Railway (si aun no existe).
- Saca IDs por CLI local:
railway login
railway link
railway status- Guarda
RAILWAY_TOKEN,RAILWAY_PROJECT_ID,RAILWAY_ENVIRONMENT_ID,RAILWAY_SERVICE_IDen GitHub Secrets.
Notas:
.railwayignorereduce el contexto que se sube en deploy.- El contenedor final sirve API y frontend estatico en el mismo dominio.
- Pulumi se mantiene para infraestructura de entrenamiento en RunPod.
Design direction:
- Brand:
underbyteLabs - ataxx-zero - Mobile-first layout
- Public ranking
- Multi-skin theme system:
terminal-neo,amber-crt,oxide-red
Build API image (multi-stage, runtime target):
docker build -t ataxx-api:latest --target runtime .Run API + Postgres with compose:
docker compose up --buildAPI will be available at:
http://127.0.0.1:8000Model checkpoint handling in Docker:
- Default compose mounts local
./checkpointsinto container as read-only. - API expects checkpoint at
MODEL_CHECKPOINT_PATH(default/app/checkpoints/last.ckpt). - If checkpoint is missing, inference endpoints return
503.
Optional: bake checkpoint into image:
docker build -t ataxx-api:with-model --target runtime-with-model .Then run without checkpoint volume (or keep it mounted).
Optional: bake ONNX into image:
docker build -t ataxx-api:with-onnx --target runtime-with-onnx .Health check:
curl http://127.0.0.1:8000/healthReadiness check (includes DB connectivity):
curl http://127.0.0.1:8000/health/readyCORS configuration (via .env):
APP_CORS_ORIGINS=["http://localhost:5173"]
APP_CORS_ALLOW_CREDENTIALS=true
APP_CORS_ALLOW_METHODS=["*"]
APP_CORS_ALLOW_HEADERS=["*"]Observability configuration (via .env):
APP_LOG_LEVEL="INFO"
APP_LOG_JSON=true
APP_LOG_REQUESTS=trueWhen APP_LOG_REQUESTS=true, each request logs method/path/status/duration/request_id.
Alembic is configured in this repo for SQLModel metadata under src/api/db/models.
Install dependencies:
uv sync --group api --group devCheck migration status:
uv run alembic current
uv run alembic headsApply all migrations:
uv run alembic upgrade headCreate a new migration after changing models:
uv run alembic revision --autogenerate -m "describe change"Rollback one migration:
uv run alembic downgrade -1PowerShell shortcut script:
.\scripts\db.ps1 up
.\scripts\db.ps1 down
.\scripts\db.ps1 new "add user profile fields"
.\scripts\db.ps1 current
.\scripts\db.ps1 headsNotes:
- Alembic reads DB connection from
.envthroughapi.config.settings. - Use migrations (
alembic upgrade) for shared/prod DB. init_db()remains useful for isolated tests/local ephemeral DB only.
List endpoints now use a common paginated shape:
{
"items": [],
"total": 0,
"limit": 20,
"offset": 0,
"has_more": false
}Supported list endpoints:
GET /api/v1/gameplay/games?limit=20&offset=0GET /api/v1/training/samples?limit=100&offset=0GET /api/v1/model-versions?limit=50&offset=0GET /api/v1/ranking/leaderboard/{season_id}?limit=100&offset=0GET /api/v1/identity/users?limit=50&offset=0(admin)
Examples:
curl "http://127.0.0.1:8000/api/v1/gameplay/games?limit=10&offset=0" -H "Authorization: Bearer <ACCESS_TOKEN>"
curl "http://127.0.0.1:8000/api/v1/training/samples?limit=25&offset=25&split=train"
curl "http://127.0.0.1:8000/api/v1/model-versions?limit=10&offset=0"
curl "http://127.0.0.1:8000/api/v1/ranking/leaderboard/<SEASON_ID>?limit=20&offset=0"
curl "http://127.0.0.1:8000/api/v1/identity/users?limit=10&offset=0" -H "Authorization: Bearer <ADMIN_ACCESS_TOKEN>"Pagination behavior:
limitis clamped per endpoint for safety.offsetstarts at0.has_more=truemeans you can request the next page withoffset + limit.
Register:
curl -X POST "http://127.0.0.1:8000/api/v1/auth/register" \
-H "Content-Type: application/json" \
-d "{\"username\":\"diego\",\"email\":\"diego@example.com\",\"password\":\"supersecret123\"}"Login (returns access_token + refresh_token):
curl -X POST "http://127.0.0.1:8000/api/v1/auth/login" \
-H "Content-Type: application/json" \
-d "{\"username_or_email\":\"diego\",\"password\":\"supersecret123\"}"Sample login response:
{
"access_token": "eyJhbGciOiJIUzI1NiIs...",
"refresh_token": "eyJhbGciOiJIUzI1NiIs...",
"token_type": "bearer",
"expires_in": 1800
}Get current user (/me) with access token:
curl "http://127.0.0.1:8000/api/v1/auth/me" \
-H "Authorization: Bearer <ACCESS_TOKEN>"Refresh tokens:
curl -X POST "http://127.0.0.1:8000/api/v1/auth/refresh" \
-H "Content-Type: application/json" \
-d "{\"refresh_token\":\"<REFRESH_TOKEN>\"}"Logout (revoke refresh token):
curl -X POST "http://127.0.0.1:8000/api/v1/auth/logout" \
-H "Content-Type: application/json" \
-d "{\"refresh_token\":\"<REFRESH_TOKEN>\"}"Notes:
- Send
Authorization: Bearer <ACCESS_TOKEN>to protected endpoints. - After
refresh, prefer replacing both stored tokens. logoutrevokes refresh token; current access token remains valid until expiry.loginandrefreshare rate-limited (returns429+Retry-Afterheader).
Auth error examples (standard error envelope):
401 Unauthorized (missing/invalid token):
{
"error_code": "unauthorized",
"message": "Not authenticated",
"detail": "Not authenticated",
"request_id": "req-123"
}403 Forbidden (insufficient permissions):
{
"error_code": "forbidden",
"message": "Admin privileges required.",
"detail": "Admin privileges required.",
"request_id": "req-456"
}422 Validation Error (invalid payload):
{
"error_code": "validation_error",
"message": "Validation failed",
"detail": "Validation failed",
"request_id": "req-789",
"details": [
{
"type": "missing",
"loc": ["body", "password"],
"msg": "Field required",
"input": {}
}
]
}Gameplay/Matches error examples:
400 Bad Request (invalid board / illegal move):
{
"error_code": "bad_request",
"message": "Illegal move for current board state.",
"detail": "Illegal move for current board state.",
"request_id": "req-101"
}403 Forbidden (not participant / not your turn):
{
"error_code": "forbidden",
"message": "It is not your turn.",
"detail": "It is not your turn.",
"request_id": "req-102"
}404 Not Found (game/match/sample/version missing):
{
"error_code": "not_found",
"message": "Game not found: <uuid>",
"detail": "Game not found: <uuid>",
"request_id": "req-103"
}Create a game (needed for sample FK):
curl -X POST http://127.0.0.1:8000/api/v1/gameplay/games -H "Content-Type: application/json" -d "{}"Create one training sample:
curl -X POST http://127.0.0.1:8000/api/v1/training/samples -H "Content-Type: application/json" -d "{\"game_id\":\"<GAME_ID>\",\"ply\":0,\"player_side\":\"p1\",\"observation\":{\"grid\":[[0,0,0,0,0,0,0],[0,0,0,0,0,0,0],[0,0,0,0,0,0,0],[0,0,0,0,0,0,0],[0,0,0,0,0,0,0],[0,0,0,0,0,0,0],[0,0,0,0,0,0,0]],\"current_player\":1},\"policy_target\":{\"10\":1.0},\"value_target\":1.0,\"sample_weight\":1.0,\"split\":\"train\",\"source\":\"self_play\"}"List samples:
curl "http://127.0.0.1:8000/api/v1/training/samples?limit=50&split=train"Samples stats:
curl "http://127.0.0.1:8000/api/v1/training/samples/stats?split=train"Export samples as NDJSON:
curl "http://127.0.0.1:8000/api/v1/training/samples/export.ndjson?split=train&limit=500" -o training_samples.ndjsonExport samples as NPZ:
curl "http://127.0.0.1:8000/api/v1/training/samples/export.npz?split=train&limit=500" -o training_samples.npzIngest samples from a finished game:
curl -X POST "http://127.0.0.1:8000/api/v1/training/samples/ingest-game/<GAME_ID>?split=train&source=self_play&overwrite=true"Install ONNX tooling:
uv sync --group api --group export --group devExport final checkpoint to ONNX:
uv run python scripts/export_model_onnx.py \
--checkpoint checkpoints/last.ckpt \
--output checkpoints/last.onnxValidate torch vs ONNX parity:
uv run python scripts/check_onnx_parity.py \
--checkpoint checkpoints/last.ckpt \
--onnx checkpoints/last.onnx \
--samples 32 \
--policy-tol 2e-3 \
--value-tol 2e-3Enable ONNX-first inference in API (.env):
MODEL_CHECKPOINT_PATH="checkpoints/last.ckpt" # fallback + strong mode (MCTS)
MODEL_ONNX_PATH="checkpoints/last.onnx" # fast mode preferred backend
INFERENCE_PREFER_ONNX=trueRuntime behavior:
fastmode: uses ONNX first; falls back to torch checkpoint if ONNX fails.strongmode: uses torch+MCTS; if torch checkpoint is unavailable, it degrades tofast.