ICD-10 microservice, Docker stack, and CI improvements#25
Open
MelbourneDeveloper wants to merge 76 commits intomainfrom
Open
ICD-10 microservice, Docker stack, and CI improvements#25MelbourneDeveloper wants to merge 76 commits intomainfrom
MelbourneDeveloper wants to merge 76 commits intomainfrom
Conversation
- Create Samples/ICD10AM folder structure for new microservice - Add comprehensive SPEC.md with: - RAG search feature using MedEmbed-Large-v1 medical embedding model - Basic lookup with JSON and FHIR response formats - Mermaid ER diagram for database schema - API endpoint documentation - PostgreSQL with pgvector for vector similarity search - RLS (Row Level Security) via user impersonation - Add icd10am-schema.yaml with DataProvider YAML migrations for: - ICD-10-AM chapters, blocks, categories, and codes - ACHI procedure blocks and codes - Embedding tables for vector storage - Coding standards and user search history - Add Python import script (import_icd10am.py) to: - Parse IHACPA data files - Generate medical embeddings with MedEmbed - Bulk import into PostgreSQL - Remove "Too Many Cooks" multi-agent section from CLAUDE.md https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Replace all SQL query files with LQL equivalents - Add ICD10AM.Api.csproj with LQL transpilation support - Add DataProvider.json configuration - Add DatabaseSetup.cs and GlobalUsings.cs - All queries now use pipeline syntax: filter, join, select, order_by https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Hierarchical browse: chapters, blocks, categories, codes - Code lookup with JSON and FHIR format support - ACHI procedure endpoints - RAG search with embedding service integration - Cosine similarity ranking for semantic search - LQL transpilation enabled in csproj - Updated DataProvider.json to use .generated.sql files https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Dockerfile using MedEmbed-Small-v0.1 (384 dims, ~500MB) - FastAPI service with /embed and /embed/batch endpoints - docker-compose.yml for easy deployment - Health checks and resource limits configured - Model downloaded at build time for fast startup https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- ICD10AMApiFactory with seeded test data - ChapterEndpointTests: hierarchical browse tests - CodeLookupTests: code search and FHIR format tests - AchiEndpointTests: procedure code tests - HealthEndpointTests: health check tests - Real database, zero mocking https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- generate_sample_data.py creates test SQLite database - Includes common ICD-10-AM codes (infectious, diabetes, cardiac, respiratory, etc.) - Includes sample ACHI procedures (angiography, appendicectomy, hip replacement, etc.) - Note: Full ICD-10-AM data requires licensing from IHACPA https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Update SPEC.md with IHACPA licensing requirements - Add .gitignore for generated files, databases, Python cache - Note: Full ICD-10-AM data requires purchase from IHACPA https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
BREAKING: No more licensed IHACPA data! - Add import_icd10cm.py that downloads FREE data from CMS.gov - Successfully imports 71,704 diagnosis codes - 19 chapters, 1,910 blocks, 1,910 categories - Update SPEC.md to document free data sources - Remove licensing requirements (CMS data is public domain) Data sources: - Primary: https://www.cms.gov/medicare/coding-billing/icd-10-codes - Mirror: GitHub JSON gist for faster downloads https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Fixed syntax error in try/except blocks around IHACPA download - Added CDC ICD-10-CM as fallback when IHACPA returns 503 errors - Uses free US Government CDC data (74,260 codes) which shares WHO ICD-10 base with Australian ICD-10-AM - Script now successfully imports codes when IHACPA is unavailable https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Renamed folder from ICD10AM to ICD10CM (honest about data source) - Simplified import script - CDC data only, no fallback bullshit - 74,260 ICD-10-CM codes from CDC (public domain) - Clean database schema with icd10cm_ table prefix https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Fix DI registration for Func<HttpClient> embedding service - Fix DatabaseSetup to skip initialization if tables exist (for tests) - Remove unsupported 'unique' property from schema indexes - Remove SearchCodes/SearchAchiCodes LQL files (LIKE not supported) - Implement manual SQL search endpoints in Program.cs - Disable AOT in LqlCli.SQLite to avoid missing ILCompiler packages - Update GlobalUsings to remove unused Search result types - Disable NuGet audit in Directory.Build.props for proxy issues https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- generate_embeddings.py: Populates icd10cm_code_embedding table using MedEmbed-small-v0.1 model (384 dimensions) - embedding_service.py: Runtime service for encoding user queries - SPEC.md: Document the 3-step setup process: 1. Import codes (import_icd10cm.py) 2. Generate embeddings (generate_embeddings.py) 3. Start embedding service (embedding_service.py) https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
Updated LQL query to reference icd10cm_code_embedding and icd10cm_code tables where the 74,260 embeddings are stored. Added icd10cm_code and icd10cm_code_embedding table definitions to schema and DataProvider.json. 30 E2E tests passing, RAG semantic search working with MedEmbed model. https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
Replaced Python embedding service with native C# ONNX Runtime: - Added Microsoft.ML.OnnxRuntime and BERTTokenizers NuGet packages - EncodeWithOnnx helper performs tokenization and mean pooling - Updated SPEC.md with download instructions for ONNX model - Model (127MB) excluded from git - download with optimum-cli 30 E2E tests passing, RAG search works without any Python dependency. https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
The BERTTokenizers library requires vocabulary files in a Vocabularies directory. These files are needed for tokenizing query text before encoding with the ONNX model. https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
Documents how to: - Setup database and generate embeddings (one-time Python) - Export ONNX model for C# runtime - Run the API - Run E2E tests - Troubleshoot common issues https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Remove XML doc comments from top-level static functions (CS1587)
- Rewrite await using var as await using (...) {} to support ConfigureAwait(false) (CA2007)
- Update TargetFramework net9.0 -> net10.0 in Directory.Build.props and all project files - Update package versions to 10.x equivalents (Microsoft.Data.Sqlite, Microsoft.AspNetCore.*, Microsoft.Extensions.Logging.Abstractions, System.Text.Json) - Remove System.Text.Json and System.Data.Common explicit references (now inbox in .NET 10) - Update CI DOTNET_VERSION to 10.0.x Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…atibility The h5-compiler 24.x tool targets net9.0 which is not installed locally. Updated to 26.3.64893 (net10.0) in both root and Dashboard.Web local manifests. Updated h5.Target SDK and h5/h5.Core package references to 26.x accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TLDR;
Brief Details
ICD-10 microservice (
Samples/ICD10/): Full ICD-10-AM/ACHI code lookup API backed by PostgreSQL with pgvector. Includes a Python embedding service (embedding-service/) using MedEmbed for semantic RAG search, an import pipeline (scripts/CreateDb/) to seed codes and generate embeddings, and an interactive CLI (ICD10.Cli/).Docker stack (
Samples/docker/):docker-compose.yml+Dockerfile.app/Dockerfile.dashboard+nginx.conffor running Clinical, Scheduling, ICD-10, Gatekeeper, Embedding Service, and Dashboard behind a single compose stack. Scripts reorganised underSamples/scripts/.CI (
.github/workflows/ci.yml): Addedicd10-testsjob (pgvector Postgres + embedded Docker embedding service);docker-buildjob (validates app and dashboard container builds); Postgres service containers added togatekeeper-testsandsample-api-tests;Dashboard.Web.Testsremoved from sample matrix (replaced byDashboard.Integration.Tests).LQL migrations: Clinical and Scheduling
.sqlqueries replaced with equivalent.lqlfiles;filter_likeoperator added to LQL grammar and tested across SQLite, Postgres, and SQL Server.Dashboard (
Dashboard.Web/):ClinicalCodingPage.cs(1500+ lines) adds ICD-10 code browsing/search UI;ApiClient.cscentralises HTTP calls; newIcons.csand CSS components.Tooling: 7 Claude Code skills added under
.claude/skills/;tasks.jsonreorganised with labelled groups;CLAUDE.mdupdated with clearer rules.How Do The Tests Prove This Works?
Samples/ICD10/ICD10.Api.Tests/(new, ~2300 LOC):HealthEndpointTests– asserts the/healthendpoint returns 200 and a healthy status, confirming the API boots and DB migrations ran.ChapterEndpointTests/ChapterCategoryTests– seed specific chapters/blocks/categories, then assert the hierarchy endpoints return the correct counts and codes, proving the schema and queries are correct.CodeLookupTests– look up codes by exact code string, assert properties like description and billability; verifiesGetCodeByCode.lqland the generated extension methods.SearchEndpointTests– full-text and semantic search against seeded codes; asserts result ordering and relevance, exercising the pgvector cosine-similarity path inSearchIcd10Codes.sql.AchiEndpointTests– same pattern for ACHI surgical codes including block/chapter hierarchy.Samples/ICD10/ICD10.Cli.Tests/CliE2ETests.cs(new, ~1170 LOC):Samples/Dashboard/Dashboard.Integration.Tests/Icd10E2ETests.cs(new, ~588 LOC):Lql/Lql.Tests/LqlFileBasedTests.BasicOperations.cs:filter_liketest case added with expected SQL output for all three DB targets (SQLite/filter_like.sql,PostgreSql/filter_like.sql,SqlServer/filter_like.sql), proving the new grammar rule transpiles correctly and platform-independently.