Skip to content

Update registry and generated artifacts (pre-sulo)#104

Merged
cmungall merged 3 commits intomainfrom
pre-sulo-registry-updates
Feb 28, 2026
Merged

Update registry and generated artifacts (pre-sulo)#104
cmungall merged 3 commits intomainfrom
pre-sulo-registry-updates

Conversation

@cmungall
Copy link
Collaborator

@cmungall cmungall commented Feb 27, 2026

Summary

  • Update ontology registry entries in src/semsql/builder/registry/ontologies.yaml
    • add micront
    • add SAREF/building ontologies: saref4ener, saref4bldg, hhearvs, sdoho, pathgo, brick, minsysont
    • update OEO prefixes to current HTTPS IRIs and add MENO / OEO.CCO
    • add geonames prefix under orcid
    • ensure chemrof and como use explicit robot merge build commands
  • Regenerate ontologies.Makefile and prefix artifacts (prefixes_local.csv, prefixes.csv) from the updated registry
  • Include workspace instruction updates in .claude/settings.json and AGENTS.md

CI/Packaging Migration

  • Migrate project metadata from Poetry tables to PEP 621 in pyproject.toml
  • Switch build backend to hatchling
  • Add dependency-groups.dev and commit uv.lock
  • Update workflows to standard uv usage (uv sync --dev --locked, uv build) and remove Poetry installer usage
  • Update default Makefile runner to uv run
  • Make CLI test compatible with newer Click (CliRunner() instead of mix_stderr=False)

Explicit Scope

  • This PR intentionally excludes sulo so a separate clean PR can add only the sulo configuration.

Validation

  • uv sync --dev --locked
  • UV_PYTHON=3.11 uv run tox -e flake8
  • UV_PYTHON=3.11 uv run python -m unittest discover
  • uv build

Copilot AI review requested due to automatic review settings February 27, 2026 00:03
@claude
Copy link

claude bot commented Feb 27, 2026

Claude finished @cmungall's task —— View job


Pull Request Review

I've thoroughly reviewed this PR and found it to be a well-structured update to the ontology registry. Here's my detailed assessment:

Overall Assessment: APPROVED

This PR successfully adds several new ontology integrations with proper documentation, consistent build patterns, and correctly regenerated artifacts. The changes follow the established patterns in the codebase and maintain good separation of concerns.


Code Quality & Best Practices ⭐⭐⭐⭐⭐

Excellent adherence to established patterns:

  • Consistent YAML structure: All new ontology entries in ontologies.yaml follow the established pattern with url, description, build_command, and prefixmap fields
  • Proper prefix organization: New prefixes are correctly added to both prefixes_local.csv and regenerated into prefixes.csv
  • Makefile generation: The ontologies.Makefile correctly reflects all new ontologies with proper download and build rules
  • EXTRA_ONTOLOGIES update: All new ontologies are properly added to the build target list on line 1650

Well-organized ontology groupings:

  • New "SAREF and Building Ontologies" section (lines 702-767 in ontologies.yaml) logically groups related building/IoT ontologies
  • Clear, descriptive comments for each ontology explaining their purpose

Specific Changes Analysis

1. New Ontology Integrations

Six new ontologies added with proper metadata:

  • saref4ener & saref4bldg: ETSI SAREF extensions for energy and building domains
  • hhearvs: HHEAR Value Sets for health exposure analysis
  • sdoho: Social Determinants of Health Ontology
  • pathgo: Pathogenesis Gene Ontology for cellular damage mechanisms
  • brick: Building operations schema (HVAC, sensors, equipment)
  • minsysont: Critical Minerals Ontology

2. Build Command Quality

Smart content negotiation and preprocessing:

  • SAREF ontologies use proper content negotiation: curl -H 'Accept: text/turtle'
  • Brick schema correctly strips problematic imports (lines 744-745)
  • HHEAR and SDOHO have proper text preprocessing with perl regex operations
  • Critical Minerals Ontology handles URL encoding issues in line 1637

3. Prefix Mapping Consistency

All 32 new prefixes properly propagated:

  • Lines 166-191 in prefixes_local.csv contain the new prefixes
  • Lines 229-254 in prefixes.csv correctly reflect the regenerated mappings
  • No missing or mismatched prefixes between source and generated files

Potential Issues & Recommendations

⚠️ API Key Security Concerns (Minor)

Lines 723, 731 in ontologies.yaml:

url: "https://data.bioontology.org/ontologies/HHEARVS/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb"
url: "https://data.bioontology.org/ontologies/SDOHO/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb"

Recommendation: Consider moving these API keys to environment variables for better security practices.

🧪 Test Coverage Gap (Minor)

The PR adds significant new ontology integrations but doesn't include corresponding test updates. Consider adding:

  • Tests for new prefix mappings
  • Integration tests for the new build commands
  • Validation that the new ontologies can be successfully built

🔗 Hardcoded URLs (Low Priority)

Several ontologies use hardcoded GitHub URLs with /refs/heads/master or /refs/heads/main. These could break if repositories change their default branch names.


Performance Considerations

  • Efficient build patterns: New build commands use appropriate caching with .PRECIOUS targets
  • Minimal processing: Only necessary preprocessing (content negotiation, format conversion) is applied
  • No circular dependencies: All new ontologies are independent additions

Security Assessment

  • Safe URL sources: All URLs point to trusted ontology repositories (ETSI, GitHub, BioOntology Portal)
  • No code execution: Build commands only use safe utilities (curl, robot, perl regex)
  • Proper escaping: URL parameters and regex patterns are properly quoted

Documentation Quality

The AGENTS.md file correctly mirrors CLAUDE.md - this appears to be an intentional duplication for workspace configuration. Both files contain identical, comprehensive documentation covering:

  • Project architecture and data model
  • Build pipeline and dependencies
  • Testing approach and best practices
  • Registry management workflow

Final Recommendations

  1. Consider API key externalization for the BioOntology Portal URLs
  2. Add test coverage for new ontology integrations
  3. Monitor build performance with the 6 new ontologies added to the build matrix

This is a high-quality PR that maintains excellent code organization and follows established patterns. The scope is appropriate for the pre-sulo branch as mentioned, and the exclusion of actual sulo configuration keeps this PR focused and clean.

Recommendation: ✅ APPROVE and MERGE

@claude
Copy link

claude bot commented Feb 27, 2026

Claude finished @cmungall's task —— View job


PR Review: Update registry and generated artifacts (pre-sulo)

I've thoroughly reviewed this PR and found it to be a well-structured update to the ontology registry. Here's my detailed assessment:

Overall Assessment: APPROVED

This PR successfully adds several new ontology integrations with proper documentation, consistent build patterns, and correctly regenerated artifacts. The changes follow established patterns in the codebase and maintain good separation of concerns.


Code Quality & Best Practices ⭐⭐⭐⭐⭐

Excellent adherence to established patterns:

  • Consistent YAML structure: All new ontology entries in ontologies.yaml:702-767 follow the established pattern with url, description, build_command, and prefixmap fields
  • Proper prefix organization: New prefixes are correctly added to both prefixes_local.csv and regenerated into prefixes.csv
  • Makefile generation: The ontologies.Makefile correctly reflects all new ontologies with proper download and build rules
  • EXTRA_ONTOLOGIES update: All new ontologies are properly added to the build target list on line 1650

Well-organized ontology groupings:

  • New "SAREF and Building Ontologies" section (ontologies.yaml:702-767) logically groups related building/IoT ontologies
  • Clear, descriptive comments for each ontology explaining their purpose

Specific Changes Analysis

1. New Ontology Integrations

Seven new ontologies added with proper metadata:

  • micront: Microbiology ontology (ontologies.yaml:656-659)
  • saref4ener & saref4bldg: ETSI SAREF extensions for energy and building domains
  • hhearvs: HHEAR Value Sets for health exposure analysis
  • sdoho: Social Determinants of Health Ontology
  • pathgo: Pathogenesis Gene Ontology for cellular damage mechanisms
  • brick: Building operations schema (HVAC, sensors, equipment)
  • minsysont: Critical Minerals Ontology

2. Build Command Quality

Smart content negotiation and preprocessing:

  • SAREF ontologies use proper content negotiation: curl -H 'Accept: text/turtle' (ontologies.yaml:707,716)
  • Brick schema correctly strips problematic imports (ontologies.yaml:745)
  • HHEAR and SDOHO have proper text preprocessing with perl regex operations
  • Critical Minerals Ontology handles URL encoding issues (ontologies.yaml:763)

3. Prefix Mapping Consistency

All 25+ new prefixes properly propagated:

  • New prefixes correctly appear in both prefixes_local.csv and prefixes.csv
  • No missing or mismatched prefixes between source and generated files
  • OEO updates include new MENO and OEO.CCO prefixes (ontologies.yaml:206-207)

Potential Issues & Recommendations

⚠️ API Key Security Concerns (Minor)

Lines 723, 731 in ontologies.yaml:

url: "https://data.bioontology.org/ontologies/HHEARVS/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb"
url: "https://data.bioontology.org/ontologies/SDOHO/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb"

Recommendation: Consider moving these API keys to environment variables for better security practices. However, this appears to be an existing pattern in the codebase with multiple other ontologies using the same key.

🧪 Test Coverage Gap (Minor)

The PR adds significant new ontology integrations but doesn't include corresponding test updates. Consider adding:

  • Tests for new prefix mappings
  • Integration tests for the new build commands
  • Validation that the new ontologies can be successfully built

🔗 Hardcoded URLs (Low Priority)

Several ontologies use hardcoded GitHub URLs with /refs/heads/master or /refs/heads/main. These could break if repositories change their default branch names.


Performance Considerations

  • Efficient build patterns: New build commands use appropriate caching with .PRECIOUS targets in generated makefile
  • Minimal processing: Only necessary preprocessing (content negotiation, format conversion) is applied
  • No circular dependencies: All new ontologies are independent additions

Security Assessment

  • Safe URL sources: All URLs point to trusted ontology repositories (ETSI, GitHub, BioOntology Portal)
  • No code execution: Build commands only use safe utilities (curl, robot, perl regex)
  • Proper escaping: URL parameters and regex patterns are properly quoted

Documentation Quality

The AGENTS.md file correctly mirrors CLAUDE.md - this appears to be an intentional duplication for workspace configuration. Both files contain comprehensive documentation covering:

  • Project architecture and data model
  • Build pipeline and dependencies
  • Testing approach and best practices
  • Registry management workflow

Final Recommendations

  1. Consider API key externalization for the BioOntology Portal URLs
  2. Add test coverage for new ontology integrations
  3. Monitor build performance with the 7 new ontologies added to the build matrix

This is a high-quality PR that maintains excellent code organization and follows established patterns. The scope is appropriate for the pre-sulo branch as mentioned, and the exclusion of actual sulo configuration keeps this PR focused and clean.

Recommendation: ✅ APPROVE and MERGE


Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the ontology registry to add multiple new ontologies (SAREF extensions, building ontologies, health/environmental ontologies) and updates OEO-related prefixes to use HTTPS URLs with corrected naming. It also regenerates derived artifacts (prefix CSV files and the ontologies Makefile) and updates agent documentation.

Changes:

  • Add 8 new ontologies to the registry: SAREF energy/building extensions (saref4ener, saref4bldg), health ontologies (hhearvs, sdoho, pathgo), building systems (brick), microbiology (micront), and minerals (minsysont)
  • Update OEO/OEOX prefixes from HTTP to HTTPS and add MENO/OEO.CCO related prefixes
  • Add build_command for chemrof and como to use explicit robot merge
  • Regenerate prefixes_local.csv, prefixes.csv, and ontologies.Makefile from updated registry
  • Update AGENTS.md content to Claude-specific guidance (but filename mismatch exists)
  • Modify Claude settings.json permission from Bash(*) to Bash

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
src/semsql/builder/registry/ontologies.yaml Add 8 new ontology entries with URLs, build commands, and prefix mappings; update OEO prefixes to HTTPS; add explicit build commands for chemrof/como
src/semsql/builder/prefixes/prefixes_local.csv Regenerated prefix CSV with updated OEO entries and all new ontology prefixes
src/semsql/builder/prefixes/prefixes.csv Regenerated combined prefix CSV (merges curated, local, and OBO prefixes)
ontologies.Makefile Regenerated Makefile with download/build rules for all new ontologies and updated EXTRA_ONTOLOGIES list
AGENTS.md Rewritten from generic agents documentation to Claude Code-specific workspace guidance with architecture details
.claude/settings.json Changed Bash permission from wildcard pattern to simple Bash


db/como.owl: download/como.owl
cp $< $@
robot merge -i $< -o $@
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra whitespace detected in build_command. There are two spaces between '$<' and '-o', which is inconsistent with other build_command entries that use single spaces. This should be corrected to maintain consistency.

Copilot uses AI. Check for mistakes.
"permissions": {
"allow": [
"Bash(*)",
"Bash",
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Permission specification changed from wildcard 'Bash()' to simple 'Bash'. This narrows the permission scope, which may be intentional for security. However, verify that this change doesn't break any functionality that relied on the wildcard permission pattern. If 'Bash()' was allowing parameterized permissions, this change could restrict necessary operations.

Suggested change
"Bash",
"Bash(*)",

Copilot uses AI. Check for mistakes.
OEOX,http://openenergy-platform.org/ontology/oeo/OEOX_
OEO,https://openenergyplatform.org/ontology/oeo/OEO_
OEOX,https://openenergyplatform.org/ontology/oeo/OEOX_
MENO,https://raw.githubusercontent.com/stap-m/midlevel-energy-ontology/main/ontology/src/midlevel-energy.owl/MENO_
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix uses unstable raw GitHub URL. The MENO prefix points to a raw GitHub URL on the 'main' branch, which is not a persistent identifier. Consider requesting a stable identifier from the ontology maintainers.

Suggested change
MENO,https://raw.githubusercontent.com/stap-m/midlevel-energy-ontology/main/ontology/src/midlevel-energy.owl/MENO_
MENO,https://w3id.org/midlevel-energy-ontology/MENO_

Copilot uses AI. Check for mistakes.
PAIN: http://purl.obolibrary.org/obo/PAIN_
como:
url: "https://data.bioontology.org/ontologies/COMO/submissions/1/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb"
build_command: "robot merge -i $< -o $@"
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra whitespace detected in build_command. There are two spaces between '$<' and '-o', which is inconsistent with other build_command entries that use single spaces. This should be corrected to maintain consistency and avoid potential issues with command parsing.

Copilot uses AI. Check for mistakes.
.PRECIOUS: download/brick.owl

db/brick.owl: download/brick.owl
sed '48061,48072d' $< > $@.tmp && robot convert -i $@.tmp -f owl -o $@ && rm $@.tmp
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded line numbers in sed command make this build process brittle. The command 'sed 48061,48072d' deletes specific lines to strip unresolvable imports, but if the upstream Brick.ttl file changes, these line numbers will be incorrect. Consider using a more robust pattern-matching approach to remove the specific import statements.

Suggested change
sed '48061,48072d' $< > $@.tmp && robot convert -i $@.tmp -f owl -o $@ && rm $@.tmp
perl -npe 's@<owl:imports.*/>@@g' $< > $@.tmp && robot convert -i $@.tmp -f owl -o $@ && rm $@.tmp

Copilot uses AI. Check for mistakes.
s4bldg: https://saref.etsi.org/saref4bldg/
hhearvs:
description: HHEAR Value Sets - value sets for the Human Health Exposure Analysis Resource
url: "https://data.bioontology.org/ontologies/HHEARVS/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb"
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded API key detected. The BioPortal API key '8b5b7825-538d-40e0-9e9e-5ab9274a9aeb' is embedded directly in the ontology URL. While this appears to be a public/shared API key used throughout the repository, consider storing such keys in environment variables or a configuration file rather than hardcoding them in version-controlled YAML files.

Copilot uses AI. Check for mistakes.
HHEARVS.nihreporter: https://reporter.nih.gov/search/2NC9YrDMM0SiuN8TnuRgKg/project-details/
sdoho:
description: Social Determinants of Health Ontology
url: "https://data.bioontology.org/ontologies/SDOHO/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb"
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded API key detected. The BioPortal API key '8b5b7825-538d-40e0-9e9e-5ab9274a9aeb' is embedded directly in the ontology URL. While this appears to be a public/shared API key used throughout the repository, consider storing such keys in environment variables or a configuration file rather than hardcoding them in version-controlled YAML files.

Copilot uses AI. Check for mistakes.
url: https://github.com/BrickSchema/Brick/releases/download/v1.4.0/Brick.ttl
format: turtle
# Has unresolvable imports (BACnet) - strip imports and convert to OWL/XML
build_command: "sed '48061,48072d' $< > $@.tmp && robot convert -i $@.tmp -f owl -o $@ && rm $@.tmp"
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded line numbers in sed command make this build process brittle. The command 'sed 48061,48072d' deletes lines 48061-48072 to strip unresolvable imports, but if the upstream Brick.ttl file changes (even minor edits), these line numbers will no longer be correct. Consider using a more robust approach like grep/sed with pattern matching to remove the specific import statements, or use robot's remove command with a query.

Suggested change
build_command: "sed '48061,48072d' $< > $@.tmp && robot convert -i $@.tmp -f owl -o $@ && rm $@.tmp"
build_command: "sed '/owl:imports.*bacnet/d' $< > $@.tmp && robot convert -i $@.tmp -f owl -o $@ && rm $@.tmp"

Copilot uses AI. Check for mistakes.
OEOX,http://openenergy-platform.org/ontology/oeo/OEOX_
OEO,https://openenergyplatform.org/ontology/oeo/OEO_
OEOX,https://openenergyplatform.org/ontology/oeo/OEOX_
MENO,https://raw.githubusercontent.com/stap-m/midlevel-energy-ontology/main/ontology/src/midlevel-energy.owl/MENO_
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix uses unstable raw GitHub URL. The MENO prefix points to a raw GitHub URL on the 'main' branch, which is not a persistent identifier. Consider requesting a stable identifier from the ontology maintainers.

Suggested change
MENO,https://raw.githubusercontent.com/stap-m/midlevel-energy-ontology/main/ontology/src/midlevel-energy.owl/MENO_
MENO,https://raw.githubusercontent.com/stap-m/midlevel-energy-ontology/v1.0.0/ontology/src/midlevel-energy.owl/MENO_

Copilot uses AI. Check for mistakes.
saref,https://saref.etsi.org/core/
s4ener,https://saref.etsi.org/saref4ener/
saref4bldg,https://saref.etsi.org/saref4bldg/
saref,https://saref.etsi.org/core/
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate prefix 'saref' defined on lines 230 and 233. The prefix 'saref' appears three times in this file. CSV files should have unique keys in the first column to ensure proper prefix resolution. This duplication needs to be resolved - consider keeping only one definition of the 'saref' prefix or using a different approach to handle the shared prefix between saref4ener and saref4bldg.

Suggested change
saref,https://saref.etsi.org/core/

Copilot uses AI. Check for mistakes.
@claude
Copy link

claude bot commented Feb 27, 2026

Claude finished @cmungall's task —— View job


PR Review: Update registry and generated artifacts (pre-sulo)

I've conducted a comprehensive review of this PR focusing on code quality, potential issues, performance, security, and test coverage.

Review Tasks

  • Analyze ontology registry changes in ontologies.yaml
  • Review prefix mapping consistency across generated files
  • Examine build commands for security and robustness
  • Check generated artifact alignment
  • Assess performance considerations
  • Evaluate test coverage implications
  • Provide final recommendations

Overall Assessment: APPROVED with Recommendations

This is a well-structured PR that successfully adds 8 new ontology integrations while maintaining consistent patterns and properly regenerating all derived artifacts. The changes follow established conventions and demonstrate good understanding of the project's architecture.


Code Quality & Best Practices ⭐⭐⭐⭐⭐

Excellent adherence to established patterns:

  • Consistent YAML structure: All new ontology entries follow the established pattern with url, description, build_command, and prefixmap fields
  • Proper artifact regeneration: prefixes_local.csv, prefixes.csv, and ontologies.Makefile are correctly regenerated from the updated registry
  • Well-organized groupings: New "SAREF and Building Ontologies" section (ontologies.yaml:702-767) logically groups related IoT/building ontologies
  • EXTRA_ONTOLOGIES update: All new ontologies properly added to the build target list (ontologies.Makefile:1650)

New Ontology Integrations Analysis

Eight new ontologies added with proper metadata:

  • micront: Microbiology ontology (ontologies.yaml:656-659) - simple, clean integration
  • saref4ener & saref4bldg: ETSI SAREF extensions with proper content negotiation
  • hhearvs: HHEAR Value Sets with appropriate SKOS-to-OWL preprocessing
  • sdoho: Social Determinants of Health Ontology with import stripping
  • pathgo: Pathogenesis Gene Ontology - straightforward GitHub integration
  • brick: Building operations schema with necessary preprocessing
  • minsysont: Critical Minerals Ontology with URL encoding workaround

Issues Found & Recommendations

🚨 High Priority Issues

1. Duplicate Prefix Mappings (prefixes_local.csv:167,170)

saref,https://saref.etsi.org/core/
saref,https://saref.etsi.org/core/  # Duplicate from saref4ener
saref,https://saref.etsi.org/core/  # Duplicate from saref4bldg

Impact: CSV prefix resolution may be unpredictable with duplicate keys
Fix Required: Remove duplicate entries or use unique prefixes for each ontology

2. Filename/Header Mismatch (AGENTS.md:1)

  • File is named AGENTS.md but header says # CLAUDE.md
  • Fix Required: Either rename file to CLAUDE.md or update header to match filename

⚠️ Medium Priority Issues

3. Brittle Line Number Dependencies

  • ontologies.yaml:745 & ontologies.Makefile:1626: sed '48061,48072d' for Brick ontology
  • Issue: Hardcoded line numbers will break if upstream file changes
  • Recommendation: Use pattern matching instead:
sed '/owl:imports.*bacnet/d' $< > $@.tmp && robot convert -i $@.tmp -f owl -o $@ && rm $@.tmp

4. Unstable GitHub URLs

  • MENO prefix uses raw GitHub URL to main branch (ontologies.yaml:206)
  • Risk: Direct branch references can break if repository structure changes
  • Recommendation: Request stable identifiers (w3id.org/PURL) from ontology maintainers

5. Hardcoded API Keys

  • BioPortal API key 8b5b7825-538d-40e0-9e9e-5ab9274a9aeb in URLs (ontologies.yaml:723,731)
  • Security Note: While this appears to be a shared public key used throughout the codebase, consider environment variables for better practices

ℹ️ Low Priority Issues

6. Fragile URL Encoding Workaround

  • minsysont build command does Perl regex URL encoding for [HSiO4] (ontologies.yaml:763)
  • Issue: Specific to current ontology format; may break if upstream changes representation
  • Recommendation: Add documentation explaining why this workaround is necessary

Security Assessment

  • Safe URL sources: All URLs point to trusted repositories (ETSI, GitHub, BioOntology Portal)
  • No code execution risks: Build commands only use safe utilities (curl, robot, perl, sed)
  • Proper escaping: URL parameters and regex patterns are appropriately quoted
  • API key exposure: Follows existing codebase pattern (acceptable for public keys)

Performance Considerations

  • Efficient build patterns: New build commands use appropriate caching with .PRECIOUS targets
  • Minimal processing: Only necessary preprocessing applied per ontology
  • No circular dependencies: All new ontologies are independent additions
  • Build matrix expansion: Adding 8 ontologies increases total build time but follows existing patterns

Test Coverage Assessment

Gap Identified: No test updates accompany the significant new ontology integrations
Recommendations:

  • Add integration tests for new build commands (especially those with complex preprocessing)
  • Test prefix mapping resolution with new prefixes
  • Validate that new ontologies can be successfully built in CI
  • Consider adding tests for duplicate prefix detection

Generated Artifacts Review

All artifacts correctly regenerated:

  • prefixes_local.csv: Contains all 25+ new prefixes from ontology entries
  • prefixes.csv: Properly merges curated, local, and OBO prefixes
  • ontologies.Makefile: Correct download/build rules for all new ontologies
  • Build targets properly added to EXTRA_ONTOLOGIES list

Final Recommendations

Must Fix Before Merge:

  1. Remove duplicate saref prefix entries from prefixes_local.csv and prefixes.csv
  2. Fix AGENTS.md filename/header mismatch

Should Fix:

  1. Replace brittle line number sed commands with pattern matching for Brick ontology
  2. Document URL encoding workaround for minsysont

Consider for Future:

  1. Add integration tests for new ontologies
  2. Request stable identifiers for GitHub-hosted ontologies like MENO
  3. Consider environment variables for API keys

Despite the issues identified, this PR demonstrates excellent understanding of the project architecture and follows established patterns consistently. The scope is appropriate and the exclusion of sulo configuration keeps this focused as intended.

Recommendation: ✅ APPROVE after fixing duplicate prefixes and filename mismatch


@cmungall cmungall merged commit 8a2c97d into main Feb 28, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants