-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathadmin.spec.json
More file actions
2586 lines (2586 loc) · 121 KB
/
admin.spec.json
File metadata and controls
2586 lines (2586 loc) · 121 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
{
"openapi": "3.1.0",
"info": {
"title": "Amp Admin API",
"description": "Administration API for Amp, a high-performance ETL system for blockchain data services on The Graph.\n\n## About\n\nThe Admin API provides a RESTful HTTP interface for managing Amp's ETL operations. This API serves as the primary administrative interface for monitoring and controlling the Amp data pipeline, allowing you to deploy datasets, trigger data extraction jobs, monitor job progress, manage distributed worker locations, configure external data providers, and perform operations on Parquet files and their metadata.\n\n## Key Capabilities\n\n### Dataset Management\nHandle the lifecycle of data extraction configurations and access dataset information:\n- List all registered datasets from the metadata database registry\n- Register new dataset configurations with versioning support\n- Trigger data extraction jobs for specific datasets or dataset versions\n- Retrieve dataset details including tables and active storage locations\n\n### Job Control\nControl and monitor data extraction and processing jobs:\n- List and retrieve job information with pagination\n- Trigger extraction jobs with optional end block configuration\n- Stop running jobs gracefully\n- Delete jobs in terminal states (Completed, Stopped, Failed)\n- Bulk cleanup operations for finalized jobs\n\n### Storage Management\nManage locations where dataset tables are stored:\n- Supports local filesystem, S3, GCS, and Azure Blob Storage\n- List storage locations and their associated files\n- Delete locations with comprehensive cleanup (removes files and metadata)\n- Query file information including Parquet metadata and statistics\n\n### Provider Configuration\nConfigure external blockchain data sources:\n- Create, retrieve, and delete provider configurations\n- Support for EVM RPC endpoints and Firehose streams\n- Providers are reusable across multiple dataset definitions\n- **Security Note**: Provider configurations may contain connection details; ensure sensitive information is properly managed\n\n### Schema Analysis\nValidate SQL queries and infer output schemas:\n- Validate queries against registered datasets without execution\n- Determine output schema using DataFusion's query planner\n- Useful for building dynamic query tools and validating dataset definitions\n\n## Pagination\n\nMost list endpoints use cursor-based pagination for efficient data retrieval:\n\n### Paginated Endpoints\nThe following endpoints support pagination:\n- Jobs: `/jobs`\n- Locations: `/locations`\n- Files: `/locations/{location_id}/files`\n\n### Non-Paginated Endpoints\nThe following endpoints return all results without pagination:\n- Datasets: `/datasets` (returns all datasets)\n- Dataset Versions: `/datasets/{name}/versions` (returns all versions for a dataset)\n\n### Query Parameters (Paginated Endpoints Only)\n- `limit`: Maximum items per page (default: 50, max: 1000)\n- `last_*_id`: Cursor from previous page's `next_cursor` field\n\n### Response Format\nPaginated responses include:\n- Array of items (e.g., `jobs`, `locations`, `files`)\n- `next_cursor`: Cursor for the next page (absent when no more results)\n\n### Usage Pattern\n\n**First Page Request:**\n```\nGET /jobs?limit=100\n```\n\n**First Page Response:**\n```json\n{\n \"jobs\": [...],\n \"next_cursor\": 12345\n}\n```\n\n**Next Page Request:**\n```\nGET /jobs?limit=100&last_job_id=12345\n```\n\n**Last Page Response:**\n```json\n{\n \"jobs\": [...]\n // No next_cursor field = end of results\n}\n```\n\n### Cursor Formats\n\nEndpoints use different cursor formats based on their data type:\n\n**Integer ID Cursors (64-bit integers):**\nMost paginated endpoints use simple integer IDs as cursors:\n- Jobs: `last_job_id=12345`\n- Locations: `last_location_id=67890`\n- Files: `last_file_id=54321`\n\n## Error Handling\n\nAll error responses follow a consistent format with:\n- `error_code`: Stable, machine-readable code (SCREAMING_SNAKE_CASE)\n- `error_message`: Human-readable error description\n\nError codes are stable across API versions and suitable for programmatic error handling. Messages may change and should only be used for display or logging.\n\n## Important Notes\n\n### Dataset Registration\nSupports two main scenarios:\n- **Derived datasets** (kind=\"manifest\"): Registered in both object store and metadata database\n- **SQL datasets** (other kinds): Dataset definitions stored in object store\n\n### Job Lifecycle\nJobs have the following terminal states that allow deletion:\n- **Completed**: Job finished successfully\n- **Stopped**: Job was manually stopped\n- **Failed**: Job encountered an error\n\nNon-terminal jobs (Scheduled, Running, StopRequested, Stopping) are protected from deletion.\n\n### Storage Locations\n- Locations can be active or inactive for queries\n- Deleting a location performs comprehensive cleanup including file removal from object store\n- Each location is associated with a specific dataset table and storage URL\n",
"license": {
"name": ""
},
"version": "1.0.0"
},
"paths": {
"/datasets": {
"get": {
"tags": [
"datasets"
],
"summary": "Handler for the `GET /datasets` endpoint",
"description": "Returns all registered datasets across all namespaces with their version information.\n\n## Response\n- **200 OK**: Successfully retrieved all datasets\n- **500 Internal Server Error**: Database query error\n\n## Error Codes\n- `LIST_ALL_DATASETS_ERROR`: Failed to list all datasets from dataset store\n\n## Behavior\nThis endpoint returns a comprehensive list of all datasets registered in the system,\ngrouped by namespace and name. For each dataset, it includes:\n- The latest semantic version (if any versions are tagged)\n- All available semantic versions in descending order\n\nThe response does not include special tags (\"latest\", \"dev\") as these are system-managed\nand can be queried via the versions endpoint for specific datasets.\n\nResults are ordered by namespace then by name (lexicographical).",
"operationId": "list_all_datasets",
"responses": {
"200": {
"description": "Successfully retrieved all datasets",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DatasetsResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
},
"post": {
"tags": [
"datasets"
],
"summary": "Handler for the `POST /datasets` endpoint",
"description": "Registers a new dataset configuration in the server's local registry. Accepts a JSON payload\ncontaining the dataset registration configuration.\n\n**Note**: This endpoint only registers datasets and does NOT schedule data extraction.\nTo extract data after registration, make a separate call to:\n- `POST /datasets/{namespace}/{name}/versions/dev/deploy` - for dev tag\n- `POST /datasets/{namespace}/{name}/versions/latest/deploy` - for latest tag\n- `POST /datasets/{namespace}/{name}/versions/{version}/deploy` - for specific version\n\n## Request Body\n- `dataset_name`: Name of the dataset to be registered (must be valid dataset name)\n- `version`: Optional version of the dataset to register. If omitted, only the \"dev\" tag is updated.\n- `manifest`: JSON string representation of the dataset manifest\n\n## Response\n- **201 Created**: Dataset successfully registered (or updated if version tag already exists)\n- **400 Bad Request**: Invalid dataset name, version, or manifest format\n- **500 Internal Server Error**: Database or object store error\n\n## Error Codes\n- `INVALID_PAYLOAD_FORMAT`: Request JSON is malformed or invalid\n- `INVALID_MANIFEST`: Manifest JSON parsing or structure error\n- `DEPENDENCY_VALIDATION_ERROR`: SQL queries are invalid or reference undeclared dependencies\n- `MANIFEST_REGISTRATION_ERROR`: Failed to register manifest in system\n- `MANIFEST_LINKING_ERROR`: Failed to link manifest to dataset\n- `MANIFEST_NOT_FOUND`: Manifest hash provided but manifest doesn't exist\n- `VERSION_TAGGING_ERROR`: Failed to tag the manifest with the version\n- `UNSUPPORTED_DATASET_KIND`: Dataset kind is not supported\n- `STORE_ERROR`: Failed to load or access dataset store\n\n## Behavior\nThis handler supports multiple dataset kinds for registration:\n- **Derived dataset** (kind=\"manifest\"): Registers a derived dataset manifest that transforms data from other datasets using SQL queries\n- **EVM-RPC dataset** (kind=\"evm-rpc\"): Registers a raw dataset that extracts blockchain data directly from Ethereum-compatible JSON-RPC endpoints\n- **Firehose dataset** (kind=\"firehose\"): Registers a raw dataset that streams blockchain data from StreamingFast Firehose protocol\n- **Eth Beacon dataset** (kind=\"eth-beacon\"): Registers a raw dataset that extracts Ethereum Beacon Chain data\n- **Legacy SQL datasets** are **not supported** and will return an error\n\n## Registration Process\nThe registration process involves two or three steps depending on whether a version is provided:\n1. **Register or validate manifest**: Either stores a new manifest in hash-based storage and creates\n a metadata database entry, or validates that a provided manifest hash exists in the system\n2. **Link manifest to dataset**: Links the manifest to the dataset namespace/name and automatically\n updates the \"dev\" tag to point to this manifest (performed in a transaction for atomicity)\n3. **Tag version** (optional): If a version is provided, associates the version identifier with the\n manifest hash, and updates the \"latest\" tag if this version is higher than the current latest\n\nThis approach enables:\n- Content-addressable storage by manifest hash\n- Deduplication of identical manifests\n- Separation of manifest storage, dataset linking, and version management\n- Development workflow: register without version to only update \"dev\" tag via linking\n- Release workflow: register with version to create semantic version tags and update \"latest\"\n- Reuse workflow: provide manifest hash to link existing manifest without re-registering it\n\nAll operations are idempotent:\n- **Manifest registration**: If the manifest already exists (same hash), the operation succeeds without changes\n- **Manifest linking**: If the manifest is already linked to the dataset, the operation succeeds without changes\n- **Dev tag update**: The dev tag is always updated to point to the linked manifest (last-write-wins)\n- **Version tag**: If the version tag doesn't exist, it is created; if it exists with the same hash, no changes;\n if it exists with a different hash, it is updated to point to the new hash\n- **Latest tag**: Automatically updated only if the new version is higher than the current latest version\n\nThe handler:\n- Validates dataset name and version format\n- Checks that dataset kind is supported\n- Registers/validates the manifest, links it to the dataset, and optionally tags it with a version\n- Returns appropriate status codes and error messages\n\n## Typical Workflow\nFor users wanting both registration and data extraction:\n1. `POST /datasets` - Register the dataset (this endpoint)\n2. `POST /datasets/{namespace}/{name}/versions/{version}/deploy` - Schedule data extraction",
"operationId": "datasets_register",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/RegisterRequest"
}
}
},
"required": true
},
"responses": {
"201": {
"description": "Dataset successfully registered or updated"
},
"400": {
"description": "Invalid request format or manifest",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
}
},
"/datasets/{namespace}/{name}": {
"delete": {
"tags": [
"datasets"
],
"summary": "Handler for the `DELETE /datasets/{namespace}/{name}` endpoint",
"description": "Removes all manifest links and version tags for a dataset.\n\n## Response\n- **204 No Content**: Dataset successfully deleted (or didn't exist)\n- **400 Bad Request**: Invalid path parameters\n- **500 Internal Server Error**: Database operation error\n\n## Error Codes\n- `INVALID_PATH`: Invalid namespace or name in path parameters\n- `UNLINK_DATASET_MANIFESTS_ERROR`: Failed to unlink dataset manifests from dataset store\n\n## Behavior\nThis endpoint deletes all metadata for a dataset including:\n- All manifest links in the dataset_manifests table\n- All version tags (cascaded automatically via foreign key constraint)\n- Orphaned manifest files (manifests not referenced by any other dataset)\n\nThis operation is fully idempotent - it returns 204 even if the dataset\ndoesn't exist. Manifests that are still referenced by other datasets are\npreserved.",
"operationId": "delete_dataset",
"parameters": [
{
"name": "namespace",
"in": "path",
"description": "Dataset namespace",
"required": true,
"schema": {
"type": "string"
}
},
{
"name": "name",
"in": "path",
"description": "Dataset name",
"required": true,
"schema": {
"type": "string"
}
}
],
"responses": {
"204": {
"description": "Dataset successfully deleted"
},
"400": {
"description": "Invalid path parameters",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
}
},
"/datasets/{namespace}/{name}/versions": {
"get": {
"tags": [
"datasets"
],
"summary": "Handler for the `GET /datasets/{namespace}/{name}/versions` endpoint",
"description": "Returns all versions for a dataset with their metadata.\n\n## Response\n- **200 OK**: Successfully retrieved version list\n- **400 Bad Request**: Invalid path parameters\n- **500 Internal Server Error**: Database query error\n\n## Error Codes\n- `INVALID_PATH`: Invalid namespace or name in path parameters\n- `LIST_VERSION_TAGS_ERROR`: Failed to list version tags from dataset store\n- `RESOLVE_REVISION_ERROR`: Failed to resolve dev tag revision\n\n## Behavior\nThis endpoint returns comprehensive version information for a dataset:\n- All semantic versions sorted in descending order (newest first)\n- For each version: manifest hash, creation time, and last update time\n- Special tags: \"latest\" (if any semantic versions exist) and \"dev\" (if set)\n\nThe \"latest\" tag is automatically managed and always points to the highest\nsemantic version. The \"dev\" tag is explicitly managed via the registration\nendpoint and may point to any manifest hash.\n\nReturns an empty list if the dataset has no registered versions.",
"operationId": "list_dataset_versions",
"parameters": [
{
"name": "namespace",
"in": "path",
"description": "Dataset namespace",
"required": true,
"schema": {
"type": "string"
}
},
{
"name": "name",
"in": "path",
"description": "Dataset name",
"required": true,
"schema": {
"type": "string"
}
}
],
"responses": {
"200": {
"description": "Successfully retrieved versions",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/VersionsResponse"
}
}
}
},
"400": {
"description": "Invalid path parameters",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
}
},
"/datasets/{namespace}/{name}/versions/{revision}": {
"get": {
"tags": [
"datasets"
],
"summary": "Handler for the `GET /datasets/{namespace}/{name}/versions/{revision}` endpoint",
"description": "Returns detailed dataset information for the specified revision.\n\n## Response\n- **200 OK**: Successfully retrieved dataset information\n- **400 Bad Request**: Invalid path parameters\n- **404 Not Found**: Dataset or revision not found\n- **500 Internal Server Error**: Database or dataset store error\n\n## Error Codes\n- `INVALID_PATH`: Invalid namespace, name, or revision in path parameters\n- `DATASET_NOT_FOUND`: The specified dataset or revision does not exist\n- `RESOLVE_REVISION_ERROR`: Failed to resolve revision to manifest hash\n- `GET_MANIFEST_PATH_ERROR`: Failed to query manifest path from metadata database\n- `READ_MANIFEST_ERROR`: Failed to read manifest file from object store\n- `PARSE_MANIFEST_ERROR`: Failed to parse manifest JSON\n\n## Behavior\nThis endpoint retrieves detailed information about a specific dataset revision.\nThe revision parameter supports four types:\n- Semantic version (e.g., \"1.2.3\")\n- Manifest hash (SHA256 hash)\n- \"latest\" - resolves to the highest semantic version\n- \"dev\" - resolves to the development version\n\nThe endpoint first resolves the revision to a manifest hash, then returns\nbasic dataset information including namespace, name, revision, manifest hash, and kind.",
"operationId": "get_dataset_by_revision",
"parameters": [
{
"name": "namespace",
"in": "path",
"description": "Dataset namespace",
"required": true,
"schema": {
"type": "string"
}
},
{
"name": "name",
"in": "path",
"description": "Dataset name",
"required": true,
"schema": {
"type": "string"
}
},
{
"name": "revision",
"in": "path",
"description": "Revision (version, hash, latest, or dev)",
"required": true,
"schema": {
"type": "string"
}
}
],
"responses": {
"200": {
"description": "Successfully retrieved dataset",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DatasetInfo"
}
}
}
},
"400": {
"description": "Invalid path parameters",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"404": {
"description": "Dataset or revision not found",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
}
},
"/datasets/{namespace}/{name}/versions/{revision}/deploy": {
"post": {
"tags": [
"datasets"
],
"summary": "Handler for the `POST /datasets/{namespace}/{name}/versions/{revision}/deploy` endpoint",
"description": "Schedules a data extraction job for the specified dataset revision.\n\n## Response\n- **202 Accepted**: Job successfully scheduled\n- **400 Bad Request**: Invalid path parameters or request body\n- **404 Not Found**: Dataset or revision not found\n- **500 Internal Server Error**: Database or scheduler error\n\n## Error Codes\n- `INVALID_PATH`: Invalid path parameters (namespace, name, or revision)\n- `INVALID_BODY`: Invalid request body (malformed JSON or missing required fields)\n- `DATASET_NOT_FOUND`: The specified dataset or revision does not exist\n- `LIST_VERSION_TAGS_ERROR`: Failed to list version tags from dataset store\n- `RESOLVE_REVISION_ERROR`: Failed to resolve revision to manifest hash\n- `GET_DATASET_ERROR`: Failed to load dataset from store\n- `SCHEDULER_ERROR`: Failed to schedule extraction job\n\n## Behavior\nThis endpoint schedules a data extraction job for a dataset:\n1. Resolves the revision to find the corresponding version tag\n2. Loads the full dataset configuration from the dataset store\n3. Schedules an extraction job with the specified parameters\n4. Returns job ID for tracking\n\nThe revision parameter supports four types:\n- Semantic version (e.g., \"1.2.3\") - uses that specific version\n- \"latest\" - resolves to the highest semantic version\n- \"dev\" - resolves to the development version tag\n- Manifest hash (SHA256 hash) - finds the version that points to this hash\n\nJobs are executed asynchronously by worker nodes. Use the returned job ID\nto track progress via the jobs endpoints.",
"operationId": "deploy_dataset",
"parameters": [
{
"name": "namespace",
"in": "path",
"description": "Dataset namespace",
"required": true,
"schema": {
"type": "string"
}
},
{
"name": "name",
"in": "path",
"description": "Dataset name",
"required": true,
"schema": {
"type": "string"
}
},
{
"name": "revision",
"in": "path",
"description": "Revision (version, hash, latest, or dev)",
"required": true,
"schema": {
"type": "string"
}
}
],
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DeployRequest"
}
}
},
"required": true
},
"responses": {
"202": {
"description": "Job successfully scheduled",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/DeployResponse"
}
}
}
},
"400": {
"description": "Bad request (invalid parameters)",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"404": {
"description": "Dataset or revision not found",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
}
},
"/datasets/{namespace}/{name}/versions/{revision}/manifest": {
"get": {
"tags": [
"datasets"
],
"summary": "Handler for the `GET /datasets/{namespace}/{name}/versions/{revision}/manifest` endpoint",
"description": "Retrieves the raw manifest JSON for the specified dataset revision.\n\n## Response\n- **200 OK**: Successfully retrieved manifest\n- **404 Not Found**: Dataset, revision, or manifest not found\n- **500 Internal Server Error**: Database or object store error\n\n## Error Codes\n- `INVALID_PATH`: Invalid namespace, name, or revision in path parameters\n- `DATASET_NOT_FOUND`: The specified dataset or revision does not exist\n- `MANIFEST_NOT_FOUND`: The manifest file was not found in object storage\n- `RESOLVE_REVISION_ERROR`: Failed to resolve revision to manifest hash\n- `GET_MANIFEST_PATH_ERROR`: Failed to query manifest path from metadata database\n- `READ_MANIFEST_ERROR`: Failed to read manifest file from object store\n- `PARSE_MANIFEST_ERROR`: Failed to parse manifest JSON\n\n## Behavior\nThis endpoint returns the raw manifest JSON document for a dataset revision.\nThe revision parameter supports four types:\n- Semantic version (e.g., \"1.2.3\")\n- Manifest hash (SHA256 hash)\n- \"latest\" - resolves to the highest semantic version\n- \"dev\" - resolves to the development version\n\nThe endpoint first resolves the revision to a manifest hash, then retrieves\nthe manifest JSON from object storage. Manifests are immutable and\ncontent-addressable, identified by their SHA256 hash.",
"operationId": "get_dataset_manifest",
"parameters": [
{
"name": "namespace",
"in": "path",
"description": "Dataset namespace",
"required": true,
"schema": {
"type": "string"
}
},
{
"name": "name",
"in": "path",
"description": "Dataset name",
"required": true,
"schema": {
"type": "string"
}
},
{
"name": "revision",
"in": "path",
"description": "Revision (version, hash, latest, or dev)",
"required": true,
"schema": {
"type": "string"
}
}
],
"responses": {
"200": {
"description": "Successfully retrieved manifest",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Value"
}
}
}
},
"404": {
"description": "Dataset or revision not found",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
}
},
"/datasets/{namespace}/{name}/versions/{version}": {
"delete": {
"tags": [
"datasets"
],
"summary": "Handler for the `DELETE /datasets/{namespace}/{name}/versions/{version}` endpoint",
"description": "Removes a semantic version tag from a dataset.\n\n## Response\n- **204 No Content**: Version successfully deleted (or didn't exist)\n- **400 Bad Request**: Invalid path parameters or attempting to delete the \"latest\" version\n- **500 Internal Server Error**: Database operation error\n\n## Error Codes\n- `INVALID_PATH`: Invalid namespace, name, or version in path parameters\n- `CANNOT_DELETE_LATEST_VERSION`: Cannot delete the version currently tagged as \"latest\"\n- `RESOLVE_LATEST_REVISION_ERROR`: Failed to resolve the \"latest\" tag to its manifest hash\n- `RESOLVE_VERSION_REVISION_ERROR`: Failed to resolve the requested version to its manifest hash\n- `DELETE_VERSION_TAG_ERROR`: Failed to delete version tag from dataset store\n\n## Behavior\nThis endpoint removes a semantic version tag from a dataset. The deletion follows this flow:\n\n1. **Check version existence**: Resolves the requested version to its manifest hash.\n If the version doesn't exist, returns 204 immediately (idempotent).\n\n2. **Check \"latest\" protection**: Resolves the \"latest\" tag to its manifest hash and compares\n with the requested version's hash. If they point to the same manifest, deletion is rejected\n with a 400 error. You must create a newer version first to update the \"latest\" tag.\n\n3. **Delete version tag**: Removes only the version tag from the database. The underlying\n manifest file is never deleted (manifests are content-addressable and may be referenced\n by other versions or datasets).\n\nThis operation is fully idempotent - it returns 204 even if the version doesn't exist.",
"operationId": "delete_dataset_version",
"parameters": [
{
"name": "namespace",
"in": "path",
"description": "Dataset namespace",
"required": true,
"schema": {
"type": "string"
}
},
{
"name": "name",
"in": "path",
"description": "Dataset name",
"required": true,
"schema": {
"type": "string"
}
},
{
"name": "version",
"in": "path",
"description": "Semantic version (e.g., 1.2.3)",
"required": true,
"schema": {
"type": "string"
}
}
],
"responses": {
"204": {
"description": "Version successfully deleted"
},
"400": {
"description": "Invalid path parameters",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
}
},
"/files/{file_id}": {
"get": {
"tags": [
"files"
],
"summary": "Handler for the `GET /files/{file_id}` endpoint",
"description": "Retrieves and returns a specific file by its ID from the metadata database.\n\n## Path Parameters\n- `file_id`: The unique identifier of the file to retrieve (must be a positive integer)\n\n## Response\n- **200 OK**: Returns the file information as JSON\n- **400 Bad Request**: Invalid file ID format (not a number, zero, or negative)\n- **404 Not Found**: File with the given ID does not exist\n- **500 Internal Server Error**: Database connection or query error\n\n## Error Codes\n- `INVALID_FILE_ID`: The provided ID is not a valid positive integer\n- `FILE_NOT_FOUND`: No file exists with the given ID\n- `METADATA_DB_ERROR`: Internal database error occurred\n\nThis handler:\n- Validates and extracts the file ID from the URL path\n- Queries the metadata database for the file with location information\n- Returns appropriate HTTP status codes and error messages",
"operationId": "files_get",
"parameters": [
{
"name": "file_id",
"in": "path",
"description": "File ID",
"required": true,
"schema": {
"type": "integer",
"format": "int64"
}
}
],
"responses": {
"200": {
"description": "Successfully retrieved file information",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/FileInfo"
}
}
}
},
"400": {
"description": "Invalid file ID",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"404": {
"description": "File not found",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
}
},
"/jobs": {
"get": {
"tags": [
"jobs"
],
"summary": "Handler for the `GET /jobs` endpoint",
"description": "Retrieves and returns a paginated list of jobs from the metadata database.\n\n## Query Parameters\n- `limit`: Maximum number of jobs to return (default: 50, max: 1000)\n- `last_job_id`: ID of the last job from previous page for cursor-based pagination\n\n## Response\n- **200 OK**: Returns paginated job data with next cursor\n- **400 Bad Request**: Invalid limit parameter (0, negative, or > 1000)\n- **500 Internal Server Error**: Database connection or query error\n\n## Error Codes\n- `INVALID_QUERY_PARAMETERS`: Invalid query parameters (malformed or unparseable)\n- `LIMIT_TOO_LARGE`: Limit exceeds maximum allowed value (>1000)\n- `LIMIT_INVALID`: Limit is zero\n- `LIST_JOBS_ERROR`: Failed to list jobs from scheduler (database error)",
"operationId": "jobs_list",
"parameters": [
{
"name": "limit",
"in": "query",
"description": "Maximum number of jobs to return (default: 50, max: 1000)",
"required": false,
"schema": {
"type": "integer",
"minimum": 0
}
},
{
"name": "last_job_id",
"in": "query",
"description": "ID of the last job from the previous page for pagination",
"required": false,
"schema": {
"type": "string"
}
}
],
"responses": {
"200": {
"description": "Successfully retrieved jobs",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/JobsResponse"
}
}
}
},
"400": {
"description": "Invalid query parameters",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
},
"delete": {
"tags": [
"jobs"
],
"summary": "Handler for the `DELETE /jobs?status=<filter>` endpoint",
"description": "Deletes jobs based on status filter. Supports deleting jobs by various status criteria.\n\n## Query Parameters\n- `status=terminal`: Delete all jobs in terminal states (Completed, Stopped, Failed)\n- `status=completed`: Delete all completed jobs\n- `status=stopped`: Delete all stopped jobs\n- `status=error`: Delete all failed jobs\n\n## Response\n- **204 No Content**: Operation completed successfully\n- **400 Bad Request**: Invalid or missing status query parameter\n- **500 Internal Server Error**: Database error occurred\n\n## Error Codes\n- `INVALID_QUERY_PARAM`: Invalid or missing status parameter\n- `DELETE_JOBS_BY_STATUS_ERROR`: Failed to delete jobs by status from scheduler (database error)\n\n## Behavior\nThis handler provides bulk job cleanup with the following characteristics:\n- Only jobs in terminal states (Completed, Stopped, Failed) are deleted\n- Non-terminal jobs are completely protected from deletion\n- Database layer ensures atomic bulk deletion\n- Safe to call even when no terminal jobs exist\n\n## Terminal States\nJobs are deleted when in these states:\n- Completed → Safe to delete\n- Stopped → Safe to delete\n- Failed → Safe to delete\n\nProtected states (never deleted):\n- Scheduled → Job is waiting to run\n- Running → Job is actively executing\n- StopRequested → Job is being stopped\n- Stopping → Job is in process of stopping\n- Unknown → Invalid state\n\n## Usage\nThis endpoint is typically used for:\n- Periodic cleanup of completed jobs\n- Administrative maintenance\n- Freeing up database storage",
"operationId": "jobs_delete_many",
"parameters": [
{
"name": "status",
"in": "query",
"description": "Status filter for jobs to delete",
"required": true,
"schema": {
"$ref": "#/components/schemas/String"
}
}
],
"responses": {
"204": {
"description": "Jobs deleted successfully"
},
"400": {
"description": "Invalid query parameters",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
}
},
"/jobs/{id}": {
"get": {
"tags": [
"jobs"
],
"summary": "Handler for the `GET /jobs/{id}` endpoint",
"description": "Retrieves and returns a specific job by its ID from the metadata database.\n\n## Path Parameters\n- `id`: The unique identifier of the job to retrieve (must be a valid JobId)\n\n## Response\n- **200 OK**: Returns the job information as JSON\n- **400 Bad Request**: Invalid job ID format (not parseable as JobId)\n- **404 Not Found**: Job with the given ID does not exist\n- **500 Internal Server Error**: Database connection or query error\n\n## Error Codes\n- `INVALID_JOB_ID`: The provided ID is not a valid job identifier\n- `JOB_NOT_FOUND`: No job exists with the given ID\n- `GET_JOB_ERROR`: Failed to retrieve job from scheduler (database error)",
"operationId": "jobs_get",
"parameters": [
{
"name": "id",
"in": "path",
"description": "Job ID",
"required": true,
"schema": {
"type": "string"
}
}
],
"responses": {
"200": {
"description": "Successfully retrieved job information",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/JobInfo"
}
}
}
},
"400": {
"description": "Invalid job ID",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"404": {
"description": "Job not found",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
},
"delete": {
"tags": [
"jobs"
],
"summary": "Handler for the `DELETE /jobs/{id}` endpoint",
"description": "Deletes a job by its ID if it's in a terminal state (Completed, Stopped, or Failed).\nThis is a safe, idempotent operation that only removes finalized jobs from the system.\n\n## Path Parameters\n- `id`: The unique identifier of the job to delete (must be a valid JobId)\n\n## Response\n- **204 No Content**: Job was successfully deleted or does not exist (idempotent)\n- **400 Bad Request**: Invalid job ID format (not parseable as JobId)\n- **409 Conflict**: Job exists but is not in a terminal state (cannot be deleted)\n- **500 Internal Server Error**: Database error occurred\n\n## Error Codes\n- `INVALID_JOB_ID`: The provided ID is not a valid job identifier\n- `JOB_CONFLICT`: Job exists but is not in a terminal state\n- `GET_JOB_ERROR`: Failed to retrieve job from scheduler (database error)\n- `DELETE_JOB_ERROR`: Failed to delete job from scheduler (database error)\n\n## Idempotent Behavior\nThis handler is idempotent - deleting a non-existent job returns 204 (success).\nThis allows clients to safely retry deletions without worrying about 404 errors.\n\n## Behavior\nThis handler provides safe job deletion with the following characteristics:\n- Only jobs in terminal states (Completed, Stopped, Failed) can be deleted\n- Non-terminal jobs are protected from accidental deletion\n- Non-existent jobs return success (idempotent behavior)\n- Database layer ensures atomic deletion\n\n## Terminal States\nJobs can only be deleted when in these states:\n- Completed → Safe to delete\n- Stopped → Safe to delete\n- Failed → Safe to delete\n\nProtected states (cannot be deleted):\n- Scheduled → Job is waiting to run\n- Running → Job is actively executing\n- StopRequested → Job is being stopped\n- Stopping → Job is in process of stopping\n- Unknown → Invalid state",
"operationId": "jobs_delete",
"parameters": [
{
"name": "id",
"in": "path",
"description": "Job ID",
"required": true,
"schema": {
"type": "integer",
"format": "int64"
}
}
],
"responses": {
"204": {
"description": "Job deleted successfully or does not exist (idempotent)"
},
"400": {
"description": "Invalid job ID",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"409": {
"description": "Job cannot be deleted (not in terminal state)",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
}
},
"/jobs/{id}/stop": {
"put": {
"tags": [
"jobs"
],
"summary": "Handler for the `PUT /jobs/{id}/stop` endpoint",
"description": "Stops a running job using the specified job ID. This is an idempotent\noperation that handles job termination requests safely.\n\n## Path Parameters\n- `id`: The unique identifier of the job to stop (must be a valid JobId)\n\n## Response\n- **200 OK**: Job stop request processed successfully, or job already in terminal state (idempotent)\n- **400 Bad Request**: Invalid job ID format (not parseable as JobId)\n- **404 Not Found**: Job with the given ID does not exist\n- **500 Internal Server Error**: Database connection or scheduler error\n\n## Error Codes\n- `INVALID_JOB_ID`: The provided ID is not a valid job identifier\n- `JOB_NOT_FOUND`: No job exists with the given ID\n- `STOP_JOB_ERROR`: Database error during stop operation execution\n- `UNEXPECTED_STATE_CONFLICT`: Internal state machine error (indicates a bug)\n\n## Idempotent Behavior\nThis handler is idempotent - stopping a job that's already in a terminal state returns success (200).\nThis allows clients to safely retry stop requests without worrying about conflict errors.\n\nThe desired outcome of a stop request is that the job is not running. If the job is already\nstopped, completed, or failed, this outcome is achieved, so we return success.\n\n## Behavior\nThis handler provides idempotent job stopping with the following characteristics:\n- Jobs already in terminal states (Stopped, Completed, Failed) return success (idempotent)\n- Only running/scheduled jobs transition to stop-requested state\n- Job lookup and stop request are performed atomically within a single transaction\n- Database layer validates state transitions and prevents race conditions\n\n## State Transitions\nValid stop transitions:\n- Scheduled → StopRequested (200 OK)\n- Running → StopRequested (200 OK)\n\nAlready terminal (idempotent - return success):\n- Stopped → no change (200 OK)\n- Completed → no change (200 OK)\n- Failed → no change (200 OK)\n\nThe handler:\n- Validates and extracts the job ID from the URL path\n- Delegates to scheduler for atomic stop operation (job lookup + stop + worker notification)\n- Returns success if job is already in terminal state (idempotent)\n- Returns appropriate HTTP status codes and error messages",
"operationId": "jobs_stop",
"parameters": [
{
"name": "id",
"in": "path",
"description": "Job ID",
"required": true,
"schema": {
"type": "integer",
"format": "int64"
}
}
],
"responses": {
"200": {
"description": "Job stop request processed successfully, or job already in terminal state (idempotent)"
},
"400": {
"description": "Invalid job ID",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"404": {
"description": "Job not found",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
}
},
"/locations": {
"get": {
"tags": [
"locations"
],
"summary": "Handler for the `GET /locations` endpoint",
"description": "Retrieves and returns a paginated list of locations from the metadata database.\n\n## Query Parameters\n- `limit`: Maximum number of locations to return (default: 50, max: 1000)\n- `last_location_id`: ID of the last location from previous page for cursor-based pagination\n\n## Response\n- **200 OK**: Returns paginated location data with next cursor\n- **400 Bad Request**: Invalid limit parameter (0, negative, or > 1000)\n- **500 Internal Server Error**: Database connection or query error\n\n## Error Codes\n- `INVALID_REQUEST`: Invalid query parameters (limit out of range)\n- `METADATA_DB_ERROR`: Internal database error occurred\n\nThis handler:\n- Accepts query parameters for pagination (limit, last_location_id)\n- Validates the limit parameter (max 1000)\n- Calls the metadata DB to list locations with pagination\n- Returns a structured response with locations and next cursor",
"operationId": "locations_list",
"parameters": [
{
"name": "limit",
"in": "query",
"description": "Maximum number of locations to return (default: 50, max: 1000)",
"required": false,
"schema": {
"type": "integer",
"minimum": 0
}
},
{
"name": "last_location_id",
"in": "query",
"description": "ID of the last location from the previous page for pagination",
"required": false,
"schema": {
"type": "string"
}
}
],
"responses": {
"200": {
"description": "Successfully retrieved locations",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/LocationsResponse"
}
}
}
},
"400": {
"description": "Invalid query parameters",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
}
},
"/locations/{id}": {
"get": {
"tags": [
"locations"
],
"summary": "Handler for the `GET /locations/{id}` endpoint",
"description": "Retrieves and returns a specific location by its ID from the metadata database.\n\n## Path Parameters\n- `id`: The unique identifier of the location to retrieve (must be a positive integer)\n\n## Response\n- **200 OK**: Returns the location information as JSON\n- **400 Bad Request**: Invalid location ID format (not a number, zero, or negative)\n- **404 Not Found**: Location with the given ID does not exist\n- **500 Internal Server Error**: Database connection or query error\n\n## Error Codes\n- `INVALID_LOCATION_ID`: The provided ID is not a valid positive integer\n- `LOCATION_NOT_FOUND`: No location exists with the given ID\n- `METADATA_DB_ERROR`: Internal database error occurred\n\nThis handler:\n- Validates and extracts the location ID from the URL path\n- Queries the metadata database for the location\n- Returns appropriate HTTP status codes and error messages",
"operationId": "locations_get",
"parameters": [
{
"name": "id",
"in": "path",
"description": "Location ID",
"required": true,
"schema": {
"type": "integer",
"format": "int64"
}
}
],
"responses": {
"200": {
"description": "Successfully retrieved location information",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/LocationInfoWithDetails"
}
}
}
},
"400": {
"description": "Invalid location ID",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"404": {
"description": "Location not found",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
},
"500": {
"description": "Internal server error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/ErrorResponse"
}
}
}
}
}
},
"delete": {
"tags": [
"locations"
],
"summary": "Handler for the `DELETE /locations/{id}` endpoint",
"description": "Deletes a specific location by its ID from the metadata database.\n\n## Path Parameters\n- `id`: The unique identifier of the location to delete (must be a positive integer)\n\n## Query Parameters\n- `force`: (optional, default: false) Force deletion even if location is active\n\n## Response\n- **204 No Content**: Location successfully deleted\n- **400 Bad Request**: Invalid location ID format or invalid query parameters\n- **404 Not Found**: Location with the given ID does not exist\n- **409 Conflict**: Location is active (without force=true) or has an ongoing job\n- **500 Internal Server Error**: Database connection or query error\n\n## Error Codes\n- `INVALID_LOCATION_ID`: The provided ID is not a valid positive integer\n- `INVALID_QUERY_PARAMETERS`: The query parameters cannot be parsed\n- `LOCATION_NOT_FOUND`: No location exists with the given ID\n- `ACTIVE_LOCATION_CONFLICT`: Location is active and cannot be deleted without force=true\n- `ONGOING_JOB_CONFLICT`: Location has an ongoing job and cannot be deleted\n- `METADATA_DB_ERROR`: Internal database error occurred\n\n## Safety Checks\n- Active locations require `force=true` to be deleted\n- Locations with ongoing jobs cannot be deleted (even with force=true)\n- Users must stop active jobs before deleting associated locations\n\nThis handler:\n- Validates and extracts the location ID from the URL path\n- Validates optional query parameters (force flag)\n- Performs safety checks for active locations and ongoing jobs\n- Deletes associated files from object store\n- Deletes the location from the metadata database\n- Returns appropriate HTTP status codes and error messages",
"operationId": "locations_delete",
"parameters": [
{
"name": "id",
"in": "path",
"description": "Location ID",
"required": true,
"schema": {
"type": "integer",
"format": "int64"
}
},