API Reference¶

Eclair provides several APIs and tools for dataset discovery and management. This reference covers all available interfaces including MCP tools, REST API, and Python SDK.

Model Context Protocol (MCP) Tools¶

Eclair implements the Model Context Protocol, providing standardized tools for AI agents to discover and work with datasets.

Available Tools¶

Tool Name	Description	Parameters
`search-datasets`	Search for datasets using natural language	`query` (required)
`download-dataset`	Download a dataset to local storage	`collection`, `dataset` (both required)
`datasets-preview-url`	Get preview URL for dataset samples	`collection`, `dataset` (both required)
`serve-croissant`	Get Croissant metadata for a dataset	`collection`, `dataset` (both required)
`validate-croissant`	Validate Croissant metadata	`metadata_json` (required)
`help`	Get help information	No parameters
`ping`	Test server connectivity	No parameters

Tool Schemas¶

search-datasets¶

Search for datasets using natural language queries.

{
  "name": "search-datasets",
  "description": "Search for datasets using a query string",
  "inputSchema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "title": "Query",
        "description": "Search query for finding datasets"
      }
    },
    "required": ["query"]
  }
}

Example Usage:

{
  "method": "tools/call",
  "params": {
    "name": "search-datasets", 
    "arguments": {
      "query": "Fashion-MNIST"
    }
  }
}

Response (Sample):

href="#__codelineno-2-1">{ "result": [ { "document": { "collection_name": "Han-Xiao", "entity_name": "Fashion-MNIST", "entity_type": "dataset", "full_name": "dataset/Han-Xiao/Fashion-MNIST", "metadata": { "__type": "sc:Dataset", "conformsTo": "http://mlcommons.org/croissant/1.0", "name": "Fashion-MNIST", "description": "Fashion-MNIST is a dataset of Zalando's article images, consisting of a training set of 60,000 examples and a test set of 10,000 examples...", "url": "https://www.openml.org/search?type=data&id=40996", "license": "Public" } }, "highlight": { "metadata": { "description": { "matched_tokens": ["Fashion-MNIST"], "snippet": "<mark>Fashion-MNIST</mark> is a dataset of Zalando's article images" } } }, "text_match": 1157451471441100800, "text_match_info": { "best_field_score": "2211897868288", "best_field_weight": 15, "fields_matched": 2, "score": "1157451471441100922", "tokens_matched": 1 } } ] }

download-dataset¶

Download a dataset to local storage.

{
  "name": "download-dataset",
  "description": "Download a dataset",
  "inputSchema": {
    "type": "object",
    "properties": {
      "collection": {
        "type": "string",
        "title": "Collection",
        "description": "Dataset collection name"
      },
      "dataset": {
        "type": "string", 
        "title": "Dataset",
        "description": "Dataset identifier"
      }
    },
    "required": ["collection", "dataset"]
  }
}

Example Usage:

{
  "method": "tools/call",
  "params": {
    "name": "download-dataset",
    "arguments": {
      "collection": "Han-Xiao",
      "dataset": "Fashion-MNIST"
    }
  }
}

Response:

{
  "result": {
    "metadata": {
      "url": "https://www.openml.org/search?type=data&id=40996",
      "name": "Fashion-MNIST",
      "@type": "sc:Dataset",
      "sameAs": "https://github.com/zalandoresearch/fashion-mnist",
      "creator": {
        "url": "https://huggingface.co/Han-Xiao",
        "name": "Han-Xiao",
        "@type": "sc:Organization"
      },
      "license": "Public",
      "conformsTo": "http://mlcommons.org/croissant/1.0",
      "description": "Fashion-MNIST is a dataset of Zalando's article images..."
    },
    "asset_origin": "openml",
    "data_path": "Han-Xiao/Fashion-MNIST",
    "instructions": "# Install if necessary\n# !pip install datasets pandas\n\nfrom datasets import load_dataset\nimport pandas as pd\n\n# 1. Load Fashion-MNIST dataset from OpenML\ndataset = load_dataset(\"Han-Xiao/Fashion-MNIST\")\n\n# The dataset has different splits (train, test)\nprint(dataset)\n\n# 2. Take a look at a few examples\nprint(\"\\nFirst few training examples:\")\nprint(dataset[\"train\"].select(range(5)))\n\n# 3. Convert to a pandas DataFrame for easier exploration\ndf_train = pd.DataFrame(dataset[\"train\"])\n\nprint(\"\\nPandas DataFrame Head:\")\nprint(df_train.head())\n\n# 4. Simple exploration\nprint(\"\\nBasic info:\")\nprint(df_train.info())"
  }
}

datasets-preview-url¶

Get a URL for previewing dataset samples.

{
  "name": "datasets-preview-url",
  "description": "Get a download url for a dataset preview",
  "inputSchema": {
    "type": "object",
    "properties": {
      "collection": {
        "type": "string",
        "title": "Collection"
      },
      "dataset": {
        "type": "string",
        "title": "Dataset" 
      }
    },
    "required": ["collection", "dataset"]
  }
}

Example Usage:

{
  "method": "tools/call",
  "params": {
    "name": "datasets-preview-url",
    "arguments": {
      "collection": "Han-Xiao",
      "dataset": "Fashion-MNIST"
    }
  }
}

Response:

{
  "result": "https://dock.jetty.io/api/v1/datasets/Han-Xiao/Fashion-MNIST/preview"
}

serve-croissant¶

Get Croissant metadata for a dataset.

{
  "name": "serve-croissant",
  "description": "Get the Croissant dataset metadata",
  "inputSchema": {
    "type": "object",
    "properties": {
      "collection": {
        "type": "string",
        "title": "Collection"
      },
      "dataset": {
        "type": "string",
        "title": "Dataset"
      }
    },
    "required": ["collection", "dataset"]  
  }
}

Example Usage:

{
  "method": "tools/call",
  "params": {
    "name": "serve-croissant",
    "arguments": {
      "collection": "Han-Xiao",
      "dataset": "Fashion-MNIST"
    }
  }
}

Response (Sample):

{
  "result": {
    "url": "https://www.openml.org/search?type=data&id=40996",
    "name": "Fashion-MNIST",
    "@type": "sc:Dataset",
    "sameAs": "https://github.com/zalandoresearch/fashion-mnist",
    "creator": {
      "url": "https://huggingface.co/Han-Xiao",
      "name": "Han-Xiao",
      "@type": "sc:Organization"
    },
    "license": "Public",
    "@context": {
      "cr": "http://mlcommons.org/croissant/",
      "sc": "https://schema.org/",
      "@vocab": "https://schema.org/"
    },
    "keywords": [
      "image-classification",
      "multi-class-image-classification",
      "expert-generated",
      "English",
      "Public",
      "10K - 100K",
      "Fashion-MNIST"
    ],
    "conformsTo": "http://mlcommons.org/croissant/1.0",
    "description": "Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples...",
    "recordSet": [
      {
        "@id": "fashion_mnist",
        "name": "Fashion-MNIST",
        "@type": "cr:RecordSet",
        "field": [
          {
            "@id": "fashion_mnist/image",
            "name": "fashion_mnist/image",
            "@type": "cr:Field",
            "dataType": "sc:ImageObject",
            "description": "28x28 grayscale image of fashion item."
          },
          {
            "@id": "fashion_mnist/label",
            "name": "fashion_mnist/label", 
            "@type": "cr:Field",
            "dataType": "sc:Integer",
            "description": "Fashion item class label (0-9).\nLabels:\n0: T-shirt/top, 1: Trouser, 2: Pullover, 3: Dress, 4: Coat, 5: Sandal, 6: Shirt, 7: Sneaker, 8: Bag, 9: Ankle boot"
          }
        ]
      }
    ],
    "distribution": [
      {
        "@id": "openml_repo",
        "name": "openml_repo",
        "@type": "cr:FileObject",
        "contentUrl": "https://www.openml.org/search?type=data&id=40996",
        "encodingFormat": "openml+arff"
      }
    ]
  }
}

validate-croissant¶

Validate Croissant metadata for compliance.

{
  "name": "validate-croissant", 
  "description": "Validate a Croissant metadata file",
  "inputSchema": {
    "type": "object",
    "properties": {
      "metadata_json": {
        "type": "object",
        "title": "Metadata Json",
        "additionalProperties": true
      }
    },
    "required": ["metadata_json"]
  }
}

Example Usage:

{
  "method": "tools/call",
  "params": {
    "name": "validate-croissant",
    "arguments": {
      "metadata_json": {
        "conformsTo": "http://mlcommons.org/croissant/1.0",
        "contributor": {
          "name": ["Jock A. Blackard", "Dr. Denis J. Dean", "Dr. Charles W. Anderson"]
        },
        "dateCreated": "2014-04-23T13:14:37",
        "description": "**Covertype**\nPredicting forest cover type from cartographic variables only (no remotely sensed data)...",
        "distribution": [
          {
            "contentUrl": "https://data.openml.org/datasets/0000/0180/dataset_180.pq",
            "description": "Data file belonging to the dataset.",
            "encodingFormat": "application/x-parquet",
            "md5": "c741394174287c04331718c76be0336e",
            "name": "data-file"
          }
        ],
        "inLanguage": "en",
        "isAccessibleForFree": true,
        "keywords": ["Data Science", "Ecology", "Machine Learning", "study_10", "uci"],
        "license": "Public",
        "name": "covertype",
        "recordSet": [
          {
            "data": [
              {"enumerations/class/value": "Aspen"},
              {"enumerations/class/value": "Cottonwood_Willow"},
              {"enumerations/class/value": "Douglas_fir"},
              {"enumerations/class/value": "Krummholz"},
              {"enumerations/class/value": "Lodgepole_Pine"},
              {"enumerations/class/value": "Ponderosa_Pine"},
              {"enumerations/class/value": "Spruce_Fir"}
            ],
            "dataType": "sc:Enumeration",
            "description": "Possible values for class",
            "field": [
              {
                "dataType": "sc:Text",
                "description": "The value of class.",
                "name": "value"
              }
            ],
            "name": "class"
          }
        ],
        "url": "https://www.openml.org/d/180",
        "version": 1
      }
    }
  }
}

Response:

{
  "result": {
    "valid": true,
    "results": [
      {
        "test": "JSON Format Validation",
        "passed": true,
        "message": "The string is valid JSON.",
        "status": "pass"
      },
      {
        "test": "Croissant Schema Validation",
        "passed": true,
        "message": "The dataset passes Croissant validation.",
        "status": "pass"
      },
      {
        "test": "Records Generation Test",
        "passed": true,
        "message": "Record set 'enumerations/class' passed validation.",
        "status": "pass"
      }
    ]
  }
}

Note: The validation tool now successfully validates proper Croissant 1.0 metadata. This example uses the OpenML Covertype dataset with complete metadata structure including distribution, recordSet, and field definitions.

Validation with Incomplete Metadata¶

Some datasets may have schema-compliant but functionally incomplete metadata:

{
  "method": "tools/call",
  "params": {
    "name": "validate-croissant",
    "arguments": {
      "metadata_json": {
        "conformsTo": "http://mlcommons.org/croissant/1.0",
        "name": "Fashion-MNIST",
        "description": "Fashion-MNIST dataset...",
        "recordSet": [
          {
            "name": "data-file-description",
            "description": "The fields are omitted, because this dataset has too many."
          }
        ],
        "distribution": [
          {
            "contentUrl": "https://data.openml.org/datasets/0004/40996/dataset_40996.pq",
            "encodingFormat": "application/x-parquet",
            "name": "data-file"
          }
        ]
      }
    }
  }
}

Response:

{
  "result": {
    "valid": false,
    "results": [
      {
        "test": "JSON Format Validation",
        "passed": true,
        "message": "The string is valid JSON.",
        "status": "pass"
      },
      {
        "test": "Croissant Schema Validation", 
        "passed": true,
        "message": "The dataset passes Croissant validation.",
        "status": "pass"
      },
      {
        "test": "Records Generation Test",
        "passed": false,
        "message": "Record set failed due to generation error: TypeError: object of type 'NoneType' has no len()",
        "status": "warning"
      }
    ]
  }
}

This shows that metadata can be schema-compliant but generation-incomplete due to missing field definitions.

help¶

Get comprehensive help information.

{
  "name": "help",
  "description": "Get help for the Eclair Dataset MCP server",
  "inputSchema": {
    "type": "object",
    "properties": {}
  }
}

Example Usage:

{
  "method": "tools/call",
  "params": {
    "name": "help",
    "arguments": {}
  }
}

Response:

{
  "result": "# Eclair Dataset MCP Server\n\nEclair provides access to a curated collection of datasets for machine learning and data science.\n\n## Available Tools\n\n### search-datasets\nSearch for datasets using natural language queries.\n\nParameters:\n- query (string, required): Search query to find relevant datasets\n\nExample:\n```\nmcp_client.call(\"search-datasets\", {\"query\": \"computer vision\"})\n```\n\n### datasets-preview-url\nGet a preview URL for a specific dataset.\n\nParameters:\n- collection (string, required): Collection name (e.g., \"Han-Xiao\")\n- dataset (string, required): Dataset name (e.g., \"Fashion-MNIST\")\n\nExample:\n```\nmcp_client.call(\"datasets-preview-url\", {\"collection\": \"Han-Xiao\", \"dataset\": \"Fashion-MNIST\"})\n```\n\n### serve-croissant\nRetrieve Croissant metadata for a dataset.\n\nParameters:\n- collection (string, required): Collection name\n- dataset (string, required): Dataset name\n\nExample:\n```\nmcp_client.call(\"serve-croissant\", {\"collection\": \"Han-Xiao\", \"dataset\": \"Fashion-MNIST\"})\n```\n\n### validate-croissant\nValidate Croissant metadata for compliance.\n\nParameters:\n- metadata_json (object, required): Croissant metadata to validate\n\nExample:\n```\nmcp_client.call(\"validate-croissant\", {\"metadata_json\": croissant_metadata})\n```\n\n### download-dataset\nDownload a dataset with metadata and usage instructions.\n\nParameters:\n- collection (string, required): Collection name\n- dataset (string, required): Dataset name\n\nExample:\n```\nmcp_client.call(\"download-dataset\", {\"collection\": \"Han-Xiao\", \"dataset\": \"Fashion-MNIST\"})\n```\n\n### ping\nTest server connectivity.\n\nParameters: None\n\nExample:\n```\nmcp_client.call(\"ping\", {})\n```\n\n## Getting Started\n\n1. Search for datasets: `search-datasets`\n2. Preview dataset: `datasets-preview-url`  \n3. Get metadata: `serve-croissant`\n4. Download dataset: `download-dataset`\n\n## Support\n\nFor more information, visit: https://github.com/jvanscho/eclair"
}

ping¶

Test server connectivity and health.

{
  "name": "ping",
  "description": "Test that the Eclair server is working", 
  "inputSchema": {
    "type": "object",
    "properties": {}
  }
}

Example Usage:

{
  "method": "tools/call",
  "params": {
    "name": "ping",
    "arguments": {}
  }
}

Response:

{
  "content": [
    {
      "type": "text",
      "text": "Pong! Eclair MCP Server is running successfully."
    }
  ]
}