dataset-viewer

This repository provides an MCP server that interacts with the Hugging Face Dataset Viewer API, enabling users to browse, analyze, search, and access datasets hosted on the Hugging Face Hub.

14
6

Dataset Viewer MCP Server

This MCP server facilitates interaction with the Hugging Face Dataset Viewer API, enabling users to browse and analyze datasets hosted on the Hugging Face Hub. It uses the dataset:// URI scheme for accessing datasets, supporting configurations, splits, and paginated access. Authentication is handled for private datasets.

Features

Resources

  • Access Hugging Face datasets via dataset:// URIs.
  • Supports dataset configurations and splits.
  • Provides paginated access, authentication, search, filtering, statistics, and analysis.

Tools

The server offers tools like:

  1. validate: Checks dataset accessibility.
  2. get_info: Retrieves dataset details.
  3. get_rows: Fetches paginated dataset content.
  4. get_first_rows: Retrieves the first rows of a dataset split.
  5. get_statistics: Provides dataset split statistics.
  6. search_dataset: Searches for text within a dataset.
  7. filter: Filters rows using SQL-like conditions.
  8. get_parquet: Downloads the dataset in Parquet format.

Installation

Requires Python 3.12+ and uv. Clone the repository, create a virtual environment, activate it, and install in development mode using uv add -e ..

Configuration

Set the HUGGINGFACE_TOKEN environment variable for private dataset access. Integrate with Claude Desktop by adding a configuration block to the claude_desktop_config.json file.

Usage Examples

Examples are provided for validating datasets, getting information, searching, filtering, and retrieving statistics.

Repository

PR
privetin

privetin/dataset-viewer

Created

January 2, 2025

Updated

March 25, 2025

Language

Python

Category

AI