API Documentation

Our API provides programmatic access to datasets. You can use it to query and download data directly from your applications or analysis tools.

Please log in to view your API key.

Authentication

Include your API key in all requests using one of these methods:

Header Authentication (recommended)

X-API-Key: your_api_key_here

URL Parameter

?api_key=your_api_key_here

Endpoints

GET /api/v1/datasets/{dataset_owner}/{dataset_name}/data

Query and retrieve data from a dataset with optional filtering and pagination.

Path Parameters

  • dataset_owner - The owner of the dataset
  • dataset_name - The name of the dataset

Query Parameters

  • api_key - Your API key (if not using header)
  • limit - Maximum number of rows to return (default: 1000, max: 1000 for subscribers, 20 for non-subscribers). Setting limit to 0 will not apply a limit to the query, but subscription limits still apply.
  • offset - Number of rows to skip (for pagination, only available for subscribers)

Example Request

curl -H "X-API-Key: your_api_key_here" "http://publicdatamarket.com/api/v1/datasets/owner/dataset_name/data?limit=100"

Example Response

{
  "data": [
    {
      "column1_name": "value1",
      "column2_name": "value2",
      "column3_name": 123
      /* ... other columns */
    },
    {
      "column1_name": "another_value1",
      "column2_name": "another_value2",
      "column3_name": 456
      /* ... other columns */
    }
    /* ... more rows */
  ],
  "metadata": {
    "columns": [
      {
        "name": "column1_name",
        "type": "String",
        "description": "Description of column 1"
      },
      {
        "name": "column2_name",
        "type": "String",
        "description": "Description of column 2"
      },
      {
        "name": "column3_name",
        "type": "Int32",
        "description": "Description of column 3"
      }
      /* ... other columns */
    ],
    "total_rows": 1000,
    "returned_rows": 100,
    "is_sample": false,
    "pagination": {
      "limit": 100,
      "offset": 0,
      "next_offset": 100,
      "next_page": "http://publicdatamarket.com/api/v1/datasets/owner/dataset_name/data?limit=100&offset=100"
    }
  }
}

Notes on pagination:

  • When next_offset is present, there are more results available.
  • To fetch the next page, use the value of next_offset as the offset parameter in your next request.
  • The next_page field provides a direct URL to fetch the next page of results.
  • When next_offset is null and next_page is not present, you've reached the end of the dataset.
GET /api/v1/datasets/{dataset_owner}/{dataset_name}/download

Download a dataset in various formats.

Path Parameters

  • dataset_owner - The owner of the dataset
  • dataset_name - The name of the dataset

Query Parameters

  • api_key - Your API key (if not using header)
  • format - Output format (csv, jsonl, excel)
  • limit - Maximum number of rows to return (default: all rows for subscribers, 20 for non-subscribers). Setting limit to 0 explicitly requests all rows within subscription limits.

Example Request

curl -H "X-API-Key: your_api_key_here" "http://publicdatamarket.com/api/v1/datasets/owner/dataset_name/download?format=csv" --output data.csv

Code Examples

Python

import requests
import pandas as pd

# Configuration

API_URL = "http://publicdatamarket.com/api/v1/datasets/owner/dataset_name/data"

API_KEY = "your_api_key_here"

# Basic example - single request
def fetch_single_page():
    response = requests.get(
        API_URL,
        headers={"X-API-Key": API_KEY},
        params={"limit": 1000}
    )

    if response.status_code == 200:
        data = response.json()["data"]
        df = pd.DataFrame(data)
        print(f"Fetched {len(data)} rows")
        print(df.head())
    else:
        print(f"Error: {response.status_code} - {response.text}")

# Pagination example - fetch entire dataset
def fetch_all_data():
    all_data = []
    offset = 0
    limit = 1000  # Maximum allowed per request
    
    while True:
        print(f"Fetching data with offset={offset}, limit={limit}")
        response = requests.get(
            API_URL,
            headers={"X-API-Key": API_KEY},
            params={"limit": limit, "offset": offset}
        )
        
        if response.status_code != 200:
            print(f"Error: {response.status_code} - {response.text}")
            break
            
        result = response.json()
        data = result["data"]
        all_data.extend(data)
        
        print(f"Retrieved {len(data)} records. Total so far: {len(all_data)}")
        
        # Check if there are more pages
        metadata = result.get("metadata", {})
        pagination = metadata.get("pagination", {})
        next_offset = pagination.get("next_offset")
        
        if next_offset is None:
            print("Reached end of dataset")
            break
            
        offset = next_offset
    
    # Create a DataFrame with all the data
    df = pd.DataFrame(all_data)
    print(f"Complete dataset contains {len(df)} rows")
    return df

# Call the function to fetch all data
# df = fetch_all_data()

R

library(httr)
library(jsonlite)

# Configuration

api_url <- "http://publicdatamarket.com/api/v1/datasets/owner/dataset_name/data"

api_key <- "your_api_key_here"

# Make API request
response <- GET(
  api_url,
  add_headers("X-API-Key" = api_key),
  query = list(limit = 1000)
)

# Convert to dataframe
if (status_code(response) == 200) {
  data <- content(response, "text") %>% fromJSON()
  df <- as.data.frame(data$data)
  head(df)
} else {
  cat("Error:", status_code(response), "-", content(response, "text"))
}