Using the REST API

You can access Datamart through a REST API. In addition to the documentation below, you can find a Swagger UI which can be used to try the API.

There is also a Python client library for this API.

The API is versioned, with the current version being v1. The full paths for an API request would therefore be something like:

https://auctus.vida-nyu.org/api/v1/download/datamart.socrata.data-cityofnewyork-us.ht4t-wzcm

POST /search

Queries the DataMart system for datasets.

The Content-Type should be set to multipart/form-data to allow sending both the query description and the data file.

The following keys are accepted in the request body (you need to specify at least one of them):

This endpoint returns a JSON object, according to the query results specification.

POST /download

Downloads a dataset from DataMart.

The Content-Type should be set to multipart/form-data.

The following keys are accepted in the request body:

Additionally, you can use the format query parameter to get the result in a specific format, for example /download?format=d3m:

  • "csv": returns the dataset as a csv file (application/octet-stream); this is the default option

  • "d3m": returns a zip file (application/zip) containing the dataset as a csv file and its corresponding datasetDoc.json file

When using the d3m format, the structure for the ZIP file follows the D3M format:

dataset.zip
+-- datasetDoc.json
+-- tables
    +-- learningData.csv

GET /download/<id>

Downloads a dataset from DataMart, where <id> is the dataset identifier. It also accepts one query parameter, format, as specified above.

POST /augment

Augments a dataset.

The Content-Type should be set to multipart/form-data.

The accepted key/value pairs in the request body are the following:

  • data: path to a D3M dataset OR path to a csv file OR csv file contents

  • task: a JSON object that represents a query result, according to the query results specification

  • columns: a list of column indices from the DataMart dataset that will be added to data (optional)

  • destination: the location in disk where the new data will be saved (optional). Note that DataMart must have access to this path.

This endpoint also accepts the format query parameter, as specified for the download endpoint. However it currently defaults to the d3m format.

POST /upload

Adds a dataset to the index. The file can be provided either via a URL or direct upload.

When providing a URL, make sure it is a direct link to a file in a supported format (CSV, Excel, SPSS, …) and not to an HTML page with a “download” button or GitHub page where the content is embedded (use the “raw” button).

The request will return the ID of the new dataset immediately, but profiling will happen in the background so the file will only appear in searches after a couple minutes:

{"id": "datamart.upload.abcdef1234567890"}

POST /profile

Profile a dataset. Does not add it to the index.

The computed metadata is returned, similar to using the Profiling library directly.

This endpoint expects one variable in the request body, data, the contents of a file to be profiled in a supported file format (e.g. CSV, Excel, SPSS…).

In addition to the profile information, the returned JSON object contains a short string under the key token, which can be used instead of the full data when doing searches (provide it as data_profile).

Embedding Datamart in your software

Rather than using the API and implementing your own UI for data search and augmentation, it is possible to re-use our web frontend, and collect results directly from Datamart into your system without the user downloading it and then adding it in your interface.

This can be done using the following 3 steps (4 steps for augmentation):

(optional) Step 0: Provide your input data if searching for augmentations

If you don’t have input data to provide, skip this step.

Issue a request POST /profile, providing your data, and get the string under the token JSON key.

Step 1: Create a session: POST /session/new

Issue a request POST /session/new, with the following JSON input:

  • data_token: the token obtained from POST /profile, if searching for augmentations. Optional.

  • format: the desired format for datasets, as specified for the download endpoint. Options go in the format_options object. Optional, defaults to csv.

  • system_name: the name of your system. Optional, defaults to “TA3”. Will be shown on butttons (e.g. “Add to <system_name>”, “Join and add to <system_name>”).

The result is a JSON object containing the following:

  • session_id: a short string identifying the session. Use this later to retrieve results.

  • link_url: a link to our interface that you can present the user (or embed, etc)

Step 2: Direct the user to Datamart

Direct the user to the link_url obtained at step 1. Wait for them to be done to move to step 3, or poll step 3 regularly.

The user will be able to use our interface like normal, including using filters and related searches. The download buttons are replaced by “Add to <system_name>” buttons.

Step 3: Obtain the selected data from Datamart: GET /session/<id>

Issue a request to GET /session/<session_id>, where <session_id> is the short string you obtained in step 1.

The result is an array of JSON objects, under a top-level key results. Each object has a url key, at which you can find the data that the user selected (in the format you selected at step 1), and a type key, whose value is either "download" (the result is a dataset from Datamart) or "join"/"union" (the result is the input data augmented with the dataset from Datamart).