Documents API

edit

Documents API

edit

Looking for a guided introduction to documents?
See Indexing Documents.

Did you know you can use the web crawler to index web content?
See App Search web crawler.

Create (index), update, delete, and display documents.

Authentication

edit

For authentication, the Documents endpoint requires...

  1. The name of your Engine: [ENGINE]
  2. A Private API Key: [PRIVATE_API_KEY]
curl -X GET '<ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/[ENGINE]/documents' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [PRIVATE_API_KEY]'

Create or Update Documents

edit

Documents are JSON objects with up to 64 key-value pairs. The key is the field name and the value is the content.

Requests, when successful, return a 200 response code. This indicates that the batch has been received, and indexing will begin.

Documents are indexed asynchronously. There will be a processing delay before they are available to your Engine.

Key points to remember when creating documents:

  • Documents are sent via an array and are independently accepted and indexed, or rejected.
  • A 200 response and an empty errors array denotes a successful index.
  • If no id is provided, a unique id will be generated.
  • A document is created each time content is received without an id - beware duplicates!
  • A document will be updated - not created - if its id already exists within a document.
  • If the Engine has not seen the field before, then it will create a new field of type text.
  • There is a 100 document per request limit; each document must be less than 100kb.
  • An indexing request may not exceed 10mb.

Field name rules

edit

Field names:

  • Must contain a lowercase letter and may only contain lowercase letters, numbers, and underscores.
  • Must not contain whitespace or have a leading underscore.
  • Must not contain more than 64 characters.
  • Must not be any of the following reserved words:

    • id
    • engine_id
    • search_index_id
    • highlight
    • any
    • all
    • none
    • or
    • and
    • not

For schema design principles, consult the API Overview.

POST <ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/{ENGINE_NAME}/documents

Example - A POST request adding three documents to the national-parks-demo Engine.

curl -X POST '<ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/national-parks-demo/documents' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer private-xxxxxxxxxxxxxxxxxxxx' \
-d '[
  {
    "description": "Death Valley is the hottest, lowest, and driest place in the United States. Daytime temperatures have topped 130 °F (54 °C) and it is home to Badwater Basin, the lowest elevation in North America. The park contains canyons, badlands, sand dunes, and mountain ranges, while more than 1000 species of plants grow in this geologic graben. Additional points of interest include salt flats, historic mines, and springs.",
    "nps_link": "https://www.nps.gov/deva/index.htm",
    "states": [
      "California",
      "Nevada"
    ],
    "title": "Death Valley",
    "visitors": "1296283",
    "world_heritage_site": "false",
    "location": "36.24,-116.82",
    "acres": "3373063.14",
    "square_km": "13650.3",
    "date_established": "1994-10-31T06:00:00Z",
    "id": "park_death-valley"
  }
]'

Example Response

[
  {
    "id": "park_death-valley",
    "errors": []
  }
]

Partial Update (PATCH)

edit

Update specific document fields by id and field.

The id is required and new fields cannot be created using PATCH!

PATCH <ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/{ENGINE_NAME}/documents

Example - A PATCH providing partial updates to 3 actual documents and attempting to update 3 erroneous documents.

curl -X PATCH '<ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/national-parks-demo/documents' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer private-xxxxxxxxxxxxxxxxxxxx' \
-d '[
  { "id": "park_yosemite" },
  { "title": "Everglades" },
  { "id": "park_wind-cave", "date_established": "1903-01-09T06:00:00Z" }
]'

Example Response

[
  {
    "id": "park_yosemite",
    "errors": []
  },
  {
    "id": "",
    "errors": [
      "Missing required key 'id'"
    ]
  },
  {
    "id": "park_wind-cave",
    "errors": []
  }
]

Delete Documents

edit

Delete documents by id.

Returns an array of JSON objects indicating the deleted status for each document.

DELETE <ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/{ENGINE_NAME}/documents

Example - Using DELETE to remove a pair of real documents and one example erroneous document.

curl -X DELETE '<ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/national-parks-demo/documents' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer private-xxxxxxxxxxxxxxxxxxxx' \
-d '["park_zion", "park_yosemite", "does_not_exist"]'

Example Response

[
  {
    "id": "park_zion",
    "deleted": true
  },
  {
    "id": "park_yosmite",
    "deleted": true
  },
  {
    "id": "does_not_exist",
    "deleted": false
  }
]

Get Documents

edit

Retrieves one or more documents by id. All field values are returned in string format. To return documents in their raw format, use the Search API.

A null response will appear in the event that a requested document could not be found.

GET <ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/{ENGINE_NAME}/documents

JSON Object

edit

A paginated array of JSON objects representing documents.

Example - Looking for two specific ids using JSON objects within the request body: "park_zion" and "does_not_exist".

curl -X GET '<ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/national-parks-demo/documents' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer private-xxxxxxxxxxxxxxxxxxxx' \
-d '["park_zion", "does_not_exist"]'

Example Response

[
  {
    "description": "Located at the junction of the Colorado Plateau, Great Basin, and Mojave Desert, this park contains sandstone features such as mesas, rock towers, and canyons, including the Virgin River Narrows. The various sandstone formations and the forks of the Virgin River create a wilderness divided into four ecosystems: desert, riparian, woodland, and coniferous forest.",
    "nps_link": "https://www.nps.gov/zion/index.htm",
    "states": [
      "Utah"
    ],
    "title": "Zion",
    "visitors": "4295127",
    "world_heritage_site": "false",
    "location": "37.3,-113.05",
    "acres": "147237.02",
    "square_km": "595.8",
    "date_established": "1919-11-19T06:00:00Z",
    "id": "park_zion"
  },
  null
]

Query Parameters

edit

A parameterized query.

Example - Looking for two specific ids using query parameters: "park_zion" and "does_not_exist".

curl -X GET '<ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/national-parks-demo/documents?ids%5B%5D=park_zion&ids%5B%5D=does_not_exist' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer private-xxxxxxxxxxxxxxxxxxxx'

Example Response

[
  {
    "description": "Located at the junction of the Colorado Plateau, Great Basin, and Mojave Desert, this park contains sandstone features such as mesas, rock towers, and canyons, including the Virgin River Narrows. The various sandstone formations and the forks of the Virgin River create a wilderness divided into four ecosystems: desert, riparian, woodland, and coniferous forest.",
    "nps_link": "https://www.nps.gov/zion/index.htm",
    "states": [
      "Utah"
    ],
    "title": "Zion",
    "visitors": "4295127",
    "world_heritage_site": "false",
    "location": "37.3,-113.05",
    "acres": "147237.02",
    "square_km": "595.8",
    "date_established": "1919-11-19T06:00:00Z",
    "id": "park_zion"
  },
  null
]

List Documents

edit

Lists up to 10,000 documents. All field values are returned in string format. To return documents in their raw format, use the Search API.

GET <ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/{ENGINE_NAME}/documents/list
page (optional)
JSON object containing current and size, where current is the current page number and size is the page size. The maximum for size is 100, and be will truncated if a larger size is requested. The default is the first page of documents with pagination at 100.

You have two options as to how you might send in your parameters:

JSON Object

edit

A paginated array of JSON objects representing documents.

Example - Using JSON objects within the request body to see the second page of results, with a page size of 15 results per page.

curl -X GET '<ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/national-parks-demo/documents/list' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer private-xxxxxxxxxxxxxxxxxxxx' \
-d '{
  "page": {
    "current": 2,
    "size": 15
  }
}'

Example Response

{
  "meta": {
    "page": {
      "current": 2,
      "total_pages": 4,
      "total_results": 59,
      "size": 15
    }
  },
  "results": [
    {
      "description": "Death Valley is the hottest, lowest, and driest place in the United States. Daytime temperatures have topped 130 °F (54 °C) and it is home to Badwater Basin, the lowest elevation in North America. The park contains canyons, badlands, sand dunes, and mountain ranges, while more than 1000 species of plants grow in this geologic graben. Additional points of interest include salt flats, historic mines, and springs.",
      "nps_link": "https://www.nps.gov/deva/index.htm",
      "states": [
        "California",
        "Nevada"
      ],
      "title": "Death Valley",
      "visitors": "1296283",
      "world_heritage_site": "false",
      "location": "36.24,-116.82",
      "acres": "3373063.14",
      "square_km": "13650.3",
      "date_established": "1994-10-31T06:00:00Z",
      "id": "park_death-valley"
    },
    {
      "description": "Centered on Denali, the tallest mountain in North America, Denali is serviced by a single road leading to Wonder Lake. Denali and other peaks of the Alaska Range are covered with long glaciers and boreal forest. Wildlife includes grizzly bears, Dall sheep, caribou, and gray wolves.",
      "nps_link": "https://www.nps.gov/dena/index.htm",
      "states": [
        "Alaska"
      ],
      "title": "Denali",
      "visitors": "587412",
      "world_heritage_site": "false",
      "location": "63.33,-150.5",
      "acres": "4740911.16",
      "square_km": "19185.8",
      "date_established": "1917-02-26T06:00:00Z",
      "id": "park_denali"
    },
    # ... Truncated!
  ]
}

Query Parameters

edit

A parameterized query.

Example - Using Query Parameters to see the second page of results, with a page size of 15 results per page.

curl -X GET '<ENTERPRISE_SEARCH_BASE_URL>/api/as/v1/engines/national-parks-demo/documents/list?page[size]=15&page[current]=2' \
-H 'Authorization: Bearer private-xxxxxxxxxxxxxxxxxxxx'

Example Response

{
  "meta": {
    "page": {
      "current": 2,
      "total_pages": 4,
      "total_results": 59,
      "size": 15
    }
  },
  "results": [
    {
      "description": "Death Valley is the hottest, lowest, and driest place in the United States. Daytime temperatures have topped 130 °F (54 °C) and it is home to Badwater Basin, the lowest elevation in North America. The park contains canyons, badlands, sand dunes, and mountain ranges, while more than 1000 species of plants grow in this geologic graben. Additional points of interest include salt flats, historic mines, and springs.",
      "nps_link": "https://www.nps.gov/deva/index.htm",
      "states": [
        "California",
        "Nevada"
      ],
      "title": "Death Valley",
      "visitors": "1296283",
      "world_heritage_site": "false",
      "location": "36.24,-116.82",
      "acres": "3373063.14",
      "square_km": "13650.3",
      "date_established": "1994-10-31T06:00:00Z",
      "id": "park_death-valley"
    },
    {
      "description": "Centered on Denali, the tallest mountain in North America, Denali is serviced by a single road leading to Wonder Lake. Denali and other peaks of the Alaska Range are covered with long glaciers and boreal forest. Wildlife includes grizzly bears, Dall sheep, caribou, and gray wolves.",
      "nps_link": "https://www.nps.gov/dena/index.htm",
      "states": [
        "Alaska"
      ],
      "title": "Denali",
      "visitors": "587412",
      "world_heritage_site": "false",
      "location": "63.33,-150.5",
      "acres": "4740911.16",
      "square_km": "19185.8",
      "date_established": "1917-02-26T06:00:00Z",
      "id": "park_denali"
    },
    # ... Truncated!
  ]
}

Errors

edit

After making the request, a 200 will be returned assuming the request is not malformed.

To determine the status of the request, the errors array will describe any issues:

// The response code will be 200!
[
    {
        "id": null,
        "errors": [
            "Exceeds maximum allowed document size of 102400 bytes"
        ]
    }
]

Errors by Request Type

edit

Each request type has a different indication that something is amiss.

POST and PATCH
edit

A nested errors array is included as part of each document object...

[
  {
    "id": "123456",
    "errors": []
  }
]

The array may include one or more of the following errors:

Error Message

Solution

"Request exceeds maximum allowed limit of 100 documents"

Only 100 documents may be added per request. If you need to add a batch, consider writing a script that sends documents in sets of 100.

"Exceeds maximum allowed document size of 102400 bytes"

Each document must be less than 102400 bytes or 102.4 kilobytes. The larger the request, the more time it shall take. Consider breaking POST requests into smaller request batches.

"JSON must be an array or hash"

Unstructured JSON data is not permitted. Ensure you have encapsulated your object in an array [] or a hash {}.

To pass an array within query parameters, use multiple parameters with the same name followed by []. For example, the following query string sets the ids parameter to [0, 1, 2]:

?ids[]=0&ids[]=1&ids[]=2

However, you must URL-encode the query string. The following example is the query string from above, encoded:

?ids%5B%5D%3D0%26ids%5B%5D%3D1%26ids%5B%5D%3D2

"Invalid field type: id must not be blank"

Any field type may be blank except for the id. If want an id to be generated for you, you may omit the field. If you include the field, it must contain a value.

"Invalid field type: id must be less than 800 characters"

The id field is limited to 800 characters. That should be plenty!

"Name can only contain lowercase letters, numbers, and underscores"

Any field type may be blank except for the id. If you want an id to be generated for you, you may omit the field. If you include the field, it must contain a value.

"Invalid field value: Value [VALUE] cannot be parsed as a float"

When a field type is set to number, the value must be a single-precision, floating-point value (32 bits).

(If you need to represent a larger number, consider a text field as a workaround.)

"Invalid field value: Value [VALUE] cannot be parsed as a date (RFC 3339)"

When a field type is set to date, the value string must resemble a string formatted to RFC3339. You may omit time, but the date at minimum must be provided: YYYY-MM-DD.

"Invalid field value: Value [VALUE] cannot be parsed as a location"

When a field type is set to geolocation, the value string must be formatted according to lat/long: "37.7894758,-122.3940638"

"Can only update existing fields and [FIELD] is not included in the schema."

A PATCH can not be used to create new fields, only update them. A POST may be used to create new fields.

"Missing required key id"

The document id must be included so that the API knows which document you are trying to use your PATCH against.

DELETE
edit

The deleted field indicates request status: true for success and false for failure.

{
  "id": "does_not_exist",
  "deleted": false
}
GET
edit

A null value will be returned within the response array when a document could not be found:

[
  {
    "description": "Located at the junction of the Colorado Plateau, Great Basin, and Mojave Desert, this park contains sandstone features such as mesas, rock towers, and canyons, including the Virgin River Narrows. The various sandstone formations and the forks of the Virgin River create a wilderness divided into four ecosystems: desert, riparian, woodland, and coniferous forest.",
    "nps_link": "https://www.nps.gov/zion/index.htm",
    "states": [
      "Utah"
    ],
    "title": "Zion",
    "visitors": "4295127",
    "world_heritage_site": "false",
    "location": "37.3,-113.05",
    "acres": "147237.02",
    "square_km": "595.8",
    "date_established": "1919-11-19T06:00:00Z",
    "id": "park_zion"
  },
  null
]

What’s Next?

edit

You should have a solid grip on indexing basics: adding data to your Engine, indexing and then building a schema. You likely already know how to create, adjust, or destroy your Engines, but if not then that is a great next step. Otherwise, Search is where you should venture. Alternatively, if you want to adjust your Schema on account of new Field Types, that would be a valuable information.