Custom sources indexing API reference

edit

Custom sources indexing API reference

edit

This is a technical API reference. Refer to the Custom API sources for a conceptual walkthrough.

In this API reference

edit

Custom source API endpoints and operations

edit

The Custom Source API supports traditional RESTful operations:

Creating or updating a document:

POST /api/ws/v1/sources/[ID]/documents/bulk_create

Deleting a document:

POST /api/ws/v1/sources/[ID]/documents/bulk_destroy

Authenticating requests with the custom source API

edit

Workplace Search APIs support multiple methods of authentication.

For simplicity, the examples from this page use admin auth tokens.

The auth_token is your private API key. The id is used to identify which Custom Source for which documents will be indexed, updated or deleted.

curl -X POST http://localhost:3002/api/ws/v1/sources/[ID]/documents/bulk_create \
-H "Authorization: Bearer [AUTH_TOKEN]" \
-H "Content-Type: application/json" \
-d '
  ...
'

Create a new Custom Source or navigate to the Details area of an existing Custom Source from the Workplace Search administrative dashboard to locate the access token and content source ID.

The auth_token is shared amongst all custom sources. The id value is a unique identifier for each Custom Source.

Schema management and configuration

edit

Every Custom Source has its own unique schema, allowing you to create document repositories that truly represent the nature of the information you want your team to access via Workplace Search. Read the Custom API sources guide for a walkthrough of the process.

The following guidelines may help you create a maintainable and scalable schema:

  1. A Custom Source schema can be configured with up to 64 fields.
  2. Always index new fields as the same type as existing documents.

    • eg. An existing date field should not receive geolocation data.
  3. Arrays are supported, but nested field objects are not supported.
  4. Fields cannot be deleted once they have been created.
  5. Reserved fields can not be created:

    • external_id
    • source
    • content_source_id
    • updated_at
    • last_updated
    • highlight
    • any, all, none, or, and, not
    • engine_id
    • _allow_permissions and _deny_permissions
  6. A field name can only contain lowercase letters, numbers, and underscores.

Schema data types

edit

Custom Source fields can be one of four different types:

text Fields
edit

Text fields are at the heart of search. They are analyzed fields and are used for full-text matching in information retrieval. Any group of characters or text that you want to search over should be text.

Example: A description of an object, the name of a product, the content of a review.

text is the default type for all new fields.

number Fields
edit

number fields represent a single-precision, floating-point value (32 bits): 3.14 or 42.

Number fields enable fine grained sorting, filtering, faceting, and boosting.

(If you need to represent a larger number, consider a text field as a workaround.)

Example: A price, a review score, the number of visitors, or a size.
date Fields
edit

Dates must be in ISO 8601 format, i.e. "2013-02-27T18:09:19Z" or "2013-02-27T17:09:19+01:00".

Example: A product release or publish date, birth date, an air date.
geolocation Fields
edit

Geolocation fields are latitude-longitude pairs, representing locations.

Examples: A store where a product is located; the location of a venue.

Specify a geolocation using any of the following formats:

"location": "41.12,-71.34" 

"location": "drm3btev3e86" 

"location": [ -71.34, 41.12 ] 

"location" : "POINT (-71.34 41.12)" 

Geo-point expressed as a string with the format: "lat,lon"

Geo-point expressed as a geohash

Geo-point expressed as an array with the format: [ lon, lat]

Geo-point expressed as a well-known text POINT with the format: "POINT(lon lat)"

For more details, see Geo-point field type in the Elasticsearch documentation. However, be aware Enterprise Search supports fewer formats than Elasticsearch. Enterprise Search supports only the formats shown above.

Indexing and updating documents

edit

Index new objects into a Custom Source or update existing documents.

Request limits: Maximum 100 documents per request

POST /api/ws/v1/sources/[ID]/documents/bulk_create

id

required

Unique ID for a Custom Source, provided upon creation of a Custom Source.

auth_token

required

Must be included in HTTP authorization headers.

id

optional

ID unique to a document used to identify, modify or delete the record at a later time. If you do not provide an id, a BSON id will be created for you. Learn more about document IDs with the Workplace Search API reference.

last_updated

optional

The date and time the document is last updated. It will default to the current date and time only for the creation of a document, when it’s not provided.

_allow_permissions

optional

Optional for document-level permissions. When a value is set within a document, only users with a matching permission will be able to view it.

_deny_permissions

optional

Optional for document-level permissions. When a value is set within a document, users with the matching permission will be unable to view it. Read the Document permissions for custom sources to learn more.

curl -X POST http://localhost:3002/api/ws/v1/sources/[ID]/documents/bulk_create \
-H "Authorization: Bearer [AUTH_TOKEN]" \
-H "Content-Type: application/json" \
-d '[
  {
    "_allow_permissions": ["permission1"],
    "_deny_permissions": [],
    "id" : 1234,
    "title" : "The Meaning of Time",
    "body" : "Not much. It is a made up thing.",
    "url" : "https://example.com/meaning/of/time",
    "created_at": "2019-06-01T12:00:00+00:00",
    "type": "list"
  },
  {
    "_allow_permissions": [],
    "_deny_permissions": ["permission2"],
    "id" : 1235,
    "title" : "The Meaning of Sleep",
    "body" : "Rest, recharge, and connect to the Ether.",
    "url" : "https://example.com/meaning/of/sleep",
    "created_at": "2019-06-01T12:00:00+00:00",
    "type": "list"
  },
  {
    "_allow_permissions": ["permission1"],
    "_deny_permissions": ["permission2"],
    "id" : 1236,
    "title" : "The Meaning of Life",
    "body" : "Be excellent to each other.",
    "url" : "https://example.com/meaning/of/life",
    "created_at": "2019-06-01T12:00:00+00:00",
    "type": "list"
  }
]'
{
  "results": [
    {
      "id": "1234",
      "errors": []
    },
    {
      "id": "1235",
      "errors": []
    },
    {
      "id": "1236",
      "errors": []
    }
  ]
}

Deleting documents

edit

Deleting documents by ID

edit

Remove documents by ID from a Custom Source.

POST /api/ws/v1/sources/[ID]/documents/bulk_destroy

id

required

Unique ID for a Custom source, provided upon creation of a Custom Source.

auth_token

required

Must be included in HTTP authorization headers.

ids

required

An array of IDs associated to documents to delete.

curl -X POST http://localhost:3002/api/ws/v1/sources/[ID]/documents/bulk_destroy \
-H "Authorization: Bearer [AUTH_TOKEN]" \
-H "Content-Type: application/json" \
-d '[
  [DOCUMENT_ID_1], [DOCUMENT_ID_2]
]'
{
  results: [
    {
      "id":1234,
      "success":true
    },
    {
      "id":1235,
      "success":true
    }
  ]
}

Deleting documents by query

edit

Remove documents by query from a Custom Source.

DELETE /api/ws/v1/sources/[ID]/documents

id

required

Unique ID for a Custom source, provided upon creation of a Custom Source.

auth_token

required

Must be included in HTTP authorization headers.

filters

optional

Query modifiers used to refine a query. A request without filters will delete all documents in the custom source.

field

field

Name of field upon which to apply your filter. Only last_updated is supported now.

from

optional

Inclusive lower bound of the range. Is required if to is not provided. Must be in RFC-3339 format.

to

optional

Exclusive upper bound of the range. Is required if from is not provided. Must be in RFC-3339 format.

curl -X DELETE http://localhost:3002/api/ws/v1/sources/[ID]/documents \
-H "Authorization: Bearer [AUTH_TOKEN]"
-d '
{
  "filters" : {
    "last_updated" : {
      "from": "2020-06-01T12:00:00+00:00"
    }
  }
}
'
{
  "total" : 234,
  "deleted" : 234,
  "failures" : []
}

Understanding document IDs

edit

Each document within a content source must have a unique id. If you do not provide an id, a BSON id will be created for you. Two documents in two separate content sources may have the same id.

You can update existing documents by issuing a POST request to an existing id.

If the id does not exist, a new document is created. It is up to you to maintain the integrity of your id for each document within each Custom API Source.

We recommend that you avoid SHAs or any identifier derived from the content of a document. Any modification of the original data will alter the value, making it difficult to identify the document in the search index. This can lead to record duplication.

Synchronizing document-level permissions for custom sources

edit

Custom sources allow you to define at the document-level which user may or may not access the result as part of the search experience. Two reserved fields (_allow_permissions and _deny_permissions) accept array-type values. Using proper user mapping, you can generate sophisticated document access controls.

Deny permissions take precedence.

Read more in the Document permissions for custom sources guide.