@chcaa/strapi-text-search

v0.12.0

Published

4 months ago

Integrate Strapi with Elastic Search

Downloads

0High
0Medium
0Low

donbjarkone

jedglow

pbvahlst

Strapi plugin text-search

Elastic Search for strapi using the text-search api. Index and search collection-types including features such as highlighting, facetting, filters, query-language etc.

Installation

Requires strapi >= 4.1.12

In the root of the strapi project run:

npm install @chcaa/strapi-text-search

Configuration

In the root of the project add the files search-settings.js and search-settings-strapi.js with the following content (everything can also be included inside seach-settings.js if preferred).

search-settings.js (see text-search api. for more details)

module.exports = {
    elasticSearch: {
        connection: {
            url: new URL('http://localhost:9200'), //required
        },
        maxRetries: 3,
        requestTimeout: 60000,
        paths: {
            dataDir: "PATH-TO-ES-DATA-DIR" // required
        },
        actions: {
            copyResourcesToDataDir: true // if set to false you must to do this manually
        },
    },
    strapi: require('./search-settings-strapi')
};

search-settings-strapi.js


const { Search } = require('@chcaa/text-search');

module.exports = {
  projectName: process.env.PROJECT_NAME ?? 'strapi-text-search-dev', // required, should be unique as this is used as prefix for the indexes in ES
  deleteCollectionTypeIndexes: [],
  collectionTypeIndexes: [ // required
    {
      collectionTypeName: '', //  required
      fileToTextFields: [],
      includeExtraFields: [],
      previewAuthorizedUserRoles: [],
      optimizeRelationsInIndex: true,
      forceDropAndReindex: false, // force a reindex of everything, remember to disable again after use
      defaultQueryOptions: {
        find: {},
        findOne: {}
      },
      beforeIndex: (entry) => {
        // make changes to the entry here
      },
      schema: { // define the index schema for this collectionType. See https://www.npmjs.com/package/@chcaa/text-search for details
        language: Search.language.ENGLISH,
        fields: [ // the fields (attributes and relations to index)
          // {name: 'title', type: Search.fieldType.TEXT, sortable: true, highlightFragmentCount: 0, boost: 5, similarity: Search.fieldSimilarity.LENGTH_SMALL_IMPACT},
        ]
      }
    }
  ]
};

Configuration Details

projectName: string [required] - the name of this project. All indexes declared in collectionTypeIndexes will be prefixed with this name. When having multiple projects in the same ES installation the name should be unique for the project.
deleteCollectionTypeIndexes: string[] - an array of indexes to delete. Use this to remove unused indexes.
collectionTypeIndexes: object[] [required] - Mapping of each strapi collection-type which should be indexed for search.
- collectionTypeName: string [required] - the full name of the strapi collection-type e.g. api::movie.movie.
- fileToTextFields: string[] - an array of strapi field names of the type media which should have the content of the file indexed. Each field name must have a corresponding field in schema.fields named [FIELD_NAME].content. So if we add the field manuscriptFile to this array there should be a manuscriptFile.content field mapping in schema.fields.
- includeExtraFields: string[] - an array of strapi field names which should be included in the source object added to the index but which should not be searchable (not mapped in schema.fields).
- previewAuthorizedUserRoles: string[] - an array of strapi user roles which should be able to search and see entries which is in preview mode. For everyone to have access add the Public role.
- optimizeRelationsInIndex: boolean - To improve performance during editing of relational data the id's of the related data is as default added to the index. Set this to false to disable this feature (not recommended).
- forceDropAndReindex: boolean - force reindexing of all data for this index. Remember to disable again after use.
- defaultQueryOptions: object - default options to merge with the options from the client API. Options set in the client API will overwrite options defined here.
  - find: object - default options for find (used by the end-point /../entries/query)
  - findOne: object - default options for findOne (used by the end-point /../entries/query/:id)
- beforeIndex: function(entry) - a function which is called before the entry is indexed in Elastic Search. Changes to the entry can be made here. Return null to delete the entry from the index. (in all cases the database entry remains unchanged).
- schema: object - the schema definition for the fields to index. See text-search for details.
  - language: string - the language to use when indexing entries. This is used for stemming etc.
  - fields: object[] - the fields to index. The field names should be present in the collection-type being indexed, either as an attribute of the collection-type or as a relation. (except for fileToTextFields where only the prefix should be the name of a media attribute, see fileToTextFields above).

A full example of a very simple movie index could look like this:

const { Search } = require('@chcaa/text-search');

module.exports = {
  projectName: process.env.PROJECT_NAME ?? 'strapi-text-search-dev',
  deleteCollectionTypeIndexes: [],
  collectionTypeIndexes: [
    {
      collectionTypeName: 'api::movie.movie',
      fileToTextFields: ['manuscriptFile'],
      includeExtraFields: ['notes'],
      previewAuthorizedUserRoles: ['Author'],
      optimizeRelationsInIndex: true,
      forceDropAndReindex: false,
      defaultQueryOptions: {
        find: {},
        findOne: {}
      },
      beforeIndex: (entry) => undefined,  
      schema: {
        language: Search.language.ENGLISH,
        fields: [
          { name: 'title', type: Search.fieldType.TEXT, sortable: true, highlightFragmentCount: 0, boost: 5, similarity: Search.fieldSimilarity.LENGTH_SMALL_IMPACT },
          { name: 'budget', type: Search.fieldType.INTEGER, sortable: true, highlightFragmentCount: 0, generateFacets: { min: 10_000_000, max: 90_000_000, bucketCount: 8, includeOutOfRange: true } },
          { name: 'runtime', type: Search.fieldType.INTEGER, sortable: true, highlightFragmentCount: 0 },
          { name: 'manuscriptFile.content', type: Search.fieldType.TEXT, highlightFragmentCount: 3 },
        ]
      }
    }
  ]
};

Indexing Data

When configured strapi-text-search automatically indexes on create, update and delete actions performed on entries of the mapped content-types as well as mapped relations. If data exists from before strapi-text-search was configured or is has been disabled use the forceDropAndReindex flag in the settings to get everything in sync (remember to disable forceDropAndReindex afterwards).

Endpoints

Each collection-type will have the following end-points available using the singular version of the collection-type name. (remember so set permissions for each end-point in strapi admin panel to make them available for the intended client users settings -> roles -> [ROLE] -> Text-search).

GET /api/text-search/[NAME]/entries/:id

fetch the source object of the entry with the given id.

Example Request

GET /api/text-search/movie/entries/100004

Example Response

{
  "data": {
    "id": "100004",
    "title": "War and Peace",
    "budget": 72000000,
    "runtime": 98,
    "manuscriptFile": {
      "content": "..."
    }
  }
}

POST /api/text-search/[NAME]/entries/query

Search the index. For all possible options and query language see text-search.

Warning The options authorization and queryMode cannot be set using the client API due to security restrictions.

Example Request

POST /api/text-search/movie/entries/query

{
  "query": "war and peace",
  "options": {
    "pagination": {
      "page": 1,
        "maxResults": 10
      },
      "highlight": {
        "source": true
      },
      "filters": [{
        "fieldName": "budget",
        "operator": "should",
        "range": [{ "from": "100000", "to": "" }]
      }],
      "facets": {
        "filters": [{
          "fieldName": "language.name",
          "value": ["en"]
        }]
      }
  }
}

Example Response

{
  "data": {
    "status": "success",
    "query": {
      "queryString": "war and peace",
      "queryMode": "standardWithFields",
      "tieBreaker": 0,
      "warnings": []
    },
    "pagination": {
      "page": 1,
      "maxResults": 10,
      "total": {
        "value": 661,
        "relation": "eq"
      },
      "nextSearchAfter": [
        175.80144,
        "100595"
      ]
    },
    "results": [
      {
        "id": "100004",
        "score": 175.80144,
        "highlight": {
          "title": {
            "value": [
              "<mark class=\"doc-significant-term\">War and Peace</mark>"
            ],
            "hasHighlight": true,
            "isFullValue": true,
            "isOriginArray": false
          },
          "_source": {
            "id": "100004",
            "title": "<mark class=\"doc-significant-term\">War and Peace</mark>",
            "budget": 72000000,
            "runtime": 98,
            "manuscriptFile": {
              "content": "..."
            }
          }
        }
      }
    ]
  }
}

POST /api/text-search/[NAME]/entries/query/:id

Fetch the entry with the specific id. For all possible options and query language see text-search.

Warning The options authorization and queryMode cannot be set using the client API due to security restrictions.

Example Request

POST /api/text-search/movie/entries/query/100004

{
  "query": "war and peace",
  "options": {
    "highlight": {
      "source": true
    }
  }
}

Example Response

{
  "data": {
    "status": "success",
    "query": {
      "queryString": "war and peace",
      "queryMode": "standardWithFields",
      "tieBreaker": 0,
      "warnings": []
    },
    "result": {
      "id": "100004",
      "score": 175.80144,
      "highlight": {
        "title": {
          "value": ["<mark class=\"doc-significant-term\">War and Peace</mark>"],
          "hasHighlight": true,
          "isFullValue": true,
          "isOriginArray": false
        },
        "_source": {
          "id": "100004",
          "title": "<mark class=\"doc-significant-term\">War and Peace</mark>",
          "budget": 72000000,
          "runtime": 98,
          "manuscriptFile": {
            "content": "..."
          }
        }
      }
    }
  }
}

GET /api/text-search/[NAME]/entries/query/validate

Validate the query string. The response will contain warnings about incorrect syntax which will be escaped during search.

Example Request

GET /api/text-search/movie/entries/query/validate?query=war%20and%20peace~300

Example Response

{
  "data": {
    "status": "warning",
    "queryString": "war and peace~300",
    "warnings": [{
      "level": "warning",
      "type": "operatorTildeIncorrectPositionOrSyntax",
      "index": 9,
      "endIndex": 11,
      "span": 2
    }]
  }
}

POST /api/text-search/[NAME]/entries/query/explain

Explain the query. Gives insight into how the different parts of the query is scored and can be used for e.g. fine-tuning boosts for each field. The same query as used in /../entries/query can be passed in.

Warning: This end-point exposes implementation detail about authorization filtering and the full source-object (if requested), without filtering out internal fields. This information is in itself not harmful, but is probably not suited for the public.

Example Request

POST /api/text-search/movie/entries/query/explain

{
  "maxDepth": 5, // the maximum depth of the explain result, how many details should be included
  "query": "war and peace",
  "options": {
    "pagination": { "page": 1, "maxResults": 10 },
    "filters": [{
      "fieldName": "budget",
      "operator": "should",
      "range": [{ "from": "100000", "to": "" }]
    }],
    "facets": {
      "filters": [{
        "fieldName": "language.name",
        "value": ["en"]
      }]
    }
  }
}

Example Response

{
  "data": {
    "results": [{
      "_shard": "[movie-docs-3][0]",
      "_node": "6ZsvwLi8SsigtNS0HmtwZw",
      "_index": "movie-docs-3",
      "_type": "_doc",
      "_id": "100004",
      "_version": 1,
      "_score": 266.90988,
      "_source": {
        "id": "100004",
        "title": "War and Peace",
        "budget": 72000000,
        "runtime": 98,
        "manuscriptFile": {
          "content": "..."
        },
        "_permissions": {
          "public": true,
          "users": [],
          "groups": []
        }
      },
      "sort": [197.6724, "100004"],
      "_explanation": "197.6724, sum of:\n    89.26548, sum of:\n        89.26548, weight(title.exact:war in 592)..."
    }],
    "query": {
      "queryString": "war and peace",
      "queryStringExpanded": "+(((title.folded:war | title:war | (title.exact:war)^5.0 ..."
    }
  }
}

The _explanation property can be easily presented inside a <pre> tag which will result in something like the following:

Strapi Service Direct Access

The service for each text-search collection-type can be accessed directly bypassing the router and controller. This can be useful for internal use in strapi where going through end-point in som cases maybe is not desired. To get the service in e.g. boostrap.js (or any other place where you have access to the strapi object) do the following.

let movieTextSearchService = strapi.plugin('text-search').service('movie'); // the name of the service is the singular name of the collection-type
let searchResult = await movieTextSearchService.find('War and Peace', { highlight: { source: true }});

Methods

async find(queryString, options) - search the index
async findOne(id, highlightQueryString, options)
async get(id, authorization)
validateQuery(queryString)
async explainQuery(queryString, options, [maxDepth=5])
getFieldsMetaData()
async listSynonyms():string[] - get the currently saved synonyms array
async updateSynonyms(synonyms) - synonyms should be an array of strings where each string is a comma-separated list of synonyms. e.g. ["tall, high", "fast, speedy"]

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Strapi plugin text-search

Installation

Configuration

Configuration Details

Indexing Data

Endpoints

GET /api/text-search/[NAME]/entries/:id

POST /api/text-search/[NAME]/entries/query

POST /api/text-search/[NAME]/entries/query/:id

GET /api/text-search/[NAME]/entries/query/validate

POST /api/text-search/[NAME]/entries/query/explain

Strapi Service Direct Access

Methods