Import Data from MongoDB into Meilisearch

Import Data from MongoDB into Meilisearch

Background

MongoDB is a document-based database that stores data in JSON-like format. It is schema-less and primarily handles JSON documents.

Meilisearch is an open-source search engine with “search-as-you-type” functionality (similar to Algolia). It is written in Rust and excels in fast indexing and search results.

When building an index, Meilisearch is very efficient and uses minimal memory. For example, indexing 200 million documents from a 1.1GB uncompressed JSON collection takes about 20 minutes, resulting in an index size of around 8.5GB.

Meilisearch supports JSON and CSV data formats.

This guide explains how to import data from MongoDB into Meilisearch.

Prepare Data for Meilisearch (Add an ID Field)

Meilisearch requires a unique field called id, which should contain only [0-9a-zA-Z_] and no special characters. If no id field is provided, Meilisearch will use any field containing “id” (e.g., _id, user_id).

In MongoDB, the _id field is an ObjectId, exported as:

{
  "_id": {"$oid": "623a9cdace6b4611493b8525"}
}

This cannot be used directly in Meilisearch due to the $ character. Instead, create a string version of the ObjectId as the id field:

{
  "id": "623a9cdace6b4611493b8525"
}

To achieve this, create a view in MongoDB that removes _id and adds a string id field.

Execute this in the MongoDB shell (mongosh):

var pipeline = [{$addFields: {id: {"$toString": "$_id"}}}, {$project: {_id: 0}}];
db.createView("view_for_export", "sites", pipeline);

Here, sites is the collection name. This creates a view called view_for_export.

Now, export the data:

mongoexport --port=27017 --db=db1 --collection=view_for_export --out=sites.json --jsonArray

Note: The data must be in jsonArray format for Meilisearch import. The format looks like:

[
  {},
  {},
  {}
]

Import Data into Meilisearch

Before importing, adjust Meilisearch’s payload size limit to handle large files.

nohup meilisearch --http-addr 0.0.0.0:7700 --master-key="abcd123123" --http-payload-size-limit=100Gb &

Import the data:

curl \
  -X POST 'http://localhost:7700/indexes/sites/documents' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer abcd123123' \
  --data-binary @sites.json

Check the task status (replace 21 with your actual task UID):

curl \
  -H 'Authorization: Bearer abcd123123' \
  -X GET 'http://localhost:7700/tasks/21'

Example response:

{
  "uid": 21,
  "indexUid": "sites",
  "status": "succeeded",
  "type": "documentAdditionOrUpdate",
  "details": {
    "receivedDocuments": 1938716,
    "indexedDocuments": 1938716
  },
  "duration": "PT1189.588458063S",
  "enqueuedAt": "2022-09-22T03:03:48.818566287Z",
  "startedAt": "2022-09-22T03:03:48.846784313Z",
  "finishedAt": "2022-09-22T03:23:38.435242376Z"
}

In about 20 minutes, 1,938,716 documents (1.1GB uncompressed) were indexed, resulting in an 8.5GB index using less than 1GB of memory during the process.

Additional Tips

  • Ensure MongoDB is running and accessible on the specified port.
  • Install and run Meilisearch before importing (download from Meilisearch releases).
  • Use a secure master key in production; avoid hardcoding it.
  • Verify the index creation by querying Meilisearch after import.
  • For large datasets, monitor system resources (CPU, memory) during indexing.
  • If using CSV, ensure the first column is the id field.

References