Background
MongoDB
is an document(JSON) based database, which is schema-less and mainly stores JSON format files.
Meilisearch
is an open source search engine that has a search-as-you-type
feature(like Algolia), which is written in Rust and is very fast in creating index and getting search results.
When building index, Meilisearch is very fast and needs very small amount of memory, for example, building an index which contains 200M documents, the total size of the JSON(called collection
in MongoDB) file is around 1.1G(not compressed), it only takes around 20 mins, and the index site is around 8.5G.
Meilisearch supports JSON/CSV data format.
We will talk about how to import data from MongoDB to Meilisearch.
Prepare data for Meilisearch(add a id field for Meilisearch)
Meilisearch needs a unique
field called id
, id
should only contain [0-9a-zA-z_]
and should not contain any special characters. If the id
field is not provided, any field that contains id
will be used as the id
field, such as _id
, user_id
etc.
In MongoDB, there is a _id
field. When exported it’s like:
{
"_id":{"$oid":"623a9cdace6b4611493b8525"}
}
This cannot be used in meilisearch, because it contains a special character $
.
We can use the ObjectID
as id
, not directly, we should have a field called _id
and has a string value for that id
, like this:
{
"id": "623a9cdace6b4611493b8525"
}
This is fine for Meilisearch.
Then we can crate a view in MongoDB, eliminate the _id
, and add a field call id
which value is from _id
.
execute this in mongosh shell
var pipeline = [{$addFields: {id:{"$toString": "$_id"}}}, {$project: {_id: 0}}]
db.createView("view_for_export", "sites", pipeline)
sites
is the collection name, then we will have a new view called view_for_export
, then we export the data:
$ mongoexport --port=27017 --db=db1 --collection=view_for_export --out=sites.json --jsonArray
Note
: the data format must be jsonArray
for Meilisearch to import. The jsonArray
format is like this:
[
{},
{},
{}
]
Import data to Meilisearch
before importing data, we should change the payload size limit of Meilisearch.
$ nohup meilisearch --http-addr 0.0.0.0:7700 --master-key="abcd123123" --http-payload-size-limit=100Gb &
Import data
curl \
-X POST 'http://localhost:7700/indexes/sites/documents' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer abcd123123 \
--data-binary @sites.json
View the task status, change the task id 21
to your task id
curl -H 'Authorization: Bearer abcd123123' -X GET 'http://localhost:7700/tasks/21'
{"uid":21,"indexUid":"sites","status":"succeeded","type":"documentAdditionOrUpdate","details":{"receivedDocuments":1938716,"indexedDocuments":1938716},"duration":"PT1189.588458063S","enqueuedAt":"2022-09-22T03:03:48.818566287Z","startedAt":"2022-09-22T03:03:48.846784313Z","finishedAt":"2022-09-22T03:23:38.435242376Z"}
Within 20 minutes, 1938716 documents(1.1G not compressed) were indexed by Meilisearch, the total index size is 8.5GB, less than 1G memory was used during the process.
This is the end of the article.