Language Analyzers¶
Language-specific analyzers provide a convenient way to create indexes tailored to a particular language. Each language analyzer has built-in stopwords and word divisions based on that language's usage patterns.
Atlas Search offers the following language analyzers:
lucene.arabic | lucene.armenian | lucene.basque | lucene.bengali |
lucene.brazilian | lucene.bulgarian | lucene.catalan | lucene.chinese |
lucene.cjk 1 | lucene.czech | lucene.danish | lucene.dutch |
lucene.english | lucene.finnish | lucene.french | lucene.galician |
lucene.german | lucene.greek | lucene.hindi | lucene.hungarian |
lucene.indonesian | lucene.irish | lucene.italian | lucene.japanese |
lucene.korean | lucene.kuromoji 2 | lucene.latvian | lucene.lithuanian |
lucene.morfologik 3 | lucene.nori 4 | lucene.norwegian | lucene.persian |
lucene.portuguese | lucene.romanian | lucene.russian | lucene.smartcn 5 |
lucene.sorani | lucene.spanish | lucene.swedish | lucene.thai |
lucene.turkish | lucene.ukrainian |
1 cjk
is a generic Chinese, Japanese, and Korean analyzer
2 kuromoji
is a Japanese analyzer
3 morfologik
is a Polish analyzer
4 nori
is a Korean analyzer
5 smartcn
is a Chinese analyzer
Example¶
The following example index definition specifies an index on
the sujet
field using the french
analyzer:
{ "mappings": { "fields": { "sujet": { "type": "string", "analyzer": "lucene.french" } } } }
Consider a collection named voitures
with the following documents:
{ "_id": 1, "sujet": "Mieux équiper nos voitures pour comprendre les causes d'un accident." } { "_id": 2, "sujet": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." }
The following query uses the index on the sujet
field:
db.voitures.aggregate([ { $search: { "text": { "query": "pour", "path": "sujet" } } } ])
The above query returns no results when using the french
analyzer,
because pour
is a built-in stop word. Using the standard
analyzer, the same query would return both documents.
The following query searches for the string carburant
in the
sujet
field:
db.voitures.aggregate([ { $search: { "text": { "query": "carburant", "path": "sujet" } } } ])
The above query returns the document with "_id": 2
from the collection.
{ "_id": 2, "sujet": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." }