Language Analyzers¶

Language-specific analyzers provide a convenient way to create indexes tailored to a particular language. Each language analyzer has built-in stopwords and word divisions based on that language's usage patterns.

Atlas Search offers the following language analyzers:


`lucene.arabic`	`lucene.armenian`	`lucene.basque`	`lucene.bengali`
`lucene.brazilian`	`lucene.bulgarian`	`lucene.catalan`	`lucene.chinese`
`lucene.cjk` ¹	`lucene.czech`	`lucene.danish`	`lucene.dutch`
`lucene.english`	`lucene.finnish`	`lucene.french`	`lucene.galician`
`lucene.german`	`lucene.greek`	`lucene.hindi`	`lucene.hungarian`
`lucene.indonesian`	`lucene.irish`	`lucene.italian`	`lucene.japanese`
`lucene.korean`	`lucene.kuromoji` ²	`lucene.latvian`	`lucene.lithuanian`
`lucene.morfologik` ³	`lucene.nori` ⁴	`lucene.norwegian`	`lucene.persian`
`lucene.portuguese`	`lucene.romanian`	`lucene.russian`	`lucene.smartcn` ⁵
`lucene.sorani`	`lucene.spanish`	`lucene.swedish`	`lucene.thai`
`lucene.turkish`	`lucene.ukrainian`

¹ cjk is a generic Chinese, Japanese, and Korean analyzer

² kuromoji is a Japanese analyzer

³ morfologik is a Polish analyzer

⁴ nori is a Korean analyzer

⁵ smartcn is a Chinese analyzer

Example¶

The following example index definition specifies an index on the sujet field using the french analyzer:

{
  "mappings": {
    "fields": {
      "sujet": {
        "type": "string",
        "analyzer": "lucene.french"
      }
    }
  }
}

Consider a collection named voitures with the following documents:

{ "_id": 1, "sujet": "Mieux équiper nos voitures pour comprendre les causes d'un accident." }
{ "_id": 2, "sujet": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." }

The following query uses the index on the sujet field:

db.voitures.aggregate([
  {
     $search: {
       "text": {
         "query": "pour",
         "path": "sujet"
        }
     }
  }
])

The above query returns no results when using the french analyzer, because pour is a built-in stop word. Using the standard analyzer, the same query would return both documents.

The following query searches for the string carburant in the sujet field:

db.voitures.aggregate([
  {
     $search: {
       "text": {
         "query": "carburant",
         "path": "sujet"
        }
     }
  }
])

The above query returns the document with "_id": 2 from the collection.

{ "_id": 2, "sujet": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." }

← Keyword Analyzer Custom Analyzers →