mardi 7 février 2017

Azure Search: How do I ensure all combinations of gender and plurality are included in my results?

Vote count: 0

I am facing a business requirement for a French website that requires matching masculine/feminine/singular and plural versions of a word. The easiest way to describe this is to show the requirement itself in this question.

Req 1 - search for chien (masculine/singular)

The following words should be included in the search results:

  • chien (masculine/singular)
  • chiens (masculine/plural)
  • chienne (feminine/singular)
  • chiennes (feminine/plural)

When I researched this requirement, I used the Analyze API with "fr.microsoft" analyzer to quickly test the various scenarios.

Request #1

{ "analyzer": "fr.microsoft", "text": "chien" }

Response #1

Request #2

{ "analyzer": "fr.microsoft", "text": "chiens" }

Response #2

  • chien
  • chiens

Request #3

{ "analyzer": "fr.microsoft", "text": "chienne" }

Response #3

  • chien
  • chienner
  • chienne

Request #4

{ "analyzer": "fr.microsoft", "text": "chiennes" }

Response #4

  • chien
  • chienner
  • chiennes

Req 2 - search for lecteur (masculine/singular)

The following words should be included in the search results:

  • lecteur (masculine/singular)
  • lecteurs (masculine/plural)
  • lectrice (feminine/singular)
  • lectrices (feminine/plural)

I again used the Analyze API with "fr.microsoft" analyzer to quickly test the various scenarios.

Request #1

{ "analyzer": "fr.microsoft", "text": "lecteur" }

Response #1

Request #2

{ "analyzer": "fr.microsoft", "text": "chiens" }

Response #2

  • lecteur
  • lecteurs

Request #3

{ "analyzer": "fr.microsoft", "text": "lectrice" }

Response #3

  • lecteur
  • lectrice

Request #4

{ "analyzer": "fr.microsoft", "text": "lectrices" }

Response #4

  • lecteur
  • lectrices

My Impressions and Questions

  • My initial impression is that searching "chiennes" would not match a document containing "chienne" because "chiennes" is only broken down to the following: chien, chienner, chiennes.

  • Is that impression correct? Or will searching "chiennes" still return a document containing "chienne" because the search term "chiennes" gets tokenized to chien,chienner,chiennes while the document itself would have "chienne" tokenized to chien,chienner,chienne, so there would ultimately be a match.

  • Note that this may actually end up being a duplicate of my femme vs femmes S.O. question I posted earlier today: Azure Search: Searching for singular version of a word, but still include plural version in results

Please advise.

Thank you, Andres

asked 44 secs ago

Let's block ads! (Why?)



Azure Search: How do I ensure all combinations of gender and plurality are included in my results?

Aucun commentaire:

Enregistrer un commentaire