Vote count: 0
I am facing a business requirement for a French website that requires matching masculine/feminine/singular and plural versions of a word. The easiest way to describe this is to show the requirement itself in this question.
Req 1 - search for chien (masculine/singular)
The following words should be included in the search results:
- chien (masculine/singular)
- chiens (masculine/plural)
- chienne (feminine/singular)
- chiennes (feminine/plural)
When I researched this requirement, I used the Analyze API with "fr.microsoft" analyzer to quickly test the various scenarios.
Request #1
{ "analyzer": "fr.microsoft", "text": "chien" }
Response #1
Request #2
{ "analyzer": "fr.microsoft", "text": "chiens" }
Response #2
- chien
- chiens
Request #3
{ "analyzer": "fr.microsoft", "text": "chienne" }
Response #3
- chien
- chienner
- chienne
Request #4
{ "analyzer": "fr.microsoft", "text": "chiennes" }
Response #4
- chien
- chienner
- chiennes
Req 2 - search for lecteur (masculine/singular)
The following words should be included in the search results:
- lecteur (masculine/singular)
- lecteurs (masculine/plural)
- lectrice (feminine/singular)
- lectrices (feminine/plural)
I again used the Analyze API with "fr.microsoft" analyzer to quickly test the various scenarios.
Request #1
{ "analyzer": "fr.microsoft", "text": "lecteur" }
Response #1
Request #2
{ "analyzer": "fr.microsoft", "text": "chiens" }
Response #2
- lecteur
- lecteurs
Request #3
{ "analyzer": "fr.microsoft", "text": "lectrice" }
Response #3
- lecteur
- lectrice
Request #4
{ "analyzer": "fr.microsoft", "text": "lectrices" }
Response #4
- lecteur
- lectrices
My Impressions and Questions
-
My initial impression is that searching "chiennes" would not match a document containing "chienne" because "chiennes" is only broken down to the following: chien, chienner, chiennes.
-
Is that impression correct? Or will searching "chiennes" still return a document containing "chienne" because the search term "chiennes" gets tokenized to chien,chienner,chiennes while the document itself would have "chienne" tokenized to chien,chienner,chienne, so there would ultimately be a match.
-
Note that this may actually end up being a duplicate of my femme vs femmes S.O. question I posted earlier today: Azure Search: Searching for singular version of a word, but still include plural version in results
Please advise.
Thank you, Andres
Azure Search: How do I ensure all combinations of gender and plurality are included in my results?
Aucun commentaire:
Enregistrer un commentaire