Elasticsearch query_string wildcard does not consider length

I have some records on Elasticsearch that have the same first letters, such as: word, worda, wordab, wordabc, wordabcd.

I am using query_string with a wildcard:

"query": {

  "bool":{

    "must":[

      {

        "query_string":{

          "query":"word*"

        }

      }

    ]

  }

}

All hits have the same score ("_score" : 1.0), therefore the order is arbitrary. Is it possible to have a score considering how much the word actually matches the term? For instance, word matches the term 100%, worda matches the term 80%, and so on.

asked Jan 17 at 21:28

Mauricio Bertanha

New contributor

add a comment |

I have some records on Elasticsearch that have the same first letters, such as: word, worda, wordab, wordabc, wordabcd.

I am using query_string with a wildcard:

"query": {

  "bool":{

    "must":[

      {

        "query_string":{

          "query":"word*"

        }

      }

    ]

  }

}

asked Jan 17 at 21:28

Mauricio Bertanha

New contributor

add a comment |

I have some records on Elasticsearch that have the same first letters, such as: word, worda, wordab, wordabc, wordabcd.

I am using query_string with a wildcard:

"query": {

  "bool":{

    "must":[

      {

        "query_string":{

          "query":"word*"

        }

      }

    ]

  }

}

asked Jan 17 at 21:28

Mauricio Bertanha

New contributor

I have some records on Elasticsearch that have the same first letters, such as: word, worda, wordab, wordabc, wordabcd.

I am using query_string with a wildcard:

"query": {

  "bool":{

    "must":[

      {

        "query_string":{

          "query":"word*"

        }

      }

    ]

  }

}

elasticsearch

asked Jan 17 at 21:28

Mauricio Bertanha

New contributor

asked Jan 17 at 21:28

Mauricio Bertanha

New contributor

asked Jan 17 at 21:28

Mauricio Bertanha

New contributor

asked Jan 17 at 21:28

Mauricio Bertanha

asked Jan 17 at 21:28

Mauricio Bertanha

New contributor

Mauricio Bertanha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

The reason why you get score 1 for all matched docs is the following - wildcard/prefix query are multi term queries and in order for them to be executed, Elasticsearch needs to do a rewrite (to get actual matched terms)

There are several ways to achieve this, the default one is called constant_score which assigned all constant scores (ones)

There are several different ways to rewrite - some of them will produce non equal scores, but this scoring would be rather rely on TF-IDF distribution of the terms (e.g. how often worda is happening in the matched document and how many documents in whole index contains worda). As a first starting way you could try top_terms_1000, tweaking it later.

Unfortunately, there is no perfect way out-of-the-box to achieve expected behaviour.

One of the possible ways to mimic it is to try adapt Edge NGram tokenizer to produce tokens from the wordabc as following:

w, wo, wor, word, ...

In this case querying could produce more meaningful score. For perfect expected outcome - percent of the match - you would need to create custom query and scoring mechanism

answered 2 days ago

Mysterion

6,30021942

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Mauricio Bertanha is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54244555%2felasticsearch-query-string-wildcard-does-not-consider-length%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

There are several ways to achieve this, the default one is called constant_score which assigned all constant scores (ones)

Unfortunately, there is no perfect way out-of-the-box to achieve expected behaviour.

One of the possible ways to mimic it is to try adapt Edge NGram tokenizer to produce tokens from the wordabc as following:

w, wo, wor, word, ...

In this case querying could produce more meaningful score. For perfect expected outcome - percent of the match - you would need to create custom query and scoring mechanism

answered 2 days ago

Mysterion

6,30021942

add a comment |

There are several ways to achieve this, the default one is called constant_score which assigned all constant scores (ones)

Unfortunately, there is no perfect way out-of-the-box to achieve expected behaviour.

One of the possible ways to mimic it is to try adapt Edge NGram tokenizer to produce tokens from the wordabc as following:

w, wo, wor, word, ...

In this case querying could produce more meaningful score. For perfect expected outcome - percent of the match - you would need to create custom query and scoring mechanism

answered 2 days ago

Mysterion

6,30021942

add a comment |

There are several ways to achieve this, the default one is called constant_score which assigned all constant scores (ones)

Unfortunately, there is no perfect way out-of-the-box to achieve expected behaviour.

One of the possible ways to mimic it is to try adapt Edge NGram tokenizer to produce tokens from the wordabc as following:

w, wo, wor, word, ...

In this case querying could produce more meaningful score. For perfect expected outcome - percent of the match - you would need to create custom query and scoring mechanism

answered 2 days ago

Mysterion

6,30021942

There are several ways to achieve this, the default one is called constant_score which assigned all constant scores (ones)

Unfortunately, there is no perfect way out-of-the-box to achieve expected behaviour.

One of the possible ways to mimic it is to try adapt Edge NGram tokenizer to produce tokens from the wordabc as following:

w, wo, wor, word, ...

In this case querying could produce more meaningful score. For perfect expected outcome - percent of the match - you would need to create custom query and scoring mechanism

answered 2 days ago

Mysterion

6,30021942

answered 2 days ago

Mysterion

6,30021942

answered 2 days ago

Mysterion

6,30021942

answered 2 days ago

Mysterion

6,30021942

add a comment |

Mauricio Bertanha is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Mauricio Bertanha is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Brtdku