Lucene.Net search for multiple terms accross multiple fields that exist in document
Im using the Royal Mail's sample PAF file, this data has been imported to a database and the following fields indexed via my own Lucene indexer console application:
...
var doc = new Document();
doc.Add(new Field("id", item.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("postcode", item.Postcode, Field.Store.YES, Field.Index.ANALYZED));
doc.Add(new Field("buildingname", item.BuildingName, Field.Store.YES, Field.Index.ANALYZED));
...
What I want to be able to do now, is provide either a partial or full postcode or buildingname, and get matches back, as long as either searched term exists loosely in each documents postcode or buildingname fields . So if the postcode/buildingname was:
TE55 5TT Test Building
If I provided "TE55 Test" I'd expect that to come back.
My search code
var fieldsToAnalyse = new { "postcode", "buildingname" };
var finalQuery = new BooleanQuery();
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, fieldsToAnalyse, _analyzer);
string terms = searchTerm.Split(new { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string term in terms)
{
var formattedTerm = term.Replace("~", "");
var formattedTermWildcard = $"+{formattedTerm}~";
finalQuery.Add(parser.Parse(formattedTermWildcard), Occur.MUST);
}
var searcher = new IndexSearcher(_indexDirectory, true);
var hits = searcher.Search(finalQuery,10);
foreach (var hit in hits.ScoreDocs)
{
documents.Add(searcher.Doc(hit.Doc));
}
_analyzer.Close();
searcher.Dispose();
return documents;
Whats actually happening.
value of finalQuery
is:
{+(+(postcode:test~0.5 buildingname:test~0.5)) +(+(postcode:te55~0.5
buildingname:te55~0.5))}
I'm getting back addresses that have a postcode which contains "te55" but buildingname
is empty. I need it to both have a postcode that contains "te55" and the building name contain the word "test".
Sidenote
If i only provide one search term, i get:
System.IndexOutOfRangeException: 'Index was outside the bounds of the
array.'
Which is also stumping me
c# lucene
add a comment |
Im using the Royal Mail's sample PAF file, this data has been imported to a database and the following fields indexed via my own Lucene indexer console application:
...
var doc = new Document();
doc.Add(new Field("id", item.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("postcode", item.Postcode, Field.Store.YES, Field.Index.ANALYZED));
doc.Add(new Field("buildingname", item.BuildingName, Field.Store.YES, Field.Index.ANALYZED));
...
What I want to be able to do now, is provide either a partial or full postcode or buildingname, and get matches back, as long as either searched term exists loosely in each documents postcode or buildingname fields . So if the postcode/buildingname was:
TE55 5TT Test Building
If I provided "TE55 Test" I'd expect that to come back.
My search code
var fieldsToAnalyse = new { "postcode", "buildingname" };
var finalQuery = new BooleanQuery();
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, fieldsToAnalyse, _analyzer);
string terms = searchTerm.Split(new { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string term in terms)
{
var formattedTerm = term.Replace("~", "");
var formattedTermWildcard = $"+{formattedTerm}~";
finalQuery.Add(parser.Parse(formattedTermWildcard), Occur.MUST);
}
var searcher = new IndexSearcher(_indexDirectory, true);
var hits = searcher.Search(finalQuery,10);
foreach (var hit in hits.ScoreDocs)
{
documents.Add(searcher.Doc(hit.Doc));
}
_analyzer.Close();
searcher.Dispose();
return documents;
Whats actually happening.
value of finalQuery
is:
{+(+(postcode:test~0.5 buildingname:test~0.5)) +(+(postcode:te55~0.5
buildingname:te55~0.5))}
I'm getting back addresses that have a postcode which contains "te55" but buildingname
is empty. I need it to both have a postcode that contains "te55" and the building name contain the word "test".
Sidenote
If i only provide one search term, i get:
System.IndexOutOfRangeException: 'Index was outside the bounds of the
array.'
Which is also stumping me
c# lucene
add a comment |
Im using the Royal Mail's sample PAF file, this data has been imported to a database and the following fields indexed via my own Lucene indexer console application:
...
var doc = new Document();
doc.Add(new Field("id", item.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("postcode", item.Postcode, Field.Store.YES, Field.Index.ANALYZED));
doc.Add(new Field("buildingname", item.BuildingName, Field.Store.YES, Field.Index.ANALYZED));
...
What I want to be able to do now, is provide either a partial or full postcode or buildingname, and get matches back, as long as either searched term exists loosely in each documents postcode or buildingname fields . So if the postcode/buildingname was:
TE55 5TT Test Building
If I provided "TE55 Test" I'd expect that to come back.
My search code
var fieldsToAnalyse = new { "postcode", "buildingname" };
var finalQuery = new BooleanQuery();
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, fieldsToAnalyse, _analyzer);
string terms = searchTerm.Split(new { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string term in terms)
{
var formattedTerm = term.Replace("~", "");
var formattedTermWildcard = $"+{formattedTerm}~";
finalQuery.Add(parser.Parse(formattedTermWildcard), Occur.MUST);
}
var searcher = new IndexSearcher(_indexDirectory, true);
var hits = searcher.Search(finalQuery,10);
foreach (var hit in hits.ScoreDocs)
{
documents.Add(searcher.Doc(hit.Doc));
}
_analyzer.Close();
searcher.Dispose();
return documents;
Whats actually happening.
value of finalQuery
is:
{+(+(postcode:test~0.5 buildingname:test~0.5)) +(+(postcode:te55~0.5
buildingname:te55~0.5))}
I'm getting back addresses that have a postcode which contains "te55" but buildingname
is empty. I need it to both have a postcode that contains "te55" and the building name contain the word "test".
Sidenote
If i only provide one search term, i get:
System.IndexOutOfRangeException: 'Index was outside the bounds of the
array.'
Which is also stumping me
c# lucene
Im using the Royal Mail's sample PAF file, this data has been imported to a database and the following fields indexed via my own Lucene indexer console application:
...
var doc = new Document();
doc.Add(new Field("id", item.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("postcode", item.Postcode, Field.Store.YES, Field.Index.ANALYZED));
doc.Add(new Field("buildingname", item.BuildingName, Field.Store.YES, Field.Index.ANALYZED));
...
What I want to be able to do now, is provide either a partial or full postcode or buildingname, and get matches back, as long as either searched term exists loosely in each documents postcode or buildingname fields . So if the postcode/buildingname was:
TE55 5TT Test Building
If I provided "TE55 Test" I'd expect that to come back.
My search code
var fieldsToAnalyse = new { "postcode", "buildingname" };
var finalQuery = new BooleanQuery();
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, fieldsToAnalyse, _analyzer);
string terms = searchTerm.Split(new { " " }, StringSplitOptions.RemoveEmptyEntries);
foreach (string term in terms)
{
var formattedTerm = term.Replace("~", "");
var formattedTermWildcard = $"+{formattedTerm}~";
finalQuery.Add(parser.Parse(formattedTermWildcard), Occur.MUST);
}
var searcher = new IndexSearcher(_indexDirectory, true);
var hits = searcher.Search(finalQuery,10);
foreach (var hit in hits.ScoreDocs)
{
documents.Add(searcher.Doc(hit.Doc));
}
_analyzer.Close();
searcher.Dispose();
return documents;
Whats actually happening.
value of finalQuery
is:
{+(+(postcode:test~0.5 buildingname:test~0.5)) +(+(postcode:te55~0.5
buildingname:te55~0.5))}
I'm getting back addresses that have a postcode which contains "te55" but buildingname
is empty. I need it to both have a postcode that contains "te55" and the building name contain the word "test".
Sidenote
If i only provide one search term, i get:
System.IndexOutOfRangeException: 'Index was outside the bounds of the
array.'
Which is also stumping me
c# lucene
c# lucene
edited Jan 18 at 14:13
JsonStatham
asked Jan 18 at 12:35
JsonStathamJsonStatham
4,3591872136
4,3591872136
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
I would recommend to create queries programmatically (not through parsing) and also from the string version i could see that your clauses are both should (no signs around them).
As a reminder - Lucene boolean syntax is following:
+ must clause
<empty> should clause
- not clause
In your case you have
postcode:te55~0.5 buildingname:te55~0.5
which is requesting to match at least one, but not forcing both.
You need to have query like this:
+postcode:te55~0.5 +buildingname:te55~0.5
The underlying problem in MultiFieldQueryParser
is that by default it's making a should clauses. You need to setDefaultOperator(AND_OPERATOR)
before, so you will get desired behaviour.
Some information related from Lucene.Net 3.0.3 - https://lucenenet.apache.org/docs/3.0.3/d6/d0b/class_lucene_1_1_net_1_1_query_parsers_1_1_multi_field_query_parser.html
You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?
– JsonStatham
Jan 18 at 16:59
@JsonStatham i've updated the answer
– Mysterion
yesterday
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54254168%2flucene-net-search-for-multiple-terms-accross-multiple-fields-that-exist-in-docum%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I would recommend to create queries programmatically (not through parsing) and also from the string version i could see that your clauses are both should (no signs around them).
As a reminder - Lucene boolean syntax is following:
+ must clause
<empty> should clause
- not clause
In your case you have
postcode:te55~0.5 buildingname:te55~0.5
which is requesting to match at least one, but not forcing both.
You need to have query like this:
+postcode:te55~0.5 +buildingname:te55~0.5
The underlying problem in MultiFieldQueryParser
is that by default it's making a should clauses. You need to setDefaultOperator(AND_OPERATOR)
before, so you will get desired behaviour.
Some information related from Lucene.Net 3.0.3 - https://lucenenet.apache.org/docs/3.0.3/d6/d0b/class_lucene_1_1_net_1_1_query_parsers_1_1_multi_field_query_parser.html
You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?
– JsonStatham
Jan 18 at 16:59
@JsonStatham i've updated the answer
– Mysterion
yesterday
add a comment |
I would recommend to create queries programmatically (not through parsing) and also from the string version i could see that your clauses are both should (no signs around them).
As a reminder - Lucene boolean syntax is following:
+ must clause
<empty> should clause
- not clause
In your case you have
postcode:te55~0.5 buildingname:te55~0.5
which is requesting to match at least one, but not forcing both.
You need to have query like this:
+postcode:te55~0.5 +buildingname:te55~0.5
The underlying problem in MultiFieldQueryParser
is that by default it's making a should clauses. You need to setDefaultOperator(AND_OPERATOR)
before, so you will get desired behaviour.
Some information related from Lucene.Net 3.0.3 - https://lucenenet.apache.org/docs/3.0.3/d6/d0b/class_lucene_1_1_net_1_1_query_parsers_1_1_multi_field_query_parser.html
You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?
– JsonStatham
Jan 18 at 16:59
@JsonStatham i've updated the answer
– Mysterion
yesterday
add a comment |
I would recommend to create queries programmatically (not through parsing) and also from the string version i could see that your clauses are both should (no signs around them).
As a reminder - Lucene boolean syntax is following:
+ must clause
<empty> should clause
- not clause
In your case you have
postcode:te55~0.5 buildingname:te55~0.5
which is requesting to match at least one, but not forcing both.
You need to have query like this:
+postcode:te55~0.5 +buildingname:te55~0.5
The underlying problem in MultiFieldQueryParser
is that by default it's making a should clauses. You need to setDefaultOperator(AND_OPERATOR)
before, so you will get desired behaviour.
Some information related from Lucene.Net 3.0.3 - https://lucenenet.apache.org/docs/3.0.3/d6/d0b/class_lucene_1_1_net_1_1_query_parsers_1_1_multi_field_query_parser.html
I would recommend to create queries programmatically (not through parsing) and also from the string version i could see that your clauses are both should (no signs around them).
As a reminder - Lucene boolean syntax is following:
+ must clause
<empty> should clause
- not clause
In your case you have
postcode:te55~0.5 buildingname:te55~0.5
which is requesting to match at least one, but not forcing both.
You need to have query like this:
+postcode:te55~0.5 +buildingname:te55~0.5
The underlying problem in MultiFieldQueryParser
is that by default it's making a should clauses. You need to setDefaultOperator(AND_OPERATOR)
before, so you will get desired behaviour.
Some information related from Lucene.Net 3.0.3 - https://lucenenet.apache.org/docs/3.0.3/d6/d0b/class_lucene_1_1_net_1_1_query_parsers_1_1_multi_field_query_parser.html
edited yesterday
answered Jan 18 at 15:12
MysterionMysterion
6,40021942
6,40021942
You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?
– JsonStatham
Jan 18 at 16:59
@JsonStatham i've updated the answer
– Mysterion
yesterday
add a comment |
You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?
– JsonStatham
Jan 18 at 16:59
@JsonStatham i've updated the answer
– Mysterion
yesterday
You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?
– JsonStatham
Jan 18 at 16:59
You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?
– JsonStatham
Jan 18 at 16:59
@JsonStatham i've updated the answer
– Mysterion
yesterday
@JsonStatham i've updated the answer
– Mysterion
yesterday
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54254168%2flucene-net-search-for-multiple-terms-accross-multiple-fields-that-exist-in-docum%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown