Lucene.Net search for multiple terms accross multiple fields that exist in document












0















Im using the Royal Mail's sample PAF file, this data has been imported to a database and the following fields indexed via my own Lucene indexer console application:



...

var doc = new Document();

doc.Add(new Field("id", item.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.Add(new Field("postcode", item.Postcode, Field.Store.YES, Field.Index.ANALYZED));
doc.Add(new Field("buildingname", item.BuildingName, Field.Store.YES, Field.Index.ANALYZED));

...


What I want to be able to do now, is provide either a partial or full postcode or buildingname, and get matches back, as long as either searched term exists loosely in each documents postcode or buildingname fields . So if the postcode/buildingname was:




TE55 5TT Test Building




If I provided "TE55 Test" I'd expect that to come back.



My search code



var fieldsToAnalyse = new { "postcode", "buildingname" };

var finalQuery = new BooleanQuery();
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, fieldsToAnalyse, _analyzer);

string terms = searchTerm.Split(new { " " }, StringSplitOptions.RemoveEmptyEntries);

foreach (string term in terms)
{
var formattedTerm = term.Replace("~", "");

var formattedTermWildcard = $"+{formattedTerm}~";

finalQuery.Add(parser.Parse(formattedTermWildcard), Occur.MUST);

}

var searcher = new IndexSearcher(_indexDirectory, true);

var hits = searcher.Search(finalQuery,10);

foreach (var hit in hits.ScoreDocs)
{
documents.Add(searcher.Doc(hit.Doc));
}

_analyzer.Close();
searcher.Dispose();
return documents;


Whats actually happening.



value of finalQuery is:




{+(+(postcode:test~0.5 buildingname:test~0.5)) +(+(postcode:te55~0.5
buildingname:te55~0.5))}




I'm getting back addresses that have a postcode which contains "te55" but buildingname is empty. I need it to both have a postcode that contains "te55" and the building name contain the word "test".



Sidenote



If i only provide one search term, i get:




System.IndexOutOfRangeException: 'Index was outside the bounds of the
array.'




Which is also stumping me










share|improve this question





























    0















    Im using the Royal Mail's sample PAF file, this data has been imported to a database and the following fields indexed via my own Lucene indexer console application:



    ...

    var doc = new Document();

    doc.Add(new Field("id", item.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
    doc.Add(new Field("postcode", item.Postcode, Field.Store.YES, Field.Index.ANALYZED));
    doc.Add(new Field("buildingname", item.BuildingName, Field.Store.YES, Field.Index.ANALYZED));

    ...


    What I want to be able to do now, is provide either a partial or full postcode or buildingname, and get matches back, as long as either searched term exists loosely in each documents postcode or buildingname fields . So if the postcode/buildingname was:




    TE55 5TT Test Building




    If I provided "TE55 Test" I'd expect that to come back.



    My search code



    var fieldsToAnalyse = new { "postcode", "buildingname" };

    var finalQuery = new BooleanQuery();
    var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, fieldsToAnalyse, _analyzer);

    string terms = searchTerm.Split(new { " " }, StringSplitOptions.RemoveEmptyEntries);

    foreach (string term in terms)
    {
    var formattedTerm = term.Replace("~", "");

    var formattedTermWildcard = $"+{formattedTerm}~";

    finalQuery.Add(parser.Parse(formattedTermWildcard), Occur.MUST);

    }

    var searcher = new IndexSearcher(_indexDirectory, true);

    var hits = searcher.Search(finalQuery,10);

    foreach (var hit in hits.ScoreDocs)
    {
    documents.Add(searcher.Doc(hit.Doc));
    }

    _analyzer.Close();
    searcher.Dispose();
    return documents;


    Whats actually happening.



    value of finalQuery is:




    {+(+(postcode:test~0.5 buildingname:test~0.5)) +(+(postcode:te55~0.5
    buildingname:te55~0.5))}




    I'm getting back addresses that have a postcode which contains "te55" but buildingname is empty. I need it to both have a postcode that contains "te55" and the building name contain the word "test".



    Sidenote



    If i only provide one search term, i get:




    System.IndexOutOfRangeException: 'Index was outside the bounds of the
    array.'




    Which is also stumping me










    share|improve this question



























      0












      0








      0








      Im using the Royal Mail's sample PAF file, this data has been imported to a database and the following fields indexed via my own Lucene indexer console application:



      ...

      var doc = new Document();

      doc.Add(new Field("id", item.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
      doc.Add(new Field("postcode", item.Postcode, Field.Store.YES, Field.Index.ANALYZED));
      doc.Add(new Field("buildingname", item.BuildingName, Field.Store.YES, Field.Index.ANALYZED));

      ...


      What I want to be able to do now, is provide either a partial or full postcode or buildingname, and get matches back, as long as either searched term exists loosely in each documents postcode or buildingname fields . So if the postcode/buildingname was:




      TE55 5TT Test Building




      If I provided "TE55 Test" I'd expect that to come back.



      My search code



      var fieldsToAnalyse = new { "postcode", "buildingname" };

      var finalQuery = new BooleanQuery();
      var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, fieldsToAnalyse, _analyzer);

      string terms = searchTerm.Split(new { " " }, StringSplitOptions.RemoveEmptyEntries);

      foreach (string term in terms)
      {
      var formattedTerm = term.Replace("~", "");

      var formattedTermWildcard = $"+{formattedTerm}~";

      finalQuery.Add(parser.Parse(formattedTermWildcard), Occur.MUST);

      }

      var searcher = new IndexSearcher(_indexDirectory, true);

      var hits = searcher.Search(finalQuery,10);

      foreach (var hit in hits.ScoreDocs)
      {
      documents.Add(searcher.Doc(hit.Doc));
      }

      _analyzer.Close();
      searcher.Dispose();
      return documents;


      Whats actually happening.



      value of finalQuery is:




      {+(+(postcode:test~0.5 buildingname:test~0.5)) +(+(postcode:te55~0.5
      buildingname:te55~0.5))}




      I'm getting back addresses that have a postcode which contains "te55" but buildingname is empty. I need it to both have a postcode that contains "te55" and the building name contain the word "test".



      Sidenote



      If i only provide one search term, i get:




      System.IndexOutOfRangeException: 'Index was outside the bounds of the
      array.'




      Which is also stumping me










      share|improve this question
















      Im using the Royal Mail's sample PAF file, this data has been imported to a database and the following fields indexed via my own Lucene indexer console application:



      ...

      var doc = new Document();

      doc.Add(new Field("id", item.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
      doc.Add(new Field("postcode", item.Postcode, Field.Store.YES, Field.Index.ANALYZED));
      doc.Add(new Field("buildingname", item.BuildingName, Field.Store.YES, Field.Index.ANALYZED));

      ...


      What I want to be able to do now, is provide either a partial or full postcode or buildingname, and get matches back, as long as either searched term exists loosely in each documents postcode or buildingname fields . So if the postcode/buildingname was:




      TE55 5TT Test Building




      If I provided "TE55 Test" I'd expect that to come back.



      My search code



      var fieldsToAnalyse = new { "postcode", "buildingname" };

      var finalQuery = new BooleanQuery();
      var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_29, fieldsToAnalyse, _analyzer);

      string terms = searchTerm.Split(new { " " }, StringSplitOptions.RemoveEmptyEntries);

      foreach (string term in terms)
      {
      var formattedTerm = term.Replace("~", "");

      var formattedTermWildcard = $"+{formattedTerm}~";

      finalQuery.Add(parser.Parse(formattedTermWildcard), Occur.MUST);

      }

      var searcher = new IndexSearcher(_indexDirectory, true);

      var hits = searcher.Search(finalQuery,10);

      foreach (var hit in hits.ScoreDocs)
      {
      documents.Add(searcher.Doc(hit.Doc));
      }

      _analyzer.Close();
      searcher.Dispose();
      return documents;


      Whats actually happening.



      value of finalQuery is:




      {+(+(postcode:test~0.5 buildingname:test~0.5)) +(+(postcode:te55~0.5
      buildingname:te55~0.5))}




      I'm getting back addresses that have a postcode which contains "te55" but buildingname is empty. I need it to both have a postcode that contains "te55" and the building name contain the word "test".



      Sidenote



      If i only provide one search term, i get:




      System.IndexOutOfRangeException: 'Index was outside the bounds of the
      array.'




      Which is also stumping me







      c# lucene






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 18 at 14:13







      JsonStatham

















      asked Jan 18 at 12:35









      JsonStathamJsonStatham

      4,3591872136




      4,3591872136
























          1 Answer
          1






          active

          oldest

          votes


















          0














          I would recommend to create queries programmatically (not through parsing) and also from the string version i could see that your clauses are both should (no signs around them).



          As a reminder - Lucene boolean syntax is following:



          + must clause
          <empty> should clause
          - not clause


          In your case you have



          postcode:te55~0.5 buildingname:te55~0.5


          which is requesting to match at least one, but not forcing both.



          You need to have query like this:



          +postcode:te55~0.5 +buildingname:te55~0.5


          The underlying problem in MultiFieldQueryParser is that by default it's making a should clauses. You need to setDefaultOperator(AND_OPERATOR) before, so you will get desired behaviour.



          Some information related from Lucene.Net 3.0.3 - https://lucenenet.apache.org/docs/3.0.3/d6/d0b/class_lucene_1_1_net_1_1_query_parsers_1_1_multi_field_query_parser.html






          share|improve this answer


























          • You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?

            – JsonStatham
            Jan 18 at 16:59











          • @JsonStatham i've updated the answer

            – Mysterion
            yesterday











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54254168%2flucene-net-search-for-multiple-terms-accross-multiple-fields-that-exist-in-docum%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          I would recommend to create queries programmatically (not through parsing) and also from the string version i could see that your clauses are both should (no signs around them).



          As a reminder - Lucene boolean syntax is following:



          + must clause
          <empty> should clause
          - not clause


          In your case you have



          postcode:te55~0.5 buildingname:te55~0.5


          which is requesting to match at least one, but not forcing both.



          You need to have query like this:



          +postcode:te55~0.5 +buildingname:te55~0.5


          The underlying problem in MultiFieldQueryParser is that by default it's making a should clauses. You need to setDefaultOperator(AND_OPERATOR) before, so you will get desired behaviour.



          Some information related from Lucene.Net 3.0.3 - https://lucenenet.apache.org/docs/3.0.3/d6/d0b/class_lucene_1_1_net_1_1_query_parsers_1_1_multi_field_query_parser.html






          share|improve this answer


























          • You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?

            – JsonStatham
            Jan 18 at 16:59











          • @JsonStatham i've updated the answer

            – Mysterion
            yesterday
















          0














          I would recommend to create queries programmatically (not through parsing) and also from the string version i could see that your clauses are both should (no signs around them).



          As a reminder - Lucene boolean syntax is following:



          + must clause
          <empty> should clause
          - not clause


          In your case you have



          postcode:te55~0.5 buildingname:te55~0.5


          which is requesting to match at least one, but not forcing both.



          You need to have query like this:



          +postcode:te55~0.5 +buildingname:te55~0.5


          The underlying problem in MultiFieldQueryParser is that by default it's making a should clauses. You need to setDefaultOperator(AND_OPERATOR) before, so you will get desired behaviour.



          Some information related from Lucene.Net 3.0.3 - https://lucenenet.apache.org/docs/3.0.3/d6/d0b/class_lucene_1_1_net_1_1_query_parsers_1_1_multi_field_query_parser.html






          share|improve this answer


























          • You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?

            – JsonStatham
            Jan 18 at 16:59











          • @JsonStatham i've updated the answer

            – Mysterion
            yesterday














          0












          0








          0







          I would recommend to create queries programmatically (not through parsing) and also from the string version i could see that your clauses are both should (no signs around them).



          As a reminder - Lucene boolean syntax is following:



          + must clause
          <empty> should clause
          - not clause


          In your case you have



          postcode:te55~0.5 buildingname:te55~0.5


          which is requesting to match at least one, but not forcing both.



          You need to have query like this:



          +postcode:te55~0.5 +buildingname:te55~0.5


          The underlying problem in MultiFieldQueryParser is that by default it's making a should clauses. You need to setDefaultOperator(AND_OPERATOR) before, so you will get desired behaviour.



          Some information related from Lucene.Net 3.0.3 - https://lucenenet.apache.org/docs/3.0.3/d6/d0b/class_lucene_1_1_net_1_1_query_parsers_1_1_multi_field_query_parser.html






          share|improve this answer















          I would recommend to create queries programmatically (not through parsing) and also from the string version i could see that your clauses are both should (no signs around them).



          As a reminder - Lucene boolean syntax is following:



          + must clause
          <empty> should clause
          - not clause


          In your case you have



          postcode:te55~0.5 buildingname:te55~0.5


          which is requesting to match at least one, but not forcing both.



          You need to have query like this:



          +postcode:te55~0.5 +buildingname:te55~0.5


          The underlying problem in MultiFieldQueryParser is that by default it's making a should clauses. You need to setDefaultOperator(AND_OPERATOR) before, so you will get desired behaviour.



          Some information related from Lucene.Net 3.0.3 - https://lucenenet.apache.org/docs/3.0.3/d6/d0b/class_lucene_1_1_net_1_1_query_parsers_1_1_multi_field_query_parser.html







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited yesterday

























          answered Jan 18 at 15:12









          MysterionMysterion

          6,40021942




          6,40021942













          • You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?

            – JsonStatham
            Jan 18 at 16:59











          • @JsonStatham i've updated the answer

            – Mysterion
            yesterday



















          • You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?

            – JsonStatham
            Jan 18 at 16:59











          • @JsonStatham i've updated the answer

            – Mysterion
            yesterday

















          You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?

          – JsonStatham
          Jan 18 at 16:59





          You can see I've tried to do that "var formattedTermWildcard = $"+{formattedTerm}~";" but its not working. What is the syntax if what i have tried is not working out?

          – JsonStatham
          Jan 18 at 16:59













          @JsonStatham i've updated the answer

          – Mysterion
          yesterday





          @JsonStatham i've updated the answer

          – Mysterion
          yesterday


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54254168%2flucene-net-search-for-multiple-terms-accross-multiple-fields-that-exist-in-docum%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Liquibase includeAll doesn't find base path

          How to use setInterval in EJS file?

          Petrus Granier-Deferre