Aricie
The DNN Expert for your web project
Aricie Blog

Providers in LuceneSearch: part 1 - standing on the shoulders of giants

Dec 19 2012

Hello all. This blog post has been too long in limbo; fortunately a forum question finally forced me to sit down and write it. I want to give our users an overview of LuceneSearch providers, which are the main extension points for our search module. First, we will look at how providers can build upon ISearchable modules to add information to already existing search data. In the second post we will dive into providers which can either replace or implement from scratch a module search behavior. The third post will talk about providers that aren't linked to a specific module but can provide search results that span multiple datasources.

A bit of context

When a module wants its content to be integrated in DotNetNuke search index, it has to implement the ISearchable interface. The module must then return a list of SearchitemInfo which will be indexed by DotNetNuke

    public interface ISearchable
    {
        SearchItemInfoCollection GetSearchItems(ModuleInfo ModInfo);
    }

The SearchItemInfo class contains the following fields; the most important ones would be string-typed fields such as Title, Description and Content. You can already search for basic information, but some additional fields may be necessary if you need to fine-tune your search.

    [Serializable]
    public class SearchItemInfo
    {
        public int SearchItemId
        public string Title
        public string Description
        public int Author
        public DateTime PubDate
        public int ModuleId
        public string SearchKey
        public string Content
        public string GUID
        public int ImageFileId
        public int HitCount
        public int TabId
    }

Out of the box, the LuceneSearch engine adds lots of information to the search items that will be indexed: page, module and user permissions , page information, module information, etc. Your search options are widened, keeping what the module sent back to the DotNetNuke engine but with additional fields. And yet, you may need very specific fields added for a module. Enter the providers.

Using a provider to extend your search informations

To use a provider as an extension to an already ISearchable module, you need to create a class which inherits from the Aricie.DNN.Modules.LuceneSearch.Business.ILuceneSearchableUpgrade interface in the LuceneSearch assembly. By implementing this interface, the class must define a function with the following signature

	void UpgradeSearchItem(int portalId, ref LuceneSearchItemInfo searchItem);

This method will be called once for each SearchItemInfo initially returned by your ISearchable module. You can then change values, add fields, manipulate the searchItem in any way you need. Let's create a basic example of a provider. We will add a field containing the machine name where the code is executed. We will also change the content and description that are indexed by replacing all mentions of "search" by "lucene"

    public class HTMLProvider : ILuceneSearchableUpgrade
    {
        public void UpgradeSearchItem(int portalId, ref Aricie.DNN.Modules.LuceneSearch.Business.LuceneSearchItemInfo searchItem)
        {
            // let's add the machine name as a field of our html modules search information
            searchItem.AdditionalFields.Add(FieldFactory.Instance.CreateField("ServerName", Environment.MachineName));

            // change the content and description by replacing "search" by "lucene"
            searchItem.Description = searchItem.Description.Replace("search", "lucene");
            searchItem.Content = searchItem.Content.Replace("search", "lucene");
        }
    }

To install the provider, we need to copy the assembly to the bin folder of the website and edit the provider configuration file for the LuceneSearch module; go to the root of the Aricie.LuceneSearch module and edit the Aricie.LuceneSearchResults.config file. This file contains the configuration in xml format for all LuceneSearch providers on the website. In order to configure your provider, you need to add the following entry inside the node

    <luceneproviderconfig xsi:type="LuceneModuleProviderConfig">
        DNN HTML Provider
        true
        Upgrade provider for DNN HTML
        HTMLSampleProvider.HTMLProvider, HTMLSampleProvider
        HTML
        false
    

This done, we can simply test our provider: just edit a HTML module to include the word "search", reindex the content of the website, and search for the word "lucene" in the LuceneSearch module. The HTML module should appear in the results.

Using the Luke tool to read our Lucene index, we can also see that every HTML module has the field "ServerName" we defined in our provider, with the correct value.

It's very important to note that the content of the HTML module is not changed by the provider; only the indexed data is impacted so you can massage it to match the search behavior needed without limit.

Configuring the new fields

In the example, I used the LuceneSearch FieldFactory to retrieve a field to add, but i didn't give any information as to what type of field i wanted. Lucene let's you configure if a field must be stored, analyzed (ie cut into tokens), what weight it should have on the results score, etc. In order to configure all these details, your provider needs to give some information to the LuceneSearch engine, through an interface called ILuceneFieldGlossary. By implementing this interface, a provider can tell the LuceneSearch engine what fields it intends to use, and how they are defined. Let's implement this in our sample provider, by telling LuceneSearch that the ServerName field must not be stored in the Lucene index.

    public class HTMLProvider : ILuceneSearchableUpgrade, ILuceneFieldGlossary
    {
        private static readonly string ServerNameField = "ServerName";

        public void UpgradeSearchItem(int portalId, ref Aricie.DNN.Modules.LuceneSearch.Business.LuceneSearchItemInfo searchItem)
        {
            // let's add the machine name as a field of our html modules search information
            searchItem.AdditionalFields.Add(FieldFactory.Instance.CreateField(ServerNameField, Environment.MachineName));

            // change the content and description by replacing "search" by "lucene"
            searchItem.Description = searchItem.Description.Replace("search", "lucene");
            searchItem.Content = searchItem.Content.Replace("search", "lucene");
        }

        public IList GetFieldDefinitions()
        {
            var FieldsAdded = new List();
            FieldsAdded.Add(new FieldDefinition(ServerNameField, // the name of the field
                                                Lucene.Net.Documents.Field.Store.NO, // we do not need to store the value
                                                Lucene.Net.Documents.Field.Index.NOT_ANALYZED, // it's a name, it must not be tokenized by Lucene
                                                Lucene.Net.Documents.Field.TermVector.NO, // we don't need termvectors
                                                false)); // the field won't be localized
            return FieldsAdded;
        }
    }

After deploying the provider and re-indexing our content, we can see with Luke that the field is not stored anymore, but that the value can still be searched.

Conclusion

This first article on providers in our module LuceneSearch showed you how you can enrich existing search systems in the modules you use by writing providers for them. We also saw how to configure what fields you want to add to the search items your providers are working on.

In the next article we will study how a provider can be used to complete a module that is not searchable, or to replace a module whose search behavior does not match what you need. Until then, don't hesitate to ask any questions you may have regarding this part!

Bloggers
Jesse's blog
 Jesse
 1  9  12/6/2014
Musings without a muse
 samyb
 6  101  1/3/2013
Stephane DNN Blog
 Stéphane TETARD
 1  5  4/23/2012
Categories