Indexing Data Using a Filter

This topic demonstrates how to create a filter based on our template class, IFilterImpl. Included is example code for two new filters. The first, MyextFilter, handles the fictitious file type .MYEXT, an XML file. The second example, ItemFilter, is completely data driven by the Windows Search property system APIs. Both examples depend on IFilterImpl.

  • How to Create a Filter Using IFilterImpl Template
  • Example: MyextFilter
  • Example: ItemFilter
  • Related Topics

How to Create a Filter Using IFilterImpl Template

The IFilterImpl template is a helper class in the ATL style which takes care of the details of implementing a filter, so you can just concentrate on the filter's task, which is to retrieve properties from the contents of a file. When you derive from the IFilterImpl template, you need only to overload two methods: OnInit() and GetNextChunkValue().

  1. Create an ATL Com Object (via Code Wizard).

  2. Add #include <filterimpl.h>

  3. Derive it from IFilterImpl<yourclassname>.

  4. IFilterImpl implements the IFilter, IPersistFile and IPersistStream interfaces, so you need to update your QueryInterface to reflect that, which you do with ATL by inserting a COM_MAP:

    BEGIN_COM_MAP(CMyextFilter)
        COM_INTERFACE_ENTRY (IPersistFile)
        COM_INTERFACE_ENTRY (IPersistStream)
        COM_INTERFACE_ENTRY (IFilter)
    END_COM_MAP()
    
  5. Add overloaded OnInit() and GetNextChunkValue() methods.

  6. Test your filter COM object as described in err! bad xref: _search_3x_WDS_How_To_Filters_Test [rid not found (_search_3x_WDS_How_To_Filters_Test).].

  7. When your filter COM object is ready, register it as persistent handler for your file extension as described in err! bad xref: _search_3x_WDS_How_To_Filters_Install_Reg [rid not found (_search_3x_WDS_How_To_Filters_Install_Reg).].

Example: MyextFilter

This sample uses the IFilterImpl template class. This filter handles a fictitious file type, .MYEXT, which is an XML file.

// THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF
// ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO
// THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
// PARTICULAR PURPOSE.
//
// Copyright (c) Microsoft Corporation. All rights reserved

// Implementation of CMyextFilter

#include "stdafx.h"
#include "MyextFilter.h"

//
// Called after the IStream stored in m_spStream is valid
// We will validate we can load it into the xml reader
//
HRESULT  CMyextFilter::OnInit()
{
    HRESULT hr = CreateXmlReader(IID_PPV_ARGS(&m_spReader), NULL);
    if (SUCCEEDED(hr))
    {
        hr = m_spReader->SetInput(m_spStream);
    }
    return hr;
}

//
// When GetNextChunkValue() is called, we fill in the ChunkValue by calling 
// SetXXXValue() with the property and value (and other parameters that you want
// Example: chunkValue.SetTextValue(PKEY_ItemName,L"text value here");
// Return FILTER_E_END_OF_CHUNKS when there are no more chunks.
//
HRESULT CMyextFilter::GetNextChunkValue(CChunkValue &chunkValue)
{
    HRESULT hr = S_OK;
    XmlNodeType nodeType;
    
    // clear out the chunkvalue
    chunkValue.Clear();

    // read through the stream
    while (S_OK == (hr = m_spReader->Read(&nodeType))) 
    {
        LPCWSTR pszName = NULL;
        LPCWSTR pszValue = NULL;

        switch (nodeType)
        {
            // if we have an element
        case XmlNodeType_Element:
            hr = m_spReader->GetLocalName(&pszName, NULL);
            if (FAILED(hr))
            {
                return hr;
            }

            // if it is the record
            if (wcscmp(pszName, L"myrecord") == 0)
            {
                // continue
                continue;
            }
            // if it is the title
            else if (wcscmp(pszName, L"mytitle") == 0)
            {
                // get the element text
                hr = GetElementText(&pszValue, NULL);
                if (SUCCEEDED(hr))
                {
                    // return this value chunk
                    chunkValue.SetTextValue(PKEY_Title, pszValue);
                    return S_OK;
                }
            }
            // if it is the my keywords
            else if (wcscmp(pszName, L"mykeywords") == 0)
            {
                // get the element text
                hr = GetElementText(&pszValue, NULL);
                if (SUCCEEDED(hr))
                {
                    CString strValue = pszValue;
                    // indexer wants semicolons as separator between multi-valued strings
                    strValue.Replace(L",", L";"); 
                    
                    // return this value chunk
                    chunkValue.SetTextValue(PKEY_Keywords, strValue.GetString());
                    return S_OK;
                }
            }
            // if it is the my author
            else if (wcscmp(pszName, L"Author") == 0)
            {
                // get the element text
                hr = GetElementText(&pszValue, NULL);
                if (SUCCEEDED(hr))
                {
                    // return this value chunk
                    chunkValue.SetTextValue(PKEY_ItemAuthors, pszValue);
                    return S_OK;
                }
            }
            // if it is the my body
            else if (wcscmp(pszName, L"lastmodified") == 0)
            {
                // get the element text
                hr = GetElementText(&pszValue, NULL);
                if (SUCCEEDED(hr))
                {
                    // this comes as a string 12/1/09, so parse and convert to a FILETIME
                    CString strValue = pszValue;
                    int pos = 0;
                    CString strMonth = strValue.Tokenize(L"/", pos);
                    CString strDay = strValue.Tokenize(L"/", pos);
                    CString strYear = strValue.Tokenize(L"/", pos);

                    SYSTEMTIME systime;
                    ZeroMemory(&systime, sizeof(systime));
                    systime.wYear = _wtoi(strYear.GetString());
                    systime.wMonth = _wtoi(strMonth.GetString());
                    systime.wDay = _wtoi(strDay.GetString());

                    // set most date acquired 
                    chunkValue.SetSystemTimeValue(PKEY_DateModified, systime);
                    return S_OK;
                }
            }
            // if it is the my body
            else if (wcscmp(pszName, L"body") == 0)
            {
                LPCWSTR pszValue = NULL;
                hr = GetElementText(&pszValue, NULL);
                if (SUCCEEDED(hr))
                {
                    // This is the indexable body (it is not stored or retrieved 
                    // but just indexed over) we pass CHUNK_TEXT so that it is 
                    // treated as a stream of text, not a flat property string.
                    chunkValue.SetTextValue(PKEY_Search_Contents, pszValue, CHUNK_TEXT);
                    return S_OK;
                }
            }
            break;
        }
    }
    // Some properties may not be from the document, so we use the m_iEmitState 
    // to iterate through them. Each call will go to the next one.
    switch (m_iEmitState)
    {
    case EMITSTATE_FLAGSTATUS: 
        // we are using this to illustrate a numeric property
        chunkValue.SetIntValue(PKEY_FlagStatus, 1); 
        m_iEmitState++;
        return S_OK;

    case EMITSTATE_ISREAD:
        // we are using this to illustrate a bool property
        chunkValue.SetIntValue(PKEY_IsRead, true); 
        m_iEmitState++;
        return S_OK;
    }

    // if we get to here we are done with this document
    return FILTER_E_END_OF_CHUNKS;
}

// utility method for getting the contents of an element 
// example: <elementtag>test</elementtag> will return [test]
HRESULT CMyextFilter::GetElementText(LPCWSTR *ppszText, UINT *pcwchValue)
{
    XmlNodeType nodeType;
    HRESULT hr = S_OK;
    while (S_OK == (hr = m_spReader->Read(&nodeType))) 
    {
        if (nodeType == XmlNodeType_Text)
        {
            return m_spReader->GetValue(ppszText, pcwchValue);
        }
    }
    return hr;
}   

Example: ItemFilter

This sample uses the IFilterImpl template class. This filter is completely data driven by the Windows Search 3.x property system APIs. To see how it works, you simply create an XML file with .ITEM extension with XMLNS and property name to refer to an existing property in the Vista property schema.

// THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF
// ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO
// THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
// PARTICULAR PURPOSE.
//
// Copyright (c) Microsoft Corporation. All rights reserved

// Implementation of CItemFilter

#include "stdafx.h"
#include "ItemFilter.h"
#include <propsys.h>

//
// Called after the IStream is valid
// We will validate we can load it into the xml reader
//
HRESULT  CItemFilter::OnInit()
{
    // create our XMLLite reader (Requires IE 7 or Vista)
    HRESULT hr = CreateXmlReader(IID_PPV_ARGS(&m_spReader), NULL);
    if (SUCCEEDED(hr))
    {
        hr = m_spReader->SetInput(m_spStream);
    }
    return hr;
}

//
// When GetNextChunkValue() is called, we fill in the ChunkValue by calling 
// SetXXXValue() with the property and value (and other parameters that you want.
// Example:  chunkValue.SetTextValue(PKEY_ItemName,L"text value here");
// Return FILTER_E_END_OF_CHUNKS when there are no more chunks.
//
HRESULT CItemFilter::GetNextChunkValue(CChunkValue &chunkValue)
{
    HRESULT hr = S_OK;
    XmlNodeType nodeType;
    
    // clear out the chunkvalue
    chunkValue.Clear();

    // read through the stream
    while (S_OK == (hr = m_spReader->Read(&nodeType))) 
    {

        switch (nodeType)
        {
            // if we have an element
        case XmlNodeType_Element:
            {
                LPCWSTR pszElementName = NULL;
                LPCWSTR pszNamespaceName = NULL;

                // Get the URI for the namespace
                hr = m_spReader->GetNamespaceUri(&pszNamespaceName, NULL);
                if (FAILED(hr))
                {
                    return hr;
                }

                // get the element name
                hr = m_spReader->GetLocalName(&pszElementName, NULL);
                if (FAILED(hr))
                {
                    return hr;
                }

                if ((pszNamespaceName == NULL) || (pszElementName == NULL) || 
                    (wcscmp(L"item", pszElementName) == 0))
                {
                    // skip this element
                    continue;
                }
                
                CString strNamespace = pszNamespaceName;
                CString strPropertyName;
                strPropertyName.Format(L"%s%s", strNamespace.GetString(), pszElementName);

                CComPtr<IPropertyDescription> spProperty;
                hr = PSGetPropertyDescriptionByName(strPropertyName.GetString(), IID_PPV_ARGS(&spProperty));
                if (FAILED(hr))
                {
                    // must be invalid
                    continue;
                }

                LPCWSTR pszValue = NULL;
                
                // get the element text
                hr = GetElementText(&pszValue, NULL);
                if (FAILED(hr))
                {
                    return hr;
                }

                VARTYPE vt = VT_NULL;
                if (FAILED(spProperty->GetPropertyType(&vt)))
                {
                    // skip this element
                    continue; 
                }

                PROPERTYKEY propkey;
                if (FAILED(spProperty->GetPropertyKey(&propkey)))
                {
                    // skip this element
                    continue;
                }

                switch (vt)
                {
                case VT_LPWSTR:
                    // return this value chunk
                    chunkValue.SetTextValue(propkey, pszValue);
                    return S_OK;
                
                case VT_FILETIME:
                    {
                        // convert from xs:dateTime --> VT_FILETIME
                        CString strValue = pszValue;
                        SYSTEMTIME systime;
                        int wYear = 0;
                        int wMonth = 0;
                        int wDay = 0;
                        int wHour = 0;
                        int wMinute = 0;
                        int wSecond = 0;
                        swscanf_s(pszValue, L"%d-%d-%dT%d:%d:%d", &wYear, &wMonth, &wDay, &wHour, &wMinute, &wSecond);
                        systime.wYear = wYear;
                        systime.wMonth = wMonth;
                        systime.wDay = wDay;
                        systime.wHour = wHour;
                        systime.wMinute = wMinute;
                        systime.wSecond = wSecond;

                        // TODO: parse the -08:00 timezone information!

                        // set most time property
                        chunkValue.SetSystemTimeValue(propkey, systime);
                    }
                    return S_OK;

                case VT_I2:
                case VT_I4:
                    {
                        int iVal = (int)_wtoi64(pszValue);
                        chunkValue.SetIntValue(propkey, iVal);
                    }
                    return S_OK;

                case VT_UI2:
                case VT_UI4:
                    {
                        unsigned int iVal= (unsigned int)_wtoi64(pszValue);
                        chunkValue.SetIntValue(propkey, iVal);
                    }
                    return S_OK;

                case VT_UI8:
                case VT_I8:
                    {
                        __int64 iVal= _wtoi64(pszValue);
                        chunkValue.SetInt64Value(propkey, iVal);
                    }
                    return S_OK;
                // TODO: VT_VECTER and INT arrays need to be handled as well
                // but are out of the scope of this sample

                }
            }
        }
    }

    // If we get to here, we are done with this document
    return FILTER_E_END_OF_CHUNKS;
}

// Utility method for getting the contents of an element 
// Example: <elementtag>test</elementtag> will return [test]
HRESULT CItemFilter::GetElementText(LPCWSTR *ppszText, UINT *pcwchValue)
{
    XmlNodeType nodeType;
    HRESULT hr = S_OK;
    while (S_OK == (hr = m_spReader->Read(&nodeType))) 
    {
        if (nodeType == XmlNodeType_Text)
        {
            return m_spReader->GetValue(ppszText, pcwchValue);
        }
    }
    return hr;
} 
  • Developing Filters for Windows Search
  • err! bad xref: _search_3x_WDS_How_To_Filters_Adv_Issues_Debug [rid not found (_search_3x_WDS_How_To_Filters_Adv_Issues_Debug).]
  • err! bad xref: _search_3x_WDS_How_To_Filters_Test [rid not found (_search_3x_WDS_How_To_Filters_Test).]
  • err! bad xref: _search_3x_WDS_How_To_Filters_Install_Reg [rid not found (_search_3x_WDS_How_To_Filters_Install_Reg).]