Export (0) Print
Expand All

IFILTER_INIT enumeration

[Indexing Service is no longer supported as of Windows XP and is unavailable for use as of Windows 8. Instead, use Windows Search for client side search and Microsoft Search Server Express for server side search.]

Flags that control the filtering process.

Syntax


typedef enum tagIFILTER_INIT { 
  IFILTER_INIT_CANON_PARAGRAPHS         = 1,
  IFILTER_INIT_HARD_LINE_BREAKS         = 2,
  IFILTER_INIT_CANON_HYPHENS            = 4,
  IFILTER_INIT_CANON_SPACES             = 8,
  IFILTER_INIT_APPLY_INDEX_ATTRIBUTES   = 16,
  IFILTER_INIT_APPLY_OTHER_ATTRIBUTES   = 32,
  IFILTER_INIT_APPLY_CRAWL_ATTRIBUTES   = 256,
  IFILTER_INIT_INDEXING_ONLY            = 64,
  IFILTER_INIT_SEARCH_LINKS             = 128,
  IFILTER_INIT_FILTER_OWNED_VALUE_OK    = 512,
  IFILTER_INIT_FILTER_AGGRESSIVE_BREAK  = 1024,
  IFILTER_INIT_DISABLE_EMBEDDED         = 2048,
  IFILTER_INIT_EMIT_FORMATTING          = 4096
} IFILTER_INIT;

Constants

IFILTER_INIT_CANON_PARAGRAPHS

Paragraph breaks should be marked with the Unicode PARAGRAPH SEPARATOR (0x2029).

IFILTER_INIT_HARD_LINE_BREAKS

Soft returns, such as the newline character in Word, should be replaced by hard returns?LINE SEPARATOR (0x2028). Existing hard returns can be doubled. A carriage return (0x000D), line feed (0x000A), or the carriage return and line feed in combination should be considered a hard return. The intent is to enable pattern-expression matches that match against observed line breaks.

IFILTER_INIT_CANON_HYPHENS

Various word-processing programs have forms of hyphens that are not represented in the host character set, such as optional hyphens (appearing only at the end of a line) and nonbreaking hyphens. This flag indicates that optional hyphens are to be converted to nulls, and non-breaking hyphens are to be converted to normal hyphens (0x2010), or HYPHEN-MINUSES (0x002D).

IFILTER_INIT_CANON_SPACES

Just as the IFILTER_INIT_CANON_HYPHENS flag standardizes hyphens, this one standardizes spaces. All special space characters, such as nonbreaking spaces, are converted to the standard space character (0x0020).

IFILTER_INIT_APPLY_INDEX_ATTRIBUTES

Indicates that the client wants text split into chunks representing internal value-type properties.

IFILTER_INIT_APPLY_OTHER_ATTRIBUTES

Any properties not covered by the IFILTER_INIT_APPLY_INDEX_ATTRIBUTES and IFILTER_INIT_APPLY_CRAWL_ATTRIBUTES flags should be emitted.

IFILTER_INIT_APPLY_CRAWL_ATTRIBUTES

Indicates that the client wants text split into chunks representing properties determined during the indexing process.

IFILTER_INIT_INDEXING_ONLY

Optimizes IFilter for indexing because the client calls the IFilter::Init method only once and does not call IFilter::BindRegion. This eliminates the possibility of accessing a chunk both before and after accessing another chunk.

IFILTER_INIT_SEARCH_LINKS

The text extraction process must recursively search all linked objects within the document. If a link is unavailable, the IFilter::GetChunk call that would have obtained the first chunk of the link should return FILTER_E_LINK_UNAVAILABLE.

IFILTER_INIT_FILTER_OWNED_VALUE_OK

The content indexing process can return property values set by the filter.

IFILTER_INIT_FILTER_AGGRESSIVE_BREAK

TBD

IFILTER_INIT_DISABLE_EMBEDDED

TBD

IFILTER_INIT_EMIT_FORMATTING

TBD

Remarks

Generally, text output by the IFilter::GetText method should match exactly the actual text of the document. However, in order to achieve maximum interoperability, some standardization of common features is desirable. These features include paragraph breaks, line breaks, hyphens and spaces. IFilter interface servers can also embed null characters in text, which are nearly ignored by clients. That is, Unicode character 0x0000 is completely ignored and 0x0001 is treated as a word break.

Four flags control text standardization: IFILTER_INIT_CANON_PARAGRAPHS, IFILTER_INIT_HARD_LINE_BREAKS, IFILTER_INIT_CANON_HYPHENS, and IFILTER_INIT_CANON_SPACES.

Different clients of the IFilter interface want different views of an object. Three flags, IFILTER_INIT_APPLY_INDEX_ATTRIBUTES, IFILTER_INIT_APPLY_CRAWL_ATTRIBUTES, and IFILTER_INIT_APPLY_OTHER_ATTRIBUTES, control the set of properties that should be applied to chunks. In addition, specific properties can be requested in calls to the IFilter::Init method as an array of size cAttributes, stored in aAttributes.

IFilter interface implementations need to store some chunk information when operations other than content indexing occur. IFILTER_INIT_INDEXING_ONLY optimizes the filter for indexing.

For viewing purposes, it can be desirable to search across links as well as in the document and any objects it embeds. IFILTER_INIT_SEARCH_LINKS specifies recursively searching all links.

Certain IFilter interface implementations might generate property values during the content indexing process, and IFILTER_INIT_FILTER_OWNED_VALUE_OK indicates that it is OK to return these values.

Requirements

Minimum supported client

Windows 2000 Professional (desktop apps only)

Minimum supported server

Windows 2000 Server (desktop apps only)

End of client support

Windows 7

End of server support

Windows Server 2008 R2

Header

Filter.h

See also

IFilter

 

 

Build date: 9/10/2012

Community Additions

ADD
Show:
© 2014 Microsoft