org.apache.solr.handler.extraction

Interface ExtractingParams

    • Field Detail

      • MAP_PREFIX

        static final String MAP_PREFIX
        The param prefix for mapping Tika metadata to Solr fields.

        To map a field, add a name like:

        fmap.title=solr.title
        In this example, the tika "title" metadata value will be added to a Solr field named "solr.title"
        See Also:
        Constant Field Values
      • BOOST_PREFIX

        static final String BOOST_PREFIX
        The boost value for the name of the field. The boost can be specified by a name mapping.

        For example

         map.title=solr.title
         boost.solr.title=2.5
         
        will boost the solr.title field for this document by 2.5
        See Also:
        Constant Field Values
      • LITERALS_OVERRIDE

        static final String LITERALS_OVERRIDE
        Literal field values will by default override other values such as metadata and content. Set this to false to revert to pre-4.0 behaviour
        See Also:
        Constant Field Values
      • CAPTURE_ELEMENTS

        static final String CAPTURE_ELEMENTS
        Capture the specified fields (and everything included below it that isn't capture by some other capture field) separately from the default. This is different then the case of passing in an XPath expression.

        The Capture field is based on the localName returned to the SolrContentHandler by Tika, not to be confused by the mapped field. The field name can then be mapped into the index schema.

        For instance, a Tika document may look like:

          <html>
            ...
            <body>
              <p>some text here.  <div>more text</div></p>
              Some more text
            </body>
         
        By passing in the p tag, you could capture all P tags separately from the rest of the t Thus, in the example, the capture of the P tag would be: "some text here. more text"
        See Also:
        Constant Field Values
      • UNKNOWN_FIELD_PREFIX

        static final String UNKNOWN_FIELD_PREFIX
        Optional. If specified, the prefix will be prepended to all Metadata, such that it would be possible to setup a dynamic field to automatically capture it
        See Also:
        Constant Field Values
      • DEFAULT_FIELD

        static final String DEFAULT_FIELD
        Optional. If specified and the name of a potential field cannot be determined, the default Field specified will be used instead.
        See Also:
        Constant Field Values
      • PASSWORD_MAP_FILE

        static final String PASSWORD_MAP_FILE
        Optional. If specified, loads the file as a source for password lookups for Tika encrypted documents.

        File format is Java properties format with one key=value per line. The key is evaluated as a regex against the file name, and the value is the password The rules are evaluated top-bottom, i.e. the first match will be used If you want a fallback password to be always used, supply a .*=<defaultmypassword> at the end

        See Also:
        Constant Field Values