I’ve been encountering clients who have developed various processes to ingest content to their Google Search Appliance which involves creating GSA feed xml. Some of these client’s we’ve been replatforming them to Lucidworks Fusion. Fusion provides components that allow for the ingestion of this content easily but they do require some configuration.
Step 1 – You need a connector
Fusion comes with connectors out of the box. There are two likely datasources which you would like to use: The Local File System or the Push Content (link). The Push Content simply creates an end point which will receive the content and place it into a defined indexing pipeline. For my demo, I used the Local File System.
Step 2 – Creating an indexing pipeline
There is some documentation on how to ingest xml documents on the fusion documentation site (link). I choose this pipeline:
- Tika Transformation
- Return Parsed Content as XML or HTML [X]
- Return Original XML and HTML Instead of Tika XML Output [X]
- Field Mapping (Default)
- Solr Dynamic Field Name Mapping (Default)
- Solr Indexer (Default)