swish-e/conf/example9.config

# ----- Example 9 - Filtering PDF with "prog" -------
#
#  Please see the swish-e documentation for
#  information on configuration directives.
#  Documentation is included with the swish-e
#  distribution, and also can be found on-line
#  at http://swish-e.org
#
#
#  This example demonstrates how to use swish's
#  "prog" document source feature to filter documents.
#
#  The "prog" document source feature allows
#  an external program to feed documents to
#  swish, one after another.  This allows you
#  to index documents from any source (e.g. web, DBMS)
#  and to filter and adjust the content before swish
#  indexes the content.
#
#  Using the "prog" method to filter documents requires more
#  work to set up than using the "filters" described in
#  example8.config because you must write a program to retrieve
#  the documents and feed them to swish.
#
#  On the otherhand, the "prog" method should be faster than the
#  filter method in example8.config because swish doesn't need to fork
#  itself and run an external program for each document to filter.
#  This can be significant if you are using a perl script as a filter since
#  the perl script must be compiled each time it is run.  This "prog" method
#  avoides that overhead.
#
#  This example uses the example9.pl program.  This program
#  is very similar to the included DirTree.pl program found in
#  the prog-bin directory.  This program simple reads files from the
#  file system, and passes their content onto swish if they are the correct
#  type.  PDF files are converted by the prog-bin/pdf2xml.pm module.
#
#  The PDF info fields (e.g. author)  are placed in xml tags
#  which allows indexing the PDF info as MetaNames.
#  By specifying metanemes you can limit searches by this PDF info.
#
#  For this example, you will need the xpdf package.
#  Type "perldoc pdf2xml" from the prog-bin directory for
#  more information.
#
#  Run this example as:
#
#     swish-e -S prog -c example9.config
# 
#---------------------------------------------------

# Include our site-wide configuration settings:
IncludeConfigFile example4.config


# Define the program to run
IndexDir ./example9.pl


# Pass in the top-level directory to index
# (here we specify the current directory)
SwishProgParameters .


# Swish can index a number of different types of documents.
# .config are text, and .pdf are converted (filtered) to xml:
IndexContents TXT .config
IndexContents XML .pdf


# Since the pdf2xml module generates xml for the PDF info fields and
# for the PDF content, let's use MetaNames
# Instead of specifying each metaname, let's let swish do it automatically.
UndefinedMetaTags auto


# Show what's happening

IndexReport 3


# end of example
1	adcroft	1.1	# ----- Example 9 - Filtering PDF with "prog" -------
2			#
3			# Please see the swish-e documentation for
4			# information on configuration directives.
5			# Documentation is included with the swish-e
6			# distribution, and also can be found on-line
7			# at http://swish-e.org
8			#
9			#
10			# This example demonstrates how to use swish's
11			# "prog" document source feature to filter documents.
12			#
13			# The "prog" document source feature allows
14			# an external program to feed documents to
15			# swish, one after another. This allows you
16			# to index documents from any source (e.g. web, DBMS)
17			# and to filter and adjust the content before swish
18			# indexes the content.
19			#
20			# Using the "prog" method to filter documents requires more
21			# work to set up than using the "filters" described in
22			# example8.config because you must write a program to retrieve
23			# the documents and feed them to swish.
24			#
25			# On the otherhand, the "prog" method should be faster than the
26			# filter method in example8.config because swish doesn't need to fork
27			# itself and run an external program for each document to filter.
28			# This can be significant if you are using a perl script as a filter since
29			# the perl script must be compiled each time it is run. This "prog" method
30			# avoides that overhead.
31			#
32			# This example uses the example9.pl program. This program
33			# is very similar to the included DirTree.pl program found in
34			# the prog-bin directory. This program simple reads files from the
35			# file system, and passes their content onto swish if they are the correct
36			# type. PDF files are converted by the prog-bin/pdf2xml.pm module.
37			#
38			# The PDF info fields (e.g. author) are placed in xml tags
39			# which allows indexing the PDF info as MetaNames.
40			# By specifying metanemes you can limit searches by this PDF info.
41			#
42			# For this example, you will need the xpdf package.
43			# Type "perldoc pdf2xml" from the prog-bin directory for
44			# more information.
45			#
46			# Run this example as:
47			#
48			# swish-e -S prog -c example9.config
49			#
50			#---------------------------------------------------
51
52			# Include our site-wide configuration settings:
53			IncludeConfigFile example4.config
54
55
56			# Define the program to run
57			IndexDir ./example9.pl
58
59
60			# Pass in the top-level directory to index
61			# (here we specify the current directory)
62			SwishProgParameters .
63
64
65			# Swish can index a number of different types of documents.
66			# .config are text, and .pdf are converted (filtered) to xml:
67			IndexContents TXT .config
68			IndexContents XML .pdf
69
70
71			# Since the pdf2xml module generates xml for the PDF info fields and
72			# for the PDF content, let's use MetaNames
73			# Instead of specifying each metaname, let's let swish do it automatically.
74			UndefinedMetaTags auto
75
76
77
78			# Show what's happening
79
80			IndexReport 3
81
82
83			# end of example