1 |
adcroft |
1.1 |
NAME |
2 |
|
|
The Swish-e README File |
3 |
|
|
|
4 |
|
|
What is Swish-e? |
5 |
|
|
Swish-e is Simple Web Indexing System for Humans - Enhanced. Swish-e can |
6 |
|
|
quickly and easily index directories of files or remote web sites and |
7 |
|
|
search the generated indexes. |
8 |
|
|
|
9 |
|
|
Swish-e it extremely fast in both indexing and searching, highly |
10 |
|
|
configurable, and can be seamlessly integrated with existing web sites |
11 |
|
|
to maintain a consistent design. Swish-e can index web pages, but can |
12 |
|
|
just as easily index text files, mailing list archives, or data stored |
13 |
|
|
in a relational database. |
14 |
|
|
|
15 |
|
|
Swish-e version 2.2 represents a major rewrite of the code and the |
16 |
|
|
addition of many new features. Memory requirements for indexing have |
17 |
|
|
been reduced, and indexing speed is significantly improved from previous |
18 |
|
|
versions. New features allow more control over indexing, better document |
19 |
|
|
parsing, improved indexing and searching logic, better filter code, and |
20 |
|
|
the ability to index from any data source. |
21 |
|
|
|
22 |
|
|
Swish-e is not a "turn-key" indexing and searching solution. The Swish-e |
23 |
|
|
distribution contains most of the parts to create such a system, but you |
24 |
|
|
need to put the parts together as best meets your needs. You will need |
25 |
|
|
to configure Swish-e to index your documents, create an index by running |
26 |
|
|
Swish-e, and setup an interface such as a CGI script (a script is |
27 |
|
|
included). Swish uses helper programs to index documents of types that |
28 |
|
|
Swish-e cannot natively index. These programs may need to be installed |
29 |
|
|
separately from Swish-e. |
30 |
|
|
|
31 |
|
|
Swish-e is an Open Source (see: http://opensource.org ) program |
32 |
|
|
supported by developers and a large group of users. Please take time to |
33 |
|
|
join the Swish-e discussion list at http://Swish-e.org. |
34 |
|
|
|
35 |
|
|
Key features |
36 |
|
|
|
37 |
|
|
* Quickly index a large number of documents in different formats |
38 |
|
|
including text, HTML, and XML |
39 |
|
|
|
40 |
|
|
* Use "filters" to index other types of files such as PDF, gzip, or |
41 |
|
|
Postscript. |
42 |
|
|
|
43 |
|
|
* Includes a web spider for indexing remote documents over HTTP. |
44 |
|
|
Follows Robots Exclusion Rules (including META tags). |
45 |
|
|
|
46 |
|
|
* Use an external program to supply documents to Swish-e, such as an |
47 |
|
|
advanced spider for your web server or a program to read and format |
48 |
|
|
records from a relational database. |
49 |
|
|
|
50 |
|
|
* Document "properties" (some subset of the source document, usually |
51 |
|
|
defined as a META or XML elements) may be stored in the index and |
52 |
|
|
returned with search results |
53 |
|
|
|
54 |
|
|
* Document summaries can be returned with each search |
55 |
|
|
|
56 |
|
|
* Word stemming, soundex, metaphone, and double-metaphone indexing for |
57 |
|
|
"fuzzy" searching |
58 |
|
|
|
59 |
|
|
* Phrase searching and wildcard searching |
60 |
|
|
|
61 |
|
|
* Limit searches to HTML links |
62 |
|
|
|
63 |
|
|
* Use powerful Regular Expressions to select documents for indexing or |
64 |
|
|
exclusion |
65 |
|
|
|
66 |
|
|
* Easily limit searches to parts or all of your web site |
67 |
|
|
|
68 |
|
|
* Results can be sorted by relevance or by any number of properties in |
69 |
|
|
ascending or descending order |
70 |
|
|
|
71 |
|
|
* Limit searches to parts of documents such as certain HTML tags |
72 |
|
|
(META, TITLE, comments, etc.) or to XML elements. |
73 |
|
|
|
74 |
|
|
* Can report structural errors in your XML and HTML documents |
75 |
|
|
|
76 |
|
|
* Index file is portable between platforms. |
77 |
|
|
|
78 |
|
|
* A Swish-e library is provided to allow embedding Swish-e into your |
79 |
|
|
applications. A Perl module is available that provides a standard |
80 |
|
|
API for accessing Swish-e. |
81 |
|
|
|
82 |
|
|
* Includes example search scripts |
83 |
|
|
|
84 |
|
|
* Swish-e is fast. |
85 |
|
|
|
86 |
|
|
* It's open source and FREE! You can customize Swish-e and you can |
87 |
|
|
contribute your fancy new features to the project. |
88 |
|
|
|
89 |
|
|
* Supported by on-line user and developer groups |
90 |
|
|
|
91 |
|
|
Where do I get Swish-e? |
92 |
|
|
The current version of Swish-e can be found at: |
93 |
|
|
|
94 |
|
|
http://Swish-e.org |
95 |
|
|
|
96 |
|
|
Please make sure you use a current version of Swish-e. |
97 |
|
|
|
98 |
|
|
Information about Windows binary distributions can also be found at this |
99 |
|
|
site. |
100 |
|
|
|
101 |
|
|
How Do I Install Swish-e? |
102 |
|
|
Read the INSTALL page. |
103 |
|
|
|
104 |
|
|
Building from source is recommended. On most platforms Swish-e should |
105 |
|
|
build without problems. Information on building for VMS and Win32 can be |
106 |
|
|
found in sub-directories of the "src" directory. Check the Swish-e site |
107 |
|
|
for information about binary distributions (such as for Windows). |
108 |
|
|
|
109 |
|
|
In addition to the INSTALL page, make sure you read the SWISH-FAQ page |
110 |
|
|
if you have any questions, or to get an idea of questions that you might |
111 |
|
|
someday ask. |
112 |
|
|
|
113 |
|
|
Problems or questions about installing Swish-e should be directed to the |
114 |
|
|
Swish-e discussion list (see the Swish-e web site at |
115 |
|
|
http://Swish-e.org). |
116 |
|
|
|
117 |
|
|
The Swish-e Documentation |
118 |
|
|
Documetation is provided in the Swish-e distribution package in two |
119 |
|
|
forms, POD (Plain Old Documentation), and in html format. The POD |
120 |
|
|
documentation is in the pod directory, and the HTML documentation is in |
121 |
|
|
the html directory, of course. |
122 |
|
|
|
123 |
|
|
If your system includes the required support files and programs, the |
124 |
|
|
distribution make files can also generate the documentation in these |
125 |
|
|
formats: |
126 |
|
|
|
127 |
|
|
Postscript |
128 |
|
|
PDF (Adobe Acrobat) |
129 |
|
|
system man pages |
130 |
|
|
|
131 |
|
|
You may also build a "split" version of the documentation where each |
132 |
|
|
topic heading is a separate web page. Building the split version also |
133 |
|
|
creates a Swish-e index of the documentation that makes the |
134 |
|
|
documentation searchable via the included Perl CGI program. |
135 |
|
|
|
136 |
|
|
Building these other forms of documentation require additional helper |
137 |
|
|
applications -- most modern Linux distributions will include all that's |
138 |
|
|
needed (at least mine does...). You shouldn't have a problem if you have |
139 |
|
|
kept your Perl and Perl libraries up to date. |
140 |
|
|
|
141 |
|
|
Online documentation can be found at the Swish-e web site listed above. |
142 |
|
|
|
143 |
|
|
See INSTALL for information on creating the PDF and Postscript versions |
144 |
|
|
of the documentation, and for information on installing the SWISH-* |
145 |
|
|
documentation as Unix man(1) pages. |
146 |
|
|
|
147 |
|
|
How do I read the Swish-e documentation? |
148 |
|
|
|
149 |
|
|
The Swish-e documentation included with the distribution is in POD and |
150 |
|
|
HTML formats. The POD documentation can be found in the pod directory, |
151 |
|
|
and the HTML documentation can be found in the html directory. |
152 |
|
|
|
153 |
|
|
To view the HTML documentation point your browser to the html/index.html |
154 |
|
|
file. |
155 |
|
|
|
156 |
|
|
The POD documentation is displayed by the "perldoc" command that is |
157 |
|
|
included with every Perl installation. For example, to view the Swish-e |
158 |
|
|
installation documentation page called "INSTALL", type |
159 |
|
|
|
160 |
|
|
perldoc pod/INSTALL |
161 |
|
|
|
162 |
|
|
or to make life easier, |
163 |
|
|
|
164 |
|
|
cd pod |
165 |
|
|
perldoc INSTALL |
166 |
|
|
perldoc SWISH-RUN |
167 |
|
|
|
168 |
|
|
Complain to your system administrator if the "perldoc" command is not |
169 |
|
|
available on your machine. |
170 |
|
|
|
171 |
|
|
Included Documentation |
172 |
|
|
|
173 |
|
|
The following documentation is included in this Swish-e distribution. |
174 |
|
|
|
175 |
|
|
If you are new to Swish-e read the INSTALL page to get Swish-e installed |
176 |
|
|
and tested. Work through the example in shown in the INSTALL page, and |
177 |
|
|
the examples in the conf directory. Also review the SWISH-FAQ. |
178 |
|
|
|
179 |
|
|
* README - This file |
180 |
|
|
|
181 |
|
|
* INSTALL - Installation and basic usage instructions |
182 |
|
|
|
183 |
|
|
* SWISH-CONFIG - Configuration File Directives |
184 |
|
|
|
185 |
|
|
* SWISH-RUN - Running Swish and Command Line Switches |
186 |
|
|
|
187 |
|
|
* SWISH-SEARCH - All about Searching with Swish-e |
188 |
|
|
|
189 |
|
|
* SWISH-FAQ - Common questions, and some answers |
190 |
|
|
|
191 |
|
|
* SWISH-LIBRARY - Interface to the Swish-e C library |
192 |
|
|
|
193 |
|
|
* SWISH-PERL - Instructions for using the Perl library |
194 |
|
|
|
195 |
|
|
* CHANGES - List of feature changes and bug fixes |
196 |
|
|
|
197 |
|
|
* SWISH-BUGS - List of known bugs in the release |
198 |
|
|
|
199 |
|
|
Document Generation |
200 |
|
|
|
201 |
|
|
The Swish-e documentation in HTML format was created with |
202 |
|
|
Pod::HtmlPsPdf, a package of Perl modules written and/or modified by |
203 |
|
|
Stas Bekman to automate the conversion of documents in pod format (see |
204 |
|
|
perldoc perlpod) to HTML, Postscript, and PDF. A slightly modified |
205 |
|
|
version of this package is include with the Swish-e distribution and |
206 |
|
|
used for building the HTML. As distributed, Swish-e contains only the |
207 |
|
|
pod and HTML documentation. See INSTALL for instructions on creating |
208 |
|
|
man(1), Postscript, and PDF formats. |
209 |
|
|
|
210 |
|
|
Thanks, Stas, for your help! |
211 |
|
|
|
212 |
|
|
What's included in the Swish-e distribution? |
213 |
|
|
Here's an overview of the directories included in the Swish-e |
214 |
|
|
distribution: |
215 |
|
|
|
216 |
|
|
conf/ |
217 |
|
|
Example Swish-e configuration setups to help you get started. After |
218 |
|
|
reading the INSTALL page, and its included example, review the sample |
219 |
|
|
configuration in this directory. |
220 |
|
|
|
221 |
|
|
conf/stopwords |
222 |
|
|
In the "conf/stopwords" sub-directory are a number of stopword files |
223 |
|
|
for different languages. Use of stopwords is not required with |
224 |
|
|
Swish-e. |
225 |
|
|
|
226 |
|
|
doc/ |
227 |
|
|
Contains files required for building the HTML, PDF, and Postscript |
228 |
|
|
documentation. |
229 |
|
|
|
230 |
|
|
example/ |
231 |
|
|
This contains a sample CGI script (swish.cgi) for searching with |
232 |
|
|
Swish-e. Documentation for using swish.cgi are included within the |
233 |
|
|
script. Type: |
234 |
|
|
|
235 |
|
|
perldoc example/swish.cgi |
236 |
|
|
|
237 |
|
|
from the top-level directory where the Swish-e distribution was |
238 |
|
|
unpacked. |
239 |
|
|
|
240 |
|
|
filter-bin/ |
241 |
|
|
Sample programs to use with Swish-e's "filters". Examples include |
242 |
|
|
PDF, MS Word, and binary strings filters. Filters often require |
243 |
|
|
installing separate document conversion programs. |
244 |
|
|
|
245 |
|
|
html/ |
246 |
|
|
The documentation in HTML format. |
247 |
|
|
|
248 |
|
|
perl/ |
249 |
|
|
The Perl interface to the Swish-e C library. This Perl module |
250 |
|
|
provides direct access to Swish-e from within your Perl programs. See |
251 |
|
|
the perl/README file for more information. |
252 |
|
|
|
253 |
|
|
pod/ |
254 |
|
|
The source for all documentation in perldoc (pod) format. |
255 |
|
|
|
256 |
|
|
prog-bin/ |
257 |
|
|
Example programs and modules to use with the "prog" document source |
258 |
|
|
access method. Examples include a web spider, a program to index |
259 |
|
|
directly from a MySQL database, and a program to recurse a directory |
260 |
|
|
tree. Example Perl modules are provided for converting PDF and |
261 |
|
|
MS-Word documents into a format usable by Swish-e. See |
262 |
|
|
prog-bin/README for an overview of the programs and modules, and |
263 |
|
|
check each file for included documentation. |
264 |
|
|
|
265 |
|
|
The prog-bin/spider.pl program is a web spider program with many |
266 |
|
|
features. It contains its own documentation. Type: |
267 |
|
|
|
268 |
|
|
perldoc example/spider.pl |
269 |
|
|
|
270 |
|
|
from the top-level directory where the Swish-e distribution was |
271 |
|
|
unpacked. |
272 |
|
|
|
273 |
|
|
The "prog" document source feature is very powerful, but can be a |
274 |
|
|
challange to set up when first using Swish-e. Please contact the |
275 |
|
|
Swish-e disussion list if you have any questions. |
276 |
|
|
|
277 |
|
|
src/ |
278 |
|
|
This directory contains the source code for Swish-e. OS-specific |
279 |
|
|
directories are also found here. |
280 |
|
|
|
281 |
|
|
tests/ |
282 |
|
|
The documents used for running "make test". |
283 |
|
|
|
284 |
|
|
Where do I get help with Swish-e? |
285 |
|
|
If you need help with installing or using Swish-e please subscribe to |
286 |
|
|
the Swish-e mailing list. Visit the Swish-e web site listed above for |
287 |
|
|
information on subscribing to the mailing list. |
288 |
|
|
|
289 |
|
|
Before posting any questions please read QUESTIONS AND TROUBLESHOOTING |
290 |
|
|
in the INSTALL documentation page. |
291 |
|
|
|
292 |
|
|
Speling mistakes |
293 |
|
|
Please contact the Swish-e list with corrections to this documentation. |
294 |
|
|
Any help in cleaning up the docs will be appreciated! |
295 |
|
|
|
296 |
|
|
Any patches should be made against the .pod files, not the .html files. |
297 |
|
|
|
298 |
|
|
Swish-e Development |
299 |
|
|
Swish-e is currently being developed as an open source project on |
300 |
|
|
SourceForge http://sourceforge.net. |
301 |
|
|
|
302 |
|
|
Contact the Swish-e list for questions. |
303 |
|
|
|
304 |
|
|
Swish-e's History |
305 |
|
|
SWISH was created by Kevin Hughes to fill the need of the growing number |
306 |
|
|
of Web administrators on the Internet - many of the indexing systems |
307 |
|
|
were not well documented, were hard to use and install, and were too |
308 |
|
|
complex for their own good. The system was widely used for several |
309 |
|
|
years, long enough to collect some bug fixes and requests for |
310 |
|
|
enhancements. |
311 |
|
|
|
312 |
|
|
In Fall 1996, The Library of UC Berkeley received permission from Kevin |
313 |
|
|
Hughes to implement bug fixes and enhancements to the original binary. |
314 |
|
|
The result is Swish-enhanced or Swish-e, brought to you by the Swish-e |
315 |
|
|
Development Team. |
316 |
|
|
|
317 |
|
|
Document Info |
318 |
|
|
Each document in the Swish-e distribution contains this section. It |
319 |
|
|
refers only to the specific page it's located in, and not to the Swish-e |
320 |
|
|
program or the documentation as a whole. |
321 |
|
|
|
322 |
|
|
$Id: README.pod,v 1.11 2002/08/20 22:24:08 whmoseley Exp $ |
323 |
|
|
|
324 |
|
|
. |