1 |
adcroft |
1.1 |
=head1 NAME |
2 |
|
|
|
3 |
|
|
The Swish-e README File |
4 |
|
|
|
5 |
|
|
=head1 What is Swish-e? |
6 |
|
|
|
7 |
|
|
Swish-e is B<S>imple B<W>eb B<I>ndexing B<S>ystem for B<H>umans - B<E>nhanced. |
8 |
|
|
Swish-e can quickly and easily index directories of files or remote web sites |
9 |
|
|
and search the generated indexes. |
10 |
|
|
|
11 |
|
|
Swish-e it extremely fast in both indexing and searching, highly configurable, |
12 |
|
|
and can be seamlessly integrated with existing web sites to maintain a consistent design. |
13 |
|
|
Swish-e can index web pages, but can just as easily index text files, mailing list archives, |
14 |
|
|
or data stored in a relational database. |
15 |
|
|
|
16 |
|
|
Swish-e version 2.2 represents a major rewrite of the code and the |
17 |
|
|
addition of many new features. Memory requirements for indexing have |
18 |
|
|
been reduced, and indexing speed is significantly improved from previous versions. |
19 |
|
|
New features allow more control over indexing, better document parsing, improved indexing |
20 |
|
|
and searching logic, better filter code, and the ability to index from any data source. |
21 |
|
|
|
22 |
|
|
Swish-e is not a "turn-key" indexing and searching solution. The Swish-e distribution contains |
23 |
|
|
most of the parts to create such a system, but you need to put the parts together as best meets your needs. |
24 |
|
|
You will need to configure Swish-e to index your documents, create an index by running Swish-e, |
25 |
|
|
and setup an interface such as a CGI script (a script is included). Swish uses helper programs |
26 |
|
|
to index documents of types that Swish-e cannot natively index. These programs may need to be installed |
27 |
|
|
separately from Swish-e. |
28 |
|
|
|
29 |
|
|
Swish-e is an Open Source (see: http://opensource.org ) program supported by developers and a large group of users. |
30 |
|
|
Please take time to join the Swish-e discussion list at http://Swish-e.org. |
31 |
|
|
|
32 |
|
|
|
33 |
|
|
=head2 Key features |
34 |
|
|
|
35 |
|
|
=over 4 |
36 |
|
|
|
37 |
|
|
=item * |
38 |
|
|
|
39 |
|
|
Quickly index a large number of documents in different formats |
40 |
|
|
including text, HTML, and XML |
41 |
|
|
|
42 |
|
|
=item * |
43 |
|
|
|
44 |
|
|
Use "filters" to index other types of files such as PDF, gzip, or |
45 |
|
|
Postscript. |
46 |
|
|
|
47 |
|
|
=item * |
48 |
|
|
|
49 |
|
|
Includes a web spider for indexing remote documents over HTTP. |
50 |
|
|
Follows Robots Exclusion Rules (including META tags). |
51 |
|
|
|
52 |
|
|
=item * |
53 |
|
|
|
54 |
|
|
Use an external program to supply documents to Swish-e, such as an |
55 |
|
|
advanced spider for your web server or a program to read and format |
56 |
|
|
records from a relational database. |
57 |
|
|
|
58 |
|
|
=item * |
59 |
|
|
|
60 |
|
|
Document "properties" (some subset of the source document, usually defined |
61 |
|
|
as a META or XML elements) may be stored in the index and returned with |
62 |
|
|
search results |
63 |
|
|
|
64 |
|
|
=item * |
65 |
|
|
|
66 |
|
|
Document summaries can be returned with each search |
67 |
|
|
|
68 |
|
|
=item * |
69 |
|
|
|
70 |
|
|
Word stemming, soundex, metaphone, and double-metaphone indexing for "fuzzy" searching |
71 |
|
|
|
72 |
|
|
=item * |
73 |
|
|
|
74 |
|
|
Phrase searching and wildcard searching |
75 |
|
|
|
76 |
|
|
=item * |
77 |
|
|
|
78 |
|
|
Limit searches to HTML links |
79 |
|
|
|
80 |
|
|
=item * |
81 |
|
|
|
82 |
|
|
Use powerful Regular Expressions to select documents for indexing or exclusion |
83 |
|
|
|
84 |
|
|
=item * |
85 |
|
|
|
86 |
|
|
Easily limit searches to parts or all of your web site |
87 |
|
|
|
88 |
|
|
=item * |
89 |
|
|
|
90 |
|
|
Results can be sorted by relevance or by any number of properties |
91 |
|
|
in ascending or descending order |
92 |
|
|
|
93 |
|
|
=item * |
94 |
|
|
|
95 |
|
|
Limit searches to parts of documents such as certain HTML tags |
96 |
|
|
(META, TITLE, comments, etc.) or to XML elements. |
97 |
|
|
|
98 |
|
|
=item * |
99 |
|
|
|
100 |
|
|
Can report structural errors in your XML and HTML documents |
101 |
|
|
|
102 |
|
|
=item * |
103 |
|
|
|
104 |
|
|
Index file is portable between platforms. |
105 |
|
|
|
106 |
|
|
=item * |
107 |
|
|
|
108 |
|
|
A Swish-e library is provided to allow embedding Swish-e into your applications. |
109 |
|
|
A Perl module is available that provides a standard API for accessing Swish-e. |
110 |
|
|
|
111 |
|
|
=item * |
112 |
|
|
|
113 |
|
|
Includes example search scripts |
114 |
|
|
|
115 |
|
|
=item * |
116 |
|
|
|
117 |
|
|
Swish-e is fast. |
118 |
|
|
|
119 |
|
|
=item * |
120 |
|
|
|
121 |
|
|
It's open source and FREE! You can customize Swish-e and you can |
122 |
|
|
contribute your fancy new features to the project. |
123 |
|
|
|
124 |
|
|
=item * |
125 |
|
|
|
126 |
|
|
Supported by on-line user and developer groups |
127 |
|
|
|
128 |
|
|
=back |
129 |
|
|
|
130 |
|
|
|
131 |
|
|
=head1 Where do I get Swish-e? |
132 |
|
|
|
133 |
|
|
The current version of Swish-e can be found at: |
134 |
|
|
|
135 |
|
|
http://Swish-e.org |
136 |
|
|
|
137 |
|
|
Please make sure you use a current version of Swish-e. |
138 |
|
|
|
139 |
|
|
Information about Windows binary distributions can also be found at |
140 |
|
|
this site. |
141 |
|
|
|
142 |
|
|
=head1 How Do I Install Swish-e? |
143 |
|
|
|
144 |
|
|
Read the L<INSTALL|INSTALL> page. |
145 |
|
|
|
146 |
|
|
Building from source is recommended. On most platforms Swish-e should build without problems. |
147 |
|
|
Information on building for VMS and Win32 can be found in sub-directories of the C<src> directory. |
148 |
|
|
Check the Swish-e site for information about binary distributions (such as for Windows). |
149 |
|
|
|
150 |
|
|
In addition to the INSTALL page, make sure you read the |
151 |
|
|
L<SWISH-FAQ|SWISH-FAQ> page if you have any questions, or to get an idea |
152 |
|
|
of questions that you might someday ask. |
153 |
|
|
|
154 |
|
|
Problems or questions about installing Swish-e should be directed to the Swish-e discussion list (see the |
155 |
|
|
Swish-e web site at http://Swish-e.org). |
156 |
|
|
|
157 |
|
|
|
158 |
|
|
=head1 The Swish-e Documentation |
159 |
|
|
|
160 |
|
|
Documetation is provided in the Swish-e distribution package in two forms, |
161 |
|
|
POD (Plain Old Documentation), and in html format. The POD documentation |
162 |
|
|
is in the F<pod> directory, and the HTML documentation is in the F<html> |
163 |
|
|
directory, of course. |
164 |
|
|
|
165 |
|
|
If your system includes the required support files and programs, the |
166 |
|
|
distribution make files can also generate the documentation in these |
167 |
|
|
formats: |
168 |
|
|
|
169 |
|
|
Postscript |
170 |
|
|
PDF (Adobe Acrobat) |
171 |
|
|
system man pages |
172 |
|
|
|
173 |
|
|
You may also build a "split" version of the documentation where each |
174 |
|
|
topic heading is a separate web page. Building the split version also |
175 |
|
|
creates a Swish-e index of the documentation that makes the documentation |
176 |
|
|
searchable via the included Perl CGI program. |
177 |
|
|
|
178 |
|
|
Building these other forms of documentation require additional helper |
179 |
|
|
applications -- most modern Linux distributions will include all that's |
180 |
|
|
needed (at least mine does...). You shouldn't have a problem if you have |
181 |
|
|
kept your Perl and Perl libraries up to date. |
182 |
|
|
|
183 |
|
|
Online documentation can be found at the Swish-e web site listed above. |
184 |
|
|
|
185 |
|
|
See L<INSTALL|INSTALL> for information on creating the PDF and Postscript |
186 |
|
|
versions of the documentation, and for information on installing the |
187 |
|
|
SWISH-* documentation as Unix man(1) pages. |
188 |
|
|
|
189 |
|
|
|
190 |
|
|
=head2 How do I read the Swish-e documentation? |
191 |
|
|
|
192 |
|
|
The Swish-e documentation included with the distribution is in POD and |
193 |
|
|
HTML formats. The POD documentation can be found in the F<pod> directory, |
194 |
|
|
and the HTML documentation can be found in the F<html> directory. |
195 |
|
|
|
196 |
|
|
To view the HTML documentation point your browser to the |
197 |
|
|
F<html/index.html> file. |
198 |
|
|
|
199 |
|
|
The POD documentation is displayed by the "perldoc" command that is |
200 |
|
|
included with every Perl installation. For example, to view the Swish-e |
201 |
|
|
installation documentation page called "INSTALL", type |
202 |
|
|
|
203 |
|
|
perldoc pod/INSTALL |
204 |
|
|
|
205 |
|
|
or to make life easier, |
206 |
|
|
|
207 |
|
|
cd pod |
208 |
|
|
perldoc INSTALL |
209 |
|
|
perldoc SWISH-RUN |
210 |
|
|
|
211 |
|
|
Complain to your system administrator if the C<perldoc> command is not |
212 |
|
|
available on your machine. |
213 |
|
|
|
214 |
|
|
=head2 Included Documentation |
215 |
|
|
|
216 |
|
|
The following documentation is included in this Swish-e distribution. |
217 |
|
|
|
218 |
|
|
If you are new to Swish-e read the L<INSTALL|INSTALL> page to get Swish-e installed |
219 |
|
|
and tested. Work through the example in shown in the L<INSTALL|INSTALL> page, and |
220 |
|
|
the examples in the F<conf> directory. Also review the L<SWISH-FAQ|SWISH-FAQ>. |
221 |
|
|
|
222 |
|
|
=over 4 |
223 |
|
|
|
224 |
|
|
=item * |
225 |
|
|
|
226 |
|
|
L<README|README> - This file |
227 |
|
|
|
228 |
|
|
=item * |
229 |
|
|
|
230 |
|
|
L<INSTALL|INSTALL> - Installation and basic usage instructions |
231 |
|
|
|
232 |
|
|
=item * |
233 |
|
|
|
234 |
|
|
L<SWISH-CONFIG|SWISH-CONFIG> - Configuration File Directives |
235 |
|
|
|
236 |
|
|
=item * |
237 |
|
|
|
238 |
|
|
L<SWISH-RUN|SWISH-RUN> - Running Swish and Command Line Switches |
239 |
|
|
|
240 |
|
|
=item * |
241 |
|
|
|
242 |
|
|
L<SWISH-SEARCH|SWISH-SEARCH> - All about Searching with Swish-e |
243 |
|
|
|
244 |
|
|
=item * |
245 |
|
|
|
246 |
|
|
L<SWISH-FAQ|SWISH-FAQ> - Common questions, and some answers |
247 |
|
|
|
248 |
|
|
=item * |
249 |
|
|
|
250 |
|
|
L<SWISH-LIBRARY|SWISH-LIBRARY> - Interface to the Swish-e C library |
251 |
|
|
|
252 |
|
|
=item * |
253 |
|
|
|
254 |
|
|
L<SWISH-PERL|SWISH-PERL> - Instructions for using the Perl library |
255 |
|
|
|
256 |
|
|
=item * |
257 |
|
|
|
258 |
|
|
L<CHANGES|CHANGES> - List of feature changes and bug fixes |
259 |
|
|
|
260 |
|
|
=item * |
261 |
|
|
|
262 |
|
|
L<SWISH-BUGS|SWISH-BUGS> - List of known bugs in the release |
263 |
|
|
|
264 |
|
|
=back |
265 |
|
|
|
266 |
|
|
=head2 Document Generation |
267 |
|
|
|
268 |
|
|
The Swish-e documentation in HTML format was created with Pod::HtmlPsPdf, |
269 |
|
|
a package of Perl modules written and/or modified by Stas Bekman to |
270 |
|
|
automate the conversion of documents in pod format (see perldoc perlpod) |
271 |
|
|
to HTML, Postscript, and PDF. A slightly modified version of this package |
272 |
|
|
is include with the Swish-e distribution and used for building the HTML. |
273 |
|
|
As distributed, Swish-e contains only the pod and HTML documentation. |
274 |
|
|
See L<INSTALL|INSTALL> for instructions on creating man(1), Postscript, |
275 |
|
|
and PDF formats. |
276 |
|
|
|
277 |
|
|
Thanks, Stas, for your help! |
278 |
|
|
|
279 |
|
|
=head1 What's included in the Swish-e distribution? |
280 |
|
|
|
281 |
|
|
Here's an overview of the directories included in the Swish-e |
282 |
|
|
distribution: |
283 |
|
|
|
284 |
|
|
=over 3 |
285 |
|
|
|
286 |
|
|
=item conf/ |
287 |
|
|
|
288 |
|
|
Example Swish-e configuration setups to help you get started. |
289 |
|
|
After reading the L<INSTALL|INSTALL> page, and its included example, review |
290 |
|
|
the sample configuration in this directory. |
291 |
|
|
|
292 |
|
|
=item conf/stopwords |
293 |
|
|
|
294 |
|
|
In the C<conf/stopwords> sub-directory are a number of stopword files for different |
295 |
|
|
languages. Use of stopwords is not required with Swish-e. |
296 |
|
|
|
297 |
|
|
=item doc/ |
298 |
|
|
|
299 |
|
|
Contains files required for building the HTML, PDF, and Postscript |
300 |
|
|
documentation. |
301 |
|
|
|
302 |
|
|
=item example/ |
303 |
|
|
|
304 |
|
|
This contains a sample CGI script (F<swish.cgi>) for searching with Swish-e. |
305 |
|
|
Documentation for using F<swish.cgi> are included within the script. Type: |
306 |
|
|
|
307 |
|
|
perldoc example/swish.cgi |
308 |
|
|
|
309 |
|
|
from the top-level directory where the Swish-e distribution was unpacked. |
310 |
|
|
|
311 |
|
|
=item filter-bin/ |
312 |
|
|
|
313 |
|
|
Sample programs to use with Swish-e's "filters". Examples include PDF, |
314 |
|
|
MS Word, and binary strings filters. |
315 |
|
|
Filters often require installing separate document conversion programs. |
316 |
|
|
|
317 |
|
|
=item html/ |
318 |
|
|
|
319 |
|
|
The documentation in HTML format. |
320 |
|
|
|
321 |
|
|
=item perl/ |
322 |
|
|
|
323 |
|
|
The Perl interface to the Swish-e C library. This Perl module provides direct access to |
324 |
|
|
Swish-e from within your Perl programs. See the F<perl/README> file for more information. |
325 |
|
|
|
326 |
|
|
=item pod/ |
327 |
|
|
|
328 |
|
|
The source for all documentation in perldoc (pod) format. |
329 |
|
|
|
330 |
|
|
=item prog-bin/ |
331 |
|
|
|
332 |
|
|
Example programs and modules to use with the "prog" document source |
333 |
|
|
access method. Examples include a web spider, a program to index directly from |
334 |
|
|
a MySQL database, and a program to recurse a directory tree. |
335 |
|
|
Example Perl modules are provided for converting PDF and MS-Word documents |
336 |
|
|
into a format usable by Swish-e. See F<prog-bin/README> for an overview of the |
337 |
|
|
programs and modules, and check each file for included documentation. |
338 |
|
|
|
339 |
|
|
The F<prog-bin/spider.pl> program is a web spider program with many features. |
340 |
|
|
It contains its own documentation. Type: |
341 |
|
|
|
342 |
|
|
perldoc example/spider.pl |
343 |
|
|
|
344 |
|
|
from the top-level directory where the Swish-e distribution was unpacked. |
345 |
|
|
|
346 |
|
|
The "prog" document source feature is very powerful, but can be a challange to |
347 |
|
|
set up when first using Swish-e. Please contact the Swish-e disussion list |
348 |
|
|
if you have any questions. |
349 |
|
|
|
350 |
|
|
=item src/ |
351 |
|
|
|
352 |
|
|
This directory contains the source code for Swish-e. OS-specific |
353 |
|
|
directories are also found here. |
354 |
|
|
|
355 |
|
|
=item tests/ |
356 |
|
|
|
357 |
|
|
The documents used for running C<make test>. |
358 |
|
|
|
359 |
|
|
|
360 |
|
|
=back |
361 |
|
|
|
362 |
|
|
|
363 |
|
|
=head1 Where do I get help with Swish-e? |
364 |
|
|
|
365 |
|
|
If you need help with installing or using Swish-e please subscribe to |
366 |
|
|
the Swish-e mailing list. Visit the Swish-e web site listed above |
367 |
|
|
for information on subscribing to the mailing list. |
368 |
|
|
|
369 |
|
|
Before posting any questions please read |
370 |
|
|
L<QUESTIONS AND TROUBLESHOOTING|INSTALL/"QUESTIONS AND TROUBLESHOOTING"> |
371 |
|
|
in the L<INSTALL|INSTALL> documentation page. |
372 |
|
|
|
373 |
|
|
=head1 Speling mistakes |
374 |
|
|
|
375 |
|
|
Please contact the Swish-e list with corrections to this documentation. |
376 |
|
|
Any help in cleaning up the docs will be appreciated! |
377 |
|
|
|
378 |
|
|
Any patches should be made against the .pod files, not the .html files. |
379 |
|
|
|
380 |
|
|
=head1 Swish-e Development |
381 |
|
|
|
382 |
|
|
Swish-e is currently being developed as an open source project on |
383 |
|
|
SourceForge http://sourceforge.net. |
384 |
|
|
|
385 |
|
|
Contact the Swish-e list for questions. |
386 |
|
|
|
387 |
|
|
=head1 Swish-e's History |
388 |
|
|
|
389 |
|
|
SWISH was created by Kevin Hughes to fill the need of the growing number |
390 |
|
|
of Web administrators on the Internet - many of the indexing systems were |
391 |
|
|
not well documented, were hard to use and install, and were too complex |
392 |
|
|
for their own good. The system was widely used for several years, long |
393 |
|
|
enough to collect some bug fixes and requests for enhancements. |
394 |
|
|
|
395 |
|
|
In Fall 1996, The Library of UC Berkeley received permission from |
396 |
|
|
Kevin Hughes to implement bug fixes and enhancements to the original |
397 |
|
|
binary. The result is Swish-enhanced or Swish-e, brought to you by the |
398 |
|
|
Swish-e Development Team. |
399 |
|
|
|
400 |
|
|
=head1 Document Info |
401 |
|
|
|
402 |
|
|
Each document in the Swish-e distribution contains this section. |
403 |
|
|
It refers only to the specific page it's located in, and not to the |
404 |
|
|
Swish-e program or the documentation as a whole. |
405 |
|
|
|
406 |
|
|
$Id: README.pod,v 1.11 2002/08/20 22:24:08 whmoseley Exp $ |
407 |
|
|
|
408 |
|
|
. |