1 |
=head1 NAME |
2 |
|
3 |
The Swish-e README File |
4 |
|
5 |
=head1 What is Swish-e? |
6 |
|
7 |
Swish-e is B<S>imple B<W>eb B<I>ndexing B<S>ystem for B<H>umans - B<E>nhanced. |
8 |
Swish-e can quickly and easily index directories of files or remote web sites |
9 |
and search the generated indexes. |
10 |
|
11 |
Swish-e it extremely fast in both indexing and searching, highly configurable, |
12 |
and can be seamlessly integrated with existing web sites to maintain a consistent design. |
13 |
Swish-e can index web pages, but can just as easily index text files, mailing list archives, |
14 |
or data stored in a relational database. |
15 |
|
16 |
Swish-e version 2.2 represents a major rewrite of the code and the |
17 |
addition of many new features. Memory requirements for indexing have |
18 |
been reduced, and indexing speed is significantly improved from previous versions. |
19 |
New features allow more control over indexing, better document parsing, improved indexing |
20 |
and searching logic, better filter code, and the ability to index from any data source. |
21 |
|
22 |
Swish-e is not a "turn-key" indexing and searching solution. The Swish-e distribution contains |
23 |
most of the parts to create such a system, but you need to put the parts together as best meets your needs. |
24 |
You will need to configure Swish-e to index your documents, create an index by running Swish-e, |
25 |
and setup an interface such as a CGI script (a script is included). Swish uses helper programs |
26 |
to index documents of types that Swish-e cannot natively index. These programs may need to be installed |
27 |
separately from Swish-e. |
28 |
|
29 |
Swish-e is an Open Source (see: http://opensource.org ) program supported by developers and a large group of users. |
30 |
Please take time to join the Swish-e discussion list at http://Swish-e.org. |
31 |
|
32 |
|
33 |
=head2 Key features |
34 |
|
35 |
=over 4 |
36 |
|
37 |
=item * |
38 |
|
39 |
Quickly index a large number of documents in different formats |
40 |
including text, HTML, and XML |
41 |
|
42 |
=item * |
43 |
|
44 |
Use "filters" to index other types of files such as PDF, gzip, or |
45 |
Postscript. |
46 |
|
47 |
=item * |
48 |
|
49 |
Includes a web spider for indexing remote documents over HTTP. |
50 |
Follows Robots Exclusion Rules (including META tags). |
51 |
|
52 |
=item * |
53 |
|
54 |
Use an external program to supply documents to Swish-e, such as an |
55 |
advanced spider for your web server or a program to read and format |
56 |
records from a relational database. |
57 |
|
58 |
=item * |
59 |
|
60 |
Document "properties" (some subset of the source document, usually defined |
61 |
as a META or XML elements) may be stored in the index and returned with |
62 |
search results |
63 |
|
64 |
=item * |
65 |
|
66 |
Document summaries can be returned with each search |
67 |
|
68 |
=item * |
69 |
|
70 |
Word stemming, soundex, metaphone, and double-metaphone indexing for "fuzzy" searching |
71 |
|
72 |
=item * |
73 |
|
74 |
Phrase searching and wildcard searching |
75 |
|
76 |
=item * |
77 |
|
78 |
Limit searches to HTML links |
79 |
|
80 |
=item * |
81 |
|
82 |
Use powerful Regular Expressions to select documents for indexing or exclusion |
83 |
|
84 |
=item * |
85 |
|
86 |
Easily limit searches to parts or all of your web site |
87 |
|
88 |
=item * |
89 |
|
90 |
Results can be sorted by relevance or by any number of properties |
91 |
in ascending or descending order |
92 |
|
93 |
=item * |
94 |
|
95 |
Limit searches to parts of documents such as certain HTML tags |
96 |
(META, TITLE, comments, etc.) or to XML elements. |
97 |
|
98 |
=item * |
99 |
|
100 |
Can report structural errors in your XML and HTML documents |
101 |
|
102 |
=item * |
103 |
|
104 |
Index file is portable between platforms. |
105 |
|
106 |
=item * |
107 |
|
108 |
A Swish-e library is provided to allow embedding Swish-e into your applications. |
109 |
A Perl module is available that provides a standard API for accessing Swish-e. |
110 |
|
111 |
=item * |
112 |
|
113 |
Includes example search scripts |
114 |
|
115 |
=item * |
116 |
|
117 |
Swish-e is fast. |
118 |
|
119 |
=item * |
120 |
|
121 |
It's open source and FREE! You can customize Swish-e and you can |
122 |
contribute your fancy new features to the project. |
123 |
|
124 |
=item * |
125 |
|
126 |
Supported by on-line user and developer groups |
127 |
|
128 |
=back |
129 |
|
130 |
|
131 |
=head1 Where do I get Swish-e? |
132 |
|
133 |
The current version of Swish-e can be found at: |
134 |
|
135 |
http://Swish-e.org |
136 |
|
137 |
Please make sure you use a current version of Swish-e. |
138 |
|
139 |
Information about Windows binary distributions can also be found at |
140 |
this site. |
141 |
|
142 |
=head1 How Do I Install Swish-e? |
143 |
|
144 |
Read the L<INSTALL|INSTALL> page. |
145 |
|
146 |
Building from source is recommended. On most platforms Swish-e should build without problems. |
147 |
Information on building for VMS and Win32 can be found in sub-directories of the C<src> directory. |
148 |
Check the Swish-e site for information about binary distributions (such as for Windows). |
149 |
|
150 |
In addition to the INSTALL page, make sure you read the |
151 |
L<SWISH-FAQ|SWISH-FAQ> page if you have any questions, or to get an idea |
152 |
of questions that you might someday ask. |
153 |
|
154 |
Problems or questions about installing Swish-e should be directed to the Swish-e discussion list (see the |
155 |
Swish-e web site at http://Swish-e.org). |
156 |
|
157 |
|
158 |
=head1 The Swish-e Documentation |
159 |
|
160 |
Documetation is provided in the Swish-e distribution package in two forms, |
161 |
POD (Plain Old Documentation), and in html format. The POD documentation |
162 |
is in the F<pod> directory, and the HTML documentation is in the F<html> |
163 |
directory, of course. |
164 |
|
165 |
If your system includes the required support files and programs, the |
166 |
distribution make files can also generate the documentation in these |
167 |
formats: |
168 |
|
169 |
Postscript |
170 |
PDF (Adobe Acrobat) |
171 |
system man pages |
172 |
|
173 |
You may also build a "split" version of the documentation where each |
174 |
topic heading is a separate web page. Building the split version also |
175 |
creates a Swish-e index of the documentation that makes the documentation |
176 |
searchable via the included Perl CGI program. |
177 |
|
178 |
Building these other forms of documentation require additional helper |
179 |
applications -- most modern Linux distributions will include all that's |
180 |
needed (at least mine does...). You shouldn't have a problem if you have |
181 |
kept your Perl and Perl libraries up to date. |
182 |
|
183 |
Online documentation can be found at the Swish-e web site listed above. |
184 |
|
185 |
See L<INSTALL|INSTALL> for information on creating the PDF and Postscript |
186 |
versions of the documentation, and for information on installing the |
187 |
SWISH-* documentation as Unix man(1) pages. |
188 |
|
189 |
|
190 |
=head2 How do I read the Swish-e documentation? |
191 |
|
192 |
The Swish-e documentation included with the distribution is in POD and |
193 |
HTML formats. The POD documentation can be found in the F<pod> directory, |
194 |
and the HTML documentation can be found in the F<html> directory. |
195 |
|
196 |
To view the HTML documentation point your browser to the |
197 |
F<html/index.html> file. |
198 |
|
199 |
The POD documentation is displayed by the "perldoc" command that is |
200 |
included with every Perl installation. For example, to view the Swish-e |
201 |
installation documentation page called "INSTALL", type |
202 |
|
203 |
perldoc pod/INSTALL |
204 |
|
205 |
or to make life easier, |
206 |
|
207 |
cd pod |
208 |
perldoc INSTALL |
209 |
perldoc SWISH-RUN |
210 |
|
211 |
Complain to your system administrator if the C<perldoc> command is not |
212 |
available on your machine. |
213 |
|
214 |
=head2 Included Documentation |
215 |
|
216 |
The following documentation is included in this Swish-e distribution. |
217 |
|
218 |
If you are new to Swish-e read the L<INSTALL|INSTALL> page to get Swish-e installed |
219 |
and tested. Work through the example in shown in the L<INSTALL|INSTALL> page, and |
220 |
the examples in the F<conf> directory. Also review the L<SWISH-FAQ|SWISH-FAQ>. |
221 |
|
222 |
=over 4 |
223 |
|
224 |
=item * |
225 |
|
226 |
L<README|README> - This file |
227 |
|
228 |
=item * |
229 |
|
230 |
L<INSTALL|INSTALL> - Installation and basic usage instructions |
231 |
|
232 |
=item * |
233 |
|
234 |
L<SWISH-CONFIG|SWISH-CONFIG> - Configuration File Directives |
235 |
|
236 |
=item * |
237 |
|
238 |
L<SWISH-RUN|SWISH-RUN> - Running Swish and Command Line Switches |
239 |
|
240 |
=item * |
241 |
|
242 |
L<SWISH-SEARCH|SWISH-SEARCH> - All about Searching with Swish-e |
243 |
|
244 |
=item * |
245 |
|
246 |
L<SWISH-FAQ|SWISH-FAQ> - Common questions, and some answers |
247 |
|
248 |
=item * |
249 |
|
250 |
L<SWISH-LIBRARY|SWISH-LIBRARY> - Interface to the Swish-e C library |
251 |
|
252 |
=item * |
253 |
|
254 |
L<SWISH-PERL|SWISH-PERL> - Instructions for using the Perl library |
255 |
|
256 |
=item * |
257 |
|
258 |
L<CHANGES|CHANGES> - List of feature changes and bug fixes |
259 |
|
260 |
=item * |
261 |
|
262 |
L<SWISH-BUGS|SWISH-BUGS> - List of known bugs in the release |
263 |
|
264 |
=back |
265 |
|
266 |
=head2 Document Generation |
267 |
|
268 |
The Swish-e documentation in HTML format was created with Pod::HtmlPsPdf, |
269 |
a package of Perl modules written and/or modified by Stas Bekman to |
270 |
automate the conversion of documents in pod format (see perldoc perlpod) |
271 |
to HTML, Postscript, and PDF. A slightly modified version of this package |
272 |
is include with the Swish-e distribution and used for building the HTML. |
273 |
As distributed, Swish-e contains only the pod and HTML documentation. |
274 |
See L<INSTALL|INSTALL> for instructions on creating man(1), Postscript, |
275 |
and PDF formats. |
276 |
|
277 |
Thanks, Stas, for your help! |
278 |
|
279 |
=head1 What's included in the Swish-e distribution? |
280 |
|
281 |
Here's an overview of the directories included in the Swish-e |
282 |
distribution: |
283 |
|
284 |
=over 3 |
285 |
|
286 |
=item conf/ |
287 |
|
288 |
Example Swish-e configuration setups to help you get started. |
289 |
After reading the L<INSTALL|INSTALL> page, and its included example, review |
290 |
the sample configuration in this directory. |
291 |
|
292 |
=item conf/stopwords |
293 |
|
294 |
In the C<conf/stopwords> sub-directory are a number of stopword files for different |
295 |
languages. Use of stopwords is not required with Swish-e. |
296 |
|
297 |
=item doc/ |
298 |
|
299 |
Contains files required for building the HTML, PDF, and Postscript |
300 |
documentation. |
301 |
|
302 |
=item example/ |
303 |
|
304 |
This contains a sample CGI script (F<swish.cgi>) for searching with Swish-e. |
305 |
Documentation for using F<swish.cgi> are included within the script. Type: |
306 |
|
307 |
perldoc example/swish.cgi |
308 |
|
309 |
from the top-level directory where the Swish-e distribution was unpacked. |
310 |
|
311 |
=item filter-bin/ |
312 |
|
313 |
Sample programs to use with Swish-e's "filters". Examples include PDF, |
314 |
MS Word, and binary strings filters. |
315 |
Filters often require installing separate document conversion programs. |
316 |
|
317 |
=item html/ |
318 |
|
319 |
The documentation in HTML format. |
320 |
|
321 |
=item perl/ |
322 |
|
323 |
The Perl interface to the Swish-e C library. This Perl module provides direct access to |
324 |
Swish-e from within your Perl programs. See the F<perl/README> file for more information. |
325 |
|
326 |
=item pod/ |
327 |
|
328 |
The source for all documentation in perldoc (pod) format. |
329 |
|
330 |
=item prog-bin/ |
331 |
|
332 |
Example programs and modules to use with the "prog" document source |
333 |
access method. Examples include a web spider, a program to index directly from |
334 |
a MySQL database, and a program to recurse a directory tree. |
335 |
Example Perl modules are provided for converting PDF and MS-Word documents |
336 |
into a format usable by Swish-e. See F<prog-bin/README> for an overview of the |
337 |
programs and modules, and check each file for included documentation. |
338 |
|
339 |
The F<prog-bin/spider.pl> program is a web spider program with many features. |
340 |
It contains its own documentation. Type: |
341 |
|
342 |
perldoc example/spider.pl |
343 |
|
344 |
from the top-level directory where the Swish-e distribution was unpacked. |
345 |
|
346 |
The "prog" document source feature is very powerful, but can be a challange to |
347 |
set up when first using Swish-e. Please contact the Swish-e disussion list |
348 |
if you have any questions. |
349 |
|
350 |
=item src/ |
351 |
|
352 |
This directory contains the source code for Swish-e. OS-specific |
353 |
directories are also found here. |
354 |
|
355 |
=item tests/ |
356 |
|
357 |
The documents used for running C<make test>. |
358 |
|
359 |
|
360 |
=back |
361 |
|
362 |
|
363 |
=head1 Where do I get help with Swish-e? |
364 |
|
365 |
If you need help with installing or using Swish-e please subscribe to |
366 |
the Swish-e mailing list. Visit the Swish-e web site listed above |
367 |
for information on subscribing to the mailing list. |
368 |
|
369 |
Before posting any questions please read |
370 |
L<QUESTIONS AND TROUBLESHOOTING|INSTALL/"QUESTIONS AND TROUBLESHOOTING"> |
371 |
in the L<INSTALL|INSTALL> documentation page. |
372 |
|
373 |
=head1 Speling mistakes |
374 |
|
375 |
Please contact the Swish-e list with corrections to this documentation. |
376 |
Any help in cleaning up the docs will be appreciated! |
377 |
|
378 |
Any patches should be made against the .pod files, not the .html files. |
379 |
|
380 |
=head1 Swish-e Development |
381 |
|
382 |
Swish-e is currently being developed as an open source project on |
383 |
SourceForge http://sourceforge.net. |
384 |
|
385 |
Contact the Swish-e list for questions. |
386 |
|
387 |
=head1 Swish-e's History |
388 |
|
389 |
SWISH was created by Kevin Hughes to fill the need of the growing number |
390 |
of Web administrators on the Internet - many of the indexing systems were |
391 |
not well documented, were hard to use and install, and were too complex |
392 |
for their own good. The system was widely used for several years, long |
393 |
enough to collect some bug fixes and requests for enhancements. |
394 |
|
395 |
In Fall 1996, The Library of UC Berkeley received permission from |
396 |
Kevin Hughes to implement bug fixes and enhancements to the original |
397 |
binary. The result is Swish-enhanced or Swish-e, brought to you by the |
398 |
Swish-e Development Team. |
399 |
|
400 |
=head1 Document Info |
401 |
|
402 |
Each document in the Swish-e distribution contains this section. |
403 |
It refers only to the specific page it's located in, and not to the |
404 |
Swish-e program or the documentation as a whole. |
405 |
|
406 |
$Id: README.pod,v 1.11 2002/08/20 22:24:08 whmoseley Exp $ |
407 |
|
408 |
. |