1 |
NAME |
2 |
The Swish-e README File |
3 |
|
4 |
What is Swish-e? |
5 |
Swish-e is Simple Web Indexing System for Humans - Enhanced. Swish-e can |
6 |
quickly and easily index directories of files or remote web sites and |
7 |
search the generated indexes. |
8 |
|
9 |
Swish-e it extremely fast in both indexing and searching, highly |
10 |
configurable, and can be seamlessly integrated with existing web sites |
11 |
to maintain a consistent design. Swish-e can index web pages, but can |
12 |
just as easily index text files, mailing list archives, or data stored |
13 |
in a relational database. |
14 |
|
15 |
Swish-e version 2.2 represents a major rewrite of the code and the |
16 |
addition of many new features. Memory requirements for indexing have |
17 |
been reduced, and indexing speed is significantly improved from previous |
18 |
versions. New features allow more control over indexing, better document |
19 |
parsing, improved indexing and searching logic, better filter code, and |
20 |
the ability to index from any data source. |
21 |
|
22 |
Swish-e is not a "turn-key" indexing and searching solution. The Swish-e |
23 |
distribution contains most of the parts to create such a system, but you |
24 |
need to put the parts together as best meets your needs. You will need |
25 |
to configure Swish-e to index your documents, create an index by running |
26 |
Swish-e, and setup an interface such as a CGI script (a script is |
27 |
included). Swish uses helper programs to index documents of types that |
28 |
Swish-e cannot natively index. These programs may need to be installed |
29 |
separately from Swish-e. |
30 |
|
31 |
Swish-e is an Open Source (see: http://opensource.org ) program |
32 |
supported by developers and a large group of users. Please take time to |
33 |
join the Swish-e discussion list at http://Swish-e.org. |
34 |
|
35 |
Key features |
36 |
|
37 |
* Quickly index a large number of documents in different formats |
38 |
including text, HTML, and XML |
39 |
|
40 |
* Use "filters" to index other types of files such as PDF, gzip, or |
41 |
Postscript. |
42 |
|
43 |
* Includes a web spider for indexing remote documents over HTTP. |
44 |
Follows Robots Exclusion Rules (including META tags). |
45 |
|
46 |
* Use an external program to supply documents to Swish-e, such as an |
47 |
advanced spider for your web server or a program to read and format |
48 |
records from a relational database. |
49 |
|
50 |
* Document "properties" (some subset of the source document, usually |
51 |
defined as a META or XML elements) may be stored in the index and |
52 |
returned with search results |
53 |
|
54 |
* Document summaries can be returned with each search |
55 |
|
56 |
* Word stemming, soundex, metaphone, and double-metaphone indexing for |
57 |
"fuzzy" searching |
58 |
|
59 |
* Phrase searching and wildcard searching |
60 |
|
61 |
* Limit searches to HTML links |
62 |
|
63 |
* Use powerful Regular Expressions to select documents for indexing or |
64 |
exclusion |
65 |
|
66 |
* Easily limit searches to parts or all of your web site |
67 |
|
68 |
* Results can be sorted by relevance or by any number of properties in |
69 |
ascending or descending order |
70 |
|
71 |
* Limit searches to parts of documents such as certain HTML tags |
72 |
(META, TITLE, comments, etc.) or to XML elements. |
73 |
|
74 |
* Can report structural errors in your XML and HTML documents |
75 |
|
76 |
* Index file is portable between platforms. |
77 |
|
78 |
* A Swish-e library is provided to allow embedding Swish-e into your |
79 |
applications. A Perl module is available that provides a standard |
80 |
API for accessing Swish-e. |
81 |
|
82 |
* Includes example search scripts |
83 |
|
84 |
* Swish-e is fast. |
85 |
|
86 |
* It's open source and FREE! You can customize Swish-e and you can |
87 |
contribute your fancy new features to the project. |
88 |
|
89 |
* Supported by on-line user and developer groups |
90 |
|
91 |
Where do I get Swish-e? |
92 |
The current version of Swish-e can be found at: |
93 |
|
94 |
http://Swish-e.org |
95 |
|
96 |
Please make sure you use a current version of Swish-e. |
97 |
|
98 |
Information about Windows binary distributions can also be found at this |
99 |
site. |
100 |
|
101 |
How Do I Install Swish-e? |
102 |
Read the INSTALL page. |
103 |
|
104 |
Building from source is recommended. On most platforms Swish-e should |
105 |
build without problems. Information on building for VMS and Win32 can be |
106 |
found in sub-directories of the "src" directory. Check the Swish-e site |
107 |
for information about binary distributions (such as for Windows). |
108 |
|
109 |
In addition to the INSTALL page, make sure you read the SWISH-FAQ page |
110 |
if you have any questions, or to get an idea of questions that you might |
111 |
someday ask. |
112 |
|
113 |
Problems or questions about installing Swish-e should be directed to the |
114 |
Swish-e discussion list (see the Swish-e web site at |
115 |
http://Swish-e.org). |
116 |
|
117 |
The Swish-e Documentation |
118 |
Documetation is provided in the Swish-e distribution package in two |
119 |
forms, POD (Plain Old Documentation), and in html format. The POD |
120 |
documentation is in the pod directory, and the HTML documentation is in |
121 |
the html directory, of course. |
122 |
|
123 |
If your system includes the required support files and programs, the |
124 |
distribution make files can also generate the documentation in these |
125 |
formats: |
126 |
|
127 |
Postscript |
128 |
PDF (Adobe Acrobat) |
129 |
system man pages |
130 |
|
131 |
You may also build a "split" version of the documentation where each |
132 |
topic heading is a separate web page. Building the split version also |
133 |
creates a Swish-e index of the documentation that makes the |
134 |
documentation searchable via the included Perl CGI program. |
135 |
|
136 |
Building these other forms of documentation require additional helper |
137 |
applications -- most modern Linux distributions will include all that's |
138 |
needed (at least mine does...). You shouldn't have a problem if you have |
139 |
kept your Perl and Perl libraries up to date. |
140 |
|
141 |
Online documentation can be found at the Swish-e web site listed above. |
142 |
|
143 |
See INSTALL for information on creating the PDF and Postscript versions |
144 |
of the documentation, and for information on installing the SWISH-* |
145 |
documentation as Unix man(1) pages. |
146 |
|
147 |
How do I read the Swish-e documentation? |
148 |
|
149 |
The Swish-e documentation included with the distribution is in POD and |
150 |
HTML formats. The POD documentation can be found in the pod directory, |
151 |
and the HTML documentation can be found in the html directory. |
152 |
|
153 |
To view the HTML documentation point your browser to the html/index.html |
154 |
file. |
155 |
|
156 |
The POD documentation is displayed by the "perldoc" command that is |
157 |
included with every Perl installation. For example, to view the Swish-e |
158 |
installation documentation page called "INSTALL", type |
159 |
|
160 |
perldoc pod/INSTALL |
161 |
|
162 |
or to make life easier, |
163 |
|
164 |
cd pod |
165 |
perldoc INSTALL |
166 |
perldoc SWISH-RUN |
167 |
|
168 |
Complain to your system administrator if the "perldoc" command is not |
169 |
available on your machine. |
170 |
|
171 |
Included Documentation |
172 |
|
173 |
The following documentation is included in this Swish-e distribution. |
174 |
|
175 |
If you are new to Swish-e read the INSTALL page to get Swish-e installed |
176 |
and tested. Work through the example in shown in the INSTALL page, and |
177 |
the examples in the conf directory. Also review the SWISH-FAQ. |
178 |
|
179 |
* README - This file |
180 |
|
181 |
* INSTALL - Installation and basic usage instructions |
182 |
|
183 |
* SWISH-CONFIG - Configuration File Directives |
184 |
|
185 |
* SWISH-RUN - Running Swish and Command Line Switches |
186 |
|
187 |
* SWISH-SEARCH - All about Searching with Swish-e |
188 |
|
189 |
* SWISH-FAQ - Common questions, and some answers |
190 |
|
191 |
* SWISH-LIBRARY - Interface to the Swish-e C library |
192 |
|
193 |
* SWISH-PERL - Instructions for using the Perl library |
194 |
|
195 |
* CHANGES - List of feature changes and bug fixes |
196 |
|
197 |
* SWISH-BUGS - List of known bugs in the release |
198 |
|
199 |
Document Generation |
200 |
|
201 |
The Swish-e documentation in HTML format was created with |
202 |
Pod::HtmlPsPdf, a package of Perl modules written and/or modified by |
203 |
Stas Bekman to automate the conversion of documents in pod format (see |
204 |
perldoc perlpod) to HTML, Postscript, and PDF. A slightly modified |
205 |
version of this package is include with the Swish-e distribution and |
206 |
used for building the HTML. As distributed, Swish-e contains only the |
207 |
pod and HTML documentation. See INSTALL for instructions on creating |
208 |
man(1), Postscript, and PDF formats. |
209 |
|
210 |
Thanks, Stas, for your help! |
211 |
|
212 |
What's included in the Swish-e distribution? |
213 |
Here's an overview of the directories included in the Swish-e |
214 |
distribution: |
215 |
|
216 |
conf/ |
217 |
Example Swish-e configuration setups to help you get started. After |
218 |
reading the INSTALL page, and its included example, review the sample |
219 |
configuration in this directory. |
220 |
|
221 |
conf/stopwords |
222 |
In the "conf/stopwords" sub-directory are a number of stopword files |
223 |
for different languages. Use of stopwords is not required with |
224 |
Swish-e. |
225 |
|
226 |
doc/ |
227 |
Contains files required for building the HTML, PDF, and Postscript |
228 |
documentation. |
229 |
|
230 |
example/ |
231 |
This contains a sample CGI script (swish.cgi) for searching with |
232 |
Swish-e. Documentation for using swish.cgi are included within the |
233 |
script. Type: |
234 |
|
235 |
perldoc example/swish.cgi |
236 |
|
237 |
from the top-level directory where the Swish-e distribution was |
238 |
unpacked. |
239 |
|
240 |
filter-bin/ |
241 |
Sample programs to use with Swish-e's "filters". Examples include |
242 |
PDF, MS Word, and binary strings filters. Filters often require |
243 |
installing separate document conversion programs. |
244 |
|
245 |
html/ |
246 |
The documentation in HTML format. |
247 |
|
248 |
perl/ |
249 |
The Perl interface to the Swish-e C library. This Perl module |
250 |
provides direct access to Swish-e from within your Perl programs. See |
251 |
the perl/README file for more information. |
252 |
|
253 |
pod/ |
254 |
The source for all documentation in perldoc (pod) format. |
255 |
|
256 |
prog-bin/ |
257 |
Example programs and modules to use with the "prog" document source |
258 |
access method. Examples include a web spider, a program to index |
259 |
directly from a MySQL database, and a program to recurse a directory |
260 |
tree. Example Perl modules are provided for converting PDF and |
261 |
MS-Word documents into a format usable by Swish-e. See |
262 |
prog-bin/README for an overview of the programs and modules, and |
263 |
check each file for included documentation. |
264 |
|
265 |
The prog-bin/spider.pl program is a web spider program with many |
266 |
features. It contains its own documentation. Type: |
267 |
|
268 |
perldoc example/spider.pl |
269 |
|
270 |
from the top-level directory where the Swish-e distribution was |
271 |
unpacked. |
272 |
|
273 |
The "prog" document source feature is very powerful, but can be a |
274 |
challange to set up when first using Swish-e. Please contact the |
275 |
Swish-e disussion list if you have any questions. |
276 |
|
277 |
src/ |
278 |
This directory contains the source code for Swish-e. OS-specific |
279 |
directories are also found here. |
280 |
|
281 |
tests/ |
282 |
The documents used for running "make test". |
283 |
|
284 |
Where do I get help with Swish-e? |
285 |
If you need help with installing or using Swish-e please subscribe to |
286 |
the Swish-e mailing list. Visit the Swish-e web site listed above for |
287 |
information on subscribing to the mailing list. |
288 |
|
289 |
Before posting any questions please read QUESTIONS AND TROUBLESHOOTING |
290 |
in the INSTALL documentation page. |
291 |
|
292 |
Speling mistakes |
293 |
Please contact the Swish-e list with corrections to this documentation. |
294 |
Any help in cleaning up the docs will be appreciated! |
295 |
|
296 |
Any patches should be made against the .pod files, not the .html files. |
297 |
|
298 |
Swish-e Development |
299 |
Swish-e is currently being developed as an open source project on |
300 |
SourceForge http://sourceforge.net. |
301 |
|
302 |
Contact the Swish-e list for questions. |
303 |
|
304 |
Swish-e's History |
305 |
SWISH was created by Kevin Hughes to fill the need of the growing number |
306 |
of Web administrators on the Internet - many of the indexing systems |
307 |
were not well documented, were hard to use and install, and were too |
308 |
complex for their own good. The system was widely used for several |
309 |
years, long enough to collect some bug fixes and requests for |
310 |
enhancements. |
311 |
|
312 |
In Fall 1996, The Library of UC Berkeley received permission from Kevin |
313 |
Hughes to implement bug fixes and enhancements to the original binary. |
314 |
The result is Swish-enhanced or Swish-e, brought to you by the Swish-e |
315 |
Development Team. |
316 |
|
317 |
Document Info |
318 |
Each document in the Swish-e distribution contains this section. It |
319 |
refers only to the specific page it's located in, and not to the Swish-e |
320 |
program or the documentation as a whole. |
321 |
|
322 |
$Id: README.pod,v 1.11 2002/08/20 22:24:08 whmoseley Exp $ |
323 |
|
324 |
. |