| 1 |
NAME |
| 2 |
The Swish-e README File |
| 3 |
|
| 4 |
What is Swish-e? |
| 5 |
Swish-e is Simple Web Indexing System for Humans - Enhanced. Swish-e can |
| 6 |
quickly and easily index directories of files or remote web sites and |
| 7 |
search the generated indexes. |
| 8 |
|
| 9 |
Swish-e it extremely fast in both indexing and searching, highly |
| 10 |
configurable, and can be seamlessly integrated with existing web sites |
| 11 |
to maintain a consistent design. Swish-e can index web pages, but can |
| 12 |
just as easily index text files, mailing list archives, or data stored |
| 13 |
in a relational database. |
| 14 |
|
| 15 |
Swish-e version 2.2 represents a major rewrite of the code and the |
| 16 |
addition of many new features. Memory requirements for indexing have |
| 17 |
been reduced, and indexing speed is significantly improved from previous |
| 18 |
versions. New features allow more control over indexing, better document |
| 19 |
parsing, improved indexing and searching logic, better filter code, and |
| 20 |
the ability to index from any data source. |
| 21 |
|
| 22 |
Swish-e is not a "turn-key" indexing and searching solution. The Swish-e |
| 23 |
distribution contains most of the parts to create such a system, but you |
| 24 |
need to put the parts together as best meets your needs. You will need |
| 25 |
to configure Swish-e to index your documents, create an index by running |
| 26 |
Swish-e, and setup an interface such as a CGI script (a script is |
| 27 |
included). Swish uses helper programs to index documents of types that |
| 28 |
Swish-e cannot natively index. These programs may need to be installed |
| 29 |
separately from Swish-e. |
| 30 |
|
| 31 |
Swish-e is an Open Source (see: http://opensource.org ) program |
| 32 |
supported by developers and a large group of users. Please take time to |
| 33 |
join the Swish-e discussion list at http://Swish-e.org. |
| 34 |
|
| 35 |
Key features |
| 36 |
|
| 37 |
* Quickly index a large number of documents in different formats |
| 38 |
including text, HTML, and XML |
| 39 |
|
| 40 |
* Use "filters" to index other types of files such as PDF, gzip, or |
| 41 |
Postscript. |
| 42 |
|
| 43 |
* Includes a web spider for indexing remote documents over HTTP. |
| 44 |
Follows Robots Exclusion Rules (including META tags). |
| 45 |
|
| 46 |
* Use an external program to supply documents to Swish-e, such as an |
| 47 |
advanced spider for your web server or a program to read and format |
| 48 |
records from a relational database. |
| 49 |
|
| 50 |
* Document "properties" (some subset of the source document, usually |
| 51 |
defined as a META or XML elements) may be stored in the index and |
| 52 |
returned with search results |
| 53 |
|
| 54 |
* Document summaries can be returned with each search |
| 55 |
|
| 56 |
* Word stemming, soundex, metaphone, and double-metaphone indexing for |
| 57 |
"fuzzy" searching |
| 58 |
|
| 59 |
* Phrase searching and wildcard searching |
| 60 |
|
| 61 |
* Limit searches to HTML links |
| 62 |
|
| 63 |
* Use powerful Regular Expressions to select documents for indexing or |
| 64 |
exclusion |
| 65 |
|
| 66 |
* Easily limit searches to parts or all of your web site |
| 67 |
|
| 68 |
* Results can be sorted by relevance or by any number of properties in |
| 69 |
ascending or descending order |
| 70 |
|
| 71 |
* Limit searches to parts of documents such as certain HTML tags |
| 72 |
(META, TITLE, comments, etc.) or to XML elements. |
| 73 |
|
| 74 |
* Can report structural errors in your XML and HTML documents |
| 75 |
|
| 76 |
* Index file is portable between platforms. |
| 77 |
|
| 78 |
* A Swish-e library is provided to allow embedding Swish-e into your |
| 79 |
applications. A Perl module is available that provides a standard |
| 80 |
API for accessing Swish-e. |
| 81 |
|
| 82 |
* Includes example search scripts |
| 83 |
|
| 84 |
* Swish-e is fast. |
| 85 |
|
| 86 |
* It's open source and FREE! You can customize Swish-e and you can |
| 87 |
contribute your fancy new features to the project. |
| 88 |
|
| 89 |
* Supported by on-line user and developer groups |
| 90 |
|
| 91 |
Where do I get Swish-e? |
| 92 |
The current version of Swish-e can be found at: |
| 93 |
|
| 94 |
http://Swish-e.org |
| 95 |
|
| 96 |
Please make sure you use a current version of Swish-e. |
| 97 |
|
| 98 |
Information about Windows binary distributions can also be found at this |
| 99 |
site. |
| 100 |
|
| 101 |
How Do I Install Swish-e? |
| 102 |
Read the INSTALL page. |
| 103 |
|
| 104 |
Building from source is recommended. On most platforms Swish-e should |
| 105 |
build without problems. Information on building for VMS and Win32 can be |
| 106 |
found in sub-directories of the "src" directory. Check the Swish-e site |
| 107 |
for information about binary distributions (such as for Windows). |
| 108 |
|
| 109 |
In addition to the INSTALL page, make sure you read the SWISH-FAQ page |
| 110 |
if you have any questions, or to get an idea of questions that you might |
| 111 |
someday ask. |
| 112 |
|
| 113 |
Problems or questions about installing Swish-e should be directed to the |
| 114 |
Swish-e discussion list (see the Swish-e web site at |
| 115 |
http://Swish-e.org). |
| 116 |
|
| 117 |
The Swish-e Documentation |
| 118 |
Documetation is provided in the Swish-e distribution package in two |
| 119 |
forms, POD (Plain Old Documentation), and in html format. The POD |
| 120 |
documentation is in the pod directory, and the HTML documentation is in |
| 121 |
the html directory, of course. |
| 122 |
|
| 123 |
If your system includes the required support files and programs, the |
| 124 |
distribution make files can also generate the documentation in these |
| 125 |
formats: |
| 126 |
|
| 127 |
Postscript |
| 128 |
PDF (Adobe Acrobat) |
| 129 |
system man pages |
| 130 |
|
| 131 |
You may also build a "split" version of the documentation where each |
| 132 |
topic heading is a separate web page. Building the split version also |
| 133 |
creates a Swish-e index of the documentation that makes the |
| 134 |
documentation searchable via the included Perl CGI program. |
| 135 |
|
| 136 |
Building these other forms of documentation require additional helper |
| 137 |
applications -- most modern Linux distributions will include all that's |
| 138 |
needed (at least mine does...). You shouldn't have a problem if you have |
| 139 |
kept your Perl and Perl libraries up to date. |
| 140 |
|
| 141 |
Online documentation can be found at the Swish-e web site listed above. |
| 142 |
|
| 143 |
See INSTALL for information on creating the PDF and Postscript versions |
| 144 |
of the documentation, and for information on installing the SWISH-* |
| 145 |
documentation as Unix man(1) pages. |
| 146 |
|
| 147 |
How do I read the Swish-e documentation? |
| 148 |
|
| 149 |
The Swish-e documentation included with the distribution is in POD and |
| 150 |
HTML formats. The POD documentation can be found in the pod directory, |
| 151 |
and the HTML documentation can be found in the html directory. |
| 152 |
|
| 153 |
To view the HTML documentation point your browser to the html/index.html |
| 154 |
file. |
| 155 |
|
| 156 |
The POD documentation is displayed by the "perldoc" command that is |
| 157 |
included with every Perl installation. For example, to view the Swish-e |
| 158 |
installation documentation page called "INSTALL", type |
| 159 |
|
| 160 |
perldoc pod/INSTALL |
| 161 |
|
| 162 |
or to make life easier, |
| 163 |
|
| 164 |
cd pod |
| 165 |
perldoc INSTALL |
| 166 |
perldoc SWISH-RUN |
| 167 |
|
| 168 |
Complain to your system administrator if the "perldoc" command is not |
| 169 |
available on your machine. |
| 170 |
|
| 171 |
Included Documentation |
| 172 |
|
| 173 |
The following documentation is included in this Swish-e distribution. |
| 174 |
|
| 175 |
If you are new to Swish-e read the INSTALL page to get Swish-e installed |
| 176 |
and tested. Work through the example in shown in the INSTALL page, and |
| 177 |
the examples in the conf directory. Also review the SWISH-FAQ. |
| 178 |
|
| 179 |
* README - This file |
| 180 |
|
| 181 |
* INSTALL - Installation and basic usage instructions |
| 182 |
|
| 183 |
* SWISH-CONFIG - Configuration File Directives |
| 184 |
|
| 185 |
* SWISH-RUN - Running Swish and Command Line Switches |
| 186 |
|
| 187 |
* SWISH-SEARCH - All about Searching with Swish-e |
| 188 |
|
| 189 |
* SWISH-FAQ - Common questions, and some answers |
| 190 |
|
| 191 |
* SWISH-LIBRARY - Interface to the Swish-e C library |
| 192 |
|
| 193 |
* SWISH-PERL - Instructions for using the Perl library |
| 194 |
|
| 195 |
* CHANGES - List of feature changes and bug fixes |
| 196 |
|
| 197 |
* SWISH-BUGS - List of known bugs in the release |
| 198 |
|
| 199 |
Document Generation |
| 200 |
|
| 201 |
The Swish-e documentation in HTML format was created with |
| 202 |
Pod::HtmlPsPdf, a package of Perl modules written and/or modified by |
| 203 |
Stas Bekman to automate the conversion of documents in pod format (see |
| 204 |
perldoc perlpod) to HTML, Postscript, and PDF. A slightly modified |
| 205 |
version of this package is include with the Swish-e distribution and |
| 206 |
used for building the HTML. As distributed, Swish-e contains only the |
| 207 |
pod and HTML documentation. See INSTALL for instructions on creating |
| 208 |
man(1), Postscript, and PDF formats. |
| 209 |
|
| 210 |
Thanks, Stas, for your help! |
| 211 |
|
| 212 |
What's included in the Swish-e distribution? |
| 213 |
Here's an overview of the directories included in the Swish-e |
| 214 |
distribution: |
| 215 |
|
| 216 |
conf/ |
| 217 |
Example Swish-e configuration setups to help you get started. After |
| 218 |
reading the INSTALL page, and its included example, review the sample |
| 219 |
configuration in this directory. |
| 220 |
|
| 221 |
conf/stopwords |
| 222 |
In the "conf/stopwords" sub-directory are a number of stopword files |
| 223 |
for different languages. Use of stopwords is not required with |
| 224 |
Swish-e. |
| 225 |
|
| 226 |
doc/ |
| 227 |
Contains files required for building the HTML, PDF, and Postscript |
| 228 |
documentation. |
| 229 |
|
| 230 |
example/ |
| 231 |
This contains a sample CGI script (swish.cgi) for searching with |
| 232 |
Swish-e. Documentation for using swish.cgi are included within the |
| 233 |
script. Type: |
| 234 |
|
| 235 |
perldoc example/swish.cgi |
| 236 |
|
| 237 |
from the top-level directory where the Swish-e distribution was |
| 238 |
unpacked. |
| 239 |
|
| 240 |
filter-bin/ |
| 241 |
Sample programs to use with Swish-e's "filters". Examples include |
| 242 |
PDF, MS Word, and binary strings filters. Filters often require |
| 243 |
installing separate document conversion programs. |
| 244 |
|
| 245 |
html/ |
| 246 |
The documentation in HTML format. |
| 247 |
|
| 248 |
perl/ |
| 249 |
The Perl interface to the Swish-e C library. This Perl module |
| 250 |
provides direct access to Swish-e from within your Perl programs. See |
| 251 |
the perl/README file for more information. |
| 252 |
|
| 253 |
pod/ |
| 254 |
The source for all documentation in perldoc (pod) format. |
| 255 |
|
| 256 |
prog-bin/ |
| 257 |
Example programs and modules to use with the "prog" document source |
| 258 |
access method. Examples include a web spider, a program to index |
| 259 |
directly from a MySQL database, and a program to recurse a directory |
| 260 |
tree. Example Perl modules are provided for converting PDF and |
| 261 |
MS-Word documents into a format usable by Swish-e. See |
| 262 |
prog-bin/README for an overview of the programs and modules, and |
| 263 |
check each file for included documentation. |
| 264 |
|
| 265 |
The prog-bin/spider.pl program is a web spider program with many |
| 266 |
features. It contains its own documentation. Type: |
| 267 |
|
| 268 |
perldoc example/spider.pl |
| 269 |
|
| 270 |
from the top-level directory where the Swish-e distribution was |
| 271 |
unpacked. |
| 272 |
|
| 273 |
The "prog" document source feature is very powerful, but can be a |
| 274 |
challange to set up when first using Swish-e. Please contact the |
| 275 |
Swish-e disussion list if you have any questions. |
| 276 |
|
| 277 |
src/ |
| 278 |
This directory contains the source code for Swish-e. OS-specific |
| 279 |
directories are also found here. |
| 280 |
|
| 281 |
tests/ |
| 282 |
The documents used for running "make test". |
| 283 |
|
| 284 |
Where do I get help with Swish-e? |
| 285 |
If you need help with installing or using Swish-e please subscribe to |
| 286 |
the Swish-e mailing list. Visit the Swish-e web site listed above for |
| 287 |
information on subscribing to the mailing list. |
| 288 |
|
| 289 |
Before posting any questions please read QUESTIONS AND TROUBLESHOOTING |
| 290 |
in the INSTALL documentation page. |
| 291 |
|
| 292 |
Speling mistakes |
| 293 |
Please contact the Swish-e list with corrections to this documentation. |
| 294 |
Any help in cleaning up the docs will be appreciated! |
| 295 |
|
| 296 |
Any patches should be made against the .pod files, not the .html files. |
| 297 |
|
| 298 |
Swish-e Development |
| 299 |
Swish-e is currently being developed as an open source project on |
| 300 |
SourceForge http://sourceforge.net. |
| 301 |
|
| 302 |
Contact the Swish-e list for questions. |
| 303 |
|
| 304 |
Swish-e's History |
| 305 |
SWISH was created by Kevin Hughes to fill the need of the growing number |
| 306 |
of Web administrators on the Internet - many of the indexing systems |
| 307 |
were not well documented, were hard to use and install, and were too |
| 308 |
complex for their own good. The system was widely used for several |
| 309 |
years, long enough to collect some bug fixes and requests for |
| 310 |
enhancements. |
| 311 |
|
| 312 |
In Fall 1996, The Library of UC Berkeley received permission from Kevin |
| 313 |
Hughes to implement bug fixes and enhancements to the original binary. |
| 314 |
The result is Swish-enhanced or Swish-e, brought to you by the Swish-e |
| 315 |
Development Team. |
| 316 |
|
| 317 |
Document Info |
| 318 |
Each document in the Swish-e distribution contains this section. It |
| 319 |
refers only to the specific page it's located in, and not to the Swish-e |
| 320 |
program or the documentation as a whole. |
| 321 |
|
| 322 |
$Id: README.pod,v 1.11 2002/08/20 22:24:08 whmoseley Exp $ |
| 323 |
|
| 324 |
. |