| 1 |
adcroft |
1.1 |
NAME |
| 2 |
|
|
The Swish-e README File |
| 3 |
|
|
|
| 4 |
|
|
What is Swish-e? |
| 5 |
|
|
Swish-e is Simple Web Indexing System for Humans - Enhanced. Swish-e can |
| 6 |
|
|
quickly and easily index directories of files or remote web sites and |
| 7 |
|
|
search the generated indexes. |
| 8 |
|
|
|
| 9 |
|
|
Swish-e it extremely fast in both indexing and searching, highly |
| 10 |
|
|
configurable, and can be seamlessly integrated with existing web sites |
| 11 |
|
|
to maintain a consistent design. Swish-e can index web pages, but can |
| 12 |
|
|
just as easily index text files, mailing list archives, or data stored |
| 13 |
|
|
in a relational database. |
| 14 |
|
|
|
| 15 |
|
|
Swish-e version 2.2 represents a major rewrite of the code and the |
| 16 |
|
|
addition of many new features. Memory requirements for indexing have |
| 17 |
|
|
been reduced, and indexing speed is significantly improved from previous |
| 18 |
|
|
versions. New features allow more control over indexing, better document |
| 19 |
|
|
parsing, improved indexing and searching logic, better filter code, and |
| 20 |
|
|
the ability to index from any data source. |
| 21 |
|
|
|
| 22 |
|
|
Swish-e is not a "turn-key" indexing and searching solution. The Swish-e |
| 23 |
|
|
distribution contains most of the parts to create such a system, but you |
| 24 |
|
|
need to put the parts together as best meets your needs. You will need |
| 25 |
|
|
to configure Swish-e to index your documents, create an index by running |
| 26 |
|
|
Swish-e, and setup an interface such as a CGI script (a script is |
| 27 |
|
|
included). Swish uses helper programs to index documents of types that |
| 28 |
|
|
Swish-e cannot natively index. These programs may need to be installed |
| 29 |
|
|
separately from Swish-e. |
| 30 |
|
|
|
| 31 |
|
|
Swish-e is an Open Source (see: http://opensource.org ) program |
| 32 |
|
|
supported by developers and a large group of users. Please take time to |
| 33 |
|
|
join the Swish-e discussion list at http://Swish-e.org. |
| 34 |
|
|
|
| 35 |
|
|
Key features |
| 36 |
|
|
|
| 37 |
|
|
* Quickly index a large number of documents in different formats |
| 38 |
|
|
including text, HTML, and XML |
| 39 |
|
|
|
| 40 |
|
|
* Use "filters" to index other types of files such as PDF, gzip, or |
| 41 |
|
|
Postscript. |
| 42 |
|
|
|
| 43 |
|
|
* Includes a web spider for indexing remote documents over HTTP. |
| 44 |
|
|
Follows Robots Exclusion Rules (including META tags). |
| 45 |
|
|
|
| 46 |
|
|
* Use an external program to supply documents to Swish-e, such as an |
| 47 |
|
|
advanced spider for your web server or a program to read and format |
| 48 |
|
|
records from a relational database. |
| 49 |
|
|
|
| 50 |
|
|
* Document "properties" (some subset of the source document, usually |
| 51 |
|
|
defined as a META or XML elements) may be stored in the index and |
| 52 |
|
|
returned with search results |
| 53 |
|
|
|
| 54 |
|
|
* Document summaries can be returned with each search |
| 55 |
|
|
|
| 56 |
|
|
* Word stemming, soundex, metaphone, and double-metaphone indexing for |
| 57 |
|
|
"fuzzy" searching |
| 58 |
|
|
|
| 59 |
|
|
* Phrase searching and wildcard searching |
| 60 |
|
|
|
| 61 |
|
|
* Limit searches to HTML links |
| 62 |
|
|
|
| 63 |
|
|
* Use powerful Regular Expressions to select documents for indexing or |
| 64 |
|
|
exclusion |
| 65 |
|
|
|
| 66 |
|
|
* Easily limit searches to parts or all of your web site |
| 67 |
|
|
|
| 68 |
|
|
* Results can be sorted by relevance or by any number of properties in |
| 69 |
|
|
ascending or descending order |
| 70 |
|
|
|
| 71 |
|
|
* Limit searches to parts of documents such as certain HTML tags |
| 72 |
|
|
(META, TITLE, comments, etc.) or to XML elements. |
| 73 |
|
|
|
| 74 |
|
|
* Can report structural errors in your XML and HTML documents |
| 75 |
|
|
|
| 76 |
|
|
* Index file is portable between platforms. |
| 77 |
|
|
|
| 78 |
|
|
* A Swish-e library is provided to allow embedding Swish-e into your |
| 79 |
|
|
applications. A Perl module is available that provides a standard |
| 80 |
|
|
API for accessing Swish-e. |
| 81 |
|
|
|
| 82 |
|
|
* Includes example search scripts |
| 83 |
|
|
|
| 84 |
|
|
* Swish-e is fast. |
| 85 |
|
|
|
| 86 |
|
|
* It's open source and FREE! You can customize Swish-e and you can |
| 87 |
|
|
contribute your fancy new features to the project. |
| 88 |
|
|
|
| 89 |
|
|
* Supported by on-line user and developer groups |
| 90 |
|
|
|
| 91 |
|
|
Where do I get Swish-e? |
| 92 |
|
|
The current version of Swish-e can be found at: |
| 93 |
|
|
|
| 94 |
|
|
http://Swish-e.org |
| 95 |
|
|
|
| 96 |
|
|
Please make sure you use a current version of Swish-e. |
| 97 |
|
|
|
| 98 |
|
|
Information about Windows binary distributions can also be found at this |
| 99 |
|
|
site. |
| 100 |
|
|
|
| 101 |
|
|
How Do I Install Swish-e? |
| 102 |
|
|
Read the INSTALL page. |
| 103 |
|
|
|
| 104 |
|
|
Building from source is recommended. On most platforms Swish-e should |
| 105 |
|
|
build without problems. Information on building for VMS and Win32 can be |
| 106 |
|
|
found in sub-directories of the "src" directory. Check the Swish-e site |
| 107 |
|
|
for information about binary distributions (such as for Windows). |
| 108 |
|
|
|
| 109 |
|
|
In addition to the INSTALL page, make sure you read the SWISH-FAQ page |
| 110 |
|
|
if you have any questions, or to get an idea of questions that you might |
| 111 |
|
|
someday ask. |
| 112 |
|
|
|
| 113 |
|
|
Problems or questions about installing Swish-e should be directed to the |
| 114 |
|
|
Swish-e discussion list (see the Swish-e web site at |
| 115 |
|
|
http://Swish-e.org). |
| 116 |
|
|
|
| 117 |
|
|
The Swish-e Documentation |
| 118 |
|
|
Documetation is provided in the Swish-e distribution package in two |
| 119 |
|
|
forms, POD (Plain Old Documentation), and in html format. The POD |
| 120 |
|
|
documentation is in the pod directory, and the HTML documentation is in |
| 121 |
|
|
the html directory, of course. |
| 122 |
|
|
|
| 123 |
|
|
If your system includes the required support files and programs, the |
| 124 |
|
|
distribution make files can also generate the documentation in these |
| 125 |
|
|
formats: |
| 126 |
|
|
|
| 127 |
|
|
Postscript |
| 128 |
|
|
PDF (Adobe Acrobat) |
| 129 |
|
|
system man pages |
| 130 |
|
|
|
| 131 |
|
|
You may also build a "split" version of the documentation where each |
| 132 |
|
|
topic heading is a separate web page. Building the split version also |
| 133 |
|
|
creates a Swish-e index of the documentation that makes the |
| 134 |
|
|
documentation searchable via the included Perl CGI program. |
| 135 |
|
|
|
| 136 |
|
|
Building these other forms of documentation require additional helper |
| 137 |
|
|
applications -- most modern Linux distributions will include all that's |
| 138 |
|
|
needed (at least mine does...). You shouldn't have a problem if you have |
| 139 |
|
|
kept your Perl and Perl libraries up to date. |
| 140 |
|
|
|
| 141 |
|
|
Online documentation can be found at the Swish-e web site listed above. |
| 142 |
|
|
|
| 143 |
|
|
See INSTALL for information on creating the PDF and Postscript versions |
| 144 |
|
|
of the documentation, and for information on installing the SWISH-* |
| 145 |
|
|
documentation as Unix man(1) pages. |
| 146 |
|
|
|
| 147 |
|
|
How do I read the Swish-e documentation? |
| 148 |
|
|
|
| 149 |
|
|
The Swish-e documentation included with the distribution is in POD and |
| 150 |
|
|
HTML formats. The POD documentation can be found in the pod directory, |
| 151 |
|
|
and the HTML documentation can be found in the html directory. |
| 152 |
|
|
|
| 153 |
|
|
To view the HTML documentation point your browser to the html/index.html |
| 154 |
|
|
file. |
| 155 |
|
|
|
| 156 |
|
|
The POD documentation is displayed by the "perldoc" command that is |
| 157 |
|
|
included with every Perl installation. For example, to view the Swish-e |
| 158 |
|
|
installation documentation page called "INSTALL", type |
| 159 |
|
|
|
| 160 |
|
|
perldoc pod/INSTALL |
| 161 |
|
|
|
| 162 |
|
|
or to make life easier, |
| 163 |
|
|
|
| 164 |
|
|
cd pod |
| 165 |
|
|
perldoc INSTALL |
| 166 |
|
|
perldoc SWISH-RUN |
| 167 |
|
|
|
| 168 |
|
|
Complain to your system administrator if the "perldoc" command is not |
| 169 |
|
|
available on your machine. |
| 170 |
|
|
|
| 171 |
|
|
Included Documentation |
| 172 |
|
|
|
| 173 |
|
|
The following documentation is included in this Swish-e distribution. |
| 174 |
|
|
|
| 175 |
|
|
If you are new to Swish-e read the INSTALL page to get Swish-e installed |
| 176 |
|
|
and tested. Work through the example in shown in the INSTALL page, and |
| 177 |
|
|
the examples in the conf directory. Also review the SWISH-FAQ. |
| 178 |
|
|
|
| 179 |
|
|
* README - This file |
| 180 |
|
|
|
| 181 |
|
|
* INSTALL - Installation and basic usage instructions |
| 182 |
|
|
|
| 183 |
|
|
* SWISH-CONFIG - Configuration File Directives |
| 184 |
|
|
|
| 185 |
|
|
* SWISH-RUN - Running Swish and Command Line Switches |
| 186 |
|
|
|
| 187 |
|
|
* SWISH-SEARCH - All about Searching with Swish-e |
| 188 |
|
|
|
| 189 |
|
|
* SWISH-FAQ - Common questions, and some answers |
| 190 |
|
|
|
| 191 |
|
|
* SWISH-LIBRARY - Interface to the Swish-e C library |
| 192 |
|
|
|
| 193 |
|
|
* SWISH-PERL - Instructions for using the Perl library |
| 194 |
|
|
|
| 195 |
|
|
* CHANGES - List of feature changes and bug fixes |
| 196 |
|
|
|
| 197 |
|
|
* SWISH-BUGS - List of known bugs in the release |
| 198 |
|
|
|
| 199 |
|
|
Document Generation |
| 200 |
|
|
|
| 201 |
|
|
The Swish-e documentation in HTML format was created with |
| 202 |
|
|
Pod::HtmlPsPdf, a package of Perl modules written and/or modified by |
| 203 |
|
|
Stas Bekman to automate the conversion of documents in pod format (see |
| 204 |
|
|
perldoc perlpod) to HTML, Postscript, and PDF. A slightly modified |
| 205 |
|
|
version of this package is include with the Swish-e distribution and |
| 206 |
|
|
used for building the HTML. As distributed, Swish-e contains only the |
| 207 |
|
|
pod and HTML documentation. See INSTALL for instructions on creating |
| 208 |
|
|
man(1), Postscript, and PDF formats. |
| 209 |
|
|
|
| 210 |
|
|
Thanks, Stas, for your help! |
| 211 |
|
|
|
| 212 |
|
|
What's included in the Swish-e distribution? |
| 213 |
|
|
Here's an overview of the directories included in the Swish-e |
| 214 |
|
|
distribution: |
| 215 |
|
|
|
| 216 |
|
|
conf/ |
| 217 |
|
|
Example Swish-e configuration setups to help you get started. After |
| 218 |
|
|
reading the INSTALL page, and its included example, review the sample |
| 219 |
|
|
configuration in this directory. |
| 220 |
|
|
|
| 221 |
|
|
conf/stopwords |
| 222 |
|
|
In the "conf/stopwords" sub-directory are a number of stopword files |
| 223 |
|
|
for different languages. Use of stopwords is not required with |
| 224 |
|
|
Swish-e. |
| 225 |
|
|
|
| 226 |
|
|
doc/ |
| 227 |
|
|
Contains files required for building the HTML, PDF, and Postscript |
| 228 |
|
|
documentation. |
| 229 |
|
|
|
| 230 |
|
|
example/ |
| 231 |
|
|
This contains a sample CGI script (swish.cgi) for searching with |
| 232 |
|
|
Swish-e. Documentation for using swish.cgi are included within the |
| 233 |
|
|
script. Type: |
| 234 |
|
|
|
| 235 |
|
|
perldoc example/swish.cgi |
| 236 |
|
|
|
| 237 |
|
|
from the top-level directory where the Swish-e distribution was |
| 238 |
|
|
unpacked. |
| 239 |
|
|
|
| 240 |
|
|
filter-bin/ |
| 241 |
|
|
Sample programs to use with Swish-e's "filters". Examples include |
| 242 |
|
|
PDF, MS Word, and binary strings filters. Filters often require |
| 243 |
|
|
installing separate document conversion programs. |
| 244 |
|
|
|
| 245 |
|
|
html/ |
| 246 |
|
|
The documentation in HTML format. |
| 247 |
|
|
|
| 248 |
|
|
perl/ |
| 249 |
|
|
The Perl interface to the Swish-e C library. This Perl module |
| 250 |
|
|
provides direct access to Swish-e from within your Perl programs. See |
| 251 |
|
|
the perl/README file for more information. |
| 252 |
|
|
|
| 253 |
|
|
pod/ |
| 254 |
|
|
The source for all documentation in perldoc (pod) format. |
| 255 |
|
|
|
| 256 |
|
|
prog-bin/ |
| 257 |
|
|
Example programs and modules to use with the "prog" document source |
| 258 |
|
|
access method. Examples include a web spider, a program to index |
| 259 |
|
|
directly from a MySQL database, and a program to recurse a directory |
| 260 |
|
|
tree. Example Perl modules are provided for converting PDF and |
| 261 |
|
|
MS-Word documents into a format usable by Swish-e. See |
| 262 |
|
|
prog-bin/README for an overview of the programs and modules, and |
| 263 |
|
|
check each file for included documentation. |
| 264 |
|
|
|
| 265 |
|
|
The prog-bin/spider.pl program is a web spider program with many |
| 266 |
|
|
features. It contains its own documentation. Type: |
| 267 |
|
|
|
| 268 |
|
|
perldoc example/spider.pl |
| 269 |
|
|
|
| 270 |
|
|
from the top-level directory where the Swish-e distribution was |
| 271 |
|
|
unpacked. |
| 272 |
|
|
|
| 273 |
|
|
The "prog" document source feature is very powerful, but can be a |
| 274 |
|
|
challange to set up when first using Swish-e. Please contact the |
| 275 |
|
|
Swish-e disussion list if you have any questions. |
| 276 |
|
|
|
| 277 |
|
|
src/ |
| 278 |
|
|
This directory contains the source code for Swish-e. OS-specific |
| 279 |
|
|
directories are also found here. |
| 280 |
|
|
|
| 281 |
|
|
tests/ |
| 282 |
|
|
The documents used for running "make test". |
| 283 |
|
|
|
| 284 |
|
|
Where do I get help with Swish-e? |
| 285 |
|
|
If you need help with installing or using Swish-e please subscribe to |
| 286 |
|
|
the Swish-e mailing list. Visit the Swish-e web site listed above for |
| 287 |
|
|
information on subscribing to the mailing list. |
| 288 |
|
|
|
| 289 |
|
|
Before posting any questions please read QUESTIONS AND TROUBLESHOOTING |
| 290 |
|
|
in the INSTALL documentation page. |
| 291 |
|
|
|
| 292 |
|
|
Speling mistakes |
| 293 |
|
|
Please contact the Swish-e list with corrections to this documentation. |
| 294 |
|
|
Any help in cleaning up the docs will be appreciated! |
| 295 |
|
|
|
| 296 |
|
|
Any patches should be made against the .pod files, not the .html files. |
| 297 |
|
|
|
| 298 |
|
|
Swish-e Development |
| 299 |
|
|
Swish-e is currently being developed as an open source project on |
| 300 |
|
|
SourceForge http://sourceforge.net. |
| 301 |
|
|
|
| 302 |
|
|
Contact the Swish-e list for questions. |
| 303 |
|
|
|
| 304 |
|
|
Swish-e's History |
| 305 |
|
|
SWISH was created by Kevin Hughes to fill the need of the growing number |
| 306 |
|
|
of Web administrators on the Internet - many of the indexing systems |
| 307 |
|
|
were not well documented, were hard to use and install, and were too |
| 308 |
|
|
complex for their own good. The system was widely used for several |
| 309 |
|
|
years, long enough to collect some bug fixes and requests for |
| 310 |
|
|
enhancements. |
| 311 |
|
|
|
| 312 |
|
|
In Fall 1996, The Library of UC Berkeley received permission from Kevin |
| 313 |
|
|
Hughes to implement bug fixes and enhancements to the original binary. |
| 314 |
|
|
The result is Swish-enhanced or Swish-e, brought to you by the Swish-e |
| 315 |
|
|
Development Team. |
| 316 |
|
|
|
| 317 |
|
|
Document Info |
| 318 |
|
|
Each document in the Swish-e distribution contains this section. It |
| 319 |
|
|
refers only to the specific page it's located in, and not to the Swish-e |
| 320 |
|
|
program or the documentation as a whole. |
| 321 |
|
|
|
| 322 |
|
|
$Id: README.pod,v 1.11 2002/08/20 22:24:08 whmoseley Exp $ |
| 323 |
|
|
|
| 324 |
|
|
. |