| 1 | 
adcroft | 
1.1 | 
NAME | 
| 2 | 
  | 
  | 
    The Swish-e README File | 
| 3 | 
  | 
  | 
 | 
| 4 | 
  | 
  | 
What is Swish-e? | 
| 5 | 
  | 
  | 
    Swish-e is Simple Web Indexing System for Humans - Enhanced. Swish-e can | 
| 6 | 
  | 
  | 
    quickly and easily index directories of files or remote web sites and | 
| 7 | 
  | 
  | 
    search the generated indexes. | 
| 8 | 
  | 
  | 
 | 
| 9 | 
  | 
  | 
    Swish-e it extremely fast in both indexing and searching, highly | 
| 10 | 
  | 
  | 
    configurable, and can be seamlessly integrated with existing web sites | 
| 11 | 
  | 
  | 
    to maintain a consistent design. Swish-e can index web pages, but can | 
| 12 | 
  | 
  | 
    just as easily index text files, mailing list archives, or data stored | 
| 13 | 
  | 
  | 
    in a relational database. | 
| 14 | 
  | 
  | 
 | 
| 15 | 
  | 
  | 
    Swish-e version 2.2 represents a major rewrite of the code and the | 
| 16 | 
  | 
  | 
    addition of many new features. Memory requirements for indexing have | 
| 17 | 
  | 
  | 
    been reduced, and indexing speed is significantly improved from previous | 
| 18 | 
  | 
  | 
    versions. New features allow more control over indexing, better document | 
| 19 | 
  | 
  | 
    parsing, improved indexing and searching logic, better filter code, and | 
| 20 | 
  | 
  | 
    the ability to index from any data source. | 
| 21 | 
  | 
  | 
 | 
| 22 | 
  | 
  | 
    Swish-e is not a "turn-key" indexing and searching solution. The Swish-e | 
| 23 | 
  | 
  | 
    distribution contains most of the parts to create such a system, but you | 
| 24 | 
  | 
  | 
    need to put the parts together as best meets your needs. You will need | 
| 25 | 
  | 
  | 
    to configure Swish-e to index your documents, create an index by running | 
| 26 | 
  | 
  | 
    Swish-e, and setup an interface such as a CGI script (a script is | 
| 27 | 
  | 
  | 
    included). Swish uses helper programs to index documents of types that | 
| 28 | 
  | 
  | 
    Swish-e cannot natively index. These programs may need to be installed | 
| 29 | 
  | 
  | 
    separately from Swish-e. | 
| 30 | 
  | 
  | 
 | 
| 31 | 
  | 
  | 
    Swish-e is an Open Source (see: http://opensource.org ) program | 
| 32 | 
  | 
  | 
    supported by developers and a large group of users. Please take time to | 
| 33 | 
  | 
  | 
    join the Swish-e discussion list at http://Swish-e.org. | 
| 34 | 
  | 
  | 
 | 
| 35 | 
  | 
  | 
  Key features | 
| 36 | 
  | 
  | 
 | 
| 37 | 
  | 
  | 
    *   Quickly index a large number of documents in different formats | 
| 38 | 
  | 
  | 
        including text, HTML, and XML | 
| 39 | 
  | 
  | 
 | 
| 40 | 
  | 
  | 
    *   Use "filters" to index other types of files such as PDF, gzip, or | 
| 41 | 
  | 
  | 
        Postscript. | 
| 42 | 
  | 
  | 
 | 
| 43 | 
  | 
  | 
    *   Includes a web spider for indexing remote documents over HTTP. | 
| 44 | 
  | 
  | 
        Follows Robots Exclusion Rules (including META tags). | 
| 45 | 
  | 
  | 
 | 
| 46 | 
  | 
  | 
    *   Use an external program to supply documents to Swish-e, such as an | 
| 47 | 
  | 
  | 
        advanced spider for your web server or a program to read and format | 
| 48 | 
  | 
  | 
        records from a relational database. | 
| 49 | 
  | 
  | 
 | 
| 50 | 
  | 
  | 
    *   Document "properties" (some subset of the source document, usually | 
| 51 | 
  | 
  | 
        defined as a META or XML elements) may be stored in the index and | 
| 52 | 
  | 
  | 
        returned with search results | 
| 53 | 
  | 
  | 
 | 
| 54 | 
  | 
  | 
    *   Document summaries can be returned with each search | 
| 55 | 
  | 
  | 
 | 
| 56 | 
  | 
  | 
    *   Word stemming, soundex, metaphone, and double-metaphone indexing for | 
| 57 | 
  | 
  | 
        "fuzzy" searching | 
| 58 | 
  | 
  | 
 | 
| 59 | 
  | 
  | 
    *   Phrase searching and wildcard searching | 
| 60 | 
  | 
  | 
 | 
| 61 | 
  | 
  | 
    *   Limit searches to HTML links | 
| 62 | 
  | 
  | 
 | 
| 63 | 
  | 
  | 
    *   Use powerful Regular Expressions to select documents for indexing or | 
| 64 | 
  | 
  | 
        exclusion | 
| 65 | 
  | 
  | 
 | 
| 66 | 
  | 
  | 
    *   Easily limit searches to parts or all of your web site | 
| 67 | 
  | 
  | 
 | 
| 68 | 
  | 
  | 
    *   Results can be sorted by relevance or by any number of properties in | 
| 69 | 
  | 
  | 
        ascending or descending order | 
| 70 | 
  | 
  | 
 | 
| 71 | 
  | 
  | 
    *   Limit searches to parts of documents such as certain HTML tags | 
| 72 | 
  | 
  | 
        (META, TITLE, comments, etc.) or to XML elements. | 
| 73 | 
  | 
  | 
 | 
| 74 | 
  | 
  | 
    *   Can report structural errors in your XML and HTML documents | 
| 75 | 
  | 
  | 
 | 
| 76 | 
  | 
  | 
    *   Index file is portable between platforms. | 
| 77 | 
  | 
  | 
 | 
| 78 | 
  | 
  | 
    *   A Swish-e library is provided to allow embedding Swish-e into your | 
| 79 | 
  | 
  | 
        applications. A Perl module is available that provides a standard | 
| 80 | 
  | 
  | 
        API for accessing Swish-e. | 
| 81 | 
  | 
  | 
 | 
| 82 | 
  | 
  | 
    *   Includes example search scripts | 
| 83 | 
  | 
  | 
 | 
| 84 | 
  | 
  | 
    *   Swish-e is fast. | 
| 85 | 
  | 
  | 
 | 
| 86 | 
  | 
  | 
    *   It's open source and FREE! You can customize Swish-e and you can | 
| 87 | 
  | 
  | 
        contribute your fancy new features to the project. | 
| 88 | 
  | 
  | 
 | 
| 89 | 
  | 
  | 
    *   Supported by on-line user and developer groups | 
| 90 | 
  | 
  | 
 | 
| 91 | 
  | 
  | 
Where do I get Swish-e? | 
| 92 | 
  | 
  | 
    The current version of Swish-e can be found at: | 
| 93 | 
  | 
  | 
 | 
| 94 | 
  | 
  | 
    http://Swish-e.org | 
| 95 | 
  | 
  | 
 | 
| 96 | 
  | 
  | 
    Please make sure you use a current version of Swish-e. | 
| 97 | 
  | 
  | 
 | 
| 98 | 
  | 
  | 
    Information about Windows binary distributions can also be found at this | 
| 99 | 
  | 
  | 
    site. | 
| 100 | 
  | 
  | 
 | 
| 101 | 
  | 
  | 
How Do I Install Swish-e? | 
| 102 | 
  | 
  | 
    Read the INSTALL page. | 
| 103 | 
  | 
  | 
 | 
| 104 | 
  | 
  | 
    Building from source is recommended. On most platforms Swish-e should | 
| 105 | 
  | 
  | 
    build without problems. Information on building for VMS and Win32 can be | 
| 106 | 
  | 
  | 
    found in sub-directories of the "src" directory. Check the Swish-e site | 
| 107 | 
  | 
  | 
    for information about binary distributions (such as for Windows). | 
| 108 | 
  | 
  | 
 | 
| 109 | 
  | 
  | 
    In addition to the INSTALL page, make sure you read the SWISH-FAQ page | 
| 110 | 
  | 
  | 
    if you have any questions, or to get an idea of questions that you might | 
| 111 | 
  | 
  | 
    someday ask. | 
| 112 | 
  | 
  | 
 | 
| 113 | 
  | 
  | 
    Problems or questions about installing Swish-e should be directed to the | 
| 114 | 
  | 
  | 
    Swish-e discussion list (see the Swish-e web site at | 
| 115 | 
  | 
  | 
    http://Swish-e.org). | 
| 116 | 
  | 
  | 
 | 
| 117 | 
  | 
  | 
The Swish-e Documentation | 
| 118 | 
  | 
  | 
    Documetation is provided in the Swish-e distribution package in two | 
| 119 | 
  | 
  | 
    forms, POD (Plain Old Documentation), and in html format. The POD | 
| 120 | 
  | 
  | 
    documentation is in the pod directory, and the HTML documentation is in | 
| 121 | 
  | 
  | 
    the html directory, of course. | 
| 122 | 
  | 
  | 
 | 
| 123 | 
  | 
  | 
    If your system includes the required support files and programs, the | 
| 124 | 
  | 
  | 
    distribution make files can also generate the documentation in these | 
| 125 | 
  | 
  | 
    formats: | 
| 126 | 
  | 
  | 
 | 
| 127 | 
  | 
  | 
        Postscript | 
| 128 | 
  | 
  | 
        PDF (Adobe Acrobat) | 
| 129 | 
  | 
  | 
        system man pages | 
| 130 | 
  | 
  | 
 | 
| 131 | 
  | 
  | 
    You may also build a "split" version of the documentation where each | 
| 132 | 
  | 
  | 
    topic heading is a separate web page. Building the split version also | 
| 133 | 
  | 
  | 
    creates a Swish-e index of the documentation that makes the | 
| 134 | 
  | 
  | 
    documentation searchable via the included Perl CGI program. | 
| 135 | 
  | 
  | 
 | 
| 136 | 
  | 
  | 
    Building these other forms of documentation require additional helper | 
| 137 | 
  | 
  | 
    applications -- most modern Linux distributions will include all that's | 
| 138 | 
  | 
  | 
    needed (at least mine does...). You shouldn't have a problem if you have | 
| 139 | 
  | 
  | 
    kept your Perl and Perl libraries up to date. | 
| 140 | 
  | 
  | 
 | 
| 141 | 
  | 
  | 
    Online documentation can be found at the Swish-e web site listed above. | 
| 142 | 
  | 
  | 
 | 
| 143 | 
  | 
  | 
    See INSTALL for information on creating the PDF and Postscript versions | 
| 144 | 
  | 
  | 
    of the documentation, and for information on installing the SWISH-* | 
| 145 | 
  | 
  | 
    documentation as Unix man(1) pages. | 
| 146 | 
  | 
  | 
 | 
| 147 | 
  | 
  | 
  How do I read the Swish-e documentation? | 
| 148 | 
  | 
  | 
 | 
| 149 | 
  | 
  | 
    The Swish-e documentation included with the distribution is in POD and | 
| 150 | 
  | 
  | 
    HTML formats. The POD documentation can be found in the pod directory, | 
| 151 | 
  | 
  | 
    and the HTML documentation can be found in the html directory. | 
| 152 | 
  | 
  | 
 | 
| 153 | 
  | 
  | 
    To view the HTML documentation point your browser to the html/index.html | 
| 154 | 
  | 
  | 
    file. | 
| 155 | 
  | 
  | 
 | 
| 156 | 
  | 
  | 
    The POD documentation is displayed by the "perldoc" command that is | 
| 157 | 
  | 
  | 
    included with every Perl installation. For example, to view the Swish-e | 
| 158 | 
  | 
  | 
    installation documentation page called "INSTALL", type | 
| 159 | 
  | 
  | 
 | 
| 160 | 
  | 
  | 
       perldoc pod/INSTALL | 
| 161 | 
  | 
  | 
 | 
| 162 | 
  | 
  | 
    or to make life easier, | 
| 163 | 
  | 
  | 
 | 
| 164 | 
  | 
  | 
       cd pod | 
| 165 | 
  | 
  | 
       perldoc INSTALL | 
| 166 | 
  | 
  | 
       perldoc SWISH-RUN | 
| 167 | 
  | 
  | 
 | 
| 168 | 
  | 
  | 
    Complain to your system administrator if the "perldoc" command is not | 
| 169 | 
  | 
  | 
    available on your machine. | 
| 170 | 
  | 
  | 
 | 
| 171 | 
  | 
  | 
  Included Documentation | 
| 172 | 
  | 
  | 
 | 
| 173 | 
  | 
  | 
    The following documentation is included in this Swish-e distribution. | 
| 174 | 
  | 
  | 
 | 
| 175 | 
  | 
  | 
    If you are new to Swish-e read the INSTALL page to get Swish-e installed | 
| 176 | 
  | 
  | 
    and tested. Work through the example in shown in the INSTALL page, and | 
| 177 | 
  | 
  | 
    the examples in the conf directory. Also review the SWISH-FAQ. | 
| 178 | 
  | 
  | 
 | 
| 179 | 
  | 
  | 
    *   README - This file | 
| 180 | 
  | 
  | 
 | 
| 181 | 
  | 
  | 
    *   INSTALL - Installation and basic usage instructions | 
| 182 | 
  | 
  | 
 | 
| 183 | 
  | 
  | 
    *   SWISH-CONFIG - Configuration File Directives | 
| 184 | 
  | 
  | 
 | 
| 185 | 
  | 
  | 
    *   SWISH-RUN - Running Swish and Command Line Switches | 
| 186 | 
  | 
  | 
 | 
| 187 | 
  | 
  | 
    *   SWISH-SEARCH - All about Searching with Swish-e | 
| 188 | 
  | 
  | 
 | 
| 189 | 
  | 
  | 
    *   SWISH-FAQ - Common questions, and some answers | 
| 190 | 
  | 
  | 
 | 
| 191 | 
  | 
  | 
    *   SWISH-LIBRARY - Interface to the Swish-e C library | 
| 192 | 
  | 
  | 
 | 
| 193 | 
  | 
  | 
    *   SWISH-PERL - Instructions for using the Perl library | 
| 194 | 
  | 
  | 
 | 
| 195 | 
  | 
  | 
    *   CHANGES - List of feature changes and bug fixes | 
| 196 | 
  | 
  | 
 | 
| 197 | 
  | 
  | 
    *   SWISH-BUGS - List of known bugs in the release | 
| 198 | 
  | 
  | 
 | 
| 199 | 
  | 
  | 
  Document Generation | 
| 200 | 
  | 
  | 
 | 
| 201 | 
  | 
  | 
    The Swish-e documentation in HTML format was created with | 
| 202 | 
  | 
  | 
    Pod::HtmlPsPdf, a package of Perl modules written and/or modified by | 
| 203 | 
  | 
  | 
    Stas Bekman to automate the conversion of documents in pod format (see | 
| 204 | 
  | 
  | 
    perldoc perlpod) to HTML, Postscript, and PDF. A slightly modified | 
| 205 | 
  | 
  | 
    version of this package is include with the Swish-e distribution and | 
| 206 | 
  | 
  | 
    used for building the HTML. As distributed, Swish-e contains only the | 
| 207 | 
  | 
  | 
    pod and HTML documentation. See INSTALL for instructions on creating | 
| 208 | 
  | 
  | 
    man(1), Postscript, and PDF formats. | 
| 209 | 
  | 
  | 
 | 
| 210 | 
  | 
  | 
    Thanks, Stas, for your help! | 
| 211 | 
  | 
  | 
 | 
| 212 | 
  | 
  | 
What's included in the Swish-e distribution? | 
| 213 | 
  | 
  | 
    Here's an overview of the directories included in the Swish-e | 
| 214 | 
  | 
  | 
    distribution: | 
| 215 | 
  | 
  | 
 | 
| 216 | 
  | 
  | 
    conf/ | 
| 217 | 
  | 
  | 
       Example Swish-e configuration setups to help you get started. After | 
| 218 | 
  | 
  | 
       reading the INSTALL page, and its included example, review the sample | 
| 219 | 
  | 
  | 
       configuration in this directory. | 
| 220 | 
  | 
  | 
 | 
| 221 | 
  | 
  | 
    conf/stopwords | 
| 222 | 
  | 
  | 
       In the "conf/stopwords" sub-directory are a number of stopword files | 
| 223 | 
  | 
  | 
       for different languages. Use of stopwords is not required with | 
| 224 | 
  | 
  | 
       Swish-e. | 
| 225 | 
  | 
  | 
 | 
| 226 | 
  | 
  | 
    doc/ | 
| 227 | 
  | 
  | 
       Contains files required for building the HTML, PDF, and Postscript | 
| 228 | 
  | 
  | 
       documentation. | 
| 229 | 
  | 
  | 
 | 
| 230 | 
  | 
  | 
    example/ | 
| 231 | 
  | 
  | 
       This contains a sample CGI script (swish.cgi) for searching with | 
| 232 | 
  | 
  | 
       Swish-e. Documentation for using swish.cgi are included within the | 
| 233 | 
  | 
  | 
       script. Type: | 
| 234 | 
  | 
  | 
 | 
| 235 | 
  | 
  | 
           perldoc example/swish.cgi | 
| 236 | 
  | 
  | 
 | 
| 237 | 
  | 
  | 
       from the top-level directory where the Swish-e distribution was | 
| 238 | 
  | 
  | 
       unpacked. | 
| 239 | 
  | 
  | 
 | 
| 240 | 
  | 
  | 
    filter-bin/ | 
| 241 | 
  | 
  | 
       Sample programs to use with Swish-e's "filters". Examples include | 
| 242 | 
  | 
  | 
       PDF, MS Word, and binary strings filters. Filters often require | 
| 243 | 
  | 
  | 
       installing separate document conversion programs. | 
| 244 | 
  | 
  | 
 | 
| 245 | 
  | 
  | 
    html/ | 
| 246 | 
  | 
  | 
       The documentation in HTML format. | 
| 247 | 
  | 
  | 
 | 
| 248 | 
  | 
  | 
    perl/ | 
| 249 | 
  | 
  | 
       The Perl interface to the Swish-e C library. This Perl module | 
| 250 | 
  | 
  | 
       provides direct access to Swish-e from within your Perl programs. See | 
| 251 | 
  | 
  | 
       the perl/README file for more information. | 
| 252 | 
  | 
  | 
 | 
| 253 | 
  | 
  | 
    pod/ | 
| 254 | 
  | 
  | 
       The source for all documentation in perldoc (pod) format. | 
| 255 | 
  | 
  | 
 | 
| 256 | 
  | 
  | 
    prog-bin/ | 
| 257 | 
  | 
  | 
       Example programs and modules to use with the "prog" document source | 
| 258 | 
  | 
  | 
       access method. Examples include a web spider, a program to index | 
| 259 | 
  | 
  | 
       directly from a MySQL database, and a program to recurse a directory | 
| 260 | 
  | 
  | 
       tree. Example Perl modules are provided for converting PDF and | 
| 261 | 
  | 
  | 
       MS-Word documents into a format usable by Swish-e. See | 
| 262 | 
  | 
  | 
       prog-bin/README for an overview of the programs and modules, and | 
| 263 | 
  | 
  | 
       check each file for included documentation. | 
| 264 | 
  | 
  | 
 | 
| 265 | 
  | 
  | 
       The prog-bin/spider.pl program is a web spider program with many | 
| 266 | 
  | 
  | 
       features. It contains its own documentation. Type: | 
| 267 | 
  | 
  | 
 | 
| 268 | 
  | 
  | 
           perldoc example/spider.pl | 
| 269 | 
  | 
  | 
 | 
| 270 | 
  | 
  | 
       from the top-level directory where the Swish-e distribution was | 
| 271 | 
  | 
  | 
       unpacked. | 
| 272 | 
  | 
  | 
 | 
| 273 | 
  | 
  | 
       The "prog" document source feature is very powerful, but can be a | 
| 274 | 
  | 
  | 
       challange to set up when first using Swish-e. Please contact the | 
| 275 | 
  | 
  | 
       Swish-e disussion list if you have any questions. | 
| 276 | 
  | 
  | 
 | 
| 277 | 
  | 
  | 
    src/ | 
| 278 | 
  | 
  | 
       This directory contains the source code for Swish-e. OS-specific | 
| 279 | 
  | 
  | 
       directories are also found here. | 
| 280 | 
  | 
  | 
 | 
| 281 | 
  | 
  | 
    tests/ | 
| 282 | 
  | 
  | 
       The documents used for running "make test". | 
| 283 | 
  | 
  | 
 | 
| 284 | 
  | 
  | 
Where do I get help with Swish-e? | 
| 285 | 
  | 
  | 
    If you need help with installing or using Swish-e please subscribe to | 
| 286 | 
  | 
  | 
    the Swish-e mailing list. Visit the Swish-e web site listed above for | 
| 287 | 
  | 
  | 
    information on subscribing to the mailing list. | 
| 288 | 
  | 
  | 
 | 
| 289 | 
  | 
  | 
    Before posting any questions please read QUESTIONS AND TROUBLESHOOTING | 
| 290 | 
  | 
  | 
    in the INSTALL documentation page. | 
| 291 | 
  | 
  | 
 | 
| 292 | 
  | 
  | 
Speling mistakes | 
| 293 | 
  | 
  | 
    Please contact the Swish-e list with corrections to this documentation. | 
| 294 | 
  | 
  | 
    Any help in cleaning up the docs will be appreciated! | 
| 295 | 
  | 
  | 
 | 
| 296 | 
  | 
  | 
    Any patches should be made against the .pod files, not the .html files. | 
| 297 | 
  | 
  | 
 | 
| 298 | 
  | 
  | 
Swish-e Development | 
| 299 | 
  | 
  | 
    Swish-e is currently being developed as an open source project on | 
| 300 | 
  | 
  | 
    SourceForge http://sourceforge.net. | 
| 301 | 
  | 
  | 
 | 
| 302 | 
  | 
  | 
    Contact the Swish-e list for questions. | 
| 303 | 
  | 
  | 
 | 
| 304 | 
  | 
  | 
Swish-e's History | 
| 305 | 
  | 
  | 
    SWISH was created by Kevin Hughes to fill the need of the growing number | 
| 306 | 
  | 
  | 
    of Web administrators on the Internet - many of the indexing systems | 
| 307 | 
  | 
  | 
    were not well documented, were hard to use and install, and were too | 
| 308 | 
  | 
  | 
    complex for their own good. The system was widely used for several | 
| 309 | 
  | 
  | 
    years, long enough to collect some bug fixes and requests for | 
| 310 | 
  | 
  | 
    enhancements. | 
| 311 | 
  | 
  | 
 | 
| 312 | 
  | 
  | 
    In Fall 1996, The Library of UC Berkeley received permission from Kevin | 
| 313 | 
  | 
  | 
    Hughes to implement bug fixes and enhancements to the original binary. | 
| 314 | 
  | 
  | 
    The result is Swish-enhanced or Swish-e, brought to you by the Swish-e | 
| 315 | 
  | 
  | 
    Development Team. | 
| 316 | 
  | 
  | 
 | 
| 317 | 
  | 
  | 
Document Info | 
| 318 | 
  | 
  | 
    Each document in the Swish-e distribution contains this section. It | 
| 319 | 
  | 
  | 
    refers only to the specific page it's located in, and not to the Swish-e | 
| 320 | 
  | 
  | 
    program or the documentation as a whole. | 
| 321 | 
  | 
  | 
 | 
| 322 | 
  | 
  | 
    $Id: README.pod,v 1.11 2002/08/20 22:24:08 whmoseley Exp $ | 
| 323 | 
  | 
  | 
 | 
| 324 | 
  | 
  | 
    . |