/[MITgcm]/mitgcm.org/devel/buildweb/pkg/swish-e/pod/INSTALL.pod
ViewVC logotype

Contents of /mitgcm.org/devel/buildweb/pkg/swish-e/pod/INSTALL.pod

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.1.1.1 - (show annotations) (download) (vendor branch)
Fri Sep 20 19:47:29 2002 UTC (22 years, 10 months ago) by adcroft
Branch: Import, MAIN
CVS Tags: baseline, HEAD
Changes since 1.1: +0 -0 lines
Error occurred while calculating annotation data.
Importing web-site building process.

1 =head1 NAME
2
3 INSTALL - Swish-e Installation Instructions
4
5 =head1 OVERVIEW
6
7 This document describes how to download, build and install Swish-e.
8 Also described is how to build Swish-e with optional, yet recommended libraries that
9 extend and enhance Swish-e.
10
11 This document also provides instructions on how to get help installing
12 and using Swish-e (and the important information you should provide when asking for help).
13
14 Also, below is a basic overview of using Swish-e to index documents, with pointers to
15 other more advanced examples.
16
17 For those in a hurry, see L<"Quick Start for the Impatient">.
18
19 =head1 SYSTEM REQUIREMENTS
20
21 Swish-e 2.x is written in C, and, up to this time, it has been tested on
22 Solaris 2.6, AIX 4.3.2, OpenVMS 7.2-1 AXP, RedHat Linux 6.2 (and other
23 Linux distributions) and Win32 platforms.
24
25 Unless you are using the Win32 binary distribution, a C compiler is needed.
26 Pretty much any standard compiler should do, although you will probably
27 have best luck with a current version of gcc. If you are using something
28 else (such as HP-UX or AIX) you may see more warnings during the build
29 process. Any problems should be sent to the Swish-e discussion list
30 after searching the list archives.
31
32 B<libxml2>
33
34 http://www.xmlsoft.org/
35
36 Swish-e 2.2 can (and probably should) use the libxml2 library for parsing
37 HTML and XML files. Instructions for installing and enabling the library
38 are described below.
39
40 Currently, the libxml2 library is not required, but is a much better
41 parser than the tired old Swish-e html parser (html.c). Please see
42 the Swish-e FAQ L<SWISH-FAQ|SWISH-FAQ> for more discussion of the use
43 of libxml2.
44
45 Swish-e's old xml.c paser has been rewritten to use James Clark's Expat
46 library (included with the Swish-e distribution), but Swish-e's old
47 html.c code is still broken in a number of ways. Libxml2 is comparable to
48 Expat, but offers a much better HTML parser that Swish-e's html.c parser.
49 Use libxml2 if possible for parsing HTML and XML.
50
51 Currently, setting a content type
52 (L<IndexContents|SWISH-CONFIG/"item_IndeContents"> or L<DefaultContents|"SWISH-CONFIG/"DefaultContents">)
53 of "HTML" uses Swish-e's html.c parser, where a setting of "HTML2" uses libxml2's HTML parser.
54 Likewise, a setting of "XML" uses the included Expat library, where "XML2"
55 uses libxml2 for parsing XML. All this may change in future releases.
56
57 B<zlib compression>
58
59 http://www.gzip.org/zlib/
60
61 Swish-e can make use of zlib to compress document properties. This is recommended
62 if you are using L<StoreDescription|SWISH-CONFIG/"item_StoreDescription">.
63
64 A Swish-e program built with zlib will read an index from a version of Swish-e that
65 was not built with zlib. But, if you are searching an index that was compressed with
66 zlib then you will need to use a version of Swish-e built with zlib. Therefore, it's
67 recommended to always include zlib support.
68
69
70 B<Memory>
71
72 Swish needs quite a bit of memory while indexing. How much depends
73 on what you are indexing. The index is portable between platforms,
74 so you can index on a machine that has lots of memory available and
75 move the index files to another machine for searching. Use the C<-e>
76 switch if you are short on memory.
77
78 B<Perl modules>
79
80 http://www.cpan.org
81
82 http://search.cpan.org
83
84 Swish-e uses a perl script for spidering web sites. The script
85 requires the LWP bundle of modules (see http://search.cpan.org/search?dist=libwww-perl ).
86 (Note: depending on your perl installation, you might need to install additional modules required
87 by LWP; for requirements and downloads check http://www.cpan.org
88 or http://search.cpan.org). The Perl helper script was tested with
89 perl 5.005, 5.6.0, and 5.6.1 although it should probably work with any version 5 release.
90 Do note that the LWP, HTTP, and HTML modules are updated often for bug
91 fixes and such -- do check for upgrades, and don't expect that your system admin
92 as been keeping up with bug fixes.
93
94
95 =head2 Platform Specific Information
96
97 A C<configure> script is used to determine platform specific details
98 for building swish. Please contact the Swish-e discussion list if you
99 notice any platform specific problems while building Swish-e.
100
101 Specific information for various platforms can be found in subdirectories
102 of the C<src> directory. For example, the Win32 files can be found
103 in C<src/win32>, and instructions for building under VMS can be found
104 in C<src/vms>.
105
106 The Windows binary is distributed as a separate package from the source
107 distribution. See http://Swish-e.org for download information.
108
109 =head1 INSTALLATION
110
111 Instructions below are for installing Swish-e from source.
112 Installing from source is recommended, but you should also check
113 the Swish-e web site for binary distributions for your platform.
114
115 Windows binary distributions are available from the Swish-e site.
116
117 =head2 Brief Instructions
118
119 ./configure
120 make
121 make test
122 su root
123 make install
124
125 Swish uses a F<configure> script to generate a Makefile for your platform.
126 The F<configure> script should detect and use optional libraries if found on
127 your system.
128
129 =head2 Using libxml2 parser library (optional, but recommended)
130
131 Daniel Veillard's libxml2 is a well supported library for working with
132 HTML and XML documents. As of version 2.2 Swish-e can use libxml2 to parse HTML and
133 XML documents.
134
135 Installing the libxml2 library is not required at this time, but is
136 recommended, espeically if you are parsing HTML. As mentioned above,
137 the XML parser that is included with swish uses James Clarks's Expat
138 library and works well. The HTML parser in Swish-e has been in use for
139 years, but the parser provided by libxml2 is preferred. The libxml2
140 HTML parser offers more features (and more features for parsing XML), and
141 is more accurate. If you are running Linux it may already be installed
142 (look for libxml2.so.2.4.5 or higher).
143
144 The library can be downloaded from http://www.xmlsoft.org/. Installation
145 directions are included in the INSTALL file in the libxml2 package.
146 Uncompressing, building, and installation of libxml2 is very similar to
147 the way Swish-e is built.
148
149 Many Linux distributions provide libxml2 packages directly via RPM or
150 the Debian pacakage system. Check with your distributions web site for
151 more information, as this is a very easy way to install this library.
152
153 If libxml2 complains during compilation that it can not find zlib then
154 you may need to specify the location of zlib. This happens (on Solaris)
155 when the ./configure script finds the zlib header files, but the compiler
156 and linker do not know to look in /usr/local/lib for the library.
157 You may see an error like:
158
159 ld: fatal: library -lz: not found
160 ld: fatal: File processing errors. No output written to .libs/libxml2.so.2.4.5
161 *** Error code 1
162
163 In this case, try specifying where zlib can be found. For example,
164 if libz was located in /usr/local/lib you would use this when building
165 B<libxml2>:
166
167 # building libxml2 (not swish)
168 ./configure --with-zlib=/usr/local
169
170 Swish-e doesn't use libxml2 uncompression features, so you *should*
171 be able to disable zlib when building B<libxml2>:
172
173 ./configure --without-zlib
174
175 B<NOTE:> But, that doesn't seem to work at this time (as of version
176 libxml2-2.4.5).
177
178 If you do not have root access you can specify a prefix when building B<libxml2>:
179
180 ./configure --prefix=$HOME/local
181
182 This will install the headers and library files in F<$HOME/local/include>
183 and F<$HOME/local/lib>. You will need to inform the Swish-e build
184 process of this non-standard directory location (explained below).
185
186 Once you run the libxml2 F<configure> script you build and install the library
187 as the libxml2 F<INSTALL> page instructs:
188
189 make
190 make install
191
192
193 B<Building Swish-e with libxml2>
194
195 Swish will try to detect if libxml2 is installed in the standard library locations.
196
197 If libxml2 is installed in your system and you do B<not> want to build with libxml2:
198
199 ./configure --without-libxml2
200
201 If libxml2 was installed in a non-standard location then specify the
202 path where libxml2 was installed. For example,
203
204 ./configure --with-libxml2=$HOME/local
205
206 If libxml2 is installed in a non-standard location, Swish-e needs to know
207 where that library is at run time. There seems to be a number of ways
208 to do this. First, you can set the environment variable C<LD_RUN_PATH>
209 *before* running make to create Swish-e. This will add the path directly
210 to the Swish-e executable file.
211
212 For example, under Bourne type shells:
213
214 LD_RUN_PATH=$HOME/local/lib make
215
216 Other shells (like csh and tcsh) may require:
217
218 setenv LD_RUN_PATH $HOME/local/lib
219 make
220
221 Another option is to use the C<LD_LIBRARY_PATH> environment variable.
222 This is a list of directories to search for libraries when a program
223 is run. See the ld(8) man page for more info.
224
225 Note that libxml2 will be linked as a shared library on many platforms, so once you
226 compile Swish-e to use the library, the libxml2 library must not be
227 deleted or moved.
228
229 =head2 Building Swish-e with zlib
230
231 Building with zlib is similar to the instructions for building Swish-e
232 with libxml2 above. The F<configure> script will attempt to detect if zlib is
233 installed in your system and if found link Swish-e with the zlib library.
234
235 zlib is common on many systems, but may be out of date, and versions prior to 1.1.4
236 have a know security issue. You should run
237 at least version 1.1.4. To link with zlib in a non-standard location use,
238 for example:
239
240 ./configure --with-zlib=$HOME/zlib
241
242 Again, as with compiling libxml2, you may need to use the C<LD_RUN_PATH>
243 or C<LD_LIBRARY_PATH> variables. See above for more details.
244
245
246 =head2 Downloading and unpacking and building Swish-e
247
248 If you are reading this INSTALL document, then you probably already have
249 downloaded and unpacked the distribution. But just in case...
250
251 Make sure you are using the current release from
252 http://Swish-e.org. If you have any questions about which version to use, please
253 ask on the Swish-e discussion list.
254
255 How you download Swish-e is up to you: lynx, lwp-download,
256 wget are all common methods.
257
258 =over 3
259
260 =item 1 Uncompress the distribution file
261
262
263 gzip -dc swish-e.x.x.tar.gz | tar xof -
264
265 or on some versions of tar, simply
266
267 tar -zxof swish-e.x.x.tar.gz
268
269 Uncompressing should create the following directories:
270
271 swish-e-x.x/ configure script and top-level Makefile
272 swish-e-x.x/pod/ Swish-e documentation
273 swish-e-x.x/html/ HTML version of the documentation
274 swish-e-x.x/src/ source code
275 swish-e-x.x/conf/ example configuration files and stopword files
276 swish-e-x.x/example/ working example CGI scripts
277 swish-e-x.x/filter-bin/ filter samples
278 swish-e-x.x/prog-bin/ -S prog a web spider and other examples
279 swish-e-x.x/perl/ perl interface to the Swish-e C library
280 swish-e-x.x/src/expat/ James Clark's Expat XML parser
281 swish-e-x.x/src/win32/ win32 binary and buid files
282 swish-e-x.x/src/vms/ files required for building under VMS
283 swish-e-x.x/tests/ tests used for running "make test"
284 swish-e-x.x/doc/ directory used or building the documentation
285
286
287 =item 2 Make any needed changes in F<src/config.h>
288
289 Compile-time configuration settings are adjusted in the file
290 F<src/config.h>. Most of the settings may also be specified in the
291 configuration file used during indexing.
292
293 You probably will B<not> need to change this file, but it's helpful
294 to become familiar with the default compiled-in settings.
295
296 =item 3 Build Swish-e
297
298 Building Swish-e on most systems is a simple procedure. In the
299 Swish-e-x.x/ top level directory type the following commands
300
301 ./configure
302 make
303 make test
304
305 You should build swish as a normal user (i.e. not as "root").
306
307 Note: If you wish to use libxml2 or zlib please see the previous section
308 for the required configure options.
309
310 The above will create the Swish-e executable F<src/swish-e> and test
311 that the executable is working correctly. C<make test> will generate
312 an index file in the F<tests> directory and run a number of searches
313 against this index. At this time, the tests really just make sure that swish-e
314 was compiled correctly and runs.
315
316 You may optionally "build" the F<swish-search> executable. This is
317 a version of Swish-e that cannot write to the index file. This
318 version may provide somewhat improved security in a CGI environment.
319 The binaries F<swish-e> and F<swish-search> are the same files -- the
320 additional security is enabled when the binary is named I<swish-search>.
321 F<swish-search> is not a substitute for good file system and CGI security.
322 Please review the many CGI security papers available on-line.
323
324 Again, this is an optional step:
325
326 make swish-search
327
328 which simply copies the file F<swish-e> to F<swish-search>.
329
330 =item 4 Install Swish-e
331
332 Move the F<swish-e> (and/or F<swish-search>) executable to its final
333 location (normally /usr/local/bin). You may simply copy the program
334 anywhere you see fit, or you may use the C<make install> command to
335 install it to the location defined by the F<configure> script:
336
337 You may need to superuser privileges:
338
339 su root
340 make install
341 exit
342
343 B<IMPORTANT:> Do not run swish-e as the superuser (root).
344
345 The bin directory may be set when first running F<./configure>. For example:
346
347 ./configure --bindir=$HOME/bin
348
349 sets the installation directory to F<$HOME/bin> and C<make install>
350 will install the program in that location.
351
352 =back
353
354 =head2 Join the Swish-e discussion list
355
356 The Swish-e discussion list is the place to ask questions about installing
357 and using Swish-e, see or post bug fixes or security announcements, and
358 a place where B<you> can offer help to others.
359
360 The list is typically I<very low traffic>, so it won't overload your
361 inbox. Please take time to subscribe. See http://Swish-e.org.
362
363 If you are using Swish-e on a public site, please let the list know so
364 it can be added to the list of sites that use Swish-e!
365
366 Please review L<QUESTIONS AND TROUBLESHOOTING|QUESTIONS AND TROUBLESHOOTING> before posting
367 a question to the Swish-e list.
368
369 =head2 Installing the Swish-e C Library (optional)
370
371 Swish 2.2 creates the C library F<libswish-e.a> during the build.
372 Install this library if you wish to embed Swish-e into another
373 application. For example, the library should be installed
374 before using the high level Perl SWISH modules located on
375 CPAN. http://search.cpan.org/search?mode=module&query=SWISH
376
377 This is an *optional* step. Most users will not need to install the library.
378
379 To install the library issue the following commands (again, you may need
380 to su root)
381
382 su root
383 make install-lib
384 exit
385
386 By default this will install the library in /usr/local/lib, but this
387 directory can be set when running ./configure with the --libdir option.
388 For example:
389
390 ./configure --bindir=$HOME/bin --libdir=$HOME/lib
391
392 So C<make install> will install the F<swish-e> binary in F<$HOME/bin>
393 and C<make install-lib> will install the F<libswish-e.a> library in
394 F<$HOME/lib>.
395
396 Note: You may wish to run C<make realclean> before running ./configure again.
397
398 =head2 Creating PDF and Postscript documentation (optional)
399
400 The Swish-e documentation in HTML format was created with Pod::HtmlPsPdf,
401 a package of Perl modules written and/or modified by Stas Bekman to automate
402 the conversion of documents in pod format (see perldoc perlpod) to HTML,
403 Postscript, and PDF. A slightly modified version of this package is
404 include with the Swish-e distribution and used for building the HTML.
405
406 If your system has the B<necessary tools> to build Postscript and the
407 converter ps2pdf installed, you may be able to build the Postscript
408 and PDF versions of the documentation. After you have run ./configure,
409 type from the top-level directory of the distribution:
410
411 make pdf
412
413 And with any luck you will end up with the these two files in the top-level directory:
414
415 swish-e_documentation.pdf
416 swish-e_documentation.ps
417
418 Most people find reading the documentation in HTML most convenient.
419
420 =head2 Installing the Swish-e documentation as man(1) pages (optional)
421
422 Part of the included Swish-e documentation can be installed as system
423 man(1) pages. Only the reference related pages are installed (it's
424 assumed that you don't need to install the README or INSTALL documents as
425 man pages). You must have the pod2man program installed on your system
426 (which you probably do if you have Perl).
427
428 To build the man pages and install them into your system, type from the
429 top-level directory (after running ./configure):
430
431 su root
432 make install-man
433 exit
434
435 You will need to C<su root> if you do not have write access to the man directory.
436
437 The man pages are installed in the system man directory. This directory
438 is determined by running ./configure and can be set by passing the
439 directory when running ./configure.
440
441 For example,
442
443 ./configure --mandir=/usr/local/doc/man
444
445 Information on running ./configure can be found by typing:
446
447 ./configure --help
448
449 The pod source files used to create the man files were written running
450 under perl 5.6.1. Older version of Perl may complain slightly about the
451 formatting of the pod files. This shouldn't be a problem, but please
452 let the Swish-e list know if otherwise. Then upgrade your version of perl. ;)
453
454 =head1 QUESTIONS AND TROUBLESHOOTING
455
456 Please search the Swish-e list archive before posting a question, and
457 check the L<SWISH-FAQ|SWISH-FAQ> to see if your question hasn't already
458 been asked.
459
460 Support for installation, configuration and usage is available via the
461 Swish-e discussion list. Visit http://swish-e.org for information.
462 Do not contact developers directly for help -- always post your question
463 to the list.
464
465 Before posting use tools available to narrow down the problem.
466
467 Swish-e has the -T, -v, and -k switches that may help resolve issues.
468 If possible find a single document that shows the problem, then index
469 with -T INDEXED_WORDS and watch the exact words that are indexed.
470 Use -H 9 when searching and look at C<Parsed Words:> to make sure you
471 are searching the correct words.
472
473 You can also use programs like C<gdb> to help find segfaults and other
474 run-time errors, and programs like C<truss> or C<strace> can often
475 provide interesting information, if you are adventurous.
476
477 =head2 When posting please provide the following information:
478
479 =over 4
480
481 =item *
482
483 The exact version of Swish-e that you are using. Running Swish-e with the
484 C<-V> switch will print the version number. Also, supply the output from
485 C<uname -a> or similar command that identifies the operating system you
486 are running on. If you are running an old version of swish be prepared
487 for a response to your question of "upgrade."
488
489 =item *
490
491 A summary of the problem. This should include the commands issued
492 (e.g. for indexing or searching) and their output, and why you don't
493 think it's working correctly. Please cut-n-paste the exact commands
494 and their output instead of retyping to avoid errors.
495
496 =item *
497
498 Include a copy of the configuration file you are using, if any. Swish-e
499 has reasonable defaults so in many cases you can run it without using
500 a configuration file. But, if you need to use a configuration file,
501 reduce it down to the absolute minimum number of commands required to
502 demonstrate your problem. Again, cut-n-paste.
503
504 =item *
505
506 A small copy of a source document that demonstrates the problem.
507
508 If you are having problems spidering a web server, use lwp-download or
509 wget to copy the file locally to make sure you can index the document
510 using the file system method.
511
512 If you do need help with spidering, don't post fake URLs, as it makes it
513 impossible to help. If you don't want to expose your web page to the
514 people on the Swish-e list, find some other site to test spidering on.
515 If that works, but you still cannot spider your own site then post your
516 real URL if you want help.
517
518 =item *
519
520 If you are having trouble building Swish-e please cut-n-paste the output
521 from make (or from ./configure if that's where the problem is).
522
523
524 =back
525
526 =head1 BASIC CONFIGURATION AND USAGE
527
528 This section should give you a basic overview of indexing and searching
529 with B<Swish-e>. Other examples can be found in the F<conf> directory, which will
530 step you through a number of different configurations.
531 Also, please review the L<SWISH-FAQ|SWISH-FAQ>.
532
533 Swish-e reads a configuration file (see L<SWISH-CONFIG|SWISH-CONFIG>)
534 for directives that control what and how Swish-e indexes files.
535 Then running Swish-e is controlled by command line arguments (see
536 L<SWISH-RUN|SWISH-RUN>).
537
538 Swish-e does not require a configuration file, but
539 most people need to change the default behavior by placing settings
540 in a configuration file.
541
542 To try the examples below change to the F<tests> subdirectory of the
543 distribution. The tests will use the *.html files in this directory when
544 creating the test index. You may wish to review these *.html files to
545 get an idea of the various native file formats that Swish-e supports.
546
547 =head2 Step 1: Create a Configuration File
548
549 The configuration file controls what and how Swish-e indexes. The
550 configuration file consists of directives, comments, and blank lines.
551 The configuration file can be any name you like.
552
553 This example will work with the documents in the F<tests> directory.
554 You may wish to review the F<tests/test.config> configuration file used
555 for the C<make test> tests.
556
557 For example, a simple configuration file (F<Swish-e.conf>):
558
559 # Example Swish-e Configuration file
560
561 # Define *what* to index
562 # IndexDir can point to a directories and/or a files
563
564 # Here it's pointing to the current directory
565 IndexDir .
566
567 # But only index the .html files
568 IndexOnly .html
569
570 # Show basic info while indexing
571 IndexReport 1
572
573 And that's a simple configuration file. It says to index all the
574 .html files in the current directory, and provide some basic output
575 while indexing.
576
577 The complete list of all configuration file directives are described
578 in L<SWISH-CONFIG|SWISH-CONFIG>.
579
580 =head2 Step 2: Index your Files
581
582 Now, make sure you are in the F<tests> directory and save the above
583 example configuration file as F<swish-e.conf>. Then run Swish-e using
584 the C<-c> switch to specify the name of the configuration file.
585
586 ../src/swish-e -c swish-e.conf
587
588 Indexing Data Source: "File-System"
589 Indexing "."
590 Removing very common words...
591 no words removed.
592 Writing main index...
593 Sorting words ...
594 Sorting 55 words alphabetically
595 Writing header ...
596 Writing index entries ...
597 Writing word text: Complete
598 Writing word hash: Complete
599 Writing word data: Complete
600 55 unique words indexed.
601 Writing file list ...
602 Property Sorting complete.
603 Writing sorted index ...
604 5 files indexed. 1252 total bytes.
605 Elapsed time: 00:00:00 CPU time: 00:00:00
606 Indexing done!
607
608 This created the index file F<index.swish-e>. This is the default
609 index file name unless the B<IndexFile> directive is specified in the
610 configuration file:
611
612 IndexFile ./website.index
613
614 =head2 Step 3: Search
615
616 You specify your search terms with the C<-w> switch. For example, to find
617 the files that contain the word B<sample> you would issue the command:
618
619 ../src/swish-e -w sample
620
621 This example assumes that you are in the F<tests> directory, and the
622 Swish-e binary is in the F<../src> directory. Swish-e returns in response
623 to that command the following:
624
625 ../src/swish-e -w sample
626
627 # SWISH format: 2.2
628 # Search words: sample
629 # Number of hits: 2
630 # Search time: 0.000 seconds
631 # Run time: 0.005 seconds
632 1000 ./test_xml.html "If you are seeing this, the METATAG XML search was successful!" 159
633 1000 ./test.html "If you are seeing this, the test was successful!" 437
634 .
635
636 So the word B<sample> was found in two documents. The first number
637 shown is the relevance or rank of the search term, followed by the file
638 containing the search term, the title of the document, and finally the
639 length of the document.
640
641 The period (".") alone at the end marks the end of results.
642
643 Much more information may be retrieved while searching by using
644 the C<-x> and C<-H> switches (see L<SWISH-RUN|SWISH-RUN>)
645 and by using Document Properties (see L<SWISH-CONFIG|SWISH-CONFIG>).
646
647 =head2 Phrase Searching
648
649 To search for a phrase in a document use double-quotes to delimit your
650 search terms. (The phrase delimiter is set in src/swish.h.)
651
652 You must protect the quotes from the shell.
653
654 For example, under Unix:
655
656 swish-e -w '"this is a pharase" or (this and that)'
657 swish-e -w 'meta1=("this is a pharase") or (this and that)'
658
659 Or under Windows F<command.com> shell.
660
661 swish-e -w \"this is a pharase\" or (this and that)
662
663 The phrase delimiter can be set with the C<-P> switch.
664
665 =head2 Boolean Searching
666
667 You can use the Boolean operators B<and>, B<or>, or B<not> in searching.
668 Without these Boolean, Swish-e will assume you're B<and>ing the words together.
669
670 Here are some examples:
671
672 ../src/swish-e -w 'apples oranges'
673 ../src/swish-e -w 'apples and oranges' ( Same thing )
674
675 ../src/swish-e -w 'apples or oranges'
676
677 ../src/swish-e -w 'apples or oranges not juice' -f myIndex
678
679 retrieves first the files that contain both the words "apples" and "oranges";
680 then among those the ones that do not contain the word "juice"
681
682 A few others to ponder:
683
684 ../src/swish-e -w 'apples and oranges or pears'
685 ../src/swish-e -w '(apples and oranges) or pears' ( Same thing )
686 ../src/swish-e -w 'apples and (oranges or pears)' ( Not the same thing )
687
688 See L<SWISH-SEARCH|SWISH-SEARCH> for more information.
689
690
691 =head2 Context Searching
692
693 The C<-t> option in the search command line allows you to search for
694 words that exist only in specific HTML tags. Each character in the
695 string you specify in the argument to this option represents a different
696 tag in which the word is searched; that is you can use any combinations
697 of the following characters:
698
699 H means all <HEAD> tags
700 B stands for <BODY> tags
701 t is all <TITLE> tags
702 h is <H1> to <H6> (header) tags
703 e is emphasized tags (this may be <B>, <I>, <EM>, or <STRONG>)
704 c is HTML comment tags (<!-- ... -->)
705
706 For example:
707
708 # Find only documents with the word "linux" in the E<lg>TITLEE<gt> tags.
709 ./swish-e -w linux -t t
710
711 # Find the word "apple" in titles or comments
712 ./swish-e -w apple -t tc
713
714
715 =head2 META Tags
716
717 For the last example we will instruct Swish-e to use META tags to define
718 I<fields> in your documents.
719
720 META names are a way to define "fields" in your documents. You can
721 use the META names in your queries to limit the search to just the words
722 contained in that META name of your document. For example, you might have
723 a META tagged field in your documents called C<subjects> and then you can
724 search your documents for the word "foo" but only return documents where
725 "foo" is within the C<subjects> META tag.
726
727 Document I<Properties> are somewhat related to meta tags: Properties
728 allow the contents of a META tag in a source document to be stored within
729 the index, and that text to be returned along with search results.
730
731 META tags can have two formats in your documents.
732
733 <META NAME="keyName" CONTENT="some Content">
734
735 And in XML format
736
737 <keyName>
738 Some Content
739 </keyName>
740
741 If using libxml, you can optionally use a non-html tag as a metaname:
742
743 <html>
744 <body>
745 Hello swish users!
746 <keyName>
747 this is meta data
748 </keyName>.
749 </body>
750
751 This, of course, is invalid HTML.
752
753 To continue with our sample F<Swish-e.conf> file, add the following lines:
754
755 # Define META tags
756 MetaNames meta1 meta2 meta3
757
758 Reindex to include the changes:
759
760 ../src/swish-e -c swish-e.conf
761
762 Now search, but this time limit your search to META tag "meta1":
763
764 ../src/swish-e -w 'meta1=metatest1'
765
766 Again, please see L<SWISH-RUN|SWISH-RUN> and L<SWISH-CONFIG|SWISH-CONFIG>
767 for complete documentation of the various indexing and searching options.
768
769 =head2 Additional Examples
770
771 The above example indexes local files using the file system access method
772 C<-S fs>. You may also index files that are located on a local or remote
773 web server by using the HTTP access method C<-S http>, or via the prog
774 input method C<-S prog>. These are described in L<SWISH-RUN|SWISH-RUN>
775 and example configuration files for using these methods can be found in
776 the F<conf> directory of the Swish-e distribution.
777
778 The C<-S prog> access method can be used to index any type of document,
779 such as documents stored in a database (RDBMS), or documents that need
780 to be processed before they can be indexed. Examples for using the
781 C<-S prog> method are shown in the F<prog-bin> directory.
782
783 Swish-e can also use I<filters> to convert documents as they are
784 processed by Swish-e. For example, MS-Word or PDF documents can be
785 converted and indexed by Swish-e by using filters. See the section on
786 filters in L<SWISH-CONFIG|SWISH-CONFIG>, and the examples shown in the
787 C<filter-bin> directory.
788
789 =head1 QUICK START FOR THE IMPATIENT
790
791 Here's I<one> example of the steps to install Swish-e, index documents by spidering, and
792 how to search using the included CGI script.
793
794 These steps are on Linux, and assume
795 that you have the libraries libxml2 and zlib installed in the system, you have a current version of Perl
796 and current versions of LWP, HTML:*, and HTTP:* modules installed, and Apache is installed and operating.
797
798 If you have any trouble with these instructions please read the detailed installation instructions above,
799 and see the documentation included with the F<swish.cgi> script and the F<spider.pl> programs.
800 Please don't ask for help without reading the "real" documentation first.
801
802 Not all output is included below. You should carefully watch for errors while building Swish-e.
803
804 =over 4
805
806 =item 1 Download and build Swish-e
807
808 ~ $ wget http://swish-e.org/<path to current swish-e version>.tar.gz
809 ~ $ tar zxof <path to current swish-e version>.tar.gz
810 ~ $ cd swish-e-2.2 (this directory will depend on the version of Swish-e)
811
812 ~/swish-e-2.2 $ ./configure
813 ~/swish-e-2.2 $ make
814 ~/swish-e-2.2 $ make test
815 ...
816 ** All tests completed! **
817
818 =item 2 Make a working directory and copy files
819
820 ~/swish-e-2.2 $ mkdir ~/swishtest
821 ~/swish-e-2.2 $ cd ~/swishtest
822
823 ~/swishtest $ cp ~/swish-e-2.2/src/swish-e .
824 ~/swishtest $ cp ~/swish-e-2.2/prog-bin/spider.pl .
825 ~/swishtest $ cp ~/swish-e-2.2/example/swish.cgi .
826 ~/swishtest $ cp -rp ~/swish-e-2.2/example/modules/ .
827 ~/swishtest $ chmod 755 swish.cgi spider.pl
828 ~/swishtest $ chmod 644 modules/*
829
830 =item 3 Create the index
831
832 You must create a swish configuration file and a spider configuration
833 file.
834
835 ~/swishtest $ cat swish.conf
836
837 # Program to read documents
838 IndexDir ./spider.pl
839
840 # Define the config file for the spider to use
841 SwishProgParameters spider.conf
842
843 # Use libxm2 for parsing documents
844 DefaultContents HTML2
845 IndexContents TXT2 txt
846
847 # Cache document contents in the index for context display
848 StoreDescription HTML2 <body>
849
850
851 ~/swishtest $ cat spider.conf
852
853 # Example spider configuration file to index the
854 # split version of the swish-e documentation
855
856 @servers = (
857 {
858
859 base_url => 'http://swish-e.org/2.2/docs/split/index.html',
860 same_hosts => [ qw/www.swish-e.org/ ],
861 email => 'swish-impatient@domain.invalid',
862 delay_min => .0001,
863
864 # Define call-back functions to fine-tune the spider
865
866 test_url => sub {
867 my $uri = shift;
868
869 # Skip requesting files that are probably not text
870 return if $uri->path =~ m[\.(?:gif|jpeg|png)$]i;
871
872
873 # Limit spidering to the /2.2/docs/split/ path
874 return unless $uri->path =~ m[/2.2/docs/split/];
875
876 return 1; # otherwise, ok to search
877 },
878
879
880 # Only index text/html or text/plain
881 test_response => sub {
882 my ( $uri, $server, $response ) = @_;
883
884 return $response->content_type =~ m[(?:text/html|text/plain)];
885 },
886 },
887 );
888 1;
889
890 Now begin indexing:
891
892 ~/swishtest $ ./swish-e -S prog -c swish.conf -v 2
893 Indexing Data Source: "External-Program"
894 Indexing "./spider.pl"
895 ./spider.pl: Reading parameters from 'spider.conf'
896 Processing http://swish-e.org/2.2/docs/split/index.html...
897 Processing http://swish-e.org/2.2/docs/split/index_long.html...
898 Processing http://swish-e.org/2.2/docs/split/search.cgi..
899 ...
900 2566 unique words indexed.
901 5 properties sorted.
902 155 files indexed. 609775 total bytes. 49962 total words.
903 Elapsed time: 00:00:33 CPU time: 00:00:01
904 Indexing done!
905
906 =item 4 Test swish-e from the command line
907
908 ~/swishtest $ ./swish-e -w foo -m 1
909 # SWISH format: 2.1-dev-25
910 # Search words: foo
911 # Number of hits: 18
912 # Search time: 0.000 seconds
913 # Run time: 0.038 seconds
914 1000 http://swish-e.org/2.2/docs/split/SWISH-CONFIG/Document_Contents_Directives.html "SWISH-CONFIG/Document Contents Directives" 57466
915 .
916
917
918 =item 5 Test the CGI script from the command line
919
920 ~/swishtest $ ./swish.cgi | head
921 Content-Type: text/html; charset=ISO-8859-1
922
923 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
924 <html>
925 <head>
926 <title>
927 Search our site
928 </title>
929 </head>
930 <body>
931
932 Refer to the swish.cgi documentation if you have any problems with running the CGI script.
933
934 =item 6 Configure Apache
935
936 ~/swishtest $ su -c "ln -s $HOME/swishtest /usr/local/apache/htdocs/swishdocs"
937 Password: *********
938
939 ~/swishtest $ cat .htaccess
940 # Deny everything by default
941 Deny From All
942
943 # But allow just the CGI script
944 <files swish.cgi>
945 Options ExecCGI
946 Allow From All
947 SetHandler cgi-script
948 </files>
949
950 =item 7 Test from the command line
951
952 ~/swishtest $ GET http://localhost/swishdocs/swish.cgi?query=install | head
953 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
954 <html>
955 <head>
956 <title>
957 43 Results for [install]
958 </title>
959 </head>
960 <body>
961
962 =back
963
964 Now you are ready to search.
965
966 =head1 Document Info
967
968 $Id: INSTALL.pod,v 1.19 2002/05/31 23:37:22 whmoseley Exp $
969
970 .

  ViewVC Help
Powered by ViewVC 1.1.22