/[MITgcm]/mitgcm.org/devel/buildweb/pkg/swish-e/pod/SWISH-RUN.pod
ViewVC logotype

Annotation of /mitgcm.org/devel/buildweb/pkg/swish-e/pod/SWISH-RUN.pod

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.1.1.1 - (hide annotations) (download) (vendor branch)
Fri Sep 20 19:47:29 2002 UTC (22 years, 10 months ago) by adcroft
Branch: Import, MAIN
CVS Tags: baseline, HEAD
Changes since 1.1: +0 -0 lines
Importing web-site building process.

1 adcroft 1.1 =head1 NAME
2    
3     SWISH-RUN - Running Swish-e and Command Line Switches
4    
5     =head1 OVERVIEW
6    
7     The Swish-e program is controlled by command line arguments (called
8     I<switches>). Often, it is run manually from a shell (command
9     prompt), or from a program such as a CGI script that passes the command
10     line arguments to swish.
11    
12     Note: A number of the command line switches may be specified in the
13     Swish-e configuration file specified with the C<-c> command line argument.
14     Please see L<SWISH-CONFIG|SWISH-CONFIG> for a complete description of
15     available configuration file directives.
16    
17     There are two basic operating modes of Swish-e: indexing and searching.
18     There are command line arguments that are unique to each mode, and
19     others that apply to both (yet may have different meaning depending on
20     the operating mode). These command line arguments are listed below,
21     grouped by:
22    
23     L<INDEXING|/"INDEXING> -- describes the command line arguments used
24     while indexing.
25    
26     L<SEARCHING|/"SEARCHING> -- lists the command line arguments used while
27     searching.
28    
29     L<OTHER SWITCHES|/"OTHER SWITCHES> -- lists switches that don't apply
30     to searching or indexing.
31    
32     Beginning with Swish-e version 2.1, you may embed its search engine into
33     your applications. Please see L<SWISH-LIBRARY|SWISH-LIBRARY>.
34    
35    
36     =head1 INDEXING
37    
38     Swish-e indexing is initiated by passing I<command line arguments> to
39     swish. The command line arguments used for I<searching> are described
40     in L<SEARCHING|/"SEARCHING>. Also, see L<SWISH-SEARCH|SWISH-SEARCH>
41     for examples of searching with Swish-e.
42    
43     Swish-e usage:
44    
45     swish-e [-i dir file ... ] [-c file] [-f file] [-l] \
46     [-v (num)] [-S method(fs|http|prog)] [-N path]
47    
48     The C<-h> switch (help) will list the available Swish-e command line
49     arguments:
50    
51     swish-e -h
52    
53     Typically, most if not all indexing settings are placed in a configuration
54     file (specified with the C<-c> switch). Once the configuration file is
55     setup indexing is initiated as:
56    
57     swish-e -c /path/to/config/file
58    
59     See L<SWISH-CONFIG|SWISH-CONFIG> for information on the configuration
60     file.
61    
62     Security Note: If the swish binary is named F<swish-search> then swish
63     will not allow any operation that would cause swish to write to the
64     index file.
65    
66     When indexing it may be advisable to index to a temporary file, and
67     then after indexing has successfully completed rename the file to the
68     final location. This is especially important when replacing an index
69     that is currently in use.
70    
71     swish-e -c swish.config -f index.tmp
72     [check return code from swish or look for err: output]
73     mv index.tmp index.swish-e
74    
75    
76     =head2 Indexing Command Line Arguments
77    
78     =over 4
79    
80     =item -i *directories and/or files* (input file)
81    
82     This specifies the directories and/or files to index. Directories will be
83     indexed recursively. This is typically specified in the L<configuration
84     file|SWISH-CONFIG> with the B<IndexDir> directive instead of on the
85     command line. Use of this switch overrides the configuration file
86     settings.
87    
88     =item -S [fs|http|prog] (document source/access mode)
89    
90     This specifies the method to use for accessing documents to index.
91     Can be either C<fs> for local indexing via the file system (the default),
92     C<http> for spidering, or C<prog> for reading documents from an external program.
93    
94     Located in the C<conf> directory are example configuration files that demonstrate
95     indexing with the different document source methods.
96    
97     See the L<SWISH-FAQ|SWISH-FAQ> for a discussion on the different indexing methods, and the difference
98     between spidering with the http method vs. using the file system method.
99    
100     =over 4
101    
102     =item fs - file system
103    
104     The C<fs> method simply reads files from a local (or networked) drive. This is the default
105     method if the C<-S> switch is not specified.
106     See L<SWISH-CONFIG|SWISH-CONFIG> for configuration
107     directives specific to the C<fs> method.
108    
109     =item http - spider a web server
110    
111     The C<http> method is used to spider web servers. It uses an included helper
112     program called F<swishspider> located in the F<src> directory. Swish needs to be able to locate
113     this program when using the C<http> method. See L<SWISH-CONFIG|SWISH-CONFIG> for configuration
114     directives specific to the C<http> method.
115    
116     By default, swish looks in the current directory for the F<swishspider> program, or in the directory
117     specified by the C<SwishSpiderDir> directive. The first line of the F<swishspider> program
118     (the "shebang" line) must point to the location of the Perl program (if your operating system uses it).
119    
120     Security Note: Under Windows swish passes the URLs fetched from remote documents through the shell (swish
121     uses the system() command for running F<swishspider> under Windows), and this may be considered
122     an additional security risk.
123    
124     The C<http> method is depreciated (or at least not very well appreciated). Consider using
125     the C<prog> method described below for spidering. There's a spider program available in the
126     F<prog-bin> directory for use with the C<prog> method.
127    
128     =item prog - general purpose access method
129    
130     The C<prog> method is new to Swish-e version 2.2. It's designed as a general
131     purpose method to feed documents to swish from an external program.
132    
133     For example, the external program can read a database (e.g. MySQL), spider a web
134     server, or convert documents from one format to another (e.g. pdf to html). Or,
135     you can simply use it to read the files of the file system (like C<-S fs>), yet provide
136     you with full control of what files are indexed.
137    
138     The external program name to run is passed to swish either by the L<IndexDir|SWISH-CONFIG/"item_IndexDir"> directive,
139     or via the C<-i> option. Additional parameters may be passed to the external program
140     via the L<SwishProgParameters|SWISH-CONFIG/"item_SwishProgParameters"> directive.
141    
142     A special name "stdin" may be used with C<-i> or L<IndexDir|SWISH-CONFIG/"item_IndexDir">
143     which tells swish to read from standard input instead of from an external program. See example below.
144    
145     The external program prints to standard output (which swish captures)
146     a set of headers followed by the content of the file to index. The output looks similar to
147     an email message or a HTTP document returned by a web server in that it includes name/value pairs
148     of headers, a blank line, and the content.
149    
150     The content length is determined by a content-length header
151     supplied to swish by the program; there is no "end of record" character or flag sent between documents.
152     Therefore, it is critical that the content-length header is correct. This is a common source of errors.
153    
154     One advantage of this method (over using filters, for example) is that the external program is run only once
155     for the entire indexing job, instead of once for every document. This avoids forking and creating
156     a new process for every document, and makes a huge difference when your external program is something like
157     perl that has a large startup cost.
158    
159     Here's a simple example written in Perl:
160    
161     #!/usr/local/bin/perl -w
162     use strict;
163    
164     # Build a document
165     my $doc = <<EOF;
166     <html>
167     <head>
168     <title>Document Title</title>
169     </head>
170     <body>
171     This is the text.
172     </body>
173     </html>
174     EOF
175    
176    
177     # Prepare the headers for swish
178     my $path = 'Example.file';
179     my $size = length $doc;
180     my $mtime = time;
181    
182     # Output the document (to swish)
183     print <<EOF;
184     Path-Name: $path
185     Content-Length: $size
186     Last-Mtime: $mtime
187     Document-Type: HTML
188    
189     EOF
190    
191     print $doc;
192    
193     The external program must pass to swish the C<Path-Name:> and C<Content-Length:> headers.
194     The optional C<Last-Mtime:> parameter is the last modification time of the file, and must
195     be a time stamp (seconds since the Epoch on your platform). You may override swish's
196     determination of document type (C<Indexcontents>) by using the C<Document-Type:> header.
197    
198     The above program only returns one document and exits, which is not very useful. Normally,
199     your program would read data from some source, such as files or a database, format as
200     XML, HTML, or text, and pass them to swish, one after another. The C<Content-Length:> header
201     tells swish where each document ends -- there is not any special "end of record" character or
202     marker.
203    
204     To index with the above example you need to make sure that the program is executable
205     (and that the path to perl is correct), and then call swish telling to run in C<prog>
206     mode, and the name of the program to use for input.
207    
208     % chmod 755 example.pl
209     % ./swish-e -S prog -i ./example.pl
210    
211     Programs can and should be tested prior to running swish. For example:
212    
213     % ./example.pl > test.out
214    
215     A few more useful example programs are provided in the swish-e distribution
216     located in the F<prog-bin> directory. Some include documentation:
217    
218     % cd prog-bin
219     % perldoc spider.pl
220    
221     Others are small examples that include comments:
222    
223     % cd prog-bin
224     % less DirTree.pl
225    
226     The F<spider.pl> program can be used as a replacement for the F<-S http> method.
227    
228     If you use the special program name "stdin" with C<-i> or L<IndexDir|SWISH-CONFIG/"item_IndexDir">
229     then swish-e will read from standard input instead of from a program. For example:
230    
231     % ./example.pl /path/to/data --count=1000 | ./swish-e -S prog -i stdin
232    
233     This is basically the same as using a swish-e configuration file of:
234    
235     SwishProgParameters /path/to/data --count=1000
236     IndexDir ./example.pl
237    
238     in a config file and running
239    
240     % ./swish-e -S prog -c swish.conf
241    
242     This gives an easy way to run swish without a configuration file
243     with a C<-S prog> program that requires parameters.
244    
245     Using "stdin" might also be useful for programs that call swish (instead of swish calling the
246     program).
247    
248     (The reason "stdin" is used instead of the more common "-" dash is due to the rotten way
249     swish parses the command line. This should be fixed in the future.)
250    
251     The C<prog> method bypasses some of the configuration parameters available
252     to the file system method -- settings such as
253     C<IndexOnly>, C<FileRules>, C<FileMatch> and C<FollowSymLinks>
254     are ignored when using the C<prog> method. It's expected that these operations
255     are better accomplished in the external program before passing the document onto swish. In
256     other words, when using the C<prog> method, only send the documents to swish
257     that you want indexed.
258    
259     You may use swish's filter feature with the C<prog> method, but performance will be better if you
260     run filtering programs from within your external program.
261    
262     B<Notes when using -S prog on MS Windows>
263    
264     Windows does not use the shebang (#!) line of a program to determine the program to run. So, when running,
265     for example, a perl program you will need to specify the perl.exe binary as the program, and use the
266     C<SwishProgParameters> to name the file.
267    
268     IndexDir e:/perl/bin/perl.exe
269     SwishProgParameters read_database.pl
270    
271     Swish will replace the forward slashes with backslashes before running the command specified with
272     C<IndexDir>. Swish uses the popen(3) command which passes the command through the shell.
273    
274    
275     =back
276    
277    
278     =item -f *indexfile* (index file)
279    
280     If you are indexing, this specifies the file to save the generated index in,
281     and you can only specify one file. See also B<IndexFile> in the L<configuration file|SWISH-CONFIG>.
282    
283     If you are searching, this specifies the index
284     files (one or more) to search from. The default index file is index.swish-e in the current directory.
285    
286     =item -c *file ...* (configuration files)
287    
288     Specify the configuration file(s) to use for indexing. This file contains many directives that
289     control how Swish-e proceeds.
290     See L<SWISH-CONFIG|SWISH-CONFIG> for a complete listing of configuration file directives.
291    
292    
293    
294     Example:
295    
296     swish-e -c docs.conf
297    
298    
299     If you specify a directory to index, an index file, or the verbose option on the command-line,
300     these values will override any specified in the configuration file.
301    
302     You can specify multiple configuration files. For example, you may have one configuration file
303     that has common site-wide settings, and another for a specific index.
304    
305     Examples:
306    
307     1) swish-e -c swish-e.conf
308     2) swish-e -i /usr/local/www -f index.swish-e -v -c swish-e.conf
309     3) swish-e -c swish-e.conf stopwords.conf
310    
311     =over 3
312    
313     =item 1
314    
315     The settings in the configuration file will be used to index a site.
316    
317     =item 2
318    
319     These command-line options will override anything in the configuration file.
320    
321     =item 3
322    
323     The variables in swish-e.conf will be read, then the variable in stopwords.conf will be read.
324     Note that if the same variables occur in both files, older values may be written over.
325    
326     =back
327    
328     =item -e (economy mode)
329    
330     For large sites indexing may require more RAM than is available. The C<-e> switch tells swish to use
331     disk space to store data structures while indexing, saving memory. This option is recommended if
332     swish uses so much RAM that the computer begins to swap excessively, and you cannot increase available
333     memory. The trade-off is longer indexing times, and a busy disk drive.
334    
335     =item -l (symbolic links)
336    
337     Specifying this option tells swish to follow symbolic links when indexing.
338     The configuration file value B<FollowSymLinks> will override the command-line value.
339    
340     The default is not to follow symlinks. A small improvement in indexing time my result
341     from enabling FollowSymLinks since swish does not need to stat every directory and file
342     processed to determine if it is a symbolic link.
343    
344     =item -N path (index only newer files)
345    
346     The C<-N> option takes a path to a file, and only files I<newer> than the specified
347     file will be indexed. This is helpful for creating incremental indexes -- that is,
348     indexes that contain just files added since the last full index was created of all files.
349    
350     Example (bad example)
351    
352     swish-e -c config.file -N index.swish-e -f index.new
353    
354     This will index as normal, but only files with a modified date newer
355     than F<index.swish-e> will be indexed.
356    
357     This is a bad example because it uses F<index.swish-e> which one might assume
358     was the date of last indexing. The problem is that files might have been added
359     between the time indexing read the directory and when the F<index.swish-e> file
360     was created -- which can be quite a bit of time for very large indexing jobs.
361    
362     The only solution is to prevent any new file additions while full indexing is running.
363     If this is impossible then it will be slightly better to do this:
364    
365     Full indexing:
366    
367     touch indexing_time.file
368     swish-e -c config.file -f index.tmp
369     mv index.tmp index.full
370    
371     Incremental indexing:
372    
373     swish-e -c config.file -N indexing_time.file -f index.tmp
374     mv index.tmp index.incremental
375    
376     Then search with
377    
378     swish-e -w foo -f index.full index.incremental
379    
380     or merge the indexes
381    
382     swish-e -M index.full index.incremental index.tmp
383     mv index.tmp index.swish-e
384     swish-e -w foo
385    
386    
387     =item -v [0|1|2|3] (verbosity level)
388    
389     The C<-v> option can take a numerical value from 0 to 3.
390     Specify 0 for completely silent operation and 3 for detailed reports.
391    
392     If no value is given then 1 is assumed.
393     See also B<IndexReport> in the L<configuration file|SWISH-CONFIG>.
394    
395     Warnings and errors are reported regardless of the verbosity level. In addition,
396     all error and warnings are written to standard out. This is for historical reasons (many
397     scripts exist that parse standard out for error messages).
398    
399     =back
400    
401     =head1 SEARCHING
402    
403     The following command line arguments are available when searching with Swish-e. These switches are used
404     to select the index to search, what fields to search, and how and what to print as results.
405    
406     This section just lists the available command line arguments and their usage.
407     Please see L<SWISH-SEARCH|SWISH-SEARCH> for detailed searching instructions.
408    
409     B<Warning>: If using Swish-e via a CGI interface, please see L<CGI Danger!|SWISH-SEARCH/"CGI Danger!">
410    
411     Security Note: If the swish binary is named F<swish-search> then swish will not allow any operation that
412     would cause swish to write to the index file.
413    
414     =head2 Searching Command Line Arguments
415    
416     =over 4
417    
418     =item -w *word1 word2 ...* (query words)
419    
420     This performs a case-insensitive search using a number of keywords.
421     If no index file to search is specified (via the C<-f> switch), swish-e will try to search a file called
422     index.swish-e in the current directory.
423    
424     swish-e -w word
425    
426     Phrase searching is accomplished by placing the quote delimiter (a double-quote by default) around
427     the search phrase.
428    
429     swish-e -w 'word or "this phrase"'
430    
431     Search would should be protected from the shell by quotes. Typically, this is single quotes when
432     running under Unix.
433    
434     Under Windows F<command.com> you may not need to use quotes, but you will need to
435     backslash the quotes used to delimit phrases:
436    
437     swish-e -w \"a phrase\"
438    
439     The phrase delimiter can be set with the C<-P> switch.
440    
441     The search may be limited to a I<MetaName>.
442     For example:
443    
444     swish-e -w meta1=(foo or baz)
445    
446     will only search within the B<meta1> tag.
447    
448     Please see L<SWISH-SEARCH|SWISH-SEARCH> for a description of MetaNames.
449    
450    
451    
452     =item -f *file1 file2 ...* (index files)
453    
454     Specifies the index file(s) used while searching. More than one file may be listed, and each
455     file will be searched. If no C<-f> switch is specified then the file F<index.swish-e> in the current
456     directory will be used as the index file.
457    
458     =item -m *number* (max results)
459    
460     While searching, this specifies the maximum number of results to return.
461     The default is to return all results.
462    
463     This switch is often used in conjunction with the C<-b> switch to return results one
464     page at a time (strongly recommended for large indexes).
465    
466     =item -b *number* (beginning result)
467    
468     Sets the I<begining> search result to return (records are numbered from 1). This switch can be used
469     with the C<-m> switch to return results in groups or pages.
470    
471     Example:
472    
473     swish-e -w 'word' -b 1 -m 20 # first 'page'
474     swish-e -w 'word' -b 21 -m 20 # second 'page'
475    
476     =item -t HBthec (context searching)
477    
478     The C<-t> option allows you to search for words that exist only
479     in specific HTML tags. Each character in the string you
480     specify in the argument to this option represents a
481     different tag in which to search for the word. H means all HEAD
482     tags, B stands for BODY tags, t is all TITLE tags, h is H1
483     to H6 (header) tags, e is emphasized tags (this may be B, I,
484     EM, or STRONG), and c is HTML comment tags
485    
486     search only in header (<H*>) tags
487    
488     swish-c -w word -t h
489    
490     =item -d *string* (delimiter)
491    
492     Set the delimiter used when printing results. By default, Swish-e separates the output fields by a
493     space, and places double-quotes around the document title. This output may be hard to parse, so it
494     is recommended to use C<-d> to specify a character or string used as a separator between fields.
495    
496     The string C<dq> means "double-quotes".
497    
498     swish-e -w word -d , # single char
499     swish-e -w word -d :: # string
500     swish-e -w word -d '"' # double quotes under Unix
501     swish-e -w word -d \" # double quotes under Windows
502     swish-e -w word -d dq # double quotes
503    
504     The following control characters may also be specified: C<\t \r \n \f>.
505    
506     =item -P *character*
507    
508     Sets the delimiter used for phrase searches. The default is double quotes C<">.
509    
510     Some examples under bash: (be careful about you shell metacharacters)
511    
512     swish-e -P ^ -w 'title=^words in a phrase^'
513     swish-e -P \' -w "title='words in a pharse"'
514    
515    
516     =item -p *property1 property2 ...* (display properties)
517    
518     This causes swish to print the listed property in the search results. The properties
519     are returned in the order they are listed in the C<-p> argument.
520    
521     Properties are defined by the B<ProperNames> directive in the configuration file (see L<SWISH-CONFIG|SWISH-CONFIG>)
522     and properties must also be defined in B<MetaNames>. Swish stores the text of the meta name as a I<property>, and
523     then will return this text while searching if this option is used.
524    
525     Properties are very useful for returning data included in a source documnet without having to re-read
526     the source document while searching. For example, this could be used to return a short document description.
527     See also see B<Document Summeries> and L<PropertyNames|SWISH-CONFIG/"item_PropertyNames"> in L<SWISH-CONFIG|SWISH-CONFIG>.
528    
529     To return the subject and category properties while indexing.
530    
531     swish-e -w word -p subject category
532    
533     Properties are returned in double quotes. If a property contains a double quote it is HTML escaped (&quot;).
534     See the C<-x> switch for a more advanced method of returning a list of properties.
535    
536    
537     NOTE: it is necessary to have indexed with the proper
538     PropertyNames directive in the user config file in order to
539     use this option.
540    
541     =item -s *property [asc|desc] ...* (sort)
542    
543     Normally, search results are printed out in order of relevancy, with the most relevant listed first.
544     The C<-s> sort switch allows you to sort results in order of a specified I<property>, where a I<property>
545     was defined using the B<MetaNames> and B<PropertyNames> directives during indexing
546     (see L<SWISH-CONFIG|SWISH-CONFIG>).
547    
548     The string passed can include the strings C<asc> and C<desc> to specify the sort order, and more than
549     one property may be specified to sort on more than one key.
550    
551     Examples:
552    
553     sort by title property ascending order
554    
555     -s title
556    
557     sort descending by title, ascending by name
558    
559     -s title desc name asc
560    
561     =item -L limit to a range of property values (Limit)
562    
563     B<This is an experimental feature!>
564    
565     The C<-L> switch can be used to limit search results to a range of property values
566    
567     Example:
568    
569     swish-e -w foo -L swishtitle a m
570    
571     finds all documents that contain the word C<foo>, and where the
572     document's title is in the range of C<a> to C<m>, inclusive.
573     By default, the case of the property is ignored, but this can be
574     changed by using L<PropertyNamesCompareCase|SWISH-CONFIG/"item_PropertyNamesCompareCase">
575     configuation directive.
576    
577     Limiting may be done with user-defined properties, as well.
578    
579     For example, if you indexed documents that contain a created timestamp in a meta tag:
580    
581     <meta name="created_on" content="982648324">
582    
583     Then you tell Swish that you have a property called C<created_on>, and that
584     it's a timestamp.
585    
586     PropertyNamesDate created_on
587    
588     After indexing you will be able to limit documents to a range of timestamps:
589    
590     -w foo -L created_on 946684800 949363199
591    
592     will find documents containing the word foo and that have a created_on
593     date from the start of Jan 1, 2000 to the end of Jan 31, 2000.
594    
595     Note: swish currently does not parse dates; Unix timestamps must be used.
596    
597     Two special formats can be used:
598    
599     -L swishtitle <= m
600     -L swishtitle >= m
601    
602     Finds titles less than or equal, or grater than or equal to the letter C<m>.
603    
604     This feature will not work with C<swishrank> or C<swishdbfile> properties.
605    
606     This feature takes advantages of the pre-sorted tables built by swish during indexing to
607     make this feature fast while searching.
608     You should see in the indexing output a line such as:
609    
610     6 properties sorted.
611    
612     That indicates that six pre-sorted tables were built during indexing.
613     By default, all properties are presorted while indexing.
614     What properties are pre-sorted can be controlled by the configuration parameter C<PreSortedIndex>.
615    
616     Using the C<-L> switch on a property that was not pre-sorted will still work, but may be I<much>
617     slower during searching.
618    
619     This is an experimental feature, and its use and interface are subject to change.
620    
621     =item -x formatstring (extended output format)
622    
623     The C<-x> switch defines the output format string.
624     The format string can contain plain text and property names (including swish-defined internal property names)
625     and is used to generate the output for every result.
626     In addition, the output format of the property name can be controlled with C-like printf format strings.
627     This feature overrides the cmdline switches C<-d> and C<-p>,
628     and a warning will be generated if C<-d> or C<-p> are used with C<-x>.
629    
630     For example, to return just the title, one per line, in the search results:
631    
632     swish-e -w ... -x '<swishtitle>\n' ...
633    
634     Note: the C<\n> may need to be protected from your shell.
635    
636     See also L<ResultExtFormatName|SWISH-CONFIG/"item_ResultExtFormatName"> for a way to define I<named>
637     format strings in the swish configuration file.
638    
639     B<Format of "formatstring":>
640    
641     "text<propertyname>text<propertyname fmt=propfmtstr>text..."
642    
643    
644     Where B<propertyname> is:
645    
646     =over 4
647    
648     =item *
649    
650     the name of a user property as specified with the config file
651     directive "PropertyNames"
652    
653     =item *
654    
655     the name of a swish Auto property (see below). These properties are
656     defined automatically by swish -- you do not need to specify them
657     with PropertyNames directive. (This may change in the future.)
658    
659     =back
660    
661     propertynames must be placed within "E<lt>" and "E<gt>".
662    
663     B<User properties:>
664    
665     Swish-e allows you to specify certain META tags within your documents that can be used as B<document properties>.
666     The contents of any META tag that has been identified as a document property can be returned as
667     part of the search results. Doucment properties must be defined while indexing using the B<PropertyNames>
668     configuration directive (see L<SWISH-CONFIG|SWISH-CONFIG/"item_PropertyNames">).
669    
670     Examples of user-defined PropertyNames:
671    
672     <keywords>
673     <author>
674     <deliveredby>
675     <reference>
676     <id>
677    
678    
679     B<Auto properties:>
680    
681     Swish defines a number of "Auto" properties for each document indexed.
682     These are available for output when using the C<-x> format.
683    
684     Name Type Contents
685     -------------- ------- ----------------------------------------------
686     swishreccount Integer Result record counter
687     swishtitle String Document title
688     swishrank Integer Result rank for this hit
689     swishdocpath String URL or filepath to document
690     swishdocsize Integer Document size in bytes
691     swishlastmodified Date Last modified date of document
692     swishdescription String Description of document (see:StoreDescription)
693     swishdbfile String Path of swish database indexfile
694    
695     The Auto properties can also be specified using shortcuts:
696    
697     Shortcut Property Name
698     -------- --------------
699     %c swishreccount
700     %d swishdescription
701     %D swishlastmodified
702     %I swishdbfile
703     %p swishdocpath
704     %r swishrank
705     %l swishdocsize
706     %t swishtitle
707    
708     For example, these are equivalent:
709    
710     -x '<swishrank>:<swishdocpath>:<swishtitle>\n'
711     -x '%r:%p:%t\n'
712    
713     Use a double percent sign "%%" to enter a literal percent sign in the output.
714    
715    
716     B<Formatstrings of properties:>
717    
718     Properties listed in an C<-x> format string can include format control strings.
719     These "propertyformats" are used to control how the contents of the associated property are printed.
720     Property formats are used like C-language printf formats.
721     The property format is specified by including the attribute "fmt" within the property tag.
722    
723     Format strings cannot be used with the "%" shortcuts described above.
724    
725     General syntax:
726    
727     -x '<propertyname fmt="propfmtstr">'
728    
729     where C<subfmt> controls the output format of C<propertyname>.
730    
731     Examples of property format strings:
732    
733     date type: <swishlastmodified fmt="%d.%m.%Y">
734     string type: <swishtitle fmt="%-40.35s">
735     integer type: <swishreccount fmt=/%8.8d/>
736    
737     Please see the manual pages for strftime(3) and sprintf(3) for an explanation of
738     format strings. Note: some versions of strftime do not offer the %s format string
739     (number of seconds since the Epoch), so swish provides a special format string "%ld"
740     to display the number of seconds since the Epoch.
741    
742     The first character of a property format string defines the delimiter for the format string.
743     For example,
744    
745     -x "<author fmt=[%20s]> ...\n"
746     -x "<author fmt='%20s'> ...\n"
747     -x "<author fmt=/%20s/> ...\n"
748    
749    
750     B<Standard predefined formats:>
751    
752     If you ommit the sub-format, the following formats are used:
753    
754     String type: "%s" (like printf char *)
755     Integer type: "%d" (like printf int)
756     Float type: "%f" (like printf double)
757     Date type: "%Y-%m-%d %H:%M:%S" (like strftime)
758    
759    
760     B<Text in "formatstring" or "propfmtstr":>
761    
762     Text will be output as-is in format strings (and property format strings).
763     Special characters can be escaped with a backslash.
764     To get a new line for each result hit, you have to include
765     the Newline-Character "\n" at the end of "fmtstr".
766    
767     -x "<swishreccount>|<swishrank>|<swishdocpath>\n"
768     -x "Count=<swishreccount>, Rank=<swishrank>\n"
769     -x "Title=\<b\><swishtitle>\</b\>"
770     -x 'Date: <swishlastmodified fmt="%m/%d/%Y">\n'
771     -x 'Date in seconds: <swishlastmodified fmt=/%ld/>\n'
772    
773     B<Control/Escape charcters:>
774    
775     you can use C-like control escapes in the format string:
776    
777     known controls: \a, \b, \f, \n, \r, \t, \v,
778     digit escapes: \xhexdigits \0octaldigits
779     character escapes: \anychar
780    
781     Example,
782    
783     swish -x "%c\t%r\t%p\t\"<swishtitle fmt=/%40s/>\"\n"
784    
785     B<Examples of -x format strings:>
786    
787     -x "%c|%r|%p|%t|%D|%d\n"
788     -x "%c|%r|%p|%t|<swishdate fmt=/%A, %d. %B %Y/>|%d\n"
789     -x "<swishrank>\t<swishdocpath>\t<swishtitle>\t<keywords>\n
790     -x "xml_out: \<title\><swishtitle>\>\</title\>\n"
791     -x "xml_out: <swishtitle fmt='<title>%s</title>'>\n"
792    
793     =item -H [0|1|2|3|<n>] (header output verbosity)
794    
795     The C<-H n> switch generates extened I<header> output. This is most useful when searching more than one
796     index file at a time, either by specifying more than one index file with the C<-f> switch, or when searching
797     a merged index file. In these cases, C<-H 2> will generate a set of headers specific to each index file.
798     This gives access to the settings used to generate each index file.
799    
800     Even when searching a single index file, C<-H n> will provided additional information about the index file,
801     how it was indexed, and how swish is interperting the query.
802    
803     -H 0 : print no header information, output only search result entries.
804     -H 1 : print standard result header (default).
805     -H 2 : print additional header information for each searched index file.
806     -H 3 : enhanced header output (e.g. print stopwords).
807     -H 9 : print diagnostic information in the header of the results (changed from: C<-v 4>)
808    
809    
810     =back
811    
812    
813     =head1 OTHER SWITCHES
814    
815     =over 4
816    
817     =item -V (version)
818    
819     Print the current version.
820    
821     =item -k *letter* (print out keywords)
822    
823     The C<-k> switch is used for testing and will cause swish to print out all keywords
824     in the index beginning with that letter. You may enter C<-k '*'> to generate a list of all words indexed
825     by swish.
826    
827     =item -D *index file* (debug index)
828    
829     The -D option is no longer supported in version 2.2.
830    
831     =item -T *options* (trace/debug swish)
832    
833     The -T option is used to print out information that may be helpful when debugging swish-e's
834     operation. This option replaced the C<-D> option of previous versions.
835    
836     Running C<-T help> will print out a list of available *options*
837    
838    
839     =back
840    
841     =head1 Merging Index Files
842    
843     At times it can be useful to merge different index files into one file for searching.
844     This could be because you want to keep separate site indexes and a common one for a global search, or
845     because your site is very large and Swish-e runs out of memory if you try to index it directly.
846    
847     You can only merge only indexes that were indexed with common settings
848     (e.g. don't mix stemming and non-stemming indexes, or indexes with different WordCharacter settings, etc.).
849    
850     usage: swish-e [-v (num)] [-c file] -M index1 index2 ... outputfile
851    
852     Due to the structure of the swish-e index, merging may or may not require less memory than indexing
853     all files at one time.
854    
855    
856     =over 4
857    
858     =item -M *file file ...* (merge)
859    
860     This allows you to merge two or more index files - the last file you specify on the
861     list will be the output file.
862    
863     Merging removes all redundant file and word data. To estimate how much memory the operation will need,
864     sum up the sizes of the files to be merged and divide by two.
865     That's about the maximum amount of memory that will be used.
866    
867     You can use the C<-v> option to produce feedback while merging and the C<-c> option with a
868     configuration file to include new administrative information in the new index file.
869    
870     =item -c *configuration file*
871    
872     Specify a configuration file while indexing to add administrative information to the output index file.
873    
874     =back
875    
876     =head1 Document Info
877    
878     $Id: SWISH-RUN.pod,v 1.23 2002/08/22 22:58:39 whmoseley Exp $
879    
880     .

  ViewVC Help
Powered by ViewVC 1.1.22