/[MITgcm]/mitgcm.org/devel/buildweb/pkg/swish-e/pod/SWISH-RUN.pod
ViewVC logotype

Contents of /mitgcm.org/devel/buildweb/pkg/swish-e/pod/SWISH-RUN.pod

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.1.1.1 - (show annotations) (download) (vendor branch)
Fri Sep 20 19:47:29 2002 UTC (22 years, 10 months ago) by adcroft
Branch: Import, MAIN
CVS Tags: baseline, HEAD
Changes since 1.1: +0 -0 lines
Error occurred while calculating annotation data.
Importing web-site building process.

1 =head1 NAME
2
3 SWISH-RUN - Running Swish-e and Command Line Switches
4
5 =head1 OVERVIEW
6
7 The Swish-e program is controlled by command line arguments (called
8 I<switches>). Often, it is run manually from a shell (command
9 prompt), or from a program such as a CGI script that passes the command
10 line arguments to swish.
11
12 Note: A number of the command line switches may be specified in the
13 Swish-e configuration file specified with the C<-c> command line argument.
14 Please see L<SWISH-CONFIG|SWISH-CONFIG> for a complete description of
15 available configuration file directives.
16
17 There are two basic operating modes of Swish-e: indexing and searching.
18 There are command line arguments that are unique to each mode, and
19 others that apply to both (yet may have different meaning depending on
20 the operating mode). These command line arguments are listed below,
21 grouped by:
22
23 L<INDEXING|/"INDEXING> -- describes the command line arguments used
24 while indexing.
25
26 L<SEARCHING|/"SEARCHING> -- lists the command line arguments used while
27 searching.
28
29 L<OTHER SWITCHES|/"OTHER SWITCHES> -- lists switches that don't apply
30 to searching or indexing.
31
32 Beginning with Swish-e version 2.1, you may embed its search engine into
33 your applications. Please see L<SWISH-LIBRARY|SWISH-LIBRARY>.
34
35
36 =head1 INDEXING
37
38 Swish-e indexing is initiated by passing I<command line arguments> to
39 swish. The command line arguments used for I<searching> are described
40 in L<SEARCHING|/"SEARCHING>. Also, see L<SWISH-SEARCH|SWISH-SEARCH>
41 for examples of searching with Swish-e.
42
43 Swish-e usage:
44
45 swish-e [-i dir file ... ] [-c file] [-f file] [-l] \
46 [-v (num)] [-S method(fs|http|prog)] [-N path]
47
48 The C<-h> switch (help) will list the available Swish-e command line
49 arguments:
50
51 swish-e -h
52
53 Typically, most if not all indexing settings are placed in a configuration
54 file (specified with the C<-c> switch). Once the configuration file is
55 setup indexing is initiated as:
56
57 swish-e -c /path/to/config/file
58
59 See L<SWISH-CONFIG|SWISH-CONFIG> for information on the configuration
60 file.
61
62 Security Note: If the swish binary is named F<swish-search> then swish
63 will not allow any operation that would cause swish to write to the
64 index file.
65
66 When indexing it may be advisable to index to a temporary file, and
67 then after indexing has successfully completed rename the file to the
68 final location. This is especially important when replacing an index
69 that is currently in use.
70
71 swish-e -c swish.config -f index.tmp
72 [check return code from swish or look for err: output]
73 mv index.tmp index.swish-e
74
75
76 =head2 Indexing Command Line Arguments
77
78 =over 4
79
80 =item -i *directories and/or files* (input file)
81
82 This specifies the directories and/or files to index. Directories will be
83 indexed recursively. This is typically specified in the L<configuration
84 file|SWISH-CONFIG> with the B<IndexDir> directive instead of on the
85 command line. Use of this switch overrides the configuration file
86 settings.
87
88 =item -S [fs|http|prog] (document source/access mode)
89
90 This specifies the method to use for accessing documents to index.
91 Can be either C<fs> for local indexing via the file system (the default),
92 C<http> for spidering, or C<prog> for reading documents from an external program.
93
94 Located in the C<conf> directory are example configuration files that demonstrate
95 indexing with the different document source methods.
96
97 See the L<SWISH-FAQ|SWISH-FAQ> for a discussion on the different indexing methods, and the difference
98 between spidering with the http method vs. using the file system method.
99
100 =over 4
101
102 =item fs - file system
103
104 The C<fs> method simply reads files from a local (or networked) drive. This is the default
105 method if the C<-S> switch is not specified.
106 See L<SWISH-CONFIG|SWISH-CONFIG> for configuration
107 directives specific to the C<fs> method.
108
109 =item http - spider a web server
110
111 The C<http> method is used to spider web servers. It uses an included helper
112 program called F<swishspider> located in the F<src> directory. Swish needs to be able to locate
113 this program when using the C<http> method. See L<SWISH-CONFIG|SWISH-CONFIG> for configuration
114 directives specific to the C<http> method.
115
116 By default, swish looks in the current directory for the F<swishspider> program, or in the directory
117 specified by the C<SwishSpiderDir> directive. The first line of the F<swishspider> program
118 (the "shebang" line) must point to the location of the Perl program (if your operating system uses it).
119
120 Security Note: Under Windows swish passes the URLs fetched from remote documents through the shell (swish
121 uses the system() command for running F<swishspider> under Windows), and this may be considered
122 an additional security risk.
123
124 The C<http> method is depreciated (or at least not very well appreciated). Consider using
125 the C<prog> method described below for spidering. There's a spider program available in the
126 F<prog-bin> directory for use with the C<prog> method.
127
128 =item prog - general purpose access method
129
130 The C<prog> method is new to Swish-e version 2.2. It's designed as a general
131 purpose method to feed documents to swish from an external program.
132
133 For example, the external program can read a database (e.g. MySQL), spider a web
134 server, or convert documents from one format to another (e.g. pdf to html). Or,
135 you can simply use it to read the files of the file system (like C<-S fs>), yet provide
136 you with full control of what files are indexed.
137
138 The external program name to run is passed to swish either by the L<IndexDir|SWISH-CONFIG/"item_IndexDir"> directive,
139 or via the C<-i> option. Additional parameters may be passed to the external program
140 via the L<SwishProgParameters|SWISH-CONFIG/"item_SwishProgParameters"> directive.
141
142 A special name "stdin" may be used with C<-i> or L<IndexDir|SWISH-CONFIG/"item_IndexDir">
143 which tells swish to read from standard input instead of from an external program. See example below.
144
145 The external program prints to standard output (which swish captures)
146 a set of headers followed by the content of the file to index. The output looks similar to
147 an email message or a HTTP document returned by a web server in that it includes name/value pairs
148 of headers, a blank line, and the content.
149
150 The content length is determined by a content-length header
151 supplied to swish by the program; there is no "end of record" character or flag sent between documents.
152 Therefore, it is critical that the content-length header is correct. This is a common source of errors.
153
154 One advantage of this method (over using filters, for example) is that the external program is run only once
155 for the entire indexing job, instead of once for every document. This avoids forking and creating
156 a new process for every document, and makes a huge difference when your external program is something like
157 perl that has a large startup cost.
158
159 Here's a simple example written in Perl:
160
161 #!/usr/local/bin/perl -w
162 use strict;
163
164 # Build a document
165 my $doc = <<EOF;
166 <html>
167 <head>
168 <title>Document Title</title>
169 </head>
170 <body>
171 This is the text.
172 </body>
173 </html>
174 EOF
175
176
177 # Prepare the headers for swish
178 my $path = 'Example.file';
179 my $size = length $doc;
180 my $mtime = time;
181
182 # Output the document (to swish)
183 print <<EOF;
184 Path-Name: $path
185 Content-Length: $size
186 Last-Mtime: $mtime
187 Document-Type: HTML
188
189 EOF
190
191 print $doc;
192
193 The external program must pass to swish the C<Path-Name:> and C<Content-Length:> headers.
194 The optional C<Last-Mtime:> parameter is the last modification time of the file, and must
195 be a time stamp (seconds since the Epoch on your platform). You may override swish's
196 determination of document type (C<Indexcontents>) by using the C<Document-Type:> header.
197
198 The above program only returns one document and exits, which is not very useful. Normally,
199 your program would read data from some source, such as files or a database, format as
200 XML, HTML, or text, and pass them to swish, one after another. The C<Content-Length:> header
201 tells swish where each document ends -- there is not any special "end of record" character or
202 marker.
203
204 To index with the above example you need to make sure that the program is executable
205 (and that the path to perl is correct), and then call swish telling to run in C<prog>
206 mode, and the name of the program to use for input.
207
208 % chmod 755 example.pl
209 % ./swish-e -S prog -i ./example.pl
210
211 Programs can and should be tested prior to running swish. For example:
212
213 % ./example.pl > test.out
214
215 A few more useful example programs are provided in the swish-e distribution
216 located in the F<prog-bin> directory. Some include documentation:
217
218 % cd prog-bin
219 % perldoc spider.pl
220
221 Others are small examples that include comments:
222
223 % cd prog-bin
224 % less DirTree.pl
225
226 The F<spider.pl> program can be used as a replacement for the F<-S http> method.
227
228 If you use the special program name "stdin" with C<-i> or L<IndexDir|SWISH-CONFIG/"item_IndexDir">
229 then swish-e will read from standard input instead of from a program. For example:
230
231 % ./example.pl /path/to/data --count=1000 | ./swish-e -S prog -i stdin
232
233 This is basically the same as using a swish-e configuration file of:
234
235 SwishProgParameters /path/to/data --count=1000
236 IndexDir ./example.pl
237
238 in a config file and running
239
240 % ./swish-e -S prog -c swish.conf
241
242 This gives an easy way to run swish without a configuration file
243 with a C<-S prog> program that requires parameters.
244
245 Using "stdin" might also be useful for programs that call swish (instead of swish calling the
246 program).
247
248 (The reason "stdin" is used instead of the more common "-" dash is due to the rotten way
249 swish parses the command line. This should be fixed in the future.)
250
251 The C<prog> method bypasses some of the configuration parameters available
252 to the file system method -- settings such as
253 C<IndexOnly>, C<FileRules>, C<FileMatch> and C<FollowSymLinks>
254 are ignored when using the C<prog> method. It's expected that these operations
255 are better accomplished in the external program before passing the document onto swish. In
256 other words, when using the C<prog> method, only send the documents to swish
257 that you want indexed.
258
259 You may use swish's filter feature with the C<prog> method, but performance will be better if you
260 run filtering programs from within your external program.
261
262 B<Notes when using -S prog on MS Windows>
263
264 Windows does not use the shebang (#!) line of a program to determine the program to run. So, when running,
265 for example, a perl program you will need to specify the perl.exe binary as the program, and use the
266 C<SwishProgParameters> to name the file.
267
268 IndexDir e:/perl/bin/perl.exe
269 SwishProgParameters read_database.pl
270
271 Swish will replace the forward slashes with backslashes before running the command specified with
272 C<IndexDir>. Swish uses the popen(3) command which passes the command through the shell.
273
274
275 =back
276
277
278 =item -f *indexfile* (index file)
279
280 If you are indexing, this specifies the file to save the generated index in,
281 and you can only specify one file. See also B<IndexFile> in the L<configuration file|SWISH-CONFIG>.
282
283 If you are searching, this specifies the index
284 files (one or more) to search from. The default index file is index.swish-e in the current directory.
285
286 =item -c *file ...* (configuration files)
287
288 Specify the configuration file(s) to use for indexing. This file contains many directives that
289 control how Swish-e proceeds.
290 See L<SWISH-CONFIG|SWISH-CONFIG> for a complete listing of configuration file directives.
291
292
293
294 Example:
295
296 swish-e -c docs.conf
297
298
299 If you specify a directory to index, an index file, or the verbose option on the command-line,
300 these values will override any specified in the configuration file.
301
302 You can specify multiple configuration files. For example, you may have one configuration file
303 that has common site-wide settings, and another for a specific index.
304
305 Examples:
306
307 1) swish-e -c swish-e.conf
308 2) swish-e -i /usr/local/www -f index.swish-e -v -c swish-e.conf
309 3) swish-e -c swish-e.conf stopwords.conf
310
311 =over 3
312
313 =item 1
314
315 The settings in the configuration file will be used to index a site.
316
317 =item 2
318
319 These command-line options will override anything in the configuration file.
320
321 =item 3
322
323 The variables in swish-e.conf will be read, then the variable in stopwords.conf will be read.
324 Note that if the same variables occur in both files, older values may be written over.
325
326 =back
327
328 =item -e (economy mode)
329
330 For large sites indexing may require more RAM than is available. The C<-e> switch tells swish to use
331 disk space to store data structures while indexing, saving memory. This option is recommended if
332 swish uses so much RAM that the computer begins to swap excessively, and you cannot increase available
333 memory. The trade-off is longer indexing times, and a busy disk drive.
334
335 =item -l (symbolic links)
336
337 Specifying this option tells swish to follow symbolic links when indexing.
338 The configuration file value B<FollowSymLinks> will override the command-line value.
339
340 The default is not to follow symlinks. A small improvement in indexing time my result
341 from enabling FollowSymLinks since swish does not need to stat every directory and file
342 processed to determine if it is a symbolic link.
343
344 =item -N path (index only newer files)
345
346 The C<-N> option takes a path to a file, and only files I<newer> than the specified
347 file will be indexed. This is helpful for creating incremental indexes -- that is,
348 indexes that contain just files added since the last full index was created of all files.
349
350 Example (bad example)
351
352 swish-e -c config.file -N index.swish-e -f index.new
353
354 This will index as normal, but only files with a modified date newer
355 than F<index.swish-e> will be indexed.
356
357 This is a bad example because it uses F<index.swish-e> which one might assume
358 was the date of last indexing. The problem is that files might have been added
359 between the time indexing read the directory and when the F<index.swish-e> file
360 was created -- which can be quite a bit of time for very large indexing jobs.
361
362 The only solution is to prevent any new file additions while full indexing is running.
363 If this is impossible then it will be slightly better to do this:
364
365 Full indexing:
366
367 touch indexing_time.file
368 swish-e -c config.file -f index.tmp
369 mv index.tmp index.full
370
371 Incremental indexing:
372
373 swish-e -c config.file -N indexing_time.file -f index.tmp
374 mv index.tmp index.incremental
375
376 Then search with
377
378 swish-e -w foo -f index.full index.incremental
379
380 or merge the indexes
381
382 swish-e -M index.full index.incremental index.tmp
383 mv index.tmp index.swish-e
384 swish-e -w foo
385
386
387 =item -v [0|1|2|3] (verbosity level)
388
389 The C<-v> option can take a numerical value from 0 to 3.
390 Specify 0 for completely silent operation and 3 for detailed reports.
391
392 If no value is given then 1 is assumed.
393 See also B<IndexReport> in the L<configuration file|SWISH-CONFIG>.
394
395 Warnings and errors are reported regardless of the verbosity level. In addition,
396 all error and warnings are written to standard out. This is for historical reasons (many
397 scripts exist that parse standard out for error messages).
398
399 =back
400
401 =head1 SEARCHING
402
403 The following command line arguments are available when searching with Swish-e. These switches are used
404 to select the index to search, what fields to search, and how and what to print as results.
405
406 This section just lists the available command line arguments and their usage.
407 Please see L<SWISH-SEARCH|SWISH-SEARCH> for detailed searching instructions.
408
409 B<Warning>: If using Swish-e via a CGI interface, please see L<CGI Danger!|SWISH-SEARCH/"CGI Danger!">
410
411 Security Note: If the swish binary is named F<swish-search> then swish will not allow any operation that
412 would cause swish to write to the index file.
413
414 =head2 Searching Command Line Arguments
415
416 =over 4
417
418 =item -w *word1 word2 ...* (query words)
419
420 This performs a case-insensitive search using a number of keywords.
421 If no index file to search is specified (via the C<-f> switch), swish-e will try to search a file called
422 index.swish-e in the current directory.
423
424 swish-e -w word
425
426 Phrase searching is accomplished by placing the quote delimiter (a double-quote by default) around
427 the search phrase.
428
429 swish-e -w 'word or "this phrase"'
430
431 Search would should be protected from the shell by quotes. Typically, this is single quotes when
432 running under Unix.
433
434 Under Windows F<command.com> you may not need to use quotes, but you will need to
435 backslash the quotes used to delimit phrases:
436
437 swish-e -w \"a phrase\"
438
439 The phrase delimiter can be set with the C<-P> switch.
440
441 The search may be limited to a I<MetaName>.
442 For example:
443
444 swish-e -w meta1=(foo or baz)
445
446 will only search within the B<meta1> tag.
447
448 Please see L<SWISH-SEARCH|SWISH-SEARCH> for a description of MetaNames.
449
450
451
452 =item -f *file1 file2 ...* (index files)
453
454 Specifies the index file(s) used while searching. More than one file may be listed, and each
455 file will be searched. If no C<-f> switch is specified then the file F<index.swish-e> in the current
456 directory will be used as the index file.
457
458 =item -m *number* (max results)
459
460 While searching, this specifies the maximum number of results to return.
461 The default is to return all results.
462
463 This switch is often used in conjunction with the C<-b> switch to return results one
464 page at a time (strongly recommended for large indexes).
465
466 =item -b *number* (beginning result)
467
468 Sets the I<begining> search result to return (records are numbered from 1). This switch can be used
469 with the C<-m> switch to return results in groups or pages.
470
471 Example:
472
473 swish-e -w 'word' -b 1 -m 20 # first 'page'
474 swish-e -w 'word' -b 21 -m 20 # second 'page'
475
476 =item -t HBthec (context searching)
477
478 The C<-t> option allows you to search for words that exist only
479 in specific HTML tags. Each character in the string you
480 specify in the argument to this option represents a
481 different tag in which to search for the word. H means all HEAD
482 tags, B stands for BODY tags, t is all TITLE tags, h is H1
483 to H6 (header) tags, e is emphasized tags (this may be B, I,
484 EM, or STRONG), and c is HTML comment tags
485
486 search only in header (<H*>) tags
487
488 swish-c -w word -t h
489
490 =item -d *string* (delimiter)
491
492 Set the delimiter used when printing results. By default, Swish-e separates the output fields by a
493 space, and places double-quotes around the document title. This output may be hard to parse, so it
494 is recommended to use C<-d> to specify a character or string used as a separator between fields.
495
496 The string C<dq> means "double-quotes".
497
498 swish-e -w word -d , # single char
499 swish-e -w word -d :: # string
500 swish-e -w word -d '"' # double quotes under Unix
501 swish-e -w word -d \" # double quotes under Windows
502 swish-e -w word -d dq # double quotes
503
504 The following control characters may also be specified: C<\t \r \n \f>.
505
506 =item -P *character*
507
508 Sets the delimiter used for phrase searches. The default is double quotes C<">.
509
510 Some examples under bash: (be careful about you shell metacharacters)
511
512 swish-e -P ^ -w 'title=^words in a phrase^'
513 swish-e -P \' -w "title='words in a pharse"'
514
515
516 =item -p *property1 property2 ...* (display properties)
517
518 This causes swish to print the listed property in the search results. The properties
519 are returned in the order they are listed in the C<-p> argument.
520
521 Properties are defined by the B<ProperNames> directive in the configuration file (see L<SWISH-CONFIG|SWISH-CONFIG>)
522 and properties must also be defined in B<MetaNames>. Swish stores the text of the meta name as a I<property>, and
523 then will return this text while searching if this option is used.
524
525 Properties are very useful for returning data included in a source documnet without having to re-read
526 the source document while searching. For example, this could be used to return a short document description.
527 See also see B<Document Summeries> and L<PropertyNames|SWISH-CONFIG/"item_PropertyNames"> in L<SWISH-CONFIG|SWISH-CONFIG>.
528
529 To return the subject and category properties while indexing.
530
531 swish-e -w word -p subject category
532
533 Properties are returned in double quotes. If a property contains a double quote it is HTML escaped (&quot;).
534 See the C<-x> switch for a more advanced method of returning a list of properties.
535
536
537 NOTE: it is necessary to have indexed with the proper
538 PropertyNames directive in the user config file in order to
539 use this option.
540
541 =item -s *property [asc|desc] ...* (sort)
542
543 Normally, search results are printed out in order of relevancy, with the most relevant listed first.
544 The C<-s> sort switch allows you to sort results in order of a specified I<property>, where a I<property>
545 was defined using the B<MetaNames> and B<PropertyNames> directives during indexing
546 (see L<SWISH-CONFIG|SWISH-CONFIG>).
547
548 The string passed can include the strings C<asc> and C<desc> to specify the sort order, and more than
549 one property may be specified to sort on more than one key.
550
551 Examples:
552
553 sort by title property ascending order
554
555 -s title
556
557 sort descending by title, ascending by name
558
559 -s title desc name asc
560
561 =item -L limit to a range of property values (Limit)
562
563 B<This is an experimental feature!>
564
565 The C<-L> switch can be used to limit search results to a range of property values
566
567 Example:
568
569 swish-e -w foo -L swishtitle a m
570
571 finds all documents that contain the word C<foo>, and where the
572 document's title is in the range of C<a> to C<m>, inclusive.
573 By default, the case of the property is ignored, but this can be
574 changed by using L<PropertyNamesCompareCase|SWISH-CONFIG/"item_PropertyNamesCompareCase">
575 configuation directive.
576
577 Limiting may be done with user-defined properties, as well.
578
579 For example, if you indexed documents that contain a created timestamp in a meta tag:
580
581 <meta name="created_on" content="982648324">
582
583 Then you tell Swish that you have a property called C<created_on>, and that
584 it's a timestamp.
585
586 PropertyNamesDate created_on
587
588 After indexing you will be able to limit documents to a range of timestamps:
589
590 -w foo -L created_on 946684800 949363199
591
592 will find documents containing the word foo and that have a created_on
593 date from the start of Jan 1, 2000 to the end of Jan 31, 2000.
594
595 Note: swish currently does not parse dates; Unix timestamps must be used.
596
597 Two special formats can be used:
598
599 -L swishtitle <= m
600 -L swishtitle >= m
601
602 Finds titles less than or equal, or grater than or equal to the letter C<m>.
603
604 This feature will not work with C<swishrank> or C<swishdbfile> properties.
605
606 This feature takes advantages of the pre-sorted tables built by swish during indexing to
607 make this feature fast while searching.
608 You should see in the indexing output a line such as:
609
610 6 properties sorted.
611
612 That indicates that six pre-sorted tables were built during indexing.
613 By default, all properties are presorted while indexing.
614 What properties are pre-sorted can be controlled by the configuration parameter C<PreSortedIndex>.
615
616 Using the C<-L> switch on a property that was not pre-sorted will still work, but may be I<much>
617 slower during searching.
618
619 This is an experimental feature, and its use and interface are subject to change.
620
621 =item -x formatstring (extended output format)
622
623 The C<-x> switch defines the output format string.
624 The format string can contain plain text and property names (including swish-defined internal property names)
625 and is used to generate the output for every result.
626 In addition, the output format of the property name can be controlled with C-like printf format strings.
627 This feature overrides the cmdline switches C<-d> and C<-p>,
628 and a warning will be generated if C<-d> or C<-p> are used with C<-x>.
629
630 For example, to return just the title, one per line, in the search results:
631
632 swish-e -w ... -x '<swishtitle>\n' ...
633
634 Note: the C<\n> may need to be protected from your shell.
635
636 See also L<ResultExtFormatName|SWISH-CONFIG/"item_ResultExtFormatName"> for a way to define I<named>
637 format strings in the swish configuration file.
638
639 B<Format of "formatstring":>
640
641 "text<propertyname>text<propertyname fmt=propfmtstr>text..."
642
643
644 Where B<propertyname> is:
645
646 =over 4
647
648 =item *
649
650 the name of a user property as specified with the config file
651 directive "PropertyNames"
652
653 =item *
654
655 the name of a swish Auto property (see below). These properties are
656 defined automatically by swish -- you do not need to specify them
657 with PropertyNames directive. (This may change in the future.)
658
659 =back
660
661 propertynames must be placed within "E<lt>" and "E<gt>".
662
663 B<User properties:>
664
665 Swish-e allows you to specify certain META tags within your documents that can be used as B<document properties>.
666 The contents of any META tag that has been identified as a document property can be returned as
667 part of the search results. Doucment properties must be defined while indexing using the B<PropertyNames>
668 configuration directive (see L<SWISH-CONFIG|SWISH-CONFIG/"item_PropertyNames">).
669
670 Examples of user-defined PropertyNames:
671
672 <keywords>
673 <author>
674 <deliveredby>
675 <reference>
676 <id>
677
678
679 B<Auto properties:>
680
681 Swish defines a number of "Auto" properties for each document indexed.
682 These are available for output when using the C<-x> format.
683
684 Name Type Contents
685 -------------- ------- ----------------------------------------------
686 swishreccount Integer Result record counter
687 swishtitle String Document title
688 swishrank Integer Result rank for this hit
689 swishdocpath String URL or filepath to document
690 swishdocsize Integer Document size in bytes
691 swishlastmodified Date Last modified date of document
692 swishdescription String Description of document (see:StoreDescription)
693 swishdbfile String Path of swish database indexfile
694
695 The Auto properties can also be specified using shortcuts:
696
697 Shortcut Property Name
698 -------- --------------
699 %c swishreccount
700 %d swishdescription
701 %D swishlastmodified
702 %I swishdbfile
703 %p swishdocpath
704 %r swishrank
705 %l swishdocsize
706 %t swishtitle
707
708 For example, these are equivalent:
709
710 -x '<swishrank>:<swishdocpath>:<swishtitle>\n'
711 -x '%r:%p:%t\n'
712
713 Use a double percent sign "%%" to enter a literal percent sign in the output.
714
715
716 B<Formatstrings of properties:>
717
718 Properties listed in an C<-x> format string can include format control strings.
719 These "propertyformats" are used to control how the contents of the associated property are printed.
720 Property formats are used like C-language printf formats.
721 The property format is specified by including the attribute "fmt" within the property tag.
722
723 Format strings cannot be used with the "%" shortcuts described above.
724
725 General syntax:
726
727 -x '<propertyname fmt="propfmtstr">'
728
729 where C<subfmt> controls the output format of C<propertyname>.
730
731 Examples of property format strings:
732
733 date type: <swishlastmodified fmt="%d.%m.%Y">
734 string type: <swishtitle fmt="%-40.35s">
735 integer type: <swishreccount fmt=/%8.8d/>
736
737 Please see the manual pages for strftime(3) and sprintf(3) for an explanation of
738 format strings. Note: some versions of strftime do not offer the %s format string
739 (number of seconds since the Epoch), so swish provides a special format string "%ld"
740 to display the number of seconds since the Epoch.
741
742 The first character of a property format string defines the delimiter for the format string.
743 For example,
744
745 -x "<author fmt=[%20s]> ...\n"
746 -x "<author fmt='%20s'> ...\n"
747 -x "<author fmt=/%20s/> ...\n"
748
749
750 B<Standard predefined formats:>
751
752 If you ommit the sub-format, the following formats are used:
753
754 String type: "%s" (like printf char *)
755 Integer type: "%d" (like printf int)
756 Float type: "%f" (like printf double)
757 Date type: "%Y-%m-%d %H:%M:%S" (like strftime)
758
759
760 B<Text in "formatstring" or "propfmtstr":>
761
762 Text will be output as-is in format strings (and property format strings).
763 Special characters can be escaped with a backslash.
764 To get a new line for each result hit, you have to include
765 the Newline-Character "\n" at the end of "fmtstr".
766
767 -x "<swishreccount>|<swishrank>|<swishdocpath>\n"
768 -x "Count=<swishreccount>, Rank=<swishrank>\n"
769 -x "Title=\<b\><swishtitle>\</b\>"
770 -x 'Date: <swishlastmodified fmt="%m/%d/%Y">\n'
771 -x 'Date in seconds: <swishlastmodified fmt=/%ld/>\n'
772
773 B<Control/Escape charcters:>
774
775 you can use C-like control escapes in the format string:
776
777 known controls: \a, \b, \f, \n, \r, \t, \v,
778 digit escapes: \xhexdigits \0octaldigits
779 character escapes: \anychar
780
781 Example,
782
783 swish -x "%c\t%r\t%p\t\"<swishtitle fmt=/%40s/>\"\n"
784
785 B<Examples of -x format strings:>
786
787 -x "%c|%r|%p|%t|%D|%d\n"
788 -x "%c|%r|%p|%t|<swishdate fmt=/%A, %d. %B %Y/>|%d\n"
789 -x "<swishrank>\t<swishdocpath>\t<swishtitle>\t<keywords>\n
790 -x "xml_out: \<title\><swishtitle>\>\</title\>\n"
791 -x "xml_out: <swishtitle fmt='<title>%s</title>'>\n"
792
793 =item -H [0|1|2|3|<n>] (header output verbosity)
794
795 The C<-H n> switch generates extened I<header> output. This is most useful when searching more than one
796 index file at a time, either by specifying more than one index file with the C<-f> switch, or when searching
797 a merged index file. In these cases, C<-H 2> will generate a set of headers specific to each index file.
798 This gives access to the settings used to generate each index file.
799
800 Even when searching a single index file, C<-H n> will provided additional information about the index file,
801 how it was indexed, and how swish is interperting the query.
802
803 -H 0 : print no header information, output only search result entries.
804 -H 1 : print standard result header (default).
805 -H 2 : print additional header information for each searched index file.
806 -H 3 : enhanced header output (e.g. print stopwords).
807 -H 9 : print diagnostic information in the header of the results (changed from: C<-v 4>)
808
809
810 =back
811
812
813 =head1 OTHER SWITCHES
814
815 =over 4
816
817 =item -V (version)
818
819 Print the current version.
820
821 =item -k *letter* (print out keywords)
822
823 The C<-k> switch is used for testing and will cause swish to print out all keywords
824 in the index beginning with that letter. You may enter C<-k '*'> to generate a list of all words indexed
825 by swish.
826
827 =item -D *index file* (debug index)
828
829 The -D option is no longer supported in version 2.2.
830
831 =item -T *options* (trace/debug swish)
832
833 The -T option is used to print out information that may be helpful when debugging swish-e's
834 operation. This option replaced the C<-D> option of previous versions.
835
836 Running C<-T help> will print out a list of available *options*
837
838
839 =back
840
841 =head1 Merging Index Files
842
843 At times it can be useful to merge different index files into one file for searching.
844 This could be because you want to keep separate site indexes and a common one for a global search, or
845 because your site is very large and Swish-e runs out of memory if you try to index it directly.
846
847 You can only merge only indexes that were indexed with common settings
848 (e.g. don't mix stemming and non-stemming indexes, or indexes with different WordCharacter settings, etc.).
849
850 usage: swish-e [-v (num)] [-c file] -M index1 index2 ... outputfile
851
852 Due to the structure of the swish-e index, merging may or may not require less memory than indexing
853 all files at one time.
854
855
856 =over 4
857
858 =item -M *file file ...* (merge)
859
860 This allows you to merge two or more index files - the last file you specify on the
861 list will be the output file.
862
863 Merging removes all redundant file and word data. To estimate how much memory the operation will need,
864 sum up the sizes of the files to be merged and divide by two.
865 That's about the maximum amount of memory that will be used.
866
867 You can use the C<-v> option to produce feedback while merging and the C<-c> option with a
868 configuration file to include new administrative information in the new index file.
869
870 =item -c *configuration file*
871
872 Specify a configuration file while indexing to add administrative information to the output index file.
873
874 =back
875
876 =head1 Document Info
877
878 $Id: SWISH-RUN.pod,v 1.23 2002/08/22 22:58:39 whmoseley Exp $
879
880 .

  ViewVC Help
Powered by ViewVC 1.1.22