/[MITgcm]/mitgcm.org/devel/buildweb/pkg/swish-e/conf/example6.config
ViewVC logotype

Annotation of /mitgcm.org/devel/buildweb/pkg/swish-e/conf/example6.config

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.1.1.1 - (hide annotations) (download) (vendor branch)
Fri Sep 20 19:47:30 2002 UTC (22 years, 10 months ago) by adcroft
Branch: Import, MAIN
CVS Tags: baseline, HEAD
Changes since 1.1: +0 -0 lines
Importing web-site building process.

1 adcroft 1.1 # ----- Example 6 - Spider using "prog" feature -------
2     #
3     # Please see the swish-e documentation for
4     # information on configuration directives.
5     # Documentation is included with the swish-e
6     # distribution, and also can be found on-line
7     # at http://swish-e.org
8     #
9     #
10     # This example demonstrates how to use the
11     # new (as of 2.2) "prog" document source feature
12     # to spider a webserver.
13     #
14     # The "prog" document source feature allows
15     # an external program to feed documents to
16     # swish, one after another. This allows you
17     # to index documents from any source (e.g. web, DBMS)
18     # and to filter and adjust the content before swish
19     # indexes the content.
20     #
21     # This example uses the provided spider.pl program
22     # to spider a remote web server. This spider offers
23     # more features than the "http" spider method shown
24     # in example7.config.
25     #
26     # ** Please don't test with this exact config **
27     # spider your own web server
28     #
29     # Indexing (spidering) is started with the following
30     # command issued from the "conf" directory:
31     #
32     # swish-e -S prog -c example6.config
33     #
34     # Note: You should have the current Bundle::LWP bundle
35     # of perl modules installed. This was tested with:
36     # libwww-perl-5.53
37     # Run "perldoc spider.pl" in the prog-bin directory for
38     # more information.
39     #
40     # ** Do not spider a web server without permission **
41     #
42     #---------------------------------------------------
43    
44     # Include our site-wide configuration settings:
45    
46     IncludeConfigFile example4.config
47    
48     # Specify the program to run
49     IndexDir ../prog-bin/spider.pl
50    
51    
52     # When running under the "prog" document source method you can
53     # pass a list of parameters to the program (specified with -i or IndexDir).
54    
55     # If a parameter is passed to spider.pl, it will use that as the configuration
56     # file.
57    
58     # As a special case, the word "default" followed by URL(s).
59     # In this case the spider will use default settings to spider the provided URLs.
60    
61     SwishProgParameters default http://swish-e.org
62    
63     # Note: the default used by spider.pl is SwishSpiderConfig.pl.
64     # See prog-bin/SwishSpiderConfig.pl for examples
65     # that include filtering PDF and MS Word documents.
66    
67     # Tell swish that about how to parse the content
68     DefaultContents HTML
69     IndexContents HTML .htm .html
70     IndexContents TXT .txt .conf
71    
72    
73    
74     # Just to make it interesting, let's modify the URL that get's indexed:
75     # replace http://swish-e.org/ => http:/localhost/
76    
77     ReplaceRules replace swish-e.org localhost
78    
79    
80     # end of example

  ViewVC Help
Powered by ViewVC 1.1.22