Plan 9 from Bell Labs’s /usr/web/sources/contrib/steve/root/sys/src/cmd/seft/README

Copyright © 2021 Plan 9 Foundation.
Distributed under the MIT License.
Download the Plan 9 distribution.


============================
seft (search engine for text)
============================

Usage: seft [OPTIONS] "query terms" text_files

Seft takes a set of query terms and a set of files as arguments and, using
a locality-based similarity heuristic, determines word locations within the
files that are of interest with respect to the query.  The user is then
presented with a sequence of windows of text, the first window surrounding
the most relevant location, the second window surrounding the next most
relevant location and so on.  Both the number of windows presented and the
size of the window can be specified as parameters to seft.  In addition,
the user can specify whether to apply case-folding and/or stemming to the
query terms and the text files.

[OPTIONS]

    -f query_file       A text file containing query terms
    -m max_windows      Specifies the maximum number of windows to display
                        (default = 5)
    -w window_size      Specifies the number of lines within a window
                        (default = 3)
    -x                  Turns off high-lighting of query term locations
    -n                  Suppress output
    -s [0|1|2]          0 = casefolding off, stemming off
                        1 = casefolding on,  stemming off
                        2 = casefolding on,  stemming on
                        (default = 2)
    -p                  Print a formfeed character after every window.
                        Useful when piping output through a pager such as
                        more.

Examples of usage:
------------------

Consider that the text file Query has the contents "computer industry" 
then the following seft examples have the same meaning:


seft -f Query ~oldk/News/*

seft "computer industry" ~oldk/News/*

These commands would have the effect of searching through a users News folder
for articles relating to "computer" and "industry", and returning
windows of text surrounding the most relevant locations of text.

Window merging:
---------------

If highly ranked query locations lie in close proximity, then it
is likely that seft would display either windows which contain 
the same contents (the highly ranked query terms exist on the same 
line) or windows which partially overlap.  To avoid this, the current
version of seft does not display windows whose centre line has already
been displayed (anywhere) within a previous window.


Further work:
-------------

    - Piping from stdin:
      
        The current implementation of seft does not allow the text files
        (to be searched) to be piped into seft, as in:

        cat ~oldk/News/* | seft -q "computer industry"
       
    - Document delimiters:

        Currently, the character "^B" is used as a document delimiter.
        Such a delimiter could be set as a command line argument or set
        in a resource file.

References:
-----------

    A detailed discussion of seft can be found in:

    @inproceedings{dm00:acsc,
        author = "O. de~Kretser and A. Moffat",
        title = "Needles and Haystacks: A Search Engine for Personal
                 Information Collections",
        booktitle = "Proc. 23rd Australasian Computer Science Conference",
        year = 2000,
        note = "To appear",
    }

    The locality-based ranking heuristic used by seft is described in:

    @inproceedings{dm99:adc,
        author = "O. de~Kretser and A. Moffat",
        title = "Effective Document Presentation with a Locality-Based
                 Similarity Heuristic",
        booktitle = "Proceedings of the Twenty-Second Annual International
                     ACM-SIGIR Conference on Research and Development in
                     Information Retrieval",
        pages = "113-120",
        month = aug,
        year = 1999,
        address = "San Francisco, CA",
        editor = "M. Hearst and F. Gey and R. Tong",
    }

Owen de Kretser, 1/2000

Bell Labs OSI certified Powered by Plan 9

(Return to Plan 9 Home Page)

Copyright © 2021 Plan 9 Foundation. All Rights Reserved.
Comments to [email protected].