Brevity Document Summarizer Toolkit
Main Functions Types Errors Demo Home

Function Index

brCreateSummarizer Creates the Brevity Summarizer object. This is used by all other Brevity functions and must be called before calling any other Brevity function.
brDeleteSummarizer Deletes the Brevity Summarizer object. Call this when you are finished using Brevity.
brSetDictionary Selects the dictionary that Brevity uses to determine what is significant within your text. This dictionary can be one of the ones supplied with Brevity or a dictionary generated from documents similar to the documents you wish to summarize.
brSummarizeFile Summarizes an ascii text file. This will not work properly with files with embedded tags or formatting codes.
brSummarizeBuffer Summarizes text passed in a buffer.
brGetSummary Returns the summary of your text as a text buffer. You can specify the size of the summary by either the number of words or number of characters.
brGetOffsets Returns the summary of your text as a series of offsets into your original text. Use this if you wish to use Brevity to highlight significant sentences in your text.

Toolkit Overlook

The Brevity toolkit is designed to be both flexible and simple to use. As you can see from the list of functions, it is very easy to integrate Brevity into your project. The main two functions you will use are brCreateSummarizer and brDeleteSummarizer. These create and delete the Brevity object. Whenever you start or finish using the summarizer you must call these functions. Brevity will not work without them.

What is significant within a document depends upon the type of documents you are looking at and the type of information you are looking for. Brevity works by comparing a document to a set of similar documents. For instance if you were summarizing a news feed of political news you would wish to compare your text to political news stories not physics papers. Likewise lawyers summarizing legal papers would wish to compare their paper to other legal documents of the same type. Brevity stores this document information in a Summary Dictionary. We supply several dictionaries with Brevity. These are dictionaries designed for general categories of documents. For more specialized needs we supply a utility that will generate a dictionary from a collection of documents you supply. To specify the dictionary Brevity will use call brSetDictionary.

There are two ways to summarize text. You can tell Brevity to either summarize a data file on disk or pass Brevity a buffer to text stored in memory. This enables you to decide what is easiest for your particular project. In most cases it is easiest to simply pass Brevity a memory buffer. Be sure that the text you pass Brevity has the formatting removed from it first.

Brevity can return your summary in two different ways. The easiest and most common way is to have Brevity generate a paragraph of text that it returns in a buffer. The summary can be as short or as long as you wish. How long you wish to make the summary depends upon the types of texts you are summarizing and their length. In general you might wish to start with a summary of 200 words and adjust this up or down based upon your own particular data. After you have decided what meets your needs best you can hard-code it into your code. We find that for news stories as found on the Internet or in newspapers that a length of about 100 words works great. For technical articles as found in most journals a length of 200 words is best.

The second way Brevity can summarize your text is by returning a series of offsets into your text. Each offset is the location of a significant sentence in your text. You can use this function to highlight sentences in your document that your users may think of as significant. You might then allow the user to click on the sentence to go to that sentence in the original text. This allows you to not only summarize a text but allow the user to move within your original document. This also allows you to retrieve formatting information from your original text for display, if necessary.

 

Previous Main Next Home

Copyright 2000 Lextek International