Onix Text Retrieval Toolkit
API Reference

API
Function List
Topical List

Main Index

Introduction
Calling Sequences
Query Processing
Relevancy Ranking
Data Types
Error Handling
   
Support
   
Lextek Products
Onix Text Retrieval Engine
Lextek Document Profiler & Categorizer
Brevity Document Summarizer
RouteX Routing Engine
Language Identifier

Onix Manual


You are looking at the manual of one of the fastest full text indexing engines available.  Onix is now in use by a variety of document management and imaging systems as well as webcrawlers.  Users of Onix have been amazed at its fast indexing speeds as well as its flexibility and ease with which it can be integrated into projects.  Onix can efficiently manage small text databases of 10-20K or large text databases approaching 1 Terabyte with ease.  Onix was designed with flexibility and efficiency in mind and we hope these efforts pay off as you integrate Onix into your projects.

There are two different APIs for Onix.  One is designed at small CD-ROM publishing projects and the other is aimed at text databases which are either of very large size or require periodic updating of the index.  Each has its own license and set of functionality.  As you integrate Onix, if there is some functionality which you would like to see or if there are some functions which would make your job easier, please let us know and we will see what we can do to accommodate your request.

Please be sure to read the following sections before you start to integrate Onix as they will help answer the most immediate of your questions as well as give you valuable guidance along the way:

About Words
About Records
Sample Calling Sequences
Support
 
 

About Words

Onix allows you to define what a word is in the text which you index.  A word can be composed of, for example, simply a sequence of characters a-z or, it can contain upper ASCII/ANSI extended characters or it can be Unicode, or any other sequence of binary data.  Onix is totally character set independent.  During the indexing process you simply need to specify what the binary data is and how long it is and Onix will do the rest.  Some people want to "normalize" words before they are indexed.  A common way to do this is to use a stemming algorithm which normalizes all forms of a word into a standardized form -- which may or may not be a real word.  Onix currently has the Porter Stemmer as part of its toolkit for the English language.  The Porter stemming algorithm is considered by many to be one of the best stemming algorithms developed.  Stemming has its share of advantages and disadvantages and only your application will dictate whether it is best to stem words before you index them.  Keep in mind however, that if you stem a word before you index it, you must stem the search terms before you conduct a search.  For a more detailed discussion on stemming, see the documentation for ixStemEnglishWord().
 

About Records

Onix expects you to divide up the text you index into "records" during indexing.  A record is like the page from a book.  Just as a book's index refers to the pages in which a word occurs, the indexes which Onix generates refer to which "records" a specific word appears in.  A record of text can be just about any size.  It can be only a sentence or two in length (such as a Verse from the Bible or Koran) or it may be a paragraph such as you might want to do with a piece of literature or, a record may be a whole file as you might want to do with a document management system or a web crawler.

Choosing how large a record should be is an important choice when building your application -- though usually the circumstances make it fairly easy to decide how to divide up your text.  Remember, certain operators such as the boolean operators "AND", "OR", and "NOT" operate on the record level returning which records match your boolean expression.  The smaller a record is, the more specific your index will be, the larger a record is, the less specific a index will be.  Furthermore if file size is a consideration, the smaller your records, the larger your index will be and the larger the records, the smaller your index will be.

During the indexing phase, it is common for you to write a pointer file which stores information on how to find the record indexed.  This pointer file can be as simple as a list of 4 byte integers specifying the offset into a file a record begins or it can be as complex as two different files -- one specifying a variable length field (such as a file name) and the other specifying how far into the other pointer file the variable length field begins.  It all depends on your application as to what you want to store in your pointer file and how. Onix has some of this logic built into it which you can take advantage of optionally.  There are three functions which you can use to store this pointer file into your index and retrieve the relevant record information.  These are:  ixStoreRecordData(), ixRetrieveRecordData() and ixRetrieveMoreRecordData().  This information is stored optionally and if you want to take advantage of this functionality, you will need to create the index with the function ixCreateIndexEx() which gives you more control over the index creation process than you would with ixCreateIndex().

So you can visualize how it all fits together, please look at the diagram below:     The user submits a query to the index.  The index returns a list of record numbers which match the query.  The pointer table is then consulted to find out where the record text is located.  Then the text itself is retrieved and (usually) displayed to the end user.
 

Sample Calling Sequences

These calling sequences should help get you up and going as well as to demonstrate how some of the function calls for Onix are used.
 

Creating and opening then closing an  Index

ixCreateIndexManager()
ixCreateIndex()
ixOpenIndex()
ixCloseIndex()
ixDeleteIndexManager()
 
 

Opening an Index and Indexing a series of files

Each file is one record

ixCreateIndexManager()
ixOpenIndex()
ixStartIndexingSession()

while(There are still files to index) {

    for (Every Word In Document) {
        ixIndexWord()
    }
    
    // Optionally store data for the record  
    // (In this case the file name)

    ixStoreRecordData(FileName)

    if(There are more files to index) {
         ixIncrementRecord()
    }
}

ixEndIndexingSession();
ixCloseIndex();
 
 

Opening an Index and Conducting a Query

ixOpenIndex()
ixStartRetrievalSession()
printf("\nWhat would you like to search for? :");
gets(QueryString);
ixConvertQuery()
QueryResults = ixProcessQuery()
// Now you can use the following six functions to navigate the results
ixNumHits()
ixCurrentHit()
ixNextHit()
ixPreviousHit()
ixNextRecord()
ixPreviousRecord()
// You can also use the following two functions if you have stored record data in your index.
ixRetrieveRecordData()
ixRetrieveMoreRecordData()
ixEndRetrievalSession()
ixCloseIndex()
 
 

Deleting a Record

Note: You have to have a retrieval session in progress to delete a record.
ixOpenIndex()
ixStartRetrievalSession()
ixDeleteRecordNum()
ixEndRetrievalSession()
ixCloseIndex()
 
 

See Also

*