Profiling Engine SDK
API Reference

API
Function List
Topical List
   
 
Query Language
Introduction
Summary
Operators
Tips, Questions, and Answers
   
 
Main Index
Index
Tutorial
API Functions
Query Language
   
Technology Overview
   
Contact Us
   
 
Other Products

Onix Text Search and Retrieval Engine
Brevity Document Summarizer
Lextek Document Profiler & Categorizer
RouteX Document Routing Engine
Lextek Language Identifier
 

Application Programming Interface

This is the manual for the Lextek Profiling Engine's API. We have separated the discussions of the API and the query language. The API consists of those calls you use in your own program to integrate the toolkit into your project. The query language is the internal language that the Profiling Engine uses to analyze the text. You can think of the query language as an interpreted language that specializes in doing document analysis.

In learning how to integrate the SDK into your project, we suggest reading through the Tutorial. It briefly goes through using the API and then goes through using the query language to analyze documents. After you've looked at the Tutorial, we suggest reading through the API Function Reference. This reference lists each function in the API and describes what it does, how to use it, and lists any arguments that the function takes. Finally we suggest reading through the Query Language Manual. This includes an overview of the query language itself and a reference for all the commands in the language.

 

About the API

The API provides several easy to use function calls for creating and deleting the Profiling Engine, for indexing documents, and for querying the document. The following are the functions available in the API.

prCreateProfilingEngine
prDeleteProfilingEngine
prDeleteResultVector
prEndZone
prGetErrorMsg
prGetWordVector
prIncrementRecord
prIndexWord
prIndexWordSpecial
prNumHits
prProcessQuery
prResetProfiler
prResetProfilerQueryParser
prSetParams
prSortVector
prStartZone
prVectorCurrentHit
prVectorNextHit
prVectorNextRecord

The main component of the Profiling Engine is a profiling object. This object is a variable that is declared to be of type ProfilingEngineT. The object is used to coordinate all your calls to the Profiling Engine and is passed to nearly every function. Before you can use the Profiling Engine you must create one of the objects using prCreateProfilingEngine. When you have finished using the Profiling Engine you must delete the object using prDeleteProfilingEninge.

To index a document you continually call prIndexWord. You determine what words are sent to the indexer. This lets you eliminate stop-words or even index only those words that you use in your queries. Further it lets you have as much control over the indexing process as you wish. The Profiling Engine breaks all indexes up into series of records. Records are the default area over which all query operators work. For instance you can find two words that occur in the same record. To start adding words to a new record you simply call prIncrementRecord. Where you put record breaks is at your discretion. While the most common use of the Profiling Engine is to start a new record with each new document, many people break on document sections or paragraphs. When you have finished with a collection of records you can reset the profiler and start adding new documents.

After you've added words to the index you can begin to analyze your results. You perform a query by passing a string to prProcessQuery. This returns a list of "hits" called a vector in an object of type OnixQueryVectorT. Each hit consists of the location of the hit (its word number and record number) and the weight or rank of that hit, You can iterate through all the hits in a vector with a few simple function calls. Because queries can define persistent named queries called functions, there is also a function for clearing all the information in the query processor. The typical use of the Profiling Engine is to load large queries that define categories prior to indexing any document. As you load documents you then process queries that make use of those categories - in effect testing each document against these preloaded queries. This is rather different from most indexes where you load all the documents first and then test queries against the documents.