IBM CMOD Composite Indexes

From CMOD.wiki
Jump to: navigation, search

Disambiguation

You might be looking for information on getting metadata from documents to be loaded into Content Manager OnDemand Applications using ACIF - see the IBM CMOD Indexing Reference: v10.5   v10.1   v9.5   v9.0   v8.5   v8.4

Background

In IBM Content Manager OnDemand, an Application Group's field type can be set to Filter or Index. Setting the type to 'Index' tells CMOD that it should instruct the database to build an index for that field. Setting the value to Filter means that the field is used to filter the results after the index search is completed.

Database indexes make access to data faster by storing a value with the location of the data in the index -- similar to how the index at the back of a reference book works. The advantage is that if the data being indexed is mostly unique ("high cardinality") the indexes can be used to make searches very fast. In the OnDemand Application Group configuration window, in the Field Information tab, fields that are set to 'Index' should also have the 'Required' checkbox selected in the Folder configuration, Field Information Tab. This forces end users to enter a value into the field in order to perform a search - reducing the potential for end users to submit queries that will perform poorly (consuming a disproportionate amount of server I/O bandwidth, RAM, and CPU), as well as reducing the possibility that one of these 'rogue queries' will impact the performance of other user's searches or retrievals.

For those who like analogies: Using 'Filter' Field Type is like trying to find a word by searching every page in a very long book. Using 'Index' is like using the index at the back of a book, telling you where to find each occurrence of the word by page number. In fact, this naming convention is used in databases -- indexes point to database pages (like pages in a book), which contain records ("words"). For this reason, you want indexes on fields that are the most unique like 'customer number' which should uniquely identify a customer account, but not 'zip code', which could encompass tens of thousands of customers in large cities. To continue the book analogy, the index in the back of your book doesn't contain a list of the pages that the words "and", "the", "a" or "to" -- because those words are likely to appear on every page of the book, and are not just useless, but also consume a huge amount of space while being useless.

Composite Indexes are database indexes that are created by adding the values of two or more fields together. If the fields selected for the composite indexes are mostly unique (high specificity or cardinality) the index may be very large, but it should be able to identify a single record -- and be extraordinarily efficient in the process. In order to be effective at reducing the duration of queries, ALL of the fields that appear in the composite index must also be set to be 'Required' fields in the Folder configuration.

Improving IBM CMOD Performance by Requiring Indexed Fields

Caveats

As mentioned above, it's important to use indexes on fields carefully, to ensure maximum benefit at minimum cost in terms of storage space and time. Also, the more Content Manager OnDemand Application Group fields that have indexing enabled, the more time and effort it takes to create, update, and maintain them, possibly slowing down the load process, and extending maintenance windows for arsmaint.

Usage

In solutions where Content Manager OnDemand must process a large number of queries in a short period of time (like tax season) creating indexes on a few single fields might not be enough to ensure that all queries are completed quickly. In these situations, IBM CMOD can be configured to have a 'composite index' - a single index which contains more than one field - like CUST_NUM+STMT_DATE to combine 'customer number' and 'statement date' data into a single index record. This means that a customer who searches for their customer number for a particular date should be able to find their statement immediately, because the index tells the database where to find the rest of their information in the database table.

Composite indexes are especially effective when building line-of-business front-ends through the ODWEK Java API, where you have strict control over the fields used to search for specific documents, like investment statements.

Image: Advanced Indexing Parameters window & configured composite index.

Selecting fields

When selecting which Application Group field to add to the composite index, the order of the fields is extremely important. Use the field that has the highest cardinality (the most unique values or, alternately the fewest repeated values) generally followed by the date. This combination allows searches to identify a specific customer's documents nearly instantly, and also zero-in on a specific date range as well. Optimizing the use of Content Manager OnDemand composite indexes requires research into the types of queries that users are performing in OnDemand. Your Database Administrator responsible for IBM CMOD will have access to utilities to help determine how best to achieve better Content Manager OnDemand performance, or you can rely on the experts at CMOD.Cloud.

Impact on Storage

Composite indexes will require that you store the contents of both columns, along with a list of database page identifiers. If your CMOD Application Group is especially large, this can cause the database tables for that specific App Group to consume anywhere from 10 to 50% more space. Check with your Database Administrators and System Administrators to ensure that you have enough space to complete the index creation.