Difference between revisions of "IBM CMOD Composite Indexes"

Jump to navigation Jump to search
m
Several updates and added section on CMOD Multiple Field Index cost in terms of I/O bandwidth.
m (Added more analogies about CMOD index fields, and caveats for choosing which fields should be CMOD indexes.)
m (Several updates and added section on CMOD Multiple Field Index cost in terms of I/O bandwidth.)
Line 1: Line 1:
{{TOCright}}
{{TOCright}}
= Disambiguation =
= Disambiguation =
Also known as "Content Manager OnDemand Multi Field Indexes".
You might be looking for information on getting metadata from documents to be loaded into Content Manager OnDemand Applications using ACIF - see the IBM CMOD Indexing Reference:  [https://www.ibm.com/support/knowledgecenter/SSEPCD_10.5.0/com.ibm.ondemand.ir.doc/dodnt000.htm v10.5]   [https://cmod.wiki/dox/CMODv10.1/IndexingReference.pdf v10.1]   [https://cmod.wiki/dox/CMODv9.5/IndexingReference.pdf v9.5]   [https://cmod.wiki/dox/CMODv9.0/IndexingReference.pdf v9.0]   [https://cmod.wiki/dox/CMODv8.5/Indexing_Reference.pdf v8.5]   [https://cmod.wiki/dox/CMODv8.4/Indexing_Reference.pdf v8.4]
You might be looking for information on getting metadata from documents to be loaded into Content Manager OnDemand Applications using ACIF - see the IBM CMOD Indexing Reference:  [https://www.ibm.com/support/knowledgecenter/SSEPCD_10.5.0/com.ibm.ondemand.ir.doc/dodnt000.htm v10.5]   [https://cmod.wiki/dox/CMODv10.1/IndexingReference.pdf v10.1]   [https://cmod.wiki/dox/CMODv9.5/IndexingReference.pdf v9.5]   [https://cmod.wiki/dox/CMODv9.0/IndexingReference.pdf v9.0]   [https://cmod.wiki/dox/CMODv8.5/Indexing_Reference.pdf v8.5]   [https://cmod.wiki/dox/CMODv8.4/Indexing_Reference.pdf v8.4]


Line 6: Line 8:
In IBM Content Manager OnDemand, an Application Group's field type can be set to Filter or Index.  Setting the type to 'Index' tells CMOD that it should instruct the database to build an index for that field.  Setting the value to Filter means that the field is used to filter the results after the index search is completed.
In IBM Content Manager OnDemand, an Application Group's field type can be set to Filter or Index.  Setting the type to 'Index' tells CMOD that it should instruct the database to build an index for that field.  Setting the value to Filter means that the field is used to filter the results after the index search is completed.


Database indexes make access to data faster by storing a value with the location of the data in the index -- similar to how the index at the back of a reference book works.  The advantage is that if the data being indexed is mostly unique ("high cardinality") the indexes can be used to make searches very fast.  In the OnDemand Application Group configuration window, in the Field Information tab, fields that are set to 'Index' should also have the 'Required' checkbox selected in the Folder configuration, Field Information Tab.  This forces end users to enter a value into the field in order to perform a search - reducing the potential for end users to submit queries that will perform poorly (consuming a disproportionate amount of server I/O bandwidth, RAM, and CPU), as well as reducing the possibility that one of these 'rogue queries' will impact the performance of other user's searches or retrievals.
Database indexes make access to data faster by storing a list of metadata stored inside the database table alongside the location of the data in the index -- similar to how the index at the back of a reference book works.  The advantage is that if the data being indexed is mostly unique ("high cardinality" is the term to use when talking to your Database Administrator) the indexes can be used to make searches very fast.  In the OnDemand Application Group configuration window, in the Field Information tab, fields that are set to 'Index' should also have the 'Required' checkbox selected in the Folder configuration, Field Information Tab.  This forces end users to enter a value into the field in order to perform a search - reducing the potential for end users to submit queries that will perform poorly (consuming a disproportionate amount of server I/O bandwidth, RAM, and CPU), as well as reducing the possibility that one of these 'rogue queries' will impact the performance of other user's searches or retrievals.


For those who like analogies:  Using 'Filter' Field Type is like trying to find a word by searching every page in a very long book.  Using 'Index' is like using the index at the back of a book, telling you where to find each occurrence of the word by page number.  In fact, this naming convention is used in databases -- indexes point to database pages (like pages in a book), which contain records ("words").  For this reason, you want indexes on fields that are the '''most unique''' like 'customer number' which should uniquely identify a customer account, but not 'zip code', which could encompass tens of thousands of customers in large cities.  To continue the book analogy, the index in the back of your book doesn't contain a list of the pages that the words "and", "the", "a" or "to" -- because those words are likely to appear on every page of the book, and are not just useless, but also consume a huge amount of space while being useless.
For those who like analogies:  Using 'Filter' Field Type is like trying to find a word by searching every page in a very long book.  Using 'Index' is like using the index at the back of a book, telling you where to find each occurrence of the word by page number.  In fact, this naming convention is used in databases -- indexes point to database pages (like pages in a book), which contain records ("words").  For this reason, you want indexes on fields that are the '''most unique''' like 'customer number' which should uniquely identify a customer account, but not 'zip code', which could encompass tens of thousands of customers in large cities.  To continue the book analogy, the index in the back of your book doesn't contain a list of the pages that the words "and", "the", "a" or "to" -- because those words are likely to appear on every page of the book, and are not just useless, but also consume a huge amount of space while being useless.


Composite Indexes are database indexes that are created by adding the values of two or more fields together.  If the fields selected for the composite indexes are mostly unique (high specificity or cardinality) the index may be very large, but it should be able to identify a single record -- and be extraordinarily efficient in the process.  In order to be effective at reducing the duration of queries, ALL of the fields that appear in the composite index must also be set to be 'Required' fields in the Folder configuration.
Composite Indexes are database indexes that are created by adding the values of two or more fields together.  If the fields selected for the composite indexes are mostly unique (high specificity or cardinality) the index may be very large, but it should be able to identify a single record -- and be extraordinarily efficient in the process.  In order to be effective at reducing the duration of queries, ALL of the fields that appear in the composite index must also be set to be 'Required' fields in the Folder configuration.
Finally the '''order''' that the fields appear in the OnDemand Multiple Field Index appears in is also important - the highest cardinality ("most unique") field must appear first for maximum performance.


[[File:RequiringIndexedFields.png|800px|Improving IBM CMOD Performance by Requiring Indexed Fields ]]
[[File:RequiringIndexedFields.png|800px|Improving IBM CMOD Performance by Requiring Indexed Fields ]]
Line 31: Line 35:
= Impact on Storage =
= Impact on Storage =
Composite indexes will require that you store the contents of both columns, along with a list of database page identifiers.  If your CMOD Application Group is especially large, this can cause the database tables for that specific App Group to consume anywhere from 10 to 50% more space.  Check with your Database Administrators and System Administrators to ensure that you have enough space to complete the index creation.
Composite indexes will require that you store the contents of both columns, along with a list of database page identifiers.  If your CMOD Application Group is especially large, this can cause the database tables for that specific App Group to consume anywhere from 10 to 50% more space.  Check with your Database Administrators and System Administrators to ensure that you have enough space to complete the index creation.
= Impact on I/O =
You may think to yourself, "Storage is cheap, let's index everything!" but there's different cost -- storage bandwidth.  In order for indexes to be the most effective, indexes must be '''compact''' - the fewest number of rows in the index with a relatively short list of records following after it.  Building Multi Field Indexes in CMOD with a large number of fields could effectively DOUBLE the size of the database table -- making the task of searching it just as slow as not having an index at all.

Navigation menu