Difference between revisions of "Table Segmentation in Content Manager OnDemand"

m
Minor updates to IBM CMOD Table Segmentation article.
(Added explanation for how tweaking segment size works.)
m (Minor updates to IBM CMOD Table Segmentation article.)
Line 1: Line 1:
== Introduction ==
== Introduction ==


In order to keep database queries fast, OnDemand uses a concept called "database table segmentation".    The term 'table segmentation' refers to splitting extremely large tables into 'segments' of smaller tables for performance and/or ease of maintenance.  Although the latest versions of database engines can do this natively, at the time CMOD was created, there was no built-in support for this functionality, so it still uses the old style segmentation to achieve the scalability and speed that customers require.
In order to keep database queries fast, OnDemand uses a concept called "database table segmentation".    The term 'table segmentation' refers to splitting extremely large tables into 'segments' of smaller tables for performance and/or ease of maintenance.  Although the latest versions of database engines can do this natively, at the time Content Manager OnDemand was created, there was no built-in support for this functionality, so it still uses the old style segmentation to achieve the scalability and speed that customers require.


When an end user performs a search, CMOD performs the search on one or more tables, based on the date range contained in individual tables.  This 'date range' is called the 'segment date'.
When an end user performs a search, IBM CMOD performs the search on one or more tables, based on the date range contained in individual tables.  This 'date range' is called the 'segment date'.


Before DB2 supported its own table segmentation natively, the Content Manger OnDemand developers decided to split index data into tables of 10 million rows each. Using this method keeps search performance linear, as only the tables containing documents in the date range you're looking for ( for example, 3 months, or 1 year) are actually searched.
Before DB2 supported its own table segmentation natively, the Content Manger OnDemand developers decided to split index data into tables of 10 million rows each. Using this method keeps search performance linear, as only the tables containing documents in the date range you're looking for ( for example, 3 months, or 1 year) are actually searched.
Line 23: Line 23:
=== How does optimizing segment size work? ===
=== How does optimizing segment size work? ===


Let's say you're a large Content Manager OnDemand customer.  You have 40 million customers, and about 25 million receive an invoice every month.  When you load those invoices into CMOD with a default 'Max Rows' value of 10 million, you probably create two or three tables a month.  But you've been in business for 5 years already.  It's not unreasonable that you might have more than 100 tables full of customer invoice data.  If an employee searches for a customer's invoices over a range of the last 12 months by default, the database must search between 24 and 36 individual tables to satisfy a single request.  Most of the tables that the databases searches won't find anything. When you multiply the work required to satisfy ONE request across an entire company, the default 'Max Rows' parameter can cause an OnDemand server to quickly become overloaded.
Let's say you're a large IBM Content Manager OnDemand user.  You have 40 million customers, and about 25 million receive an invoice every month.  When you load those invoices into IBM CMOD with a default 'Max Rows' value of 10 million, you probably create two or three tables a month.  But you've been in business for 5 years already.  It's not unreasonable that you might have more than 100 tables full of customer invoice data.  If an employee searches for a customer's invoices over a range of the last 12 months by default, the database must search between 24 and 36 individual tables to satisfy a single request.  Somewhere between 16 and 24 tables that are searched won't find anything at all! When you multiply the work required to satisfy ONE request across an entire company, the default 'Max Rows' parameter can cause an OnDemand server to quickly become overloaded.


In the same situation, when Content Manager OnDemand is configured with an optimized segment size, one search hits 12 tables, and each table returns a result -- a much better work-to-result ratio, which requires less disk-related I/O and less CPU time.  It's better in almost every way.
In the same situation, when Content Manager OnDemand is configured with an optimized segment size, one search hits 12 tables, and each table returns a result -- a much better work-to-result ratio, which requires less disk-related I/O and less CPU time.  It's better in almost every way.