Migrating From Filenet to Content Manager OnDemand
These are some recommendations and tips from my experiences migrating Filenet data to Content Manager OnDemand - since I'm not a Filenet admin, Filenet is outside the scope of this article -- this only deals with the OnDemand-specific tips and tricks to make your migration easier.
"Abandon all hope, ye who enter here" -- Dante's Inferno
Large migration projects are always painful. They take more time & money than anticipated (or more importantly, budgeted) because of one simple, honest truth: You are re-living all of the mistakes, problems, errors, and previously undiscovered issues with potentially decades of information. A lot of the 'institutional knowledge' surrounding specific data, reports, or use-cases is long gone. Especially frustrating are the conversations that end in "So, why did you think they did THAT ?!?!?", because it's not always obvious what the situation was when critical (and possibly bad) decisions were made.
Patience is a virtue, and any sufficiently large migration project will test your determination, and at times, your sanity. The advice below comes from one of the largest migration projects I've ever participated in.
This is the worst part of migrating between Filenet and CMOD, as some terms are used in both systems, but have different contexts and meanings. The process is complex enough without the added headaches of misunderstandings brought about by ambiguous terms.
- Document Class (aka "DocClass")
- Defines the metadata used to find individual reports.
Content Manager OnDemand Nomenclature
- Application Group (aka "App Group" or simply "AG")
- A way of combining many different reports into a single group of data, organized by business need. Accounting reports for accounting, and operations reports for your operations teams. Reports that are bundled into Application Groups need to have the same index fields, storage hierarchy, and retention (ie, expiration) handling.
- Application (aka "App")
- Defines the type of document (AFP, Line data, PDF, Image), and how to automatically collect metadata (aka 'indexes') that will be used as search criteria by end users, for storage in the database.
- Multiple Applications of any data type (ie, AFP for customer statements, Line data reports generated by a mainframe, special letters or notices in PDF format, and incoming faxes stores as TIFF images).
- The Folder in OnDemand abstracts the internal complexities of Application Groups and Applications, and presents users with the fields that they can search (which were populated into the database by the Application) and sets limits on their queries (maximum number of returned hits, fields required for searches, etc.)
Consider the following items before starting your migration. Getting it right at this stage will mean a faster, easier, cheaper transition to CMOD at the end of the day.
Converting Document Classes
IBM Content Manager OnDemand ("CMOD") has an entirely different architecture than the Filenet products. In IBM CMOD vernacular, an 'Application' is analogous to an individual report. But in CMOD, the top of the hierarchy is the 'Application Group' -- a grouping of Applications (aka 'reports') where the index fields, storage, and retention requirements are all the same. Properly defined Application Groups can have multiple Applications (again, 'reports') that belong to it. The most rational way to design Application Groups is to combine reports together that fulfill a specific business need. Human Resources reports shouldn't intermingle with Accounts Payable (even if they have the same index fields), and are kept logically separate by keeping their reports in separate Application Groups.
Quantify index usage
OnDemand doesn't like to have indexes defined in the Application Group without a corresponding value appearing in the reports it processes -- it also wastes space inside the database. It seems common in the Filenet world to assign a report to a Document Class that has indexes configured that simply don't exist anywhere in the report. Yes, you can assign default values to the empty fields to get ACIF to stop complaining, but if you want to do this right, you'll want to look into your index usage. Not just which fields you're populating most often, but also which fields your end users are searching on. Eliminating unused fields from Application Group definitions will streamline indexing, reduce storage costs, and reduce complaints from end users at the end of the day.
Perform quality checks on existing metadata
Over the years, and substantial amount of junk can accumulate inside database tables. Bad data that passes a very rudimentary check at the time it was loaded starts accumulating in databases, and is often discovered during the migration process. A common example is a bad date value -- someone meant to type in a date for 2013, but entered 2031 instead, or there's a mix-up between the month and day fields. (February 12th as 02/12 vs. December 2nd as 12/02). There needs to be an additional 'sanity check' on data before performing the migration, to identify and correct these issues before importing them into the new IBM CMOD server.
Check for field re-use
As is human nature, some shortcuts may have been taken in the distant past that are now your problem to resolve. One common annoyance is the 're-use' of a particular field for a new data type. The time saved in putting a value in a pre-existing field with an unrelated name is likely to be a recurring theme. For American companies, using Social Security Numbers ("SSNs") for customers or employee records used to be common. With the advent of large-scale identity fraud, companies ripped out SSNs from databases, and replaced them with different numbering systems. In order to save time and money, many companies simply re-used their old "Social" field with new "Employee" numbers, often causing chaos when new systems try and do any validation. Also, watch out for real metadata squeezed into generic fields, like 'Description' or 'Comment' or 'Notes'. You'll have to determine what exactly that data is, and where it truly belongs in order to complete the migration successfully.
Transfer in Original Formats
For some Filenet installations, upstream servers (or intermediate file transfer systems) convert report data (from EBCDIC to ASCII) and change the formatting of the report. IBM Content Manager OnDemand doesn't need any data transformation, and can ingest EBCDIC reports (of fixed record length, stream, or variable record lengths) directly and without conversion. Some conversion tools (I'm looking at you, MQ Series File Transfer Edition) can be configured to change the report so drastically, that IBM CMOD can't properly index it.
Wherever possible, remove any data conversion and deliver report data to OnDemand in its original format.
This means that you may require two different Applications ("Report Defintions") for each report -- one for the report in its original format (EBCDIC) and one for the converted version (ASCII). For this reason alone, you should always define an Application ID Field in EVERY Application Group.
Image Overlays for Reports
If a report has a graphic "overlay" (like, an image with boxes around columns, or shaded bars, or graphic logos) this should be documented as early in the process as possible. In order for these overlays to be displayed on all platforms, line data reports will need to be converted to AFP. This will require any overlay graphics not in AFP format to be converted -- a process which can take a considerable amount of time to complete, especially if there is not someone available to do the translation 'in-house'.
Review Report Types & Audience
There's no better time to review the contents of reports, and refer with end users to determine which reports should be stored, indexed, managed, and disposed of in the same manner. Put on your Business Analyst cap and strap on your most comfortable telephone headset, because this is the most time consuming and manual part of the whole process.
If you're using in Filenet, you'll need to extract them, and integrate them into IBM CMOD. Annotations can be added to individual documents in OnDemand at load time, through the Generic Index files.
In order to reconcile the documents after the migration is complete, you'll want to have a unique identifier for each document that needs to be moved out of FileNet and into CMOD. Thankfully, FileNet provides a unique 'Document Indentifier' or "DocID" -- which can be loaded into IBM CMOD by adding a corresponding field in the Application Group definition. The DocID field doesn't need to be added to OnDemand folders, so it can remain invisible to end users, or exposed to a different folder to be used with existing tools that use them.
Don't create unnecessary directories
When exporting data from Filenet, don't create a directory unless it's absolutely necessary. A number of empty directories will slowly drive you mad, as you try to determine if the directories are **supposed** to be empty, or if some data was lost, or if they were missed during a file copy or move operation.
Differences in Functionality
In Filenet systems, metadata fields can be blank -- including date fields. In the OnDemand world, fields are NEVER allowed to be blank -- the rationale is that you can't find a document that doesn't have all of the index values properly populated in the database. In the cases where fields are empty, there are a few options:
- Remove the field from the Application Group definition
- 'Retrofit' the index file with the missing data from another source
- Discard the metadata from Filenet and run the reports through one of the CMOD Indexers like ACIF.
PDFs are indexed differently in CMOD than they are in Filenet. Filenet breaks PDF documents into bundles of pages, but the entire PDF document remains available to the end user. In Content Manager OnDemand, when a PDF is indexed, it is broken into individual documents, and the individual PDF file becomes multiple individual PDF files. There is no clear way to reverse this process in CMOD.
Ingesting the exported data
You'll want to make sure that during the export process that you consider the information you'll need to get the exported data into Content Manager OnDemand quickly and easily.
Provide report names
In order to get specific reports into OnDemand, you need to provide the name of an report (likely as an Application). Make life easier for yourself by including the name of the report in the file names you output.
Group reports in chronological order
Due to the way table segmentation works in OnDemand, you'll want to load the data in chronological order. When you name files, consider including a date field in YYYY-MM-DD format, so it can be sorted numerically at load time. This ensures that when the production server goes live, that end users will get speedy and fast database queries.
Use Expiration Type of "Load"
In Content Manager OnDemand, there are three options for expiring data loaded into Application Groups:
- examines each row of metadata to determine if it's eligible for expiration, and deletes individual rows from the database
- examines the arsload table to find loads in which ALL documents are eligible for expiration, and preforms an 'unload'
- examines each database table (also know as a "Segment") to determine if ALL the documents in the table are eligible, then it simply drops the database table.
For historical migrations, using the 'Load' expiration types, minimizing the date range inside a single load file, and loading in chronological order will prevent issues with the deletion or disposition of data in the future.
Name output files in AG.Date.App format
Write your report files with the following prefix: ApplicationGroup.DateFormat.Application
This will allow you to load by Application Group, Date, and Application easily when you sort the files. By having the date as your second 'field' in the file name, you make loading in Chronological order easy and straightforward.
Concatenating reports together means fewer loads (and less overhead, as each load can represent up to 10k in metadata). It also means you'll get better compression for storage. Depending on the volume of data for a particular report, you may be able to group reports together by month -- and this also works perfectly with the point above, keeping groups of data with similar dates together inside database tables.
Produce output in manageable batches
When producing output, remember that you'll likely need to transfer this data between systems, possibly across the network, and onto different operating systems. There are limitations to different archiving and compression tools (32678 files for .zip archives, and 2GB file size limits for older versions of gzip and bzip2), and you don't want to lose too much time or effort if a file transfer is interrupted. It's best to produce managable, similarly-sized batches that you can use to develop Applications, test loads, and promote from your Development to Quality Assurance ("QA") and Production Servers.
Order of Operations
In order to find problems with reports as quickly as possible, follow these steps in order:
- Build a test / development Content Manager OnDemand Server
- Make sure you have some extra temporary storage space to queue up incoming report data
- Begin delivering duplicates of the report data to CMOD, in its original format.
- Make some test Application Groups and get some practise indexing these reports, and figuring out any strange or non-standard report types.
- Do you research -- everything under 'Initial Considerations'.
- Document your new structure - create new Application Groups, select the reports that will belong to them, and identify your sample data.