Glossary

From CMOD.wiki
Jump to: navigation, search

Acronyms & Initialisms

AFP
Advanced Function Presentation, a print-stream format made popular by IBM. AFP Streams can contain 'resources' - data that defines the size of a page, segments, fonts, and graphics like logos and bar-codes. IBM Content Manager OnDemand saves these AFP Resources separately, and compresses the remaining data, with compression ratios that regularly exceed 90%. When Content Manager OnDemand retrieves an individual document, the resources are added to the data file, to create a document that can be recreated with perfect fidelity.
AIX
Advanced Interactive eXecutive. IBM's UNIX operating system for servers containing POWER processors. AIX was one of the first platforms supported by IBM Content Manager OnDemand.
ARS
The prefix for all CMOD commands. Rumoured to stand for 'Archive Retrieval System'. You can find information on IBM Content Manager OnDemand commands like arssockd and arsdoc elsewhere on the wiki.
AWS
Amazon Web Services. Probably the world's largest Cloud Computing provider. Allows access to Servers, Storage, and Software for a monthly fee. (See 'S3' below.)
CMOD
IBM Content Manager OnDemand, IBM's Enterprise Report Management software. CMOD has consistently been the #1 ERM software product as ranked by Gartner since shortly after it's release. It offers the highest compression ratios and highest reliability for print streams, XML, and PDF data.
COLD
Computer Output to Laser Disc - a generic term that referred to data storage systems that stored digital files on Laser Disc, for example, Magneto-Optical or WORM discs. This acronym has been successively replaced by "Enterprise Content Management" and "Enterprise Report Management" over the years. The concept has remained the same -- store print data where it can be compressed and stored efficiently, and retrieved easily via an orgamization's internal network, or over the Internet through the IBM Content Manager OnDemand Web Enablement Kit.
DBA(s)
DataBase Administrator(s). These are the people (or team) that are responsible for ensuring the usability and security of information stored in database products like IBM DB2, Microsoft SQL Server, or Oracle. You should refer to them when you have questions about the underlying database used by IBM Content Manager OnDemand, or request their assistance when it comes to critical tasks like taking backups and testing restores.
DMS
Database Managed Space - an older method for allocating storage to databases. In DB2, a system administrator would allocate a raw device (a 'logical volume' in AIX, a 'slice' in Solaris, or a 'partition' in Linux) for DB2's exclusive use. Databases now largely use the SMS method for the increased reliability offered my modern file systems, with features like journaled file systems, redundancy, compression, expandability.
ECM
Enterprise Content Management - a newer, more generic term than COLD, recently giving way to the more descriptive and accurate "Enterprise Report Management" or ERM. ECM broadly refers to a variety of content management solutions, all with different features (Workflow, Records Management, Archiving, etc.)
ERM
Enterprise Report Management - specifically, the storage of reports and records for an organization. The narrower focus of ERM is the long-term archiving, retention, searching, retrieval, and eventual disposition or deletion of documents, primarily for legal or regulatory compliance, but more frequently to improve customer experience and reducing support costs by allowing customers to access their information through web-based portals, or smartphone applications.
FORMDEF / PAGEDEF
Form Definition / Page Definition. A component of an AFP data stream, defining the parameters of a form or page. FORMDEFS and PAGEDEFS can either be included inside an AFP data stream, or stored in a 'library' on the CMOD server or high-volume printers.
FTP
File Transfer Protocol. A standard method for transferring files across TCP/IP networks. FTP is insecure, and should only be used on protected internal networks. Use a secure method for file transfers, like SFTP, NDM, MQ Series File Transfer Edition.
ICN
IBM Content Navigator - a highly flexible, customizable, and modern front-end for many IBM Products in the ECM space, including FileNet P8, Content Manager v8, and Content Manager OnDemand. IBM Content Navigator is built to be modular, to allow for programmers to easily reuse sections of the interface in their own code, or for the Content Navigator interface to be customized and 'branded' with customer logos and colors.
GIF
Graphic Interchange Format - a rather old image format with limited colors (256) and relatively poor compression by today's standards. As long as an image contains less than 256 colors, the image compression is lossless, unlike JPEG.
JPEG
Joint Photographic Experts Group - a method of 'lossy' compression, primarily used for full colour photographs. JPEG saves storage space by discarding information in a way intended to be imperceptible to humans. The compression ratio can be adjusted to keep more data (at the cost of larger files) or save more space, which results in visible 'compression artifacts' leading to a loss of detail.
JES
Job Entry Subsystem. In CMOD, JES is a source of mainframe output data from reports "the JES spool". This output data can be downloaded directly from a mainframe using arsjesd. arsjesd has become rather unpopular as a method for transferring files, as it does not support encryption to keep data private, is not restartable, and offers no ability to report on the success or failure of loading the received file.
LDAP
Lightweight Directory Access Protocol - a centralized system for authenticating users to a variety of systems, and conveying entitlements to those users through group membership. IBM Content Manager OnDemand supports authenticating users via LDAP, and provides entitlements via membership in groups. Starting in Content Manager OnDemand v10.1.0.2, CMOD supports LDAP Sync - which offers the ability to synchronize users and groups from an Enterprise LDAP server to CMOD's internal User & Group tables.
NAS
Network Attached Storage - a class of storage devices, primarily accessed over a standard network connection. Often, NAS devices are referred to as "Cheap and Deep" storage, as they tend to use larger capacity, but slower SATA hard drives in the 2TB to 14TB range. This allows for high storage density at the cost of speed and reliability.
PDF
Portable Document Format. Adobe's popular format for creating, viewing, and printing documents that maintain their fidelity (look and feel) across many different types of platforms (i.e. Mac, Windows, Linux, UNIX)
IBM Content Manager OnDemand's PDF Indexer now has the ability to 'de-duplicate' the contents of PDFs, in much the same way as Content Manager OnDemand does for AFP documents with the ACIF indexer.
POWER
Performance Optimized With Enhanced RISC. IBM's custom-designed server-grade Central Processing Units (CPUs). IBM's POWER CPUs focus on providing the maximum amount of computing power, at the cost of energy efficiency and therefore heat dissipation. IBM's POWER servers are known for being power hungry and requiring substantial cooling -- meaning they are often very loud.
PSEG
Page Segment. A component of an AFP data stream, defining a portion of a page. Page Segments are AFP Resources, and as such, are extracted from AFP data streams and grouped into a 'resource bundle', which is one of the ways IBM Content Manager OnDemand saves storage space through de-duplication.
RISC
Reduced Instruction Set Computing. Refers to a concept in processor design where the CPU has less-complex individual instructions, but more of them. Opposite of CISC, Complex Instruction Set Computing. The current generation of CPUs is a combination of both RISC and CISC -- including large numbers of parallel RISC-style cores, plus 'extensions' that provide fast CISC functions useful for intensive tasks like data compression and encryption, or processing audio & video.
S3
Simple Storage Service. A product of Amazon Web Services (See 'AWS' Above), the Simple Storage Service allows software to store virtually unlimited amounts of data into the 'cloud' quickly and easily for a relatively low price per gigabyte. IBM Content Manager OnDemand S3 support was enabled in v9.5.0.4. See ars.cfg for information on configuring Content Manager OnDemand for Cloud Storage with Amazon S3.
SFTP
Secure File Transfer Protocol. Similar to FTP only in name. SFTP assures strong authentication ("I am who I say I am"), privacy (contents are encrypted in transit), and integrity (contents are verified for fidelity). All current systems should remove FTP and replace it with SFTP or another secure file transfer method. Authentication can be performed through the use of passwords (which are cryptographically hashed before being sent over the network) or through Symmetric Public Key Encryption - a method by which two keys (one private, and one public) are used to exchange information (like authentication credentials) securely.
SMS
System Managed Space - in DB2, SMS Tablespaces exist on regular filesystems. Your operating system may provide features like journals, compression, redundanacy, or caching, making SMS preferable to the older DMS method. See DMS above for the older storage method, Database Managed Space.
TCP/IP
Transmission Control Protocol / Internet Protocol. The network protocol used in the overwhelming majority of organizations, and the internet at large. TCP/IP version 4 is common and extremely popular. TCP/IP v6 has been available for nearly a decade, but adoption of v6 is slow due to the perceived complexity and incompatibility with legacy software. IBM Content Manager OnDemand operates by default on TCP port 1445 for unencrypted communications, and administrators frequently use TCP Port 1446 for SSL/TLS encrypted connections.
TIFF
Tagged Image File Format - a graphic image format primarily used to store black and white or greyscale scanned images. Originally developed by Adobe, TIFF has become an international standard. TIFF is actually a container format, which means that it can use many different image compression methods to store documents efficiently. The latest standards are JBIG and JBIG2, but due to patent issues, these standards are not widely supported across platforms and tools.
TSM
Tivoli Storage Manager (or TSM) is used to connect OnDemand to a wide variety of storage technologies, such as tape, tape libraries, optical drives, optical jukeboxes, or proprietary devices like EMC Centera. TSM has three main uses - Archive, Backup, and HSM. CMOD utilizes the 'Archive' component to provide long-term management of data loaded into CMOD. In 2017, Tivoli Storage Manager was renamed "Spectrum Protect" after version v7.1.4. The features, functionality, and compatibility are preserved across versions.
WORM
Write-Once, Read-Many - a type of unalterable, permanent storage, primarily optical disks in the years prior to 2005, but also magnetic tape and magnetic hard disc drives. WORM storage is often required for various regulatory agencies and many local and national government regulations. The main criticism of WORM storage is that once the data on the device has expired, it cannot be re-used, and must often be certifiably destroyed to comply with privacy laws. Since IBM Content Manager OnDemand v10.1 supports both document encryption and cryptographic hashing, the requirement for WORM storage devices may be relaxed, as any tampering with the encrypted data would render it irrecoverable, and any alteration of the underlying document would be detected through a change to the cryptographic hash at retrieval time.
XML
eXtensible Markup Language. A flexible data structure for storing data. See the IBM CMOD XML Indexer presentation on ODUG (Login Required, new registrations approved within 48 hours, use your corporate eMail account) Not only can XML data be quickly and easily ingested into IBM Content Manager OnDemand, the XML data can be displayed in different ways for different devices or users, by altering the style sheets. Content Manager OnDemand de-duplicates and compresses XML data in the same way it stores AFP and PDF data, leading to reduced storage costs for customers.

Terms Specific to Content Manager OnDemand

Application
An Application is a defines a specific report or type or documents to Content Manager OnDemand. The definition include things like the format of the data (Line data, AFP, PDF, or an Image like JPEG / GIF / TIFF / PNG), how it should be displayed (code page for text, rotation of the page, or definitions for headers and columns in line data reports), and how CMOD should obtain the index metadata to load into the database - ACIF is the default Indexer for AFP and Line data, ARSPDOCI for the PDF indexer, or the Generic Indexer. ACIF and PDF indexers interpret the data inside the report itself, finding defined fields to create an index file with. The Generic Indexer, primarily used for graphic formats, or other types of data that can't be read directly, reads a formatted index file to collect the metadata. Applications must belong to an Application Group.
Application Group
A Group of Applications. An Application Group ('AppGroup' or 'AG') defines index, storage management, retention, and expiration criteria. All Applications that belong to a specific Application Group will have the same metadata fields used for searching for documents, have their Application data managed in the same way (how long they are cached, and where they will be stored after the caching has expired), and expire in the same fashion. As implied by the name, an Application Group can contain more than one type of data - Applications can be defined for various types of data (for example AFP for customer statements, PDF for letters sent to customers, and TIFF images for scanning incoming mail from customers) but belong to a single Application Group -- making accessing all types of documents for a customer fast and easy.
Application Groups and Applications are usually structured to combine data with the same business use case (Customer Service, Human Resources, Accounts Payable, etc.), followed closely by the metadata, indexing, storage management, and disposition requirements. Defining an Application Group for each Application is not recommended -- binding too many Application Groups into a single folder negatively affects performance.
Folder
A Folder determines how the data inside Application Groups will be presented to the user or a custom-built web application through the Java API. It allows an administrator to define search fields, the order that those fields appear in, any default values and/or required fields for searches, the fields that are displayed when a search is completed, and allows you to assign user-friendly names to index fields (like "Customer Number" rather than Cust_Num").
Multiple Application Groups can be added to a single folder, allowing users to perform searches across multiple reports or data types with one search, although relying too heavily on this feature can lead to poor performance, as each query on the folder turns into many queries for each Application Group connected, and if a connected Application Group contains a large number of documents, it can further affect performance by performing one query per table.
Storage Set
A Storage Set connects Content Manager OnDemand to storage nodes using a variety of storage technologies. Application Groups are required to have a Storage Set defined so that CMOD can manage the archived data.
Before IBM CMOD v9.5.0.6, Tivoli Storage Manager ("TSM") or Object Access Method ("OAM") were the only supported storage managers. Tivoli Storage Manager was popular on the Content Manager OnDemand for Multiplatforms systems, for the purpose of providing access to long-term archival storage devices like Magneto-Optical or WORM optical discs, tape libraries, or pools of hard disc storage.
Starting in Content Manager OnDemand v9.5.0.6, new Storage Set Types were supported, including Amazon Simple Storage Service ("S3"), Hadoop Distributed File System (HDFS), and OpenStack Swift. In IBM CMOD v10 and higher, Content Manager OnDemand supports IBM Cloud Object Storage, and Filesystem (aka Network-Attached-Storage or NAS). For more information about configuring IBM Content Manager OnDemand Cloud Storage, see IBM CMOD Cloud Storage Options.
Cabinet
A Cabinet is a group of folders.
Cabinets are useful for providing easy access to a number of folders to users. In Content Manager OnDemand, cabinets are often created to help end users perform a specific job function where they need access to different types of data.
In the case of an employee of a Customer Service department, users may need access to monthly statements, correspondence, internal notes or call recordings which may be stored in different folders. In the case of an employee of a corporate Accounts Payable department, they may only need access to one folder, whereas an internal auditor may need access to Folders for Accounts Receivables, Accounts Payable, as well as Financial Reports, Inventory, and Human Resources.
Printer
A printer definition is used to provide access to server-side printing. Very few customers actually define these sorts of printers, instead choosing to redirect print jobs to electronic formats that can be displayed on computer monitors, smartphones and tablets.
In the context of IBM CMOD, printers are usually large industrial-scale laser printers fed by large spools of paper.
Historically, line data could be sent to large "chain printers" to produce reports on 'greenbar' paper.
User
A user is, well, a CMOD user. Users must be defined in order to log into the CMOD server.
Starting in IBM CMOD v8.5, password authentication for users defined in Content Manager OnDemand could be performed by Lightweight Directory Access Protocol, or LDAP.
In OnDemand v10 and higher, Content Manager OnDemand now supports synchronizing Users and Groups with LDAP servers, adding more flexibility, and releasing CMOD admins from duplicating the effort of managing CMOD users and groups.
Group
A Group is a collection of users. Groups can define access permissions to specific Application Groups (including limiting access to searches in Application Groups with Query Restrictions), Folders, Cabinets, or Printers.
As mentioned above, CMOD v10 now supports LDAP Sync.
Cache
"Cache Storage" is a local filesystem of fast magnetic disk (hard drives) for the purpose of storing the most frequently accessed data, so that retrievals are extremely fast.
CMOD data that isn't stored in the cache is stored in nodes, defined in Storage Sets. Data stored inside a Storage Node is generally a MINIMUM of 10x slower to retrieve, due to the overhead of communicating with the various storage methods, and the slower devices they often use for "Cheap and Deep" storage.
Segment
In order to keep database queries fast, OnDemand uses database table segmentation. Table segmentation organizes data into separate tables, and in the case of CMOD, segments are determined by the number of rows configured at the Application Group level. CMOD tracks segments by the minimum and maximum dates found inside each database segment. By requiring each query contain a date, OnDemand can quickly eliminate tables from being searched by excluding those tables that don't contain the dates being searched for.
Before DB2 supported table segmentation natively, the Content Manger OnDemand developers decided to split index data into tables of 10 million rows each. Using this method keeps search performance linear, as only the tables containing documents in the date range you're looking for ( for example, 3 months, or 1 year) are actually searched.
In some of the largest IBM Content Manager OnDemand customer sites, they've modified the Application Group definitions to include up to 250 million rows per database table segment in order to keep performance brisk.
Resources
In the context of AFP or PDF data, the category of data used to render the data on the page. For example, fonts, images, overlays, lines, boxes, or barcodes. OnDemand splits resources from data in order to maximize the compression of actual report data.
By creating 'resource bundles' from data being loaded, CMOD de-duplicates the data, optimizing storage. These resource bundles can be re-used if data loaded in the future contains identical sets of resources in an Application Group. For example, a credit card company might load statements into CMOD in AFP format. Items like logos, graphics, fonts, and page definitions are likely to stay the same from month to month - meaning that a resource bundle may be used for millions or billions of documents - saving storage space each time!
User Exit
A User Exit is a way for advanced users to customize the behavior of OnDemand, by writing custom programs ("exits") that modify data as it flows through CMOD.
User Defined Data Type
Content Manager OnDemand supports several types of data natively -- AFP, PDF, XML, Line Data, and Images. If you want to store a different type of data, you can configure CMOD to use a "User Defined" data type. User Defined data types allow OnDemand to store almost anything.
In the vast majority of cases where User Defined Data Types are used (to store things like recordings of phonecalls to an organization's customer service centre) the Generic Index type is used to provide metadata to CMOD.
When creating Applications with a User Defined data type, pay special attention to the compression settings. Some data, like MP3 audio files, are already compressed, and the Compression setting should be set to 'disable'.

OnDemand Indexing Tools

ACIF
ACIF stands for AFP Control and Indexing Facility. Originally a mainframe tool, ACIF can convert Line Data to AFP for storage inside CMOD, or extract index data automatically from reports.
Generic Index
For data types that can't be indexed automatically, Content Manger OnDemand supports 'Generic' index files, which are specially-formatted text files that contain the index values for individual documents. Generic Index files are used primarily for loading Image format files (TIFF, PNG, GIF, JPEG), but can be used in combination with User Defined data types to load almost any type of digital data into CMOD.
PDF Indexer
In order to load PDF files automatically, the PDF Indexer defines areas inside PDF files that contain the index values for documents.
XML Indexer
A new feature in CMOD v9.5, the XML indexer allows you to select data from inside XML data, and load it into CMOD.
Full Text Indexer
Introduced in OnDemand v9.0, the Full Text Indexer is a component that creates full text indexes for data stored in CMOD.

OnDemand Command-Line Tools

arsacif
The ACIF Indexer, which parses input files for index information, and optionally converts line data into AFP format.
arsadmin
A tool used primarily for unloading data inserted into the OnDemand server by arsload. Other features allow you to compress/decompress data, or save/retrieve individual storage objects.
arsjesd
A daemon that listens for mainframe output via a TCP/IP socket. This is a fast, efficient way to transfer mainframe reports or print streams directly into files on the CMOD server, to be loaded at a later date by arsload.
arsload
The utility that loads data into Content Manager OnDemand. 'arsload' gets configuration data from the Application definition, and indexes, compresses, and stores index, data, and resources.
arsmaint
The maintenance utility for OnDemand. 'arsmaint' can expire data, indexes, perform cache maintenance, or reorganize database tables.
arssockd
This is the main OnDemand process -- if arssockd isn't running, then CMOD isn't available for end users, or loading data.
arstblsp
A utility for altering table spaces within Content Manager OnDemand.
arsxml
'arsxml' allows you to add, update, or export configuration data inside CMOD.

General Information Technology Terms Related to Content Manager OnDemand

Centera
EMC's WORM solution using custom software and commodity hard drives. WORM is enforced by software, not by any irreversible physical change, which allows the re-use of storage space once data has reached its retention period and deleted.
Content Management
A generic term referring to the storage and retrieval of data with centralized control and fixed methodology.
Magneto Optical
A storage technology that uses both lasers and magnetic fields to create an optical disc that is both re-writable, and impervious to magnetic fields. Magneto-Optical discs have been known to survive catastrophic events like floods and smoke damage.
SnapLock & SmartLock
Proprietary technologies that lock files from modification on NAS devices by setting the modification timestamp on a file to a point in the future.
Tape
Magnetic Tape Storage - similar to Audio Cassettes and Video Tape, Tape drives use a long, thin ribbon of plastic coated in magnetic particles to store data. Tape has traditionally had much higher storage capacity than hard drives or optical storage. Around 2005, long-term storage on hard drives became more appealing as the capacity of a single device increased, and power consumption decreased.