SuperLuminate Home Page

> News

   SuperLuminate 1.0 Available
   Creating Terms & Definitions
   How to Write Definitions
   Managed Meta Data
   Buy a Data Dictionary?
   More...

> Newsletter

> Recommended Reading
Ten Steps to Quality Data Database Modeling with Visio Building a Data Dictionary Universal Metadata Models CIO Survival Guide IT Governance

> Open Source News

Latest Open Source News

> Open Source Tools

More...

SuperLuminate -- What is a Data Dictionary

What is a Data Dictionary

Editors Note: The following extended definition of "what a data dictionary is" will seem repetitious to those who already "Grok" what a data dictionary is and why a data dictionary is useful. The basic concept of what a data dictionary is and why every organization needs one is not simple to begin with but becomes very clear as you start to pick it up. The repetitious nature of this chapter is intended to help drill home the concepts.
To grok is to share the same reality or line of thinking with another physical or conceptual entity. Robert A. Heinlein, coined the term in his best-selling 1961 book "Stranger in a Strange Land"
Metadata has become one of the hottest arenas in information technology today. Corporations have realized the value of metadata and the absolute need for it in order for their business to thrive in markets that are becoming more and more competitive.
David Marco, Cutter Consortium
Author of: "Building and Managing the Metadata Repository"
One of enterprises' most-common concerns (and frustrations) is the lack of viable solutions to assist them with maintaining the consistency and gaining an understanding of the metadata stored in multiple formats and locations. Fewer than 30 percent of enterprises have implemented a metadata management solution that addresses this issue, and those that do generally only address a portion of the metadata.
Gartner Group

What is a Data Dictionary

A data dictionary is a collection of data definitions categorized by subject. Equally, data dictionaries provide indications about the data's life cycle, indicating, by whom, when and how the data was created, modified or deleted.
A data dictionary is an organized, formal description of data files that describes physical file attributes, such as record lengths and file types, as well as logical file attributes, such as column names and display formats.
A data dictionary is a vehicle for specifying data collection standards. Insofar as a data dictionary records such standards, it is a useful and necessary tool for enabling the collection of Sarbanes Oxley (SOX) requirements, and essentially describes the meaning of the information to be collected.
Data dictionaries are comprised of object and attribute definitions that include:

Descriptions - what is it
Context - who uses it and why
Data domains - what is the range of possible values
Guide for use - which one of the possible values should I use

When Used by a Database Administrator
A data dictionary is a repository for information about data like a database catalog that stores information about the database such as table names, column names, data types, but is capable of storing information about a much larger range of topics, that includes table names, column names, and data types.
When Used by a Data Administrator
A data dictionary is a repository to store information about data design, and business processes, e.g. entity and attribute descriptions, business rules, and the relationships that entities have to other entities.
When Used by a Business Manager
A data dictionary is a repository to store information about business related items, e.g. glossary terms, reports, processes, and Key Performance Indicators (KPI).

What is Metadata

These valuable assets stored in a data dictionary are called metadata -- data about data -- and describe critical factors about your systems and applications, such as where a particular data source is located and the types of data that are used by these systems and applications. Metadata plays a key role in reacting quickly to new technologies, and is required to remain competitive.
Metadata is the set of data that describes locations for data sources, data types used within applications, and dictionary like descriptions of the data being used (for example, "manufacturing product number" represents the unique identifier for products produced by a manufacturer - includes, data type, size, format, etc.).
Metadata can sometimes be described reasonably as data that tells us about the data we use.
The demands of the business user are changing the nature of metadata. Its previous role of describing and cataloging the multitude of relational database tables in the data warehouse is expanding to include providing information to users about the content, meaning, accuracy and quality of the data they are using to make business decisions.
Business Metadata
Business metadata is data that describes information assets in business terms. Business metadata is stored in the data dictionary and accessed by users to find and understand the information they need. For example, business metadata for a report would contain a description (in business terms) of what the report does and what calculations it contains.
Business metadata plays a critical role in Information Systems because it connects the business user with the relevant data in the enterprise. Business metadata supports the business users' perspective of Information Systems, by using common business terms and providing information about the data in terms of context, understandability, search-ability and usability, rather than in terms of infrastructure and database technology. Business metadata educates users about the origins of information and ensures that they apply the information correctly by describing the rules that govern data use and by identifying valid ways in which data can be leveraged and combined with other data. Without business metadata, business users would have to rely on ad hoc inquiries to Information Systems staff to provide the data definitions and explain the logic behind database tables in order to conduct their analyses.
Business metadata can play a role in inspiring user confidence in the completeness of the data. Business metadata puts the answer in textual, non-technical terms that the business user can understand. It describes the source systems, files and any transformation logic.
Examples of business metadata include: who maintains the data, what is the confidence level of the data and its quality, what algorithm is used to create the values, what is the definition of the data, and what reports are available.
Technical Metadata
Technical metadata is used by the Information Systems staff, e.g. systems analysts, data warehouse managers and database administrators. Technical metadata provides a detailed technical blueprint or "wiring diagram" of the data systems that can be used to assist Information Systems expansion and maintenance. Technical metadata traces the flow of data, providing information such as what sources data is extracted from, when the data was extracted, which target it was loaded into and what technical and business rule transformations were applied to that data as it moved from source to target location.
Examples of technical metadata include: what is the system of record for a specific piece of data, what transformations were performed on the source data to produce the target data, what is the structure of the tables and columns in the data warehouse, what is used to reconcile the data with the source system, and when was the last date and time the data was loaded into the target system, e.g. data warehouse.
MAJOR NOTE:
A mistake many organizations make is assuming that technical metadata can help business end users navigate Information Systems. Business users need a less technical and more user-friendly way to access and analyze metadata. They need to understand and deal with the data in a different way than technical users.

Why Is Metadata Important

Metadata can provide descriptive information about an information system, e.g. an analyst can use metadata to better understand what information is available and how it is calculated. Metadata provides a detailed analysis of where the information came from and can give a confidence factor for describing the data's validity.
Metadata acts as a road map to the information in your business. Without metadata, business and technical users can access data but not information in context that helps them make business decisions with confidence. Currently, the best practices for new business system development mandate having a metadata strategy that makes the system easy to update and use.
Metadata helps us understand our data and our systems, but more than documentation about how the system runs, it tells us where the system is running and where the physical resources being used by the system are located. With a properly maintained data dictionary, applications become easier to maintain, and, if necessary, replace.
This question is extremely timely given the recent focus on Sarbanes Oxley (a U.S. government information systems reporting act). Answering even basic business systems related questions has become significantly more difficult because of the lack of available, accurate business systems definitions -- metadata.

Why Use a Data Dictionary

Many business systems today use a simple spreadsheet to capture source and target data mappings and conversions. But beyond the project's initial requirements phase, this information quickly becomes outdated and inaccessible to most users who need to interact with data.
Data Dictionary for the Data Warehouse
According to Bill Inmon -- "Father of Data Warehousing", metadata is "the description of the structure, content, keys, indexes, etc. of data," (Managing the Data Warehouse, John Wiley & Sons, 1996). More specifically, in a data warehouse environment, metadata can be information about data in the data warehouse, information about how to get a piece of data out of the data warehouse, or information about the quality of data in the data warehouse. Metadata can even give information about how to run warehouse tools to perform different tasks. Metadata about a data warehouse includes information about systems, processes, source and target databases, data transformations, data cleansing, data access, datamarts, and Business Intelligence tools.
For example, a user looking at a regional sales report might think that Total Sales includes all discounts and shipping and handling charges. But Total Sales might not include any or all of these elements. To help the end user, the data warehouse needs metadata that ties the reports' columns to data transformations, data queries, field calculations, and source database tables and columns.
Data Dictionary for Application Integration
Leveraging the power of the data dictionary within your systems and applications will ease your company's foray into application integration. An emerging area that is heavily reliant upon the availability of metadata is application integration. Moving forward, metadata can significantly increase your ability to deliver personalized data to customers and business partners. In the age of e-commerce, clearly one of the defining factors is the ability to customize delivery of a singular set of information to multiple recipients in a variety of formats. For example, for a large bank to integrate their investment systems with their retail banking systems, it is necessary to understand the data, data types, and data sources for both systems. Through integration of these systems, customers can be provided with a consolidated statement, instead of two separate statements from the same bank. Or, in the case of a Web interface, the consolidated statement can simplify navigation by not requiring the user to view their checking and investment account information separately. In both of these cases, it is the underlying metadata that will drive the integration that facilitates the personalized delivery of information, and thus, provides a more professional impression of the bank to the customer.
Enterprise-level decisions are especially dependent upon consistent definitions. The recent proliferation of independent datamarts (departmental reporting databases) has underscored the importance of ensuring data consistency. Organizations that have not considered key data (e.g. customer account identifiers) while implementing data systems are likely to find themselves with large amounts of disparate data that can't be shared or combined. Some organizations that have built these "stovepipe" decision support systems are finding that they are incapable of making accurate decisions across the enterprise because they have no way to consistently define the data. By documenting key data while implementing independent datamarts can prevent this problem.
Data Dictionary for Process Reuse
Additionally, the same metadata that drives personalization of information can drive data reuse. When a company has a thorough understanding of the data it has, it can then intelligently decide the data's overall benefit to the company. More importantly, when the metadata is made available to all corporate personnel, new and innovative ways to use that data can emerge. For example, if the Information Systems department is the only group that has access to the metadata, innovation can be constrained. However, when the business manager for new account development has access to a source of well-defined metadata, then that person is empowered to devise a new campaign for attracting new customers.
Of note, reuse also leads to lower costs for software development, implementation, and maintenance, and increases the opportunity for standardization of information across the company. The latter point is extremely important for companies looking to optimize their internal processes or to create straight-thru processing.

Making Metadata Accessible

Part of a company's commitment to capturing metadata requires two additional decisions. The first is where the metadata will be stored, and the second is how the metadata will be made available to those who need it.
In terms of storage and access, the most obvious answer is to use a metadata repository -- a data dictionary like SuperLuminate. This is a specialized database application designed to provide the infrastructure and support for storage of interrelated components of information. As stated earlier, very few, if any, metadata components stand on their own. Metadata repositories not only help capture information about singular metadata components, but also about the relationships between individual components. Metadata repositories also provide important functionality for searching and browsing the available metadata, delivering one of the more important functions -- producing impact analyses.
Impact analyses identifies all the resources that rely on a particular system component and, therefore, assists in defining all the resources that would be impacted by a change in a system component. Producing these types of reports however, requires inputting and maintaining the necessary information in a data dictionary.

SuperLuminate

SuperLuminate (The open source business data dictionary) is a fully functional, web based, thin client, open source data dictionary application developed and distributed by SuperLuminate (The Company).
Information Model
The structure of the metadata stored in SuperLuminate is defined by the Information Model. The Information Model in SuperLuminate encompass both the taxonomy and schemas (A.K.A. Information Models).

Taxonomy
The taxonomy enables the SuperLuminate administrator to define a high level standard system of classification for the organization. The SuperLuminate taxonomy comprises a four tier hierarchy from category to subject to class and finally to type. The four tier taxonomy was designed specifically to categorize Data Administration metadata.

For example in biological sciences the taxonomy for classifying organisms is based mainly on physical similarities and comprises a hierarchy of seven primary levels. They are from top to bottom: Kingdom, Phylum, Class, Order, Family, Genus, Species.

Schemas (a.k.a. Information Models)
Schemas define the metadata in terms of object types and their relationships. The schema is thus the language for describing the metadata the repository will store.
SuperLuminate provides a default "out-of-the-box" taxonomy and schemas, although users can define their own or modify the ones provided to meet their specific metadata needs.
NOTE: The highest level in the taxonomy (Category) serves as a high level partition - a way to separate large groupings of information. Subject name is synonymous with schema name. Schema is "Too Tech" to have on the front-end so we called it Subject. The Subject name "Application" is the "Application" schema; all of the Class names (Classes) within the "Application" Subject form the entities in the "Application" schema. In this way SuperLuminate can be used to manage multiple schemas.

Extensibility
Extensibility is an important feature that enables users to add information to SuperLuminate for topics specific to their organization. The SuperLuminate administrator can extend the Information Model that the repository uses. If your organization needs to store information that is not part of the base Information Model, you can extend the model to add new objects and corresponding relationships. SuperLuminate administrators can also add new properties to track information for existing objects.
Versioning
Versioning is an important feature of SuperLuminate, and is fairly new to metadata repositories generally. With versioning, as updates happen to objects in the data dictionary, those changes are captured, and the data dictionary maintains a history of those changes.
Metadata Interchange
SuperLuminate is built on a relational database, and uses an open industry standard physical metaschema (IRDS) based on unified modeling language (UML). For these reasons, users are given the ability to share metadata across multiple tools from multiple vendors.
Secure and Flexible Administration
Because some metadata may be sensitive for the organization, you must be careful about how it is distributed. Even if it is relevant to a set of data that a department bases decisions on, it may not be in the best interest of the enterprise to make certain metadata available to all who are interested in it. Security in SuperLuminate is maintained at the individual user and/or group level.

Web Encryption via Open_SSL
User Authentication - login User ID and Password
User Authorization - Administrator, User, Group for Record Access and Maintenance, Reporting, and Tool use
User Profile Management

Powerful Data Analytics

Browse All Application Metadata in Real-Time - Absolute confidence that you are looking at the current version of the truth
Advanced Search Capabilities - Speed investigation of application and data issues with detailed and flexible search facilities
Data Lineage Analysis - Understand the source and destination of data as it flows through your applications
Dependency Analysis (Impact Analysis) - Track the impact of systems changes across the enterprise
Mapping View Analysis - Uncover how data is transformed as it moves between systems
Reporting - Create and save custom reports for easy repeat access to frequently used data sets
Versioning - Track changes to your business rules and corporate standards

SuperLuminate is Based on Modern Industry Standards

SuperLuminate is compliant with the LAMP (Web Based Thin-Client) architectural standard
SuperLuminate is metaschema compliant with IRDS standards
SuperLuminate is compliant with XML metadata interchange based import and export standards (future feature)
SuperLuminate has multi-language Support
SuperLuminate can be run on any operating system supported by PHP
SuperLuminate can be run on any relational database supported by PHP

SuperLuminate is based on an industry standard physical architecture (IRDS) that enables your company to extend the functionality of SuperLuminate by modifying the out-of-box classification taxonomy and schema (A.K.A. Information Model), or by adding one or more of your own schemas. All extensions to SuperLuminate can be accomplished by modifications to the SuperLuminate control data forgoing any need to physically change the application or physical database.
Using a tag line for SalesForce.com, No Software - A SaleForce.com registered trademark, all extensions to SuperLuminate (your company's specific requirements and standards) can be implemented by your data dictionary administrator by way of internal configuration settings alone.
With SuperLuminate "no coding is required."
Because SuperLuminate does not require physical extensions, your company can easily upgrade to new versions of SuperLuminate as they are rolled out.

References:

Shiraz Kassam	Freedom of Information	April 2002
Patrick Cross, Saeed Rahimi	Using the Microsoft Repository	March 2000
JP Morgenthal, Priscilla Walms	Mining for Metadata	Feb 2000
David Gleason V.P. Platinum	Business Metadata	May1999