Enterprise repositories have always been a tough sell. At their most basic level, repositories are database applications that contain meta data, or data about data. In the context of repositories, the term meta data refers to information about an organization's IS assets—everything from component definitions and COBOL copy books to information about online corporate data warehouses and data marts. Repositories typically also contain database schema information, business rules, and corporate coding and naming conventions. You might think of a repository as an exhaustive, cross-indexed list of resources, a giant card catalog for an organization's resources. Such a list is the goal of the enterprise repository, so you probably won't be surprised that repositories are sometimes called data dictionaries or encyclopedias. If this description sounds too abstract, you can think of the Registry as an everything-you-need-to-know-about-a-Windows-system repository.
If an enterprise repository sounds like something that would be great to have but probably a pain to set up and maintain, you have the right idea. Which brings me back to my initial point that repositories have always been a hard sell. However, repositories are making a comeback, and now is a good time to start investigating what they can do for you.
A Less-Than-Illustrious Past
Repositories aren't new. You might associate them with CASE tools, especially the grandiose upper CASE tools that were blueprints for enterprise development. Often mainframe-based, upper CASE tools addressed all the stages of the waterfall development model, from business modeling and enterprise architecture to code generation and maintenance. So-called lower CASE tools, in contrast, focused more narrowly on requirement analysis, program design, and code generation.
One example of an upper CASE tool that uses repositories is Texas Instruments' (TI's) Information Engineering Facility (IEF), which TI later renamed Composer by IEF, and which Sterling Software acquired in 1997 and renamed COOL:Gen. Two other examples of upper CASE products that use repositories are KnowledgeWare's Application Development Workbench (ADW—which Sterling Software also acquired) and Digital Equipment's COHESION.
Some tool vendors designed repositories for storing information that relates to an aspect of the software development process. These repositories might contain source code, version history, business rules, and project management information.
The current repository market is small and not particularly dynamic. One fact that demonstrates that most administrators aren't interested in repositories is that Microsoft's repository forum (http://msnews.microsoft.com) had only a score of postings as of late January.
In Cutter Information's January 12, 1999, The Cutter Edge electronic briefing, Curt Hall, editor of Cutter's Data Management Strategies and Intelligent Software Strategies newsletters, listed reasons why the enterprise repository has met with only limited success. First, he said, enterprise repositories use proprietary formats and provide only limited synchronization capabilities. Second, they don't provide all the meta data that administrators and end users need, and the meta data they provide isn't in the formats users want. Third, repositories are expensive to build and maintain. Fourth, repositories aren't easy to scale; they require an all-or-nothing implementation. Finally, centralized repositories can't fully satisfy the needs of distributed environments.
Nevertheless, increasing numbers of administrators are taking an interest in repositories, largely because of repositories' role in the exploding fields of data warehousing and online analytical processing (OLAP) applications. Repository technology is useful in data warehousing and OLAP applications because it lets you store information about a data warehouse's or OLAP server's source data and about the extraction, cleansing, and aggregation rules associated with building and maintaining the data warehouse or OLAP database.
Historically, data-warehousing tool vendors have created proprietary repositories to store and manage their applications' data. Notably, industry leaders such as Evolutionary Technologies International (ETI—http://www.evtech.com); Prism Solutions, which Ardent Software (http://www.ardentsoftware.com) acquired in late 1998; and Carleton (http://www.carleton.com) have developed repositories to support their data-warehousing products. Some of these data-warehousing repository products let traditional end users of information browse repository data, but others treat IS staffers as their end users and provide only the IS department access to the repository.
Other factors are also contributing to the repository renaissance. One of these factors is the momentum behind components and component-based application development. Repositories are logical structures that you can harness to keep track of distributed components and promote component reusability. If developers store their components in the repository, other developers can use those components in other projects. Another factor driving interest in repository technology is its potential for helping with Year 2000 (Y2K) remediation. And Microsoft's vocal championing of Microsoft Repository 2.1 has drawn publicity to repositories.
The problem with repositories is that no one wants to spend the time their setup and maintenance requires—and administrators aren't eager to pay for them. Administrators would like repositories to be built into their network infrastructure. Microsoft seems to understand this; that understanding is the reason why I think Microsoft Repository will dominate the repository market in coming years.
History. The first time I heard about a Microsoft repository was at an Enterprise Days briefing about Object Linking and Embedding (OLE) strategy back in May 1995. At that briefing, TI and Microsoft shared the stage to describe a strategy for a component repository that would help developers with application development. My notes refer to the repository as an answer to component anarchy and outline Microsoft's strategy of moving toward cooperating components.
A year and a half later, I heard about a Microsoft repository again. In December 1996, Microsoft released the Microsoft Repository 1.0 software development kit (SDK). Microsoft shipped the first version of Microsoft Repository in March 1997 as a Visual Basic (VB) add-on. The repository wasn't exactly an instant hit; Microsoft admitted that it designed Microsoft Repository 1.0 mainly for independent software vendors (ISVs), not programmers.
When TI sold its software division in July 1997, Microsoft enlisted PLATINUM Technology (http://www.platinum.com) as its new enterprise partner in designing the type of high-end standalone repository that Fortune 500 companies pay six figures for. PLATINUM Technology was a good choice because that company was, and still is, one of the leading enterprise repository vendors. (Viasoft is the other leading enterprise repository vendor; it produces the Rochade repository—see http://www.viasoft.com/rochade.)
PLATINUM sells its own mainframe Repository/MVS and UNIX-based Repository/Open Enterprise Edition (OEE). However, the company is also porting Microsoft Repository to non-Windows platforms and non-SQL Server NT databases.
Functionality. At the same time that Microsoft partnered with PLATINUM Technology, Microsoft announced its Open Information Model (OIM), an extensible COM-based object model that defines the structure of objects that OIM tools share. You can think of Microsoft Repository as two parts: the repository engine, which Microsoft built on top of a SQL database (initially either a Microsoft Access or a SQL Server database), and the OIM, a meta-meta model (i.e., the model that defines the other information models—the meta models.) The OIM supports a variety of information model extensions, including database and OLAP.
Microsoft Repository's first information model supports Unified Modeling Language (UML), an analysis and design-modeling language that has gained widespread industry support. Microsoft Repository 1.0 lets you create a VB program, then use an optional download called Visual Modeler (which is a subset of Rational Software's Rational Rose product—for information about Rational Rose, see http://www.rational.com/rose) to reverse-engineer your VB program, then export the design into the repository. At that point, the UML version of your VB program is available to other repository-aware tools.Because Microsoft derived the OIM from UML, the OIM possesses UML behaviors. Each level of the OIM inherits behaviors from the previous level. For example, the SQL Server model (Sql) inherits behavior from the Database model (Dbm), which inherits behavior from the OIM. Table 1, page 102, describes the OIM information models. ISVs and developers can base their own custom models on information that the models inherit from other portions of the OIM.
Beginning last fall, Microsoft began sounding its familiar call to action to developers. In a new "Building Distributed Applications with Visual Studio" course and at the October 1998 Professional Developers Conference, Microsoft began showing developers how they can use Microsoft Repository to store objects such as COM components and Microsoft Transaction Server (MTS) packages. The one Professional Developers Conference session Microsoft devoted to Microsoft Repository reportedly had more than 300 attendees. Seeing how many TechEd 99 sessions Microsoft devotes to Microsoft Repository will be interesting. ("More sessions" was as much as Microsoft Repository team member Steve Murchie would commit to at press time.)
UIs. Several popular Microsoft products now use Microsoft Repository. Microsoft Repository 2.0 ships with Visual Studio 6.0 and the professional and enterprise editions of VB 6.0. Microsoft Repository 2.1 ships with all versions of SQL Server 7.0. And you can download the Microsoft Repository 2.1 SDK from http://msdn.microsoft.com/repository.
As Figure 1 demonstrates, Microsoft expects nearly every user to be able to interact with the repository via browser tools such as Visual Component Manager, which ships with VB 6.0. Screen 1 shows Visual Component Manager's user interface (UI). As you can see in Screen 1, Microsoft Repository is basically a set of folders waiting for users to fill them with information. Mike Budd, the editor of "Ovum Evaluates: CASE Products" for Ovum, an independent telecommunications, new media, and information technology analyst group in London (http://www.ovum.com), conjured up a Winnie the Pooh connection in a December email conversation we had about Microsoft Repository: "It's a useful pot for putting things in."
Database programmers and users of SQL Server 7.0's new Data Transformation Services (DTS) are more likely to interact with the repository via SQL Server's Enterprise Manager than via a browser tool. Screen 2 shows the SQL Server Enterprise Manager UI. The msdb database that ships with SQL Server contains Microsoft Repository, and SQL Server 7.0 stores the DTS packages you define in the repository.
Component builders are likely to use Microsoft Repository in conjunction with Microsoft's Visual Modeler, a subset of Rational Rose 98. Visual Modeler (msvm.exe) ships with Visual Studio. After I installed Visual Studio, I found Visual Modeler in my computer's \Program Files\Microsoft Visual Studio\Common\Tool\VS-Ent98\VModeler directory. Screen 3 shows the Visual Modeler UI through which component builders can access Microsoft Repository. The application in Screen 3, Microsoft's ExplorationAir, is part of the "Developing Enterprise Applications with Visual Studio" course. (Notice the three-tier architecture that Microsoft is promoting for enterprise applications.)
Finally, ISVs and corporate programmers who want to create their own custom information models will need to use the Microsoft Repository 2.1 SDK. The SDK weighs in at about 8MB. It contains several utilities that you can use with SQL Server 7.0 and a slew of sample information models. Screen 4 shows a sample VB application in the SDK's UI.
Microsoft doesn't produce the only repository on the market. In fact, Microsoft Repository isn't even the only meta-meta model available. The Object Management Group (OMG—http://www.omg.org) supports a meta-meta model called the Meta Object Facility (MOF).
IBM's current repository, like Microsoft's, is more tools-oriented than the Viasoft or PLATINUM repositories. IBM's repository and enterprise architecture initiatives date back to the late 1980s. IBM shipped its first host-based repository in 1990 as part of AD/Cycle, a grandiose but unsuccessful attempt to centralize management of mainframe application development. Since then, IBM's repository technology has evolved through the Configuration Management and Version Control (CMVC) product to the current VisualAge TeamConnection (http://www.software.ibm.com/ad/teamcon), which has been available since 1995.
VisualAge TeamConnection Enterprise Server 3.0 is a combination software-configuration management and repository product that is especially attractive to enterprises that use IBM's VisualAge tools. TeamConnection is an open tool with a published API, and it's source-code compliant (i.e., interoperable with Microsoft's Visual SourceSafe and other version-control products). Although TeamConnection originally used Object Design's ObjectStore (an object-oriented database system) as its data store, it now uses IBM's DB2 Universal Database (UDB). IBM provides consulting services to help TeamConnection customers migrate their Open Data-link Interface (ODI) database to DB2.
Oracle Repository has always been part of Oracle Designer/2000 (which Oracle originally named Oracle CASE and renamed Oracle Designer in 1998). Repository 6.0 is the Oracle repository product currently on the market, but Oracle announced its Repository 7.0 last July, promising to ship the product in mid-1999. Repository 7.0 will leverage the Oracle8 and Oracle8i databases and will support Java extensions to the repository—which, for example, will let the repository store Enterprise JavaBeans. Oracle's Repository 7.0 also promises to offer better versioning than Microsoft Repository offers.
Two other repository vendors merit mention. First, Unisys developed the MOF-based Universal Repository (UREP—http://www.marketplace.unisys.com/urep), which Sybase has licensed. The other noteworthy repository is Softlab's Enabler (http://www.softlab.com).
When I asked Ovum's Mike Budd whether he thought the software market could tolerate several repository products, he answered, "Given that all the main purposes of a repository require that tools share a common meta model and that meta models are difficult to learn, understand, and port information between, the market will find tolerating more than one meta model difficult. Because of its position in the software development and related OS market, Microsoft seems to be in a strong position to establish itself as the definer and controller of this meta model, and thereby to win the repository wars by making competing with Microsoft Repository expensive and difficult." I doubt that the 800 or so firms that have invested in the Viasoft and PLATINUM enterprise repositories will dump those products in favor of Microsoft Repository. But I expect that the Viasoft and PLATINUM products will have to interoperate with Microsoft Repository.
Many vendors of products that include repositories recognize that their customers need to set up repositories to share meta data, so these vendors have created a variety of industry-specific meta data-exchange efforts. For example, CASE tool vendors have defined CASE Data Interchange Format (CDIF—http://www.cdif.org) as a vehicle for sharing model data.
The OMG recently ratified an Extensible Markup Language (XML) Metadata Interchange Format (XMI), which its developers based on XML and UML. (For information about XML, see Ken Spencer, "Using XML to Build Internet Solutions," page 123.) Oracle, IBM, and Unisys actively support XMI. Oracle's Repository 7.0 will support XMI and the OMG's Stream-based Model Interchange Format (SMIF) standards.
However, the XMI specification is likely to face serious competition from XML Interchange Format (XIF), a specification that Microsoft and its allies are backing. Microsoft announced XIF in December 1998 and promoted the specification as an open, industry-standard model that accommodates meta data for software development and data-warehousing tools. Not surprisingly, XIF works hand-in-glove with the OIM.
Microsoft also announced last December that it is joining the Meta Data Coalition (MDC), an industry group that attempts to bridge the gaps between proprietary meta data stores. The MDC has produced Meta Data Interface Standard (MDIS) 1.1, which promises to be a valuable basis for interoperability. (For information about MDIS, see http://www.mdcinfo.com.) IBM, which was a charter member of the MDC and instrumental in developing MDIS, pulled out of the MDC in late 1998. Muddying the very political waters of repository standards in late 1998 was Oracle's announcement of its proprietary Common Warehouse Metadata (CWM) standard, which the OMG is supposedly incorporating into the XMI initiative.
Other organizations are taking on more targeted projects for sharing meta data. The Federal Geographic Data Committee (FGDC—http://www.fgdc.gov) is spearheading a project that will result in a framework for storing data about geographic information systems (GISs). And the Warwick Framework and Dublin Core (http://www.ukoln.ac.uk/metadata/resources/dc.html) have created a meta data framework for digital libraries.
Time to Think Pooh
Although repositories might not affect your life this month—or even this year—starting to investigate this technology is probably a good idea. Visit the Web sites that "Related Reading Online" lists, and if you're in a Microsoft shop, look into Microsoft Repository 2.1 and its related tools, SQL Server's DTS, and Visual Studio's Visual Component Manager and Visual Modeler.
As you investigate repositories, keep their purpose in mind. Repositories are tools that help you manage computer systems and networks. The ideal repository is distributed, open, and extensible. It is largely self-managing and can interoperate with meta data sets that come from different sources. You can interrogate it through open, standard, and well-defined interfaces. Think Pooh. Repositories are nice pots to put things in.Corrections to this Article:
- The term "Virtual Storage Access Method" in Figure 1 was incorrectly identified as "Visual Storage Access Method." We apologize for any inconvenience this might have caused.