Data Warehouse Design and Development Approaches

There are a number of different possible architectures and design approaches for the development of the Data Warehouse (DW). Widely used approaches include the top down Corporate Information Factory architecture (Inmon, 1995), the bottom up dimensional Data Mart approach (Kimball et al., 2008), and the Data Vault approach (Linstedt et al., 2010).

Inmon (2005) defines a DW as a collection of integrated databases designed to support the DSS function, with an architecture which is, or should be, almost the same as the source system. As shown in Figure 1, the Inmon DW also has data marts, or aggregated tables that are used for reporting and querying purposes.

 

Figure 1: Simplified view of Business Intelligence based on Inmon’s DW approach

 

Linstedt et al., (2010) proposes very similar concept for the Data Vault approach. Differentiation is only in the context of modelling and storing information inside the data marts. In the Data Vault approach data is loaded from the source system as is, without any checks or manipulation (Linstedt et al., 2010). The Data Vault approach is characterised by Hubs, Links and Satellites (Jovanović, Subotić & Mrdalj, 2014). Hubs represent source system business keys in the master table, links are associations between hubs with validity periods (from/ to date), and satellites point to the links containing attributes of transaction with the validity period (Orlov, 2014). As the structure of the data is highly normalized (4NF+), this approach to implementing the data warehouse is not adequate for direct reporting and requires additional dimensional data marts to enable reporting or querying (Orlov, 2014). Because of the complexity of the design, which includes very large amounts of historical data and complex joins, a direct query to a DW database based on the Data Vault approach would be highly demanding in time and CPU resources. Thus, DMs, as a form of focused and highly optimized database, are used in the Data Vault approach as an additional stage to support reporting.

A different approach was proposed by Kimball et al. (2008) who argue that a DW should be seen as a collection of the data marts which are used for querying and reporting and are connected used conformed dimensions. Conformed dimensions are standardized master data tables that describe the dimension, but which are intended to be used by more than one fact table, and/or by other dimensions for further detailing of existing attributes. Kimball argues that there is no need to replicate all the data from the source system, but only the data needed by the business. The Kimball approach is shown in Figure 2.

 

 

Figure 2: Simplified view of Business Intelligence based on Inmon’s DW approach

 

By removing the “Data Warehouse Database” component from Inmon’s approach shown in Figure 1, Kimball’s concept (Figure 2) based on conformed dimensions would be produced.

The Inmon and Kimball strategies agree that no change to the data, master (dimensional) or transactional, should be made in the conceptual database/data marts that represents the DW. Any such changes could lead to consistency problems, as discussed further in section 2.6. However, transformation and extractions based on existing data are allowed in the conceptual database/data marts. It is accepted that change to master data and any correction of transactional data must done in source system and then sent to DW, rather than changes being made at DW level.

 

References:

Inmon, B.W., 1995. Building the Data Warehouse, New York: Wiley.

Kimball, R., Ross, M., Thorthwaite, W., Becker, B., & Mundy, J., 2008. The Data Warehouse Lifecycle Toolkit. Wiley India Pvt. Limited.

Linstedt, D., Graziano, K., & Hultgren, H., 2010. The New Business Supermodel, The Business of Data Vault modeling. Lulu. com (2nd ed.). Lulu.com.

Jovanović, V., Subotić, D., & Mrdalj, S., 2014. Data modeling styles in data warehousing. In 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2014 – Proceedings (pp. 1458–1463). Opatija, Croatia: MIPRO. Available at: http://doi.org/10.1109/MIPRO.2014.6859796

Orlov, V., 2014. Data Warehouse Architecture: Inmon CIF, Kimball Dimensional or Linstedt Data Vault? Retrieved February 20, 2015, from http://blog.westmonroepartners.com/data-warehouse-architecture-inmon-cif-kimball-dimensional-or-linstedt-data-vault/

Inmon, B.W., 2005. Building the Data Warehouse (4th ed.), Indianapolis: John Wiley & Sons.