Decision support systems are usually on the development of infrastructure data warehouse. A data warehouse (DW), the architecture has two major areas: the rest area and the area of presentation. In this article we present the rest area. Are the sources from which systematic data are extracted in order to be loaded in the DW determined. Documentation database schema sources will be reviewed in the project data mining logic. DocumentationThe quality of the data structures of these sources influenced the degree of difficulty in the design of data extraction logic. The extraction of data are loaded into the staging, is how simple or how to update the files in the database. The staging area may be different phases. Extraction of data from sources, transform data into new structures and the loading of data in the DW, a process known as ETL is located at a rest area.
The extraction requires the determination of the sourcerelational tables – fields from which the data is extracted (as above, the documentation of these structures) is of crucial importance for design. The design of the extraction include:
frequency of data mining
Collection method (for example, only changes) and technology (database replication partial)
the instance of the database or the file will be in the data is loaded into the first staging
Moreover, the volume of data to extract, calculate the plan ComputationalAnd the storage capacity. Sheet estimates "volume-cards' are known, developed with the following information for each source field:
Extraction rate
estimated volume
Standardization and transformation rules are implemented (if any)
DW database field for loads of data
In many cases, the assessment of data quality and data cleaning phases at a rest area. Design and implementation of automated ETL process is often a large part of the effort, peopleThe development of a value DW (international statistics, that 70% of total expenditure exceeded). The rest area DW, is often implemented in a separate server (staging server), thus adding complexity and cost. However, this approach has some advantages such as:
The isolation of raw data that are extracted from sources in the processing of data that are accessible by business analysts
greater security and data quality processes, DW users who have no access in this area
Load balancing, asthat "data processing" tasks and functions of DW queries are handled by separate systems.
The development of a central repository of metadata, which keeps records for all systems involved: the operating system (data sources), ETL process, data warehouse, business intelligence tools, and predefined reports
Different types of RAW data, rather than the staging area:
Data standardization: transforming data into a unified format, if necessary
Sorting records
Matching and merging data sets from the same personderived from various sources (for example, the same customer records from different systems for processing), after normalization
The processing of data derived from computed (Figures by detailed financial information, such as, for example, the total value of the contract)
Management of surrogate keys, replace the main operating systems
Accumulation of records with default values, if necessary
The production of aggregate data, if necessary
Conversion of data based on the technology platform used by DW (DBMS,OS)
The ETL process is handled by the software automatically and periodically update the DW. Copyright 2006 – Kostis Panayotakis