IN THE DESIGN OF THE BIG DATA ACTIVE DATA WAREHOUSE (ADW) CONSIDER:
Framework:
The architect outlines the map in terms of what the individual components are, defining how they fit together, who owns what parts, and sets out the priorities.
DATA: Normalized (entity/relationship - E/R) or Dimensional (large fact table to small dimension tables – star schema) (If users need detail then dimensional)
INFRASTRUCTURE: data sources, staging, bandwidth, desktop tools access, software distribution, monitoring, size, scalable, and flexibility.
TECHNICAL: Driven by Meta Data catalog, natural vs. surrogate keys, staging ETL, security, operations, and SLAs.
Flexibility and Maintainability:
Quick additions of operational data sources, flexible interface standards allowing plug and play, and with the model and metadata; allowing impact analysis and single-point changes, addition of hardware on the fly (scalable), DBA activity requirements minimal, no restructuring of physical model over time, self reorging, auto distribution of new data, partitioning capability, performance monitoring and easy compression of data.
Fast Development:
Warehouse developers must be able to understand the data warehouse process. They should be able to quickly understand:
·The Meta Data – detail understanding of the definitions of the data this is the first and most important step.
·The Movement/transformation of Data - source/destination information, how the data is changed by the rules as it is moved including DDL (ie; data types, names, etc.) and DML.
·The Business rules – conversions, definitions, derivations, related items, validation, and hierarchy information (ie: versions, dates, etc.).
·The Tool-specific information – access of data through graphic display.
·The Security rules – all user access authentication and authorization.
·The Operations information – SLAs, scheduling of data load job, dependencies, job triggers, notification, and reliability information (ie: host redirects and load balancing).
Management & Communications Tools:
Set expectations, set direction, set scope, set roles, set responsibilities, and set requirements for stake holders.
Coordinate parallel efforts:
Several relatively independent efforts can converge successfully. Be aware, data marts need architecture or face the reality of becoming simple stovepipes. These data marts should be virtual data marts (views) based off of the physical warehouse. This insures “one view of the truth”.
The architect outlines the map in terms of what the individual components are, defining how they fit together, who owns what parts, and sets out the priorities.
DATA: Normalized (entity/relationship - E/R) or Dimensional (large fact table to small dimension tables – star schema) (If users need detail then dimensional)
INFRASTRUCTURE: data sources, staging, bandwidth, desktop tools access, software distribution, monitoring, size, scalable, and flexibility.
TECHNICAL: Driven by Meta Data catalog, natural vs. surrogate keys, staging ETL, security, operations, and SLAs.
Flexibility and Maintainability:
Quick additions of operational data sources, flexible interface standards allowing plug and play, and with the model and metadata; allowing impact analysis and single-point changes, addition of hardware on the fly (scalable), DBA activity requirements minimal, no restructuring of physical model over time, self reorging, auto distribution of new data, partitioning capability, performance monitoring and easy compression of data.
Fast Development:
Warehouse developers must be able to understand the data warehouse process. They should be able to quickly understand:
·The Meta Data – detail understanding of the definitions of the data this is the first and most important step.
·The Movement/transformation of Data - source/destination information, how the data is changed by the rules as it is moved including DDL (ie; data types, names, etc.) and DML.
·The Business rules – conversions, definitions, derivations, related items, validation, and hierarchy information (ie: versions, dates, etc.).
·The Tool-specific information – access of data through graphic display.
·The Security rules – all user access authentication and authorization.
·The Operations information – SLAs, scheduling of data load job, dependencies, job triggers, notification, and reliability information (ie: host redirects and load balancing).
Management & Communications Tools:
Set expectations, set direction, set scope, set roles, set responsibilities, and set requirements for stake holders.
Coordinate parallel efforts:
Several relatively independent efforts can converge successfully. Be aware, data marts need architecture or face the reality of becoming simple stovepipes. These data marts should be virtual data marts (views) based off of the physical warehouse. This insures “one view of the truth”.