Tuesday, March 2, 2010

Facts and Dimensions

The data in a data warehouse is divided into "facts" and "dimensions". Facts are tangible events which also carry inherent characteristics. Dimensions are any data elements that may affect the behavior of these facts.

- At the center of the star are the facts. Facts are tangible events. In this case, the facts are individual sales transactions.
- Around the facts are five dimensions:
1) Customer Loyalty Dimension
2) Geographic Dimension
3) Product Dimension
4) HR Dimension
5) Time Dimension

- Further defining the Geographic dimension are two sub-dimensions, also known as “snowflake” dimensions because of the shape they give to the star:
1) Tax Snowflake Dimension, which depends on the geographic location and the time when the fact occurred
2) Weather Snowflake Dimension, which also depends on the geographic location and the time when the fact occurred

Having divided the data into facts and dimensions, one can mine the data for trends. In a retail environment, one could look for questions such as:
• What distance will the average loyalty card holding customer travel from their home to one of the company retail stores?
• Is there a correlation between the distance and the frequency of the visits?
• If a promotion flyer was distributed by mail to a given postal code, what was the loyalty card holder response?
• In the spring season, at what average temperature do customers purchase more cold drinks, like fruit juices than hot drinks, like coffee?
• If a customer bought a product in the “salty snack” category, what is the probability that they would also buy one or more cold drinks?
• Is there a typical “basket of goods” purchased on certain weekdays?
• What is the profile of the employees with the best sales?
• If a sales education course was provided for employees of a given territory, can the results be measured?

Beyond having answers to questions that the marketers may be curious about, the secondary aim of the data warehouse star schema is to enable “data mining”. Effectively, data mining is the use of software to uncover hitherto unknown trends, or trends not easily visible otherwise.

No comments: