Meet the Mogul Smart Wallet

Mogul Smart Wallet removes the headache of long seed phrases, understanding the jargon in third-party Web 3 interfaces, and the painstaking verification of a twenty-something digit long crypto…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Data Warehouse and OLAP

A short and comprehensive guide to data warehousing and OLAP operations

What is a data warehouse exactly? Simply it is a decision support database that is maintained separately from the organization’s operational database. And it is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site. A data warehouse further identified as a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions.

Let’s take a closer look at each of the key features of a data warehouse;

A data warehouse is kept separate from operational databases due to the following reasons −

Data warehouses and OLAP tools are based on a multidimensional data model. This model views data in the form of a data cube. “What is a data cube?” A data cube allows data to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts.

The multidimensional model of a data warehouse can be modeled in the form of a star schema, a snowflake schema, or a fact constellation schema.

- Star schema: A fact table in the middle connected to a set of dimension tables

- Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to a snowflake.

- fact constellation schema: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation.

star scheme, snowflake scheme, and fact constellation scheme

A data cube measure is a numeric function that can be evaluated at each point in the data cube space. A measure value is computed for a given point by aggregating the data corresponding to the respective dimension–value pairs defining the given point. Measures can be organized into three categories as distributive, algebraic, and holistic based on the kind of aggregate functions used.

Since OLAP servers are based on a multidimensional view of data, have to perform some typical OLAP operations for multidimensional data.

To get a better understanding of the concepts, python implementation of OLAP operations are described below.

Let’s move on to practical and see how a lightweight Python framework and set of tools for the development of reporting and analytical applications, Online Analytical Processing (OLAP), multidimensional analysis, and browsing of aggregated data.

Data Preparation

We can now load the data, create a table, and populate it with the contents of the CSV file.

Creating a data cube

Everything in Cubes happens in an `analytical workspace`. It contains cubes,
maintains connections to the data stores (with cube data), provides the connection to external cubes, and more. The workspace properties are specified in a configuration file slicer.ini (default name). The first thing we have to do is to specify a data store which will host the cube’s data:

Now we can create a data cube based on the above data cube model and data table:

Aggregations and OLAP operations

Let us make a browser object for the data cube. Browser is an object that does the actual aggregations and other data queries for a cube.

We can now compute the aggregates of the data cube as specified by the data cube model. For computing the total count of records:

If we want to results aggregated by year we have to use drilldown operation

Now you can obtain the following results,

Slicing and dicing operations on the data cube

We can also perform slicing and dicing operations on the data cube. In Cubes, slicing operations can be created by either specifying a “point cut” which selects a single value of an attribute in a given dimension (called using the cubes.PointCut()) or by specifying a “range cut”, which selects a range of values for a given dimension. The range cut can be called using the cubes.RangeCut() function, which takes as input the attribute name, the minimum value of the specified range, and the maximum value of the range.

To select only entries with the year being 2009, we have to perform a slicing operation on the data cube by selecting display aggregates according to the item category.

Then we can obtain the following results.

Here, we perform a dicing operation to select records with the year being 2009 and item category being “a” (corresponding to assets) and show aggregates for each subcategory level.

Then we can obtain the following results.

Dicing is similar to slicing but it works a little bit differently. When one thinks of slicing, filtering is done to focus on a particular attribute, dicing, on the other hand, is more a zoom feature that selects a subset over all the dimensions but for specific values of the dimension.

In this article, we learned the concepts of a data warehouse, modeling of data warehouses including data cubes and OLAP.

I hope you enjoyed the blog and hopefully got a clearer picture of data warehousing and OLAP. In the comments section, feel free to post your feedback.

Add a comment

Related posts:

Easily Allocate Resources with the Resource View in WPF Scheduler

In strategic planning, resource allocation is a key factor for using available resources, especially in the near term, to achieve business goals. It is so important to get a clear picture of the…

Change Your Perception About Introverts

I have been an introvert my whole life and I have often been intrigued by people who think that loud people are the successful ones. Such as this, I faced a lot of misconceptions about introverts…

How to Start Resistance Training in Your Forties

I started resistance training when I turned 40, hoping for a physical transformation that would avert a looming mid-life crisis. Resistance training did bring about many benefits that improved my…