segunda-feira, 9 de março de 2015

Data Preparation #1


How to use Tableau with your hierarchical data

Sometimes you need step back a little bit and prepare your data before use it with your visualization tools.
Otherwise you may face difficulties that might lead to to blame the tool instead of the poor data preparation.

Here I will show you an example of how to prepare a hierarchical (and normalized) data to be used with your visualization tool.

Let's imagine you own a company.
In your company, you have many sales person, and each sales person may (or may not) be the boss of other sales person.
A boss can have other boss, and so on.
Anyone of your company can sell: a sales person, the boss or even the boss's boss.

You want to analyze the sales per sales person.
Not only the individual sales, but include the sales of the hierarchy.
E.g: B is the boss of F, and F is the boss of P.
When you see the B sales, you want to have summarized the sales of B, F and P
When you see F sales, you want to have summarized the sales of F and P
When you see P sales, you want to have only the sales of P, because P is not a boss.

This is your desired output:
Desired output.jpg

In your database, you have 2 tables:
+  Sales
Which has the sales per employee

+  Boss
Which indicate who is boss of who.

Sample records:
Sales table
Employee IDSales
J59
F122
J55
F140
J72
J24
B150
B160
I67
K138
K88
K69
P95

Boss table
EmployeeBoss
FB
II
JI
KJ
PF


How do you achieve your desired output with those tables?
You will need to use recursion in your database.
What does that means? Well, that means you need to create a recursive view in your database, which will have a loop inside.
And then you can use Tableau (connected to your recursive view) to helps you to create wonderful visualizations.

You can use this approach in many databases (Teradata, IBM DB2, etc...).
In this example, we will use Teradata sintax.

So, lets start.

Step 1 - create the recursive view

CREATE RECURSIVE VIEW boss_view (employee, seller) AS
(
     SELECT           main.employee, main.employee as Seller
     FROM
         boss AS main
UNION ALL
     SELECT
          indirect.employee, direct.seller
     FROM           boss_view  AS direct,  boss AS indirect
     WHERE
          direct.employee = indirect.boss
     AND
          indirect.employee <> indirect.boss
);
But, attention: to use the recursive view, you must to have a breakpoint, otherwise your view will perform an infinity loop. In this case, the loop stops when boss.boss = boss.employee.

Step 2 - Mapping the tables into Tableau
Connect to your database and join your view (boss_view) and your sales table, using employee.
join.png

Now you can use Tableau to your analysis, such as:

+  Who is my best seller?
Desired output.jpg
+  Comparing F to others Employee by Seller Boss (A and B):
Analysing a Employee.jpg
Employee "F" appears in Seller A and B, because F belongs to B, and B belongs to A.
Also, the Seller includes it self. You can note that for the Seller B, it also shows B as employee.


Visit the follow links for more details about recursive views:
Hierarchical Queries with DB2 Connect By

Nenhum comentário:

Postar um comentário