Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
174 views
in Technique[技术] by (71.8m points)

Represent on data studio entries duplicated by joins

I am working on a project building the ETL process and dashboard to control some KPI metrics. I have created a table in BigQuery where, once a month, I save some key values calculated by aggregating data extracted from other table. I am measuring emails sent by employees, so in order to calculate one of those key values I need to read from two different tables and perform a left join to match, from each of the company's working areas existing in the aggregation (left table), how many employees that area has (right join).

This is a simplification of my tables:

Sent emails, grouped by area

|  Area Id  |  Service  |  Bad employees  | ...
|     1     |   Gmail   |      3416       | ...
|     2     |   Gmail   |     10782       | ...
|     2     |   Groups  |      9267       | ...

Total number of employees, grouped by area

|  Area Id  |  Total employees  | ...
|     1     |       34124       | ...
|     2     |       82561       | ...
|     3     |       49472       | ...

The problem comes here: as you can see, the first table (sent emails) has a field which does not appear on the second one; I am talking about Service. For this reason, when I join both tables I will get duplicated values for the Total employees field:

Joined table

|  Area Id  |  Service  |  Bad employees  |  Total employees  |
|     1     |   Gmail   |      3416       |       34124       |
|     2     |   Gmail   |     10782       |       82561       |
|     2     |   Groups  |      9267       |       82561       |

This final table will be used to create a report in Data Studio. I want to keep the Service field in my final table as I want to give the users the option of filtering by it. I can not edit the employees table schema and add a Service field to its entries because that information is unique from the emails table, it represents the service from which the email was sent and has nothing to do with the employees table.

I am struggling to get a valid data modeling option for this problem; if go with this solution and I want to represent on Data Studio, let's say, the Total number of employees per selected areas, I will get the wrong value for those areas containing multiple services:

  • Total employees area 1: 34.124
  • Total employees area 2: 82.561 + 82.561 = 165.122
  • Total employees: 34.124 + 165.122 = 199.246
  • Expected value: 34.124 + 82.561 = 116.685

This will affect any metric using the total employees value.

How can I keep the Service field of my joined table and still represent on Data Studio the correct value for Total employees?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...