Grouping By with missing data - email

Image of Data and desired result:
I'm trying to aggregate volunteer hours from a Google spreadsheet a non-profit I volunteer for. We collect volunteer e-mail information and the time that each volunteer has contributed. Each volunteer only puts in their e-mail the first time. I've found examples online on how to send e-mails, but I'm having trouble aggregating the data. I think the trouble might be that not every row has an e-mail address associated with it.
I've been able to get the sum of hours worked by volunteer using QUERY(data, "select A, sum(C) Group By A", ) but can't figure out how to get the e-mail associated with each individual.

Thanks for the advice! The VLOOKUP and ArrayFormula functions were new to me. Here's how I solved it:
QUERY(data, "select A, B where B <>'' ", -1)
This allowed me to get the Key-Value pair (Name, Email) for each volunteer (solving the problem of people who volunteered multiple times, but only left their e-mail once). From there, I was able to generate the 'Name:Hours Worked' table off to the right with:
QUERY(data, "select A, sum(C) Group By A", ).
Then, I used VLOOKUP to query my Name-Email table to get the desired result of:
Name-Email-aggregatedHours
Thanks!

You can't achieve this with query. But you could apply vlookup to sorted table:
=ArrayFormula(VLOOKUP(UNIQUE(FILTER(A2:A,A2:A<>"")),SORT(A2:B,2,0),2,0))
and get email list for unique names.

First, clean up your data. You shoud be certain that at least one column has no typos an that this column appropiate identify which data corresponds to each volunteer. This is called key value. This also could be done by, but not limited to, filling up the missing values for each row. If this will be hard, then
Create a volunteer list without missing data.
Calculate the time contributed by each volunteer. If you was able to fill up the missing values, then you could use QUERY, I this case the QUERY formula should have to group by name and email, if not, then use SUMIF

Related

Power BI: Filter sales table with multiple locations on respective start dates from different table

I've tried to find a similiar thread on this, but have not been able to do so. Im pretty new to Power BI, so i might not know what im looking for. I could really use some advise.
I have a sales table ('SalesTable') that contains all the sales from different store locations. The table includes all the sales from each store beginning in january 2021, but the stores was incorporated on different dates in 2021, and so i need to be able to make a filter to only return the sales for each store from when the stores was incorporated respectivaly.
Simplified, the tables looks like this:
'SalesTable'
'SalesTable'
'Stores'
'Stores'
The two tables are joined on storeID. SalesTable is also connected to a dax-created Calender table. The stores table is not connected to the calender table (Maybe it should??).
I need to be able to filter the report so that it only returns sales dated on or after the respective incorporateddate.
Like this:
'Desired output'
I am not sure whats the optimal way to go about this. If i should make a calculated table of the SalesTable, or if a measure is sufficient to filter the report. Any suggestions, tips or solutions would be highly appreciated :)
You can use this measure:
sumIncorp =
var __maxIncorp = CALCULATE(max(inc[IncorporatedDate]), FILTER(inc, inc[StoreID] = SELECTEDVALUE(IncSale[StoreID])))
return
CALCULATE(SUM(IncSale[Amount]), FILTER(IncSale, IncSale[Date] >= __maxIncorp))

Tableau Fixed Calculation Summing too much data

I have volume data for specific customers. The customer names come from salesforce and the volume comes from another table. When I add each in tableau, i get a nice table that seems to be working.
We can see that there are 19 values ~500 My ultimate goal is to sum these based upon filters.
A way i discovered that i can do that is to use the syntax
{ FIXED [Account Id]: count([Volume]) }
But when i do that,
I get
When I change my function to count([volume]) i get a count of all joined rows ~250k
My question is how do i make this respect indivudal entries in the database and not all the joined values? If there was a way to do the sum for distinct timestamps in another field this would also work? Any other advice would be helpful from you tableau experts.
Thanks!
I think i got it. In the table of the database that i was trying to calculate there were 20 rows that needed to be calculated. When the data was joined in SF, it duplicated the rows. The trick here was to do the sum of the max for each primary key
SUM({ FIXED [Pk], [Name1] : MAX([Volume]) })

How to get all missing days between two dates

I will try to explain the problem on an abstract level first:
I have X amount of data as input, which is always going to have a field DATE. Before, the dates that came as input (after some process) where put in a table as output. Now, I am asked to put both the input dates and any date between the minimun date received and one year from that moment. If there was originally no input for some day between this two dates, all fields must come with 0, or equivalent.
Example. I have two inputs. One with '18/03/2017' and other with '18/03/2018'. I now need to create output data for all the missing dates between '18/03/2017' and '18/04/2017'. So, output '19/03/2017' with every field to 0, and the same for the 20th and 21st and so on.
I know to do this programmatically, but on powercenter I do not. I've been told to do the following (which I have done, but I would like to know of a better method):
Get the minimun date, day0. Then, with an aggregator, create 365 fields, each has that "day0"+1, day0+2, and so on, to create an artificial year.
After that we do several transformations like sorting the dates, union between them, to get the data ready for a joiner. The idea of the joiner is to do an Full Outer Join between the original data, and the data that is going to have all fields to 0 and that we got from the previous aggregator.
Then a router picks with one of its groups the data that had actual dates (and fields without nulls) and other group where all fields are null, and then said fields are given a 0 to finally be written to a table.
I am wondering how can this be achieved by, for starters, removing the need to add 365 days to a date. If I were to do this same process for 10 years intead of one, the task gets ridicolous really quick.
I was wondering about an XOR type of operation, or some other function that would cut the number of steps that need to be done for what I (maybe wrongly) feel is a simple task. Currently I now need 5 steps just to know which dates are missing between two dates, a minimun and one year from that point.
I have tried to be as clear as posible but if I failed at any point please let me know!
Im not sure what the aggregator is supposed to do?
The same with the 'full outer' join? A normal join on a constant port is fine :) c
Can you calculate the needed number of 'dublicates' before the 'joiner'? In that case a lookup configured to return 'all rows' and a less-than-or-equal predicate can help make the mapping much more readable.
In any case You will need a helper table (or file) with a sequence of numbers between 1 and the number of potential dublicates (or more)
I use our time-dimension in the warehouse, which have one row per day from 1753-01-01 and 200000 next days, and a primary integer column with values from 1 and up ...
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
Ok... so you could override your source qualifier to achieve this in the selection query itself (am giving Oracle based example as its what I'm used to and I'm assuming your data in is from a table). I looked up the connect syntax here
SQL to generate a list of numbers from 1 to 100
SELECT (MIN(tablea.DATEFIELD) + levquery.n - 1) AS Port1 FROM tablea, (SELECT LEVEL n FROM DUAL CONNECT BY LEVEL <= 365) as levquery
(Check if the query works for you - haven't access to pc to test it at the minute)

Tableau - Multiple data into one graph, with double dimension on the x-axis

I'm very new to Tableau, and (maybe because of that) struggling with a graph setting. I need to plot a simple line graph showing the ratio between the number of users that returned after registered x days ago and the total number of users that registered x days ago (regardless on the fact that they returned or not). To do this, I have two tables: TableA having (simplifying) USER_ID and DATE_REGISTRATION, and TableB USER_ID and VISIT_DATE. Both table are joined by USER_ID.
I'm able, of course, to plot each individually (i.e. count distinct of USER_ID with DATE_REGISTRATION on the x axis to get the number of new users registered per day), but not able to combine them. I guess the problem is that I'm using either DATE_REGISTRATION or VISIT_DATE on the x-axis, but in this case I can get one or the other info, but not the two combined.
Ultimately, I would like to be able to have, for each date, both the number of users visiting and the number of user who registered.
Thanks a lot in advance.
Raffaele
Well, problem is your database is not ready to generate those analysis. Your table is user_id oriented, meaning you can do lots of analysis centered on the user_id. To do date oriented you need a table like:
Date User_id Type of event
01/01/2014 1234 Registration
02/01/2014 1234 Visit
Then you can drag Date and Type of event to Columns, and COUNTD(User_id) to rows, to get a bar chart that will show, for each day, how many people registered that day and how many people visited that same day.
Additionally, you can still join this table with the one you have, to have the registration date for each user_id. That way you can, for instance, calculate how many days have passed since registration.

How to retrieve certain entries only

this is what I´m trying to achieve:
As per buttonclick I want to populate a textfield in table A with all records from table B that have a certain value.
Kind of:
"Catch all records in B where field XYZ is 99 and put them in this textfield."
Thanks for your help! I am using FM13!
Edit:
I have an "orders" table where the order-id is a serial number. I use it to write bills from the exact order to another table.
In certain cases an order is too less for one bill, so for them i create entries in another table called "smalljobs".
But: In the smalljobs table there is one field where i can enter a specific order-id - where i write the bill on, in my case say "5".
Let´s say orders "10, 12, and 13" are small-jobs. I have one big order where I can bill smalljobs onto - in my case maybe order "5".
All I want to do: add one script to populate a certain textfield with entries from "smalljobs" where my current order-id ("5") is identified as fitting, and get Title, Costs, etc.
I was wrong from the beginning. I had to relate the "smalljobs" to a certain specific order-id, and get the informations I wanted via GetNthRecord(fieldName;recordNumber).