SumDistinct Within a Group in SSRS - tsql

I have an SSRS report displaying data on a per-day basis in a matrix. I am using the left side (up-to-down) to display the total of all entities grouped by day. I am using the top side (left-to-right) to display a break-down of the different types of entries that are summed up in that row.
eg:
dataset:
day typ cnt amount exp
Mon 1 3 001000 400
Tue 1 4 000200 400
Wed 1 0 000000 400
Thu 1 1 000020 400
Fri 1 5 002100 400
Mon 2 2 001000 200
Tue 2 0 000000 200
Wed 2 2 005000 200
Thu 2 0 000000 200
Fri 2 20 250000 200
Output:
|| day cnt amount exp || typ cnt amount typ cnt amount
|| Mon 5 002000 600 || 1 3 001000 2 2 001000
up Tue 4 000200 600 up 1 4 000200 2 0 000000
dwn Wed 2 005000 600 dwn 1 0 000000 2 2 005000
|| Thu 1 000020 600 || 1 1 000020 2 0 000000
|| Fri 25 252100 600 || 1 5 002100 2 20 250000
The caveat is that I want to essentially sum-by-distinct-type the exp column (expected amount).
Normally I would sum/ group everything in my query, but one requirement of this report is to display each individual entry on a Detail page (in addition to the output I described above) and the query is already prohibitively heavy.
Hopefully my formatted Output is not too hard to decipher; The left side (surrounded by ||||up dwn||||) is grouping on day, sum(cnt), sum(amount), sumDistinct(exp). And the right side is the "matrix", grouped on typ. The sumDistinct(exp) (DISTINCT by the typ column) is the part I am having trouble with.

Normally I would sum/ group everything in my query, but one
requirement of this report is to display each individual entry on a
Detail page (in addition to the output I described above) and the
query is already prohibitively heavy.
If I have something like this, I will define something called a ""Subreport" at the of the first, that is separated by a page-break. Not sure what program you are using for building your reports, however it should allow you to create a report with all the details save it and add it to your main report as a sub-report.

Related

Is there a simple way in Pyspark to find out number of promotions it took to convert someone into customer?

I have a date-level promotion data frame that looks something like this:
ID
Date
Promotions
Converted to customer
1
2-Jan
2
0
1
10-Jan
3
1
1
14-Jan
3
0
2
10-Jan
19
1
2
10-Jan
8
0
2
10-Jan
12
0
Now I want to see what were the number of promotions it took to convert someone into a customer
For eg., It took (2+3) promotions to convert ID 1 to the customer and (19) to convert ID 2 to the customer.
Eg.
ID
Date
1
5
2
19
I am unable to think of an idea to solve it. Can you please help me?
#Corralien and mozway have helped with the solution in Python. But I am unable to implement it in Pyspark because of the huge dataframe size (>1 TB).
You can use:
prom = (df.groupby('ID')['Promotions'].cumsum()
.where(df['Converted to customer'].eq(1))
.dropna().astype(int))
out = df.loc[prom.index, ['ID', 'Date']].assign(Promotion=prom)
print(out)
# Output
ID Date Promotion
1 1 10-Jan 5
3 2 10-Jan 19
Use one groupby to generate a mask to hide the rows, then one groupby.sum for the sum:
mask = (df.groupby('ID', group_keys=False)['Converted to customer']
.apply(lambda s: s.eq(1).shift(fill_value=False).cummax())
)
out = df[~mask].groupby('ID')['Promotions'].sum()
Output:
ID
1 5
2 19
Name: Promotions, dtype: int64
Alternative output:
df[~mask].groupby('ID', as_index=False).agg(**{'Number': ('Promotions', 'sum')})
Output:
ID Number
0 1 5
1 2 19
If you potentially have groups without conversion to customer, you might want to also aggregate the "" column as indicator:
mask = (df.groupby('ID', group_keys=False)['Converted to customer']
.apply(lambda s: s.eq(1).shift(fill_value=False).cummax())
)
out = (df[~mask]
.groupby('ID', as_index=False)
.agg(**{'Number': ('Promotions', 'sum'),
'Converted': ('Converted to customer', 'max')
})
)
Output:
ID Number Converted
0 1 5 1
1 2 19 1
2 3 39 0
Alternative input:
ID Date Promotions Converted to customer
0 1 2-Jan 2 0
1 1 10-Jan 3 1
2 1 14-Jan 3 0
3 2 10-Jan 19 1
4 2 10-Jan 8 0
5 2 10-Jan 12 0
6 3 10-Jan 19 0 # this group has
7 3 10-Jan 8 0 # no conversion
8 3 10-Jan 12 0 # to customer
you want to compute something by ID, so a groupby ID seems appropriate, e.g.
data.groupby("ID").apply(fct)
Now write a separate function agg_fct which computes the result for a
dataframe consisting of only one ID
Assuming data are ordered by Date, I guess that
def agg_fct(df):
index_of_conv = df["Converted to customer"].argmax()
return df.iloc[0:index_of_conv,df.columns.get_loc("Promotions")].sum()
would be fine. You might want to make some adjustments in case of a customer who has never been converted.

Indexing a Structure in matlab

I was under the impression that structure in matlab were similar to query tables in sql but I have a feeling I might be wrong.
I have a rather large dataset consisting of many entries and many fields. Ideally I want to index the structure, pulling out only the data I am interested in. Here is an example of the dataset
Cond Type Stime ETime
2 10 1 900
2 10 1 900
2 10 1 900
3 1 901 1800
3 1 901 1800
4 1 1801 2700
8 1 901 1800
8 1 901 1800
9 1 901 1800
9 1 901 1800
12 1 901 1800
12 1 901 1800
13 10 1 900
13 10 1 900
13 10 1 900
16 1 901 1800
16 1 901 1800
17 10 1 900
17 10 1 900
17 10 1 900
19 10 1 900
19 10 1 900
19 10 1 900
20 10 1 900
20 10 1 900
20 10 1 900
22 1 901 1800
22 1 901 1800
25 10 1 900
25 10 1 900
25 10 1 900
27 1 901 1800
27 1 901 1800
28 1 901 1800
28 1 901 1800
30 1 1801 2700
31 1 901 1800
31 1 901 1800
32 10 1 900
32 10 1 900
32 10 1 900
35 10 1 900
35 10 1 900
35 10 1 900
What I want to do is pull specific data entries for analysis example being I want all entries where Type is equal to 10 or I want all Cond from 1:20 that have ETime == 900.
I can do this by the following
idx = find([stats.Type] == 10);
[stats(idx).Stime]
but for multiple types I need a for loop as trying to use a vector throws an error.
idx = find([stats.Type] == 1:10); % Does not work
% must use this
temp = [];
for aa = 1:10
idx = find([stats.Type] == aa);
temp = horzcat(idx,temp);
end
[stats(temp).Stime]
Is this the wrong way to use structures? Is there an easier method to index a structure to pull data of interest?
This answer proposes using table indexing instead of struct indexing, which is a bit of a side-step to directly answering the question. However, my comments on this post were deemed useful so I've formalised as an answer...
If you use struct2table then you can interact with it as a table, which is generally much more intuitive.
Structures are useful if your fields have different numbers of elements (i.e. you couldn't form a consistent height table). In almost all other areas, I find tables are easier to use.
With tables you can use:
Logical indexing
Sorting (including sortrows by column name)
The family of "join" operations
Dot notation for accessing table columns by name, as you do for accessing struct fields, or select multiple columns by name using myTable( :, {'col1','col2'} ). - You don't need weird syntactic tricks like [stats.Type] to group outputs, you can just do stats.Type
I would then use ismember to compare multiple items against a table column...
idx = ismember( stats.Type, 1:10 );
Unless you need the indices, you can skip using find for speed, and just directly index using idx.

Find exact word in order in SphinxQL

I'got got an indexed columns that contains some numbers (ids) and I need to extract rows that match exactly some numbers in a given order.
For example
"Give me rows that contains 1 followed by 1 followed by 25 followed by 30"
1 1 1 2 2 25 25 26 30 31 => is valid
1 1 1 2 2 25 25 26 31 32 => is not valid
1 1 1 2 2 2 2 2 25 30 30 => is valid
I'm trying with 1 >> 1 >> 2 >> 2 but it does not work (I think because it match "1" as single character and not as a "word")
The strict order operator is << , soo
1 << 1 << 25 << 30
should work.
Matching part words/single charactors (as opposed to whole words) would only work if specifically enabled it, eg with min_infix_len=1 and would probably only match if have enable_keywords=1 (unless sphinx is old enough to have enable_stat=0

Reporting in Crystal Reports with Formulas

I want to create a formula for a table that looks like this but don't know what forumals to do to scan these files.
ID UDOfficer DATE
1 6 Jan
1 7 Jan
1 9 Jan
2 6 June
3 6 April
4 6 May
5 5 Dec
6 7 Nov
7 6 April
7 4 April
What I want to create:
A formula to put in a crosstab's column to capture UDOfficer = 6 and all others, but if UDOfficer is in 6, 7 and 9, the all others can not count that ID that was already counted for UDOFFICER 6.
OUTPUTCrosstable
DATE UDOFF6 UDOFFOTHER
JAN 1 0
APR 2 0
MAY 1 0
JUN 1 0
NOV 0 1
DEC 0 1
You can use Grouping as well as IF Else to make it work as per your requirement.
First create the group with Date.
Create 2 formulas and place in detail to count the occurrences.
Create a formula #UDOFF6 and write below code:
if UDOfficer =6
then 1
else 0
Create another formula UDOFFOTHER and write below code:
if UDOfficer <> 6
then 1
else 0
Take sum of both formulas in group footer of Date your problem will be solved.

joining three tables each with an incomplete set of each others' keys

I have three tables (actually temp tables, each the result of other queries), with very similar data sets; I need to "condense", for lack of a better term, and my limited SQL knowledge is stopping me.
For example, we have Budgets by Code, Estimates by Code, and Actuals by Code. Not all possible values for Code exist in any of the three, nor even in another accessible table.
Budgets
1 $13
2 $22
4 $44
7 $71
Estimates
1 $14
4 $49
5 $55
Actuals
2 $21
3 $33
5 $57
7 $70
What I want:
Code Bgt Est Act
1 13 14 0
2 22 0 21
3 0 0 33
4 44 49 0
5 0 55 57
7 71 0 70
(I don't have to have 0 when there's no value, that's just for illustrative purposes.)
I just have no idea how to approach this - any help appreciated!
Try using Full Outer Join, In your case query will look like -
Select ISNULL(Bgt.Code,ISNULL(EST.Code,Act.Code)) AS Code,
ISNULL(Bgt.Budget,0) AS Bgt,
ISNULL(Est.Estimate,0) AS Est,
ISNULL(Act.Actual,0) AS Bgt
FROM Budget Bgt
FULL OUTER JOIN Estimates Est ON Est.Code=Bgt.Code
FULL OUTER JOIN Actuals Act ON Act.Code=bgt.Code