Variable rows and columns in SSRS Matrix - ssrs-2008

(SSRS 2008)
I have a dataset with results looking like this:
FUNCTION | EMP-NMB
------------------
A | 100
A | 101
A | 103
B | 102
I want to display this data in my report in this way:
A | B
------------
100 | 102
101 |
103 |
I am managed to display it this way:
A | B
------------
100 |
101 |
103 |
| 102
But that table becomes very large with more data.
The number of employees and functions can vary. For now I am using a Matrix, but I don't know how to configure it to work the way I want.

I think the problem is that you are probably using EMP-NMB as you Row Group grouping.
Since you want the report to display different ones on the same line, you need to something else. Unfortunately, there isn't anything is the data you list but you can add a ROWNUMBER() to the query.
SELECT FUNCTION, EMP-NMB, ROW_NUMBER() OVER(PARTITION BY FUNCTION ORDER BY EMP-NMB) AS ROW_NUM
FROM ...
Then change the tablix Row Group Group On to use the new ROW_NUM field.

Related

Recursive CTE PostgreSQL Connecting Multiple IDs with Additional Logic for Other Fields

Within my PostgreSQL database, I have an id column that shows each unique lead that comes in. I also have a connected_lead_id column which shows whether accounts are related to each other (ie husband and wife, parents and children, group of friends, group of investors, etc).
When we count the number of ids created during a time period, we want to see the number of unique "groups" of connected_ids during a period. In other words, we wouldn't want to count both the husband and wife pair, we would only want to count one since they are truly one lead.
We want to be able to create a view that only has the "first" id based on the "created_at" date and then contains additional columns at the end for "connected_lead_id_1", "connected_lead_id_2", "connected_lead_id_3", etc.
We want to add in additional logic so that we take the "first" id's source, unless that is null, then take the "second" connected_lead_id's source unless that is null and so on. Finally, we want to take the earliest on_boarded_date from the connected_lead_id group.
id | created_at | connected_lead_id | on_boarded_date | source |
2 | 9/24/15 23:00 | 8 | |
4 | 9/25/15 23:00 | 7 | |event
7 | 9/26/15 23:00 | 4 | |
8 | 9/26/15 23:00 | 2 | |referral
11 | 9/26/15 23:00 | 336 | 7/1/17 |online
142 | 4/27/16 23:00 | 336 | |
336 | 7/4/16 23:00 | 11 | 9/20/18 |referral
End Goal:
id | created_at | on_boarded_date | source |
2 | 9/24/15 23:00 | | referral |
4 | 9/25/15 23:00 | | event |
11 | 9/26/15 23:00 | 7/1/17 | online |
Ideally, we would also have i number of extra columns at the end to show each connected_lead_id that is attached to the base id.
Thanks for the help!
Ok the best I can come up with at the moment is to first build maximal groups of related IDs, and then join back to your table of leads to get the rest of the data (See this SQL Fiddle for the setup, full queries and results).
To get the maximal groups you can use a recursive common table expression to first grow the groups, followed by a query to filter the CTE results down to just the maximal groups:
with recursive cte(grp) as (
select case when l.connected_lead_id is null then array[l.id]
else array[l.id, l.connected_lead_id]
end from leads l
union all
select grp || l.id
from leads l
join cte
on l.connected_lead_id = any(grp)
and not l.id = any(grp)
)
select * from cte c1
The CTE above outputs several similar groups as well as intermediary groups. The query predicate below prunes out the non maximal groups, and limits results to just one permutation of each possible group:
where not exists (select 1 from cte c2
where c1.grp && c2.grp
and ((not c1.grp #> c2.grp)
or (c2.grp < c1.grp
and c1.grp #> c2.grp
and c1.grp <# c2.grp)));
Results:
| grp |
|------------|
| 2,8 |
| 4,7 |
| 14 |
| 11,336,142 |
| 12,13 |
Next join the final query above back to your leads table and use window functions to get the remaining column values, along with the distinct operator to prune it down to the final result set:
with recursive cte(grp) as (
...
)
select distinct
first_value(l.id) over (partition by grp order by l.created_at) id
, first_value(l.created_at) over (partition by grp order by l.created_at) create_at
, first_value(l.on_boarded_date) over (partition by grp order by l.created_at) on_boarded_date
, first_value(l.source) over (partition by grp
order by case when l.source is null then 2 else 1 end
, l.created_at) source
, grp CONNECTED_IDS
from cte c1
join leads l
on l.id = any(grp)
where not exists (select 1 from cte c2
where c1.grp && c2.grp
and ((not c1.grp #> c2.grp)
or (c2.grp < c1.grp
and c1.grp #> c2.grp
and c1.grp <# c2.grp)));
Results:
| id | create_at | on_boarded_date | source | connected_ids |
|----|----------------------|-----------------|----------|---------------|
| 2 | 2015-09-24T23:00:00Z | (null) | referral | 2,8 |
| 4 | 2015-09-25T23:00:00Z | (null) | event | 4,7 |
| 11 | 2015-09-26T23:00:00Z | 2017-07-01 | online | 11,336,142 |
| 12 | 2015-09-26T23:00:00Z | 2017-07-01 | event | 12,13 |
| 14 | 2015-09-26T23:00:00Z | (null) | (null) | 14 |
demo:db<>fiddle
Main idea - sketch:
Looping through the ordered set. Get all ids, that haven't been seen before in any connected_lead_id (cli). These are your starting points for recursion.
The problem is your number 142 which hasn't been seen before but is in same group as 11 because of its cli. So it is would be better to get the clis of the unseen ids. With these values it's much simpler to calculate the ids of the groups later in the recursion part. Because of the loop a function/stored procedure is necessary.
The recursion part: First step is to get the ids of the starting clis. Calculating the first referring id by using the created_at timestamp. After that a simple tree recursion over the clis can be done.
1. The function:
CREATE OR REPLACE FUNCTION filter_groups() RETURNS int[] AS $$
DECLARE
_seen_values int[];
_new_values int[];
_temprow record;
BEGIN
FOR _temprow IN
-- 1:
SELECT array_agg(id ORDER BY created_at) as ids, connected_lead_id FROM groups GROUP BY connected_lead_id ORDER BY MIN(created_at)
LOOP
-- 2:
IF array_length(_seen_values, 1) IS NULL
OR (_temprow.ids || _temprow.connected_lead_id) && _seen_values = FALSE THEN
_new_values := _new_values || _temprow.connected_lead_id;
END IF;
_seen_values := _seen_values || _temprow.ids;
_seen_values := _seen_values || _temprow.connected_lead_id;
END LOOP;
RETURN _new_values;
END;
$$ LANGUAGE plpgsql;
Grouping all ids that refer to the same cli
Loop through the id arrays. If no element of the array was seen before, add the referred cli the output variable (_new_values). In both cases add the ids and the cli to the variable which stores all yet seen ids (_seen_values)
Give out the clis.
The result so far is {8, 7, 336} (which is equivalent to the ids {2,4,11,142}!)
2. The recursion:
-- 1:
WITH RECURSIVE start_points AS (
SELECT unnest(filter_groups()) as ids
),
filtered_groups AS (
-- 3:
SELECT DISTINCT
1 as depth, -- 3
first_value(id) OVER w as id, -- 4
ARRAY[(MIN(id) OVER w)] as visited, -- 5
MIN(created_at) OVER w as created_at,
connected_lead_id,
MIN(on_boarded_date) OVER w as on_boarded_date -- 6,
first_value(source) OVER w as source
FROM groups
WHERE connected_lead_id IN (SELECT ids FROM start_points)
-- 2:
WINDOW w AS (PARTITION BY connected_lead_id ORDER BY created_at)
UNION
SELECT
fg.depth + 1,
fg.id,
array_append(fg.visited, g.id), -- 8
LEAST(fg.created_at, g.created_at),
g.connected_lead_id,
LEAST(fg.on_boarded_date, g.on_boarded_date), -- 9
COALESCE(fg.source, g.source) -- 10
FROM groups g
JOIN filtered_groups fg
-- 7
ON fg.connected_lead_id = g.id AND NOT (g.id = ANY(visited))
)
SELECT DISTINCT ON (id) -- 11
id, created_at,on_boarded_date, source
FROM filtered_groups
ORDER BY id, depth DESC;
The WITH part gives out the results from the function. unnest() expands the id array into each row for each id.
Creating a window: The window function groups all values by their clis and orders the window by the created_at timestamp. In your example all values are in their own window excepting 11 and 142 which are grouped.
This is a help variable to get the latest rows later on.
first_value() gives the first value of the ordered window frame. Assuming 142 had a smaller created_at timestamp the result would have been 142. But it's 11 nevertheless.
A variable is needed to save which id has been visited yet. Without this information an infinite loop would be created: 2-8-2-8-2-8-2-8-...
The minimum date of the window is taken (same thing here: if 142 would have a smaller date than 11 this would be the result).
Now the starting query of the recursion is calculated. Following describes the recursion part:
Joining the table (the original function results) against the previous recursion result. The second condition is the stop of the infinite loop I mentioned above.
Appending the currently visited id into the visited variable.
If the current on_boarded_date is earlier it is taken.
COALESCE gives the first NOT NULL value. So the first NOT NULL source is safed throughout the whole recursion
After the recursion which gives a result of all recursion steps we want to filter out only the deepest visits of every starting id.
DISTINCT ON (id) gives out the row with the first occurence of an id. To get the last one, the whole set is descendingly ordered by the depth variable.

Efficiently selecting from a large table using floor() in Postgres

I have two tables: One with squares with columns x and y over the natural numbers, and another with points on this grid created by the first table. Example schema:
Grid Table
id | x | y
------------
123 | 1 | 1
234 | 1 | 2
345 | 2 | 1
456 | 2 | 2
Then, the points table:
id | x | y
----------------
12 | 1.23 | 1.23
23 | 2.89 | 1.55
Currently, using this query:
SELECT g.* FROM grid as g, points as p
WHERE p.id=23 AND floor(p.x)=g.x AND floor(p.y)=g.y;
I get the expected result, which is the grid square in which the point with id 23 resides (grid with id 345); However, when the table grid has 10,000,000 rows (the current situation I'm in), this query is incredibly slow, i.e. on the order of a few seconds.
I've found a workaround for this, but it's ugly:
SELECT g.* FROM grid as g, points as p
WHERE p.id=23 AND (p.x-.5)::integer=g.x AND (p.y-.5)::integer=g.y;
I get the expected result again, and in 11ms, but this feels hacky. Are there cleaner ways to do this? Any help is appreciated!
You can use a CTE, as it is evaluated once only.
WITH p2 AS (select floor(p.x) x,
floor(p.y) y
from points p
where p.id=23)
SELECT g.*
FROM grid g
INNER JOIN p2
ON p2.x=g.x and p2.y=g.y

How to divide two values from the same column but at different rows

I have a table like this:
postcode | value | uns
AA | 10 | 51
AB | 20 | 78
AA | 20 | 78
AB | 50 | 51
and I want to get a result like:
AA | 0.5
AB | 2.5
where the new values are the division for the same postcode between the value with uns = 51 and the value with uns = 78.
How can I do that with Postgres? I already checked window functions and partitions but I am not sure how to do it.
If (postcode, uns) is unique, all you need is a self-join:
select postcode, uns51.value / nullif(uns78.value, 0)
from t uns51
join t uns78 using (postcode)
where uns51.uns = 51
and uns78.uns = 78
If the rows with either t.uns = 51 or t.uns = 78 may be missing, you could use a full join instead (with possibly coalesce() to provide default values for missing rows).
pozs' solution is nice and simple, nothing wrong with it. Just adding two alternatives:
1. Correlated subquery
SELECT postcode
, value / (SELECT NULLIF(value, 0) FROM t WHERE postcode = uns51.postcode AND uns = 78)
FROM t uns51
WHERE uns = 51;
For only one or a few rows.
2. Conditional aggregate
SELECT postcode
, min(value) FILTER (WHERE uns = 51)/ NULLIF(min(value) FILTER (WHERE uns = 78), 0)
FROM t
GROUP BY postcode;
May be faster when processing most or all of the table.
Can also deal with duplicates per (postcode, uns), use an aggregate function of your choice to pick the right value from each group. For just one row in each group, min() is just as good as max() or sum().
About the aggregate FILTER:
Aggregate columns with additional (distinct) filters

crosstab in PostgreSQL, count

Crosstab function returns error:
No function matches the given name and argument types
I have in table clients, dates and type of client.
Example:
CLIENT_ID | DATE | CLI_TYPE
1234 | 201601 | F
1236 | 201602 | P
1234 | 201602 | F
1237 | 201601 | F
I would like to get number of clients(distinct) group by date and then count all clients and sort them by client type (but types: P i F put in row and count client, if they are P or F)
Something like this:
DATE | COUNT_CLIENT | P | F
201601 | 2 | 0 | 2
201602 | 2 | 1 | 1
SELECT date
, count(DISTINCT client_id) AS count_client
, count(*) FILTER (WHERE cli_type = 'P') AS p
, count(*) FILTER (WHERE cli_type = 'F') AS f
FROM clients
GROUP BY date;
This counts distinct clients per day, and total rows for client_types 'P' and 'F'. It's undefined how you want to count multiple types for the same client (or whether that's even possible).
About aggregate FILTER:
Postgres COUNT number of column values with INNER JOIN
crosstab() might make it faster, but it's pretty unclear what you want exactly.
About crosstab():
PostgreSQL Crosstab Query

Is there a way to see details for groups in a query?

I need to make a report in SSRS that will output data in this format:
Person | DocumentID | Data1 | Data2 | .....
----------------------------------------------
Mr. Smith | | | |
| #123021312 | 01 | 04 | .....
| #132145681 | 07 | 00 | .....
Mr. Black | | | |
| #912205112 | 11 | 08 | .....
| #131135810 | 03 | 05 | .....
..............................................
So, there is a kind of a hierarchy to the query. There are detail records (data about documents) and group records (persons). If I would do just GROUP BY, I would be able to only see group records, and display some aggregate information, like, Max of Data1, or Count of Document ID. Instead, I want to be able to see both aggregate and detail rows.
I tried googling and couldn't find any information about wether this is possible in T-SQL (or SSRS, for that matter). Is it?
Yes it is possible....
Flat Data
Declare #T TABLE (Person VARCHAR(25), DocumentID VARCHAR(25), Data1 VARCHAR(25), Data2 VARCHAR(25))
INSERT INTO #T (Person,DocumentID,Data1,Data2) VALUES
('Mr. Smith','#12345678A','01','04'),
('Mr. Smith','#98765432A','02','05'),
('Mr. Black','#12345678B','03','06'),
('Mr. Black','#98765432B','04','07')
SELECT *
FROM #T
Tablix Setup Steps
On your tablix that contains each of the fields in SSRS highlight the data row.
Right Click on the now visible row header with the 3 lines.
Select Add Group > Parent Group
In the group by drop down select Person then OK
The report will now be grouped by the Person column.
Bonus if you don't want the Person column showing to the right of the grouping simply delete the column.