In my data set, I am looking for value that have both positive and negative result under the amount category. For example, one entity can be bank account, and there are money coming in (positive number) and money going out (negative number).
SELECT description, account_subtype_id, subcategory_id, (case when amount > 0 then 1 end) AS amount_p, (case when amount < 0 then 0 end) AS amount_n
FROM mx.transactions
LIMIT 100
;
This approach doesn't help much because now my data looks like:
bank_A 1 null
bank_A null 0
But I really want to get something like:
bank_A 1 0
because this will be really helpful for my analysis.
Actually. If there is a way to do this, it would be even better:
For example, an entity has
Bank_A $500 -$300 -- (these two results both are from the amount column)
If you want just one row per description, you need to group by description and use aggregate functions. A clean way to check whether there's any positive or negative amount would be to check whether min(amount) and max(amount) are less/greater than 0:
SELECT
description,
min(amount) < 0 AS amount_n,
max(amount) > 0 AS amount_p
FROM mx.transactions
GROUP BY description
These tests will give you true and false values, but you can use them in your CASE/IF statements if you want something else. Or to get the actual values rather than testing against 0, just use min and max directly.
It looks like you've got multiple columns potentially acting as your bank_A identifier. If that's the case, you can GROUP BY all of them.
SELECT
description,
account_subtype_id,
subcategory_id,
min(amount),
max(amount)
FROM mx.transactions
GROUP BY description, account_subtype_id, subcategory_id
Related
I have a field customer_id and I need to track the number of unique users and repeat users. For example the table is as below:
customer_id
11
22
33
11
44
22
Here, the no. of unique users is 4 (11,22,33,44) and number of repeat users are 2 (11,22).
I am calculating unique users as COUNTD([customer_id]).
How can I calculate repeat users? It is basically the distinct count of the values which appear more than once. I tried with the following expression:
COUNTD(IF COUNT([customer_id]) > 1
THEN [customer_id]
END)
but I'm getting an error: Cannot mix aggregate and non-aggregate arguments comparisons or results in IF expressions
How else can I calculate the repeat users?
Thanks in advance.
According to your filter needs, you can rely on LOD using FIXED/INCLUDE:
{ FIXED [Customer Id] : if sum({ FIXED [Customer Id] : COUNT([Customer Id])}) > 1 then 1 end }
Basically, in the inner LOD you count the occourrences, and then you just take in consideration records having 2+ (>1) of them:
A simple alternative to Fabio's answer can also do the job. Just create a calculated field
COUNT([customer id]) >1
and add this to filter shelf.
You can filter out false candidates to remove unique users and taking returning customers only.
I have a task that I have been cracking my head off.
So I have this table transactions and it has 2 columns bonus and type like :
bonus | type
20 1
15 -1
What I want is to have a query with bonus column divided into two columns bonus_spent and bonus_left by type.
It should probably look like this one:
bonus_left | bonus_spent
20 15
I know I can duplicate tables and join them with where clause but is there any way I can do this operation on single query?
In vanilla SQL you would use conditional aggregation. We use the user_id column which indicates who the bonus belongs to and I've used SUM for aggregation to allow for there being more than one of each type of bonus:
SELECT user_id,
SUM(CASE WHEN type = 1 THEN bonus ELSE 0 END) AS bonus_left,
SUM(CASE WHEN type = -1 THEN bonus ELSE 0 END) AS bonus_spent
FROM transactions
GROUP BY user_id
Output:
user_id bonus_left bonus_spent
1 20 15
Demo on dbfiddle
I agree with Nick and you should mark that answer correct IMHO. For completeness and some Knex:
knex('users AS u')
.join('transactions AS t', 'u.id', 't.user_id')
.select('u.id', 'u.name')
.sum(knex.raw('CASE WHEN t.type = 1 THEN t.bonus ELSE 0 END AS bonus_left'))
.sum(knex.raw('CASE WHEN t.type = -1 THEN t.bonus ELSE 0 END AS bonus_spent'))
Note that, lacking your table schema, this is untested. It'll look roughly like this though. You could also just embed the two SUMs as knex.raw in the select list, but this is perhaps a little more organised.
Consider creating the type as a Postgres enum. This would allow you to avoid having to remember what a 'magic number' is in your table, instead writing comparisons like:
CASE WHEN type = 'bonus_left'
It also stops you from accidentally entering some other integer, like 99, because Postgres will type-check the insertion.
I have a nagging concern that having bonus 'left' vs 'spent' in the same table reflects a wider problem with the schema (for example, why isn't the total amount of bonus remaining the only value we need to track?) but perhaps that's just my paranoia!
Thanks in advance for any advice you can offer! I'm building a Tableau dashboard to explore housing affordability and school quality in different neighborhoods in my area. A user will select their occupation and see a graph of neighborhoods plotted based on school quality and housing affordability. To explore housing affordability, I'm using county level assessor data with the valuation of every property matched to neighborhoods.
The goal is to display the percentage of homes in an area that are affordable given the median occupational wages for the job a user selected. Right now, I'm trying to use a calculated field with COUNT([Parcels]<[Occupation])/COUNT([Parcels]), but I need to find a way to count the number of properties in each specific neighborhood below the cut off value.
Does anyone know of a way to count elements of a particular group in this way in Tableau?
I'm on a Mac, using Tableau Desktop, and doing the back end analysis work in R. Thank you!
You seem to misunderstand what the function COUNT() does. You are certainly not alone. Count() behaves in Tableau almost identically to how it does with SQL.
Count([some field]) returns the number of data rows where the value for [some field] is not null. It does not not return the number of rows where [some field] evaluates to true, or a positive number, or anything else.
If [some field] always has a non-null value, then Count([some field]) is the same as SUM([Number of Records]). If [some field] is always null, then Count([some field]) is zero. Count() is not like Excel's CountIf function.
If you want to count data rows that meet a condition, you could try COUNT(if [condition] then 1 end) Since the missing ELSE case defaults to null values, that expression will count rows where [condition] is true.
So one way to get the percentage of affordable homes is count(if [affordable] then 1 end) / count(1) assumes each Data row represents a home. Then format your field to display as a percentage. Another option is to learn to use quick table calcs
If you want to display the number of rows in a given visualized table you could also use SIZE()
Source, official docs:
https://help.tableau.com/current/pro/desktop/en-us/functions_functions_tablecalculation.htm#size
I'm quite new in tableau environment.
I have one column with reps. Second column contains values 'Yes' and 'No'. Third with customers names.
I want to count this: 'Yes' Clients/ All clients for each rep
I created calculating field: COUNT(IF [C2]='Yes' THEN [C2] ELSE NULL END]/COUNT [C2]
but it doesn't work, to each rep I have a 1 value.
How to fix it?
Well, apparently Tableau counts NULLs as well. Try this instead:
SUM(IF [C2]='Yes' THEN 1 ELSE 0 END)/COUNT([C2])
This way you count only the 'Yes'
I'm completely rewriting my question to simplify it. Sorry if you read the prior version. (The previous version of this question included a very complex query example that created a distraction from what I really need.) I'm using SQL Express.
I have a table of lessons.
LessonID StudentID StudentName LengthInMinutes
1 1 Chuck 120
2 2 George 60
3 2 George 30
4 1 Chuck 60
5 1 Chuck 10
These would be ordered by date. (Of course the actual table is thousands of records with dates and other lesson-related data but this is a simplification.)
I need to query this table such that I get all rows (or a subset of rows by a date range or by student), but I need my query to add a new column we might call PriorLessonMinutes. That is, the sum of all minutes of all lessons for the same student in lessons of PRIOR dates only.
So the query would return:
LessonID StudentID StudentName LengthInMinutes PriorLessonMinutes
1 1 Chuck 120 0
2 2 George 60 0
3 2 George 30 60 (The sum Length from row 2 only)
4 1 Chuck 60 120 (The sum Length from row 1 only)
5 1 Chuck 10 180 (The sum of Length from rows 1 and 4)
In essence, I need a running tally of the sum of prior lesson minutes for each student. Ideally the tally shouldn't include the current row, but if it does, no big deal as I can do subtraction in the code that receives the query.
Further, (and this is important) if I retrieve only a subset of records, (for example by a date range) PriorLessonMinutes must be a sum that considers rows that are NOT returned.
My first idea was to use SUM() and to GROUP BY Student, but that isn't right because unless I'm mistaken it would include a sum of minutes for all rows for each student, including rows that come after the row which aren't relevant to the sum I need.
OPTIONS I'M REJECTING: I could scan through all rows in my code that receives it, (although this would force me to retrieve all rows unnecessarily) but that's obviously inefficient. I could also put a real data field in there and populate it, but this too presents problems when other records are deleted or altered.
I have no idea how to write such a query together. Any guidance?
This is a great opportunity to use Windowed Aggregates. The trick is that you need SQL Server 2012 Express. If you can get it, then this is the query you are looking for:
select *,
sum(LengthInMinutes)
over (partition by StudentId order by LessonId
rows between unbounded preceding and 1 preceding)
as PriorLessonMinutes
from Lessons
Note that it returns NULLs instead of 0s (zeroes). If you insist on zeroes, use COALESCE function to turn NULLs into zeroes.
I suggest using a nested query to limit the number of rows returned:
select * from
(
select *,
sum(LengthInMinutes)
over (partition by StudentId order by LessonId
rows between unbounded preceding and 1 preceding)
as PriorLessonMinutes
from Lessons
) as NestedLessons
where LessonId > 3 -- this is an example of a filter
This way the filter is applied after the aggregation is complete.
Now, if you want to apply a filter that doesn't affect the aggregation (like only querying data for a certain student), you should apply the filter to the inner query, as pruning the rows that don't affect the computation early (like data for other students) will improve the performance.
I feel the following code will serve your purpose.Check it:-
select Students.StudentID ,Students.First, Students.Last,sum(Lessons.LengthInMinutes)
as TotalPriorMinutes from lessons,students
where Lessons.StartDateTime < getdate()
and Lessons.StudentID = Students.StudentID
and StartDateTime >= '20090130 00:00:00' and StartDateTime < '20790101 00:00:00'
group by Students.StudentID ,Students.First, Students.Last