KDB - Filter List Column Based on Another Column - kdb

I'm struggling with eliminating data from my query. I have attached a picture with my data results (data itself is too large and has customer info so I can't include). I have two tables that I'm joining by SKU to show when we enter a SKU into the system and when we sell it. We reuse SKUs based on vendors which isn't the best practice but is currently a necessity. What I'd like to do is eliminate the InvoiceDates where InvoiceDate < TransferDate. So in the InvoiceDate column it would only show the highlighted yellow dates for the first few rows.
Please let me know if you have any questions and thanks for the help!

This would work:
q) update InvoiceDate:{x where x >= y}'[InvoiceDate;TransferDate] from tbl
Explanation:
Above query uses 'each-both(') function to iterate over InvoiceDate and TransferDate values pair wise(indirectly row wise), pass each pair to lambda function as 'x' and 'y' and then select 'x'(InvoiceDate) which are >= 'y'(TransferDate)

You question is cut off, but I'm guessing you want to filter on whether a particular date is in your invoiceDate lists. You can do this as follows:
q)select from tbl where in[2019.01.01;] each invoiceDate
If this isn't what you are looking for, please clarify above with an example

Related

Tableau Calculated Field to Display Specific Rows

I'm trying to find a way to display only certain rows of my data based off a very specific criteria. I will try to explain it the best way I can. Let's start with a screenshot here:
Picture of part of the Tableau sheet as-is
What I'm trying to do is create a way to display only the values of "Order: Sales Order #" that have a value filled in for "Item: Connected Product Category". As you see on the screenshot, order number 15589543 has one Connected Product Category that displays "Connectable".
Since this order number does not only have null field for the Connect Product Category, I would like ALL of the rows (even the blank ones) be displayed for order # 15589543. If an order # has NO rows that have "connectable" displayed in them (orders 10305573, 15573299, 15699578, etc.) I would like these orders to be filtered out.
This is a screenshot of just a small part of the data. Basically, if an order has a "connectable" field in it, I need all of the rows for that order # to be displayed.
I tried to do logic such as IF [Item: Connected Product Category] = "Connectable" THEN [Order: Sales Order #] ELSE NULL END but this only displays the rows that literally contain "connectable" in them, not all of the rows for that order number.
Any assistance would be greatly appreciated. After extensive research I'm not sure if this is even possible. Thanks
It is simple. Create a calculated field desired filter as
{FIXED [Order: Sales Order #] : SUM(
IF [Item: Connected Product Category] = 'Connectable' THEN 1 ELSE 0 END
)} > 0
This calculated field will evalaute as TRUE/FALSE and setting filter on this field for TRUE will filter records as desired.
Try this. Good luck

DAX: Distinct and then aggregate twice

I'm trying to create a Measure in Power BI using DAX that achieves the below.
The data set has four columns, Name, Month, Country and Value. I have duplicates so first I need to dedupe across all four columns, then group by Month and sum up the value. And then, I need to average across the Month to arrive at a single value. How would I achieve this in DAX?
I figured it out. Reply by #OscarLar was very close but nested SUMMARIZE causes problems because it cannot aggregate values calculated dynamically within the query itself (https://www.sqlbi.com/articles/nested-grouping-using-groupby-vs-summarize/).
I kept the inner SUMMARIZE from #OscarLar's answer changed the outer SUMMARIZE with a GROUPBY. Here's the code that worked.
AVERAGEX(GROUPBY(SUMMARIZE(Data, Data[Name], Data[Month], Data[Country], Data[Value]), Data[Month], "Month_Value", sumx(CURRENTGROUP(), Data[Value])), [Month_Value])
Not sure I completeley understood the question since you didn't provide example data or some DAX code you've already tried. Please do so next time.
I'm assuming parts of this can not (for reasons) be done using power query so that you have to use DAX. Then I think this will do what you described.
Create a temporary data table called Data_reduced in which duplicate rows have been removed.
Data_reduced =
SUMMARIZE(
'Data';
[Name];
[Month];
[Country];
[Value]
)
Then create the averaging measure like this
AveragePerMonth =
AVERAGEX(
SUMMARIZE(
'Data_reduced';
'Data_reduced'[Month];
"Sum_month"; SUM('Data_reduced'[Value])
);
[Sum_month]
)
Where Data is the name of the table.

Identifying next closest record by date in tableau

I have a table of users and another table of transactions.
The transactions all have a date against them. What I am trying to ascertain for each user is the average time between transactions.
User | Transaction Date
-----+-----------------
A | 2001-01-01
A | 2001-01-10
A | 2001-01-12
Consider the above transactions for user A. I am basically looking for the distance from one transaction to the next chronologically to determine the distances.
There are 9 days between transactions one and two; and there are 2 days between transactions three and four. The average of these is obviously 4.5, so I would want to identify the average time between user A's transactions to be 4.5 days.
Any idea of how to achieve this in Tableau?
I am trying to create a calculated field for each transaction to identify the date of the "next" transaction but I am struggling.
{ FIXED [user id] : MIN(IF [Transaction Date] > **this transaction date** THEN [Transaction Date]) }
I am not sure what to replace this transaction date with or whether this is the right approach at all.
Any advice would be greatly appreciated.
LODs dont have access to previous values directly, so you need to create a self join in your data connection. Follow below steps to achieve what you want.
Create a self join with your data with following criteria
Create an LOD calculation as below
{FIXED [User],[Transaction Date]:
MIN(DATEDIFF('day',[Transaction Date],[Transaction Date (Data1)]))
}
Build the View
PS: If you want to improve the performance, Custom SQL might be the way.
The only type of calculation that can take order sequence into account (e.g., when the value for a calculated field depends on the value of the immediately preceding row) is a table calc. You can't use an LOD calc for this kind of problem.
You'll need to understand how partitioning and addressing works with table calcs, along with specifying your sort order criteria. See the online help. You can then do something like, for example, define days_since_last_transaction as:
if first() > 0 then min([Transaction Date]) -
lookup(min([Transaction Date]), -1) end
If you have very large data or for other reasons want to do your calculations at the database instead of in Tableau by a table calc, then you use SQL windowing (aka analytical) queries instead via Tableau's custom SQL.
Please attach an example workbook and anything you tried along with the error you have.
This might not be useful if you cannot set User ID Field as a filter.
So, you can set
User ID
as a filter. Then following the steps mentioned in here will lead you to calculating difference between any two dates. Ideally if you select any one value in the filter, the calculated field from the link should give you the difference in the dates that you have in the transaction dates column.

Using COUNT in Tableau to count observations by group

Thanks in advance for any advice you can offer! I'm building a Tableau dashboard to explore housing affordability and school quality in different neighborhoods in my area. A user will select their occupation and see a graph of neighborhoods plotted based on school quality and housing affordability. To explore housing affordability, I'm using county level assessor data with the valuation of every property matched to neighborhoods.
The goal is to display the percentage of homes in an area that are affordable given the median occupational wages for the job a user selected. Right now, I'm trying to use a calculated field with COUNT([Parcels]<[Occupation])/COUNT([Parcels]), but I need to find a way to count the number of properties in each specific neighborhood below the cut off value.
Does anyone know of a way to count elements of a particular group in this way in Tableau?
I'm on a Mac, using Tableau Desktop, and doing the back end analysis work in R. Thank you!
You seem to misunderstand what the function COUNT() does. You are certainly not alone. Count() behaves in Tableau almost identically to how it does with SQL.
Count([some field]) returns the number of data rows where the value for [some field] is not null. It does not not return the number of rows where [some field] evaluates to true, or a positive number, or anything else.
If [some field] always has a non-null value, then Count([some field]) is the same as SUM([Number of Records]). If [some field] is always null, then Count([some field]) is zero. Count() is not like Excel's CountIf function.
If you want to count data rows that meet a condition, you could try COUNT(if [condition] then 1 end) Since the missing ELSE case defaults to null values, that expression will count rows where [condition] is true.
So one way to get the percentage of affordable homes is count(if [affordable] then 1 end) / count(1) assumes each Data row represents a home. Then format your field to display as a percentage. Another option is to learn to use quick table calcs
If you want to display the number of rows in a given visualized table you could also use SIZE()
Source, official docs:
https://help.tableau.com/current/pro/desktop/en-us/functions_functions_tablecalculation.htm#size

How to sort by a calculated column in PostgreSQL?

We have a number of fields in our offers table that determine an offer's price, however one crucial component for the price is an exchange rate fetched from an external API. We still need to sort offers by actual current price.
For example, let's say we have two columns in the offers table: exchange and premium_percentage. The exchange is the name of the source for the exchange rate to which an external request will be made. premium_percentage is set by the user. In this situation, it is impossible to sort offers by current price without knowing the exchange rate, and that maybe different depending on what's in the exchange column.
How would one go about this? Is there a way to make Postgres calculate current price and then sort offers by it?
SELECT
product_id,
get_current_price(exchange) * (premium_percentage::float/100 + 1) AS price
FROM offers
ORDER BY 2;
Note the ORDER BY 2 to sort by the second ordinal column.
You can instead repeat the expression you want to sort by in the ORDER BY clause. But that can result in multiple evaluation.
Or you can wrap it all in a subquery so you can name the output columns and refer to them in other clauses.
SELECT product_id, price
FROM
(
SELECT
product_id,
get_current_price(exchange) * (premium_percentage::float/100 + 1)
FROM offers
) product_prices(product_id, price)
ORDER BY price;