Translating SQL query to Tableau - tableau-api

I am trying to translate the following SQL query into Tableau:
select store1.name, store1.city, store1.order_date
from store1
where order_date = (select max(store2.order_date) from store2
where store2.name = store1.name
and store2.city = store1.city)
I am quite new to Tableau and can't figure out how to translate the where clause as it is selecting from another table.
For example, given the following tables
Store 1:
Name | City | Order Date
Andrew | Boston | 23-Aug-16
Bob | Boston | 31-Jan-17
Cathy | Boston | 31-Jan-17
Cathy | San Diego | 19-Jan-17
Dan | New York | 3-Dec-16
Store 2:
Name | City | Order Date
Andrew | Boston | 2-Sep-16
Brandy | Miami | 4-Feb-17
Cathy | Boston | 31-Jan-17
Cathy | Boston | 2-Mar-16
Dan | New York | 2-Jul-16
My query would return the following from Store 1:
Name | City | Order Date
Bob | Boston | 31-Jan-17
Cathy | Boston | 31-Jan-17

Point for point, converting that SQL query into Tableau Custom SQL Query would be:
SELECT [Store1].[Name], [Store1].[City], [Store1].[Order Date]
FROM [Store1]
WHERE [Order Date] = (SELECT MAX([Store2].[Order Date]) FROM [Store2]
WHERE [Store2].[Name] = [Store1].[Name]
AND [Store2].[City] = [Store1].[City])
In the preview you will notice it will only return Cathy. But once you join the SQL Query onto your primary table on Order Date, you will see both Bob and Cathy as you expect.

Related

Create a pivot table for Month over Month variation

I have these records returned from a query
+---------+--------------+-----------+----------+
| Country | other fields | sales | date |
+---------+--------------+-----------+----------+
| US | 1 | $100.00 | 01/01/21 |
| CA | 1 | $100.00 | 01/01/21 |
| UK | 1 | $100.00 | 01/01/21 |
| FR | 1 | $100.00 | 01/01/21 |
| US | 1 | $200.00 | 01/02/21 |
| CA | 1 | $200.00 | 01/02/21 |
| UK | 1 | $200.00 | 01/02/21 |
| FR | 1 | $200.00 | 01/02/21 |
And I want to show the sales variation from one month to previous, like this:
| Country | 01/02/21 | 01/01/21 | Var% |
| US | $200.00 | $100.00 | 100% |
| CA | $200.00 | $100.00 | 100% |
| FR | $200.00 | $100.00 | 100% |
+---------+--------------+-----------+----------+
How could be done with a Postgres query?
if you always comparing two month only :
select country
, sum(sales) filter (where date ='01/01/21') month1
, sum(sales) filter (where date ='01/02/21') month2
, ((sum(sales) filter (where date ='01/02/21') /sum(sales) filter (where date ='01/01/21')) - 1) * 100 var
from tablename
where date in ('01/01/21' , '01/02/21')
group by country
you also can look at crosstab from tablefunc extension which basically does the same as above query.
CREATE EXTENSION IF NOT EXISTS tablefunc;
select * ,("01/02/21" /"01/01/21") - 1) * 100 var
from(
select * from crosstab ('select Country,date , sales from tablename')
as ct(country varchar(2),"01/01/21" money , "01/02/21" money)
) t
for more info about crosstab , see tablefunc
but if you want to show date in rows instead of columns, you can easily generalize it for all the dates :
select *
, ((sales / LAG(sales,1,1) over (partition by country order by date)) -1)* 100 var
from
country

Select all the rows that match the earliest date for a particular matching column

I have a postgres (11) database with a table that lists places individuals visited, and the date they visited:
name | place | date
-----+-------+-----------
Al | x | 2020-01-01
Al | y | 2020-01-01
Al | z | 2020-02-02
Bob | q | 2020-06-06
Bob | q | 2020-07-07
Bob | r | 2020-07-07
Sue | z | 2020-07-07
Sue | a | 2020-07-07
Sue | b | 2020-08-08
I want to get all the places that each individual visited on their 'first day' - i.e. all places where the name and date are the same, and it is the earliest date for that name. The result would be:
name | place | date
-----+-------+-----------
Al | x | 2020-01-01
Al | y | 2020-01-01
Bob | q | 2020-06-06
Sue | z | 2020-07-07
Sue | a | 2020-07-07
Can anyone suggest how this could be achieved?
Assuming that you have all the rows in the table visits, you can create something like this by first constructing a query that selects the first date that each person visited. This is easily done using:
SELECT name, MIN(date) AS date FROM visits GROUP BY name
Once you have that, you can just join the result of this with the original table. It will then only use the rows that have the the same name and date. I choose to use a CTE, because that is easier to follow:
WITH
first_day AS (SELECT name, MIN(date) AS date FROM visits GROUP BY name)
SELECT name, place, date
FROM first_day JOIN visits USING (name, date);

Converting DataFrame column into array using group by key

I am working on spark dataframes and I need to do a group by of a column employee , designation and company and convert the column values of grouped rows into an array of elements as new column. Example :
Input:
employee | Company Address | designation | company | Home Adress
--------------------------------------------------
Micheal | NY | Head | xyz | YN
Micheal | NJ | Head | xyz | YM
Output:
employee | designation | company | Address
--------------------------------------------------
Micheal | Head | xyz | [Company Address : NY , Home Adress YN], [Company Address : NJ , Home Adress : Ym]
Any help is highly appreciated.!
Below solution in spark for array instead of json,
from pyspark.sql.functions import *
df1 = sc.parallelize([['Micheal','NY','head','XYZ','YN'], ['Micheal','NJ','head','XYZ','YM']]).toDF(("Employee", "Company Address", "designation", "company","Home Adress"))
df2 = df1.groupBy("Employee", "designation", "company").agg(collect_list(struct(col("Company Address"),col("Home Adress"))).alias("Address"))
df2.show(1,False)
Output:
+--------+-----------+-------+--------------------+
|Employee|designation|company|Address |
+--------+-----------+-------+--------------------+
|Micheal |head |XYZ |[[NY, YN], [NJ, YM]]|
+--------+-----------+-------+--------------------+

Postgresql: Looping through a date_trunc generated group

I've got some records on my database that have a 'createdAt' timestamp.
What I'm trying to get out of postgresql is those records grouped by 'createdAt'
So far I've got this query:
SELECT date_trunc('day', "updatedAt") FROM goal GROUP BY 1
Which gives me:
+---+------------+-------------+
| date_trunc |
+---+------------+-------------+
| Sep 20 00:00:00 |
+---+------------+-------------+
Which are the days where the records got created.
My question is: Is there any way to generate something like:
| Sep 20 00:00:00 |
| id | name | gender | state | age |
|----|-------------|--------|-------|-----|
| 1 | John Kenedy | male | NY | 32 |
| |
| Sep 24 00:00:00 |
| |
| id | name | gender | state | age |
|----|-------------|--------|-------|-----|
| 1 | John Kenedy | male | NY | 32 |
| 2 | John De | male | NY | 32 |
That means group by date_trunc and select all the columns of those rows?
Thanks a lot!
Please try SELECT date_trunc('day', "updatedAt"), name, gender, state, age FROM goal GROUP BY 1,2,3. It will not provide as the structure, you expect, but will "group by date_trunc and select all the columns ".

MS Access Group By breaks when using a date

For some reason using a date/time field in a select query with Group By in Access 2010 breaks (records are not properly "grouped by" the text field first, showing the same "aTextField" value multiple times). I am able to replicate the issue in a simple, one table query. Ex:
SELECT aTextField, SUM(aIntField) AS SumOfaIntField
FROM simpleTable
GROUP BY aTextField, aDateField
HAVING aDateField >= Date()
ORDER BY aTextField;
As soon as you remove the "aDateField" from the query (Group By and Having lines) then it works properly. I can even remove the HAVING line and it still breaks. Leaving me to believe that it is something with the Group By.
Any feedback would be great. Thanks!
EDIT More details
**simpleTable**
--------------------------------------------
| ID | aTextField | aIntField | aDateField |
============================================
| 1 | John Doe | 1 | 3/14/2013 |
| 2 | John Doe | | 3/15/2013 |
| 3 | Jane Doe | 1 | 3/15/2013 |
| 4 | John Doe | 2 | 3/18/2013 |
| 5 | Jane Doe | 1 | 3/19/2013 |
| 6 | John Doe | | 3/20/2013 |
| 7 | John Doe | 3 | 3/21/2013 |
| 8 | Jane Doe | 1 | 3/19/2013 |
| 9 | John Doe | | 3/22/2013 |
| 10 | Jane Doe | 2 | 3/20/2013 |
| 11 | Jane Doe | | 3/21/2013 |
| 12 | Jane Doe | | 3/22/2013 |
--------------------------------------------
**Expected Result**
-------------------------------
| aTextField | SumOfaIntField |
===============================
| Jane Doe | 4 |
| John Doe | 3 |
-------------------------------
**Actual Result**
-------------------------------
| aTextField | SumOfaIntField |
===============================
| Jane Doe | 2 |
| Jane Doe | 2 |
| Jane Doe | |
| Jane Doe | |
| John Doe | |
| John Doe | 3 |
| John Doe | |
-------------------------------
So what appears to be happening is that there is a seperate row for each date as well. I just need to filter by the date and not necessarily Group By it. However, Access will not accept the query without grouping it. Options?
You're grouping by aTextField and aDateField. Perhaps simpleTable includes rows where the date is the same, but the time of day is different. In that case your grouping would produce a row for each date/time combination.
Whether or not that was the explanation, you should check what the db engine actually evaluates by including aDateField in the SELECT list.
SELECT aTextField, aDateField, SUM(aIntField)
FROM simpleTable
GROUP BY aTextField, aDateField
HAVING aDateField >= Date()
ORDER BY aTextField;
Also consider using a WHERE instead of HAVING clause:
WHERE aDateField >= Date()
Based on your sample data, I suspect you want ...
SELECT aTextField, SUM(aIntField)
FROM simpleTable
GROUP BY aTextField
WHERE aDateField >= Date()
ORDER BY aTextField;
You should be able to use the following:
SELECT aTextField, SUM(aIntField) AS SumOfaIntField
FROM simpleTable
WHERE aDateField >= Date()
GROUP BY aTextField
ORDER BY aTextField;
You will notice that I removed the GROUP BY on the aDateField column. Since you want the total for each aTextField, then you do not need to group by the date. Grouping by date will result in a separate row for each distinct date.
Note: this query was tested in MS Access 2010 and generated your desired result.
I think you are misunderstanding on how GROUP BY works. You should be seeing the same aTextField once for each unique textfield/datetime combination
Sample
a 2012-01-01
a 2012-01-01
b 2012-01-01
b 2012-01-02
b 2012-01-02
group by aTextField, aDateField
a 2012-01-01
b 2012-01-01
b 2012-01-02
group by aTextField
a
b