How to groupBy using multiple columns in scala collections - scala

records.groupBy(_.column1)
What if I want to include more columns like if I want to group by column1, column2 and column3?
Any hints?

Try
records.groupBy(record => (record.column1, record.column2, record.column3))
This will group by a tuple composed of those 3 columns.

Related

Convert jsonb column to multiple columns in postgreSQL without knowing what was the key inside jsonb column

How to convert JSONB column into multiple column when I dont know the fixed keys inside JSONb column in postgreSQL.
For example, Imagine I have three rows with different set of keys inside columnJson column as below
rowID columnjson
row 1 {"a":21,"b":90}
row 2 {"a":46,"b":12, "c": 754}
row 3 {"a":19}
I want to fetch the columnjson as 3 columns like below
Can anyone help on this to achive it.
All columns of a query must be known before the query is started. So you can't have a query that returns a different number of columns each time you run it.
But given your sample data, the following returns your expected result:
select row_id,
column_json ->> 'a' as a,
column_json ->> 'b' as b,
column_json ->> 'c' as c
from the_table

Pyspark union of two dataframes

I want to do the union of two pyspark dataframe. They have same columns but sequence of columns are different
I tried this
joined_df = A_df.unionAll(B_DF)
But result is based on column sequence and intermixing the results. IS there a way to do do the union based on columns name and not based on the order of columns. Thanks in advance
Just reorder columns in B so that it has the same column order as in A before union:
A_df.unionAll(B_df.select(*A_df.columns))

SSRS Parameter to select a column name on value which to filter the dataset

I need to allow my user to select a column name and its value from a parameter (or two params) to filter the result.
I have a text parameter with a few column names that are listed in my dataset
Column1, Column2, Column3. Each of those columns has only two values 1 and 0.
I would love some help in getting an idea how to filter my dataset based on the column name listed in the parameter and a selected value (1 or 0)
I assume it has to be related to a dynamic sql but, not sure how to incorporate that in either the WHERE clause or the actual dataset filter.
Thanks for any points guys!! :)
In the tablix you can use the filter section to set the criteria
col1 = param1.This will only select the rows that match the value of the parameter.
https://www.mssqltips.com/sqlservertip/2597/dataset-and-tablix-filtering-in-sql-server-reporting-services/

PostgreSQL select uniques from three different columns

I have one large table 100m+ rows and two smaller ones 2m rows ea. All three tables have a column of company names that need to be sent out to an API for matching. I want to select the strings from each column and then combine into a single column of unique strings.
I'm using a version of this response, but unsurprisingly the performance is very slow. Combined 2 columns into one column SQL
SELECT DISTINCT
unnest(string_to_array(upper(t.buyer) || '#' || upper(a.aw_supplier_name) || '#' || upper(b.supplier_source_string), '#'))
FROM
tenders t,
awards a,
banking b
;
Any ideas on a more performant way to achieve this?
Update: the banking table is the largest table with 100m rows.
Assuming PostgreSQL 9.6 and borrowing the select from rd_nielsen's answer, the following should give you a comma delimited string of the distinct names.
WITH cte
AS (
SELECT UPPER(T.buyer) NAMES
FROM tenders T
UNION
SELECT UPPER(A.aw_supplier_name) NAMES
FROM awards A
UNION
SELECT UPPER(b.supplier_source_string) NAMES
FROM banking b
)
SELECT array_to_string(ARRAY_AGG(cte.names), ',')
FROM cte
To get just a list of the combined names from all three tables, you could instead union together the selections from each table, like so:
select
upper(t.buyer)
from
tenders t
union
select
upper(a.aw_supplier_name)
from
awards a
union
select
upper(b.supplier_source_string)
from
banking b
;

Unexpected behavior in a postgres group by query

I am used to writing group by queries in t-sql. In a t-sql group by, this would generate a list where items with the same categorytext were grouped together, then items within a category text group that had the same type text would be grouped together. But that does not seem to be what is happening here:
Select "CategoryText", "TypeText"
from "NewOrleans911Categories"
group by "CategoryText", "TypeText";
Here is some output from postgres. Why are the NAs not getting grouped together?
CategoryText; TypeText
"BrokenWindows";"DRUG VIOLATIONS"
"NA";"BOMB SCARE"
"Weapon";"DISCHARGING FIREARMS"
"NA";"NEGLIGENT INJURY"
In a t-sql group by, this would generate a list where items with the same categorytext were grouped together, then items within a category text group that had the same type text would be grouped together.
In SQL, the order in which rows are returned by a query is unspecified, unless you toss in an order by clause. Typically, you'll get the rows in the order they got returned by the query, and that would entirely depend on the query plan. (Best I'm aware, t-sql does that too.)
At any rate, you'd want to add the missing order by clause to get the expected result:
Select "CategoryText", "TypeText"
from "NewOrleans911Categories"
group by "CategoryText", "TypeText"
order by "CategoryText", "TypeText";
Or (and I suspect this is what you're actually looking for) replace the group by with an order by clause:
Select "CategoryText", "TypeText"
from "NewOrleans911Categories"
order by "CategoryText", "TypeText";
You are "grouping" by two columns. The rows are only "Grouped " when the records match both columns.
In that case you have different TypeText for both NA, so they will not group by. Much like using a distinct, which in that case will accomplish the same thing.
May be you need query like this:
select distinct on ("CategoryText") "CategoryText", "TypeText"
from "NewOrleans911Categories"
because with group by you cannot select columns which aren't in group by statement.