I want to group a data in such a way that for particular record each array values also used to group for that record
I am able to group by name only. I am not able to figure out the way to this.
I have tried following query;
import pyspark.sql.functions as f
df.groupBy('name').agg(f.collect_list('data').alias('data_new')).show()
Following is the dataframe;
|-------|--------------------|
| name | data |
|-------|--------------------|
| a | [a,b,c,d,e,f,g,h,i]|
| b | [b,c,d,e,j,k] |
| c | [c,f,l,m] |
| d | [k,b,d] |
| n | [n,o,p,q] |
| p | [p,r,s,t] |
| u | [u,v,w,x] |
| b | [b,f,e,g] |
| c | [c,b,g,h] |
| a | [a,l,f,m] |
|----------------------------|
I am expecting following output;
|-------|----------------------------|
| name | data |
|-------|----------------------------|
| a | [a,b,c,d,e,f,g,h,i,j,k,l,m]|
| n | [n,o,p,q,r,s,t] |
| u | [u,v,w,x] |
|-------|----------------------------|
Related
I have a table like the following one:
+---------+-------+-------+-------------+--+
| Section | Group | Level | Fulfillment | |
+---------+-------+-------+-------------+--+
| A | Y | 1 | 82.2 | |
| A | Y | 2 | 23.2 | |
| A | M | 1 | 81.1 | |
| A | M | 2 | 28.2 | |
| B | Y | 1 | 89.1 | |
| B | Y | 2 | 58.2 | |
| B | M | 1 | 32.5 | |
| B | M | 2 | 21.4 | |
+---------+-------+-------+-------------+--+
And this would be my desired output:
+---------+-------+--------------------+--------------------+
| Section | Group | Level1_Fulfillment | Level2_Fulfillment |
+---------+-------+--------------------+--------------------+
| A | Y | 82.2 | 23.2 |
| A | M | 81.1 | 28.2 |
| B | Y | 89.1 | 58.2 |
| B | M | 32.5 | 21.4 |
+---------+-------+--------------------+--------------------+
Thus, for each section and group I'd like to obtain their percents of fulfillment for level 1 and level 2. To achieve this, I've tried crosstab(), but using this function returns me an error ("The provided SQL must return 3 columns: rowid, category, and values.") because I'm using more than three columns (I need to maintain section and group as identifiers for each row). Is possible to use crosstab in this case?
Regards.
I find crosstab() unnecessary complicated to use and prefer conditional aggregation:
select section,
"group",
max(fulfillment) filter (where level = 1) as level_1,
max(fulfillment) filter (where level = 2) as level_2
from the_table
group by section, "group"
order by section;
Online example
I have this table in my database:
| id | desc |
|-------------|
| 1 | A |
| 2 | B |
| NULL | C |
| 3 | D |
| NULL | D |
| NULL | E |
| 4 | F |
---------------
And I want to transform this table into a table that replace nulls by consecutive negative ids:
| id | desc |
|-------------|
| 1 | A |
| 2 | B |
| -1 | C |
| 3 | D |
| -2 | D |
| -3 | E |
| 4 | F |
---------------
Anyone knows how can I do this in hive?
Below approach works
select coalesce(id,concat('-',ROW_NUMBER() OVER (partition by id))) as id,desc from database_name.table_name;
I am trying to understand how to pivot data within T-SQL but can't seem to get it working. I have the following table structure
+-------------------+-----------------------+
| Name | Value |
+-------------------+-----------------------+
| TaskId | 12417 |
| TaskUid | XX00044497 |
| TaskDefId | 23 |
| TaskStatusId | 4 |
| Notes | |
| TaskActivityIndex | 0 |
| ModifiedBy | Orange |
| Modified | /Date(1554540200000)/ |
| CreatedBy | Apple |
| Created | /Date(2121212100000)/ |
| TaskPriorityId | 40 |
| OId | 2 |
+-------------------+-----------------------+
I want to pivot the name column to be columns expected output
+--------+------------------------+-----------+--------------+-------+-------------------+------------+-----------------------+-----------+-----------------------+----------------+-----+
| TASKID | TASKUID | TASKDEFID | TASKSTATUSID | NOTES | TASKACTIVITYINDEX | MODIFIEDBY | MODIFIED | CREATEDBY | CREATED | TASKPRIORITYID | OID |
+--------+------------------------+-----------+--------------+-------+-------------------+------------+-----------------------+-----------+-----------------------+----------------+-----+
| | | | | | | | | | | | |
| 12417 | XX00044497 | 23 | 4 | | 0 | Orange | /Date(1554540200000)/ | Apple | /Date(2121212100000)/ | 40 | 2 |
+--------+------------------------+-----------+--------------+-------+-------------------+------------+-----------------------+-----------+-----------------------+----------------+-----+
Is there an easy way of doing it? The columns are fixed (not dynamic).
Any help appreciated
Try this:
select * from yourtable
pivot
(
min(value)
for Name in ([TaskID],[TaskUID],[TaskDefID]......)
) as pivotable
You can also use case statements.
You must use the aggregate function in the pivot table.
If you want to learn more, here is the reference:
https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
Output (I only tried three columns):
DB<>Fiddle
I'm trying to get the last four digits of the field "SERIAL8" and put that in a new field called "SS4". Here is the query I'm trying to use but it isn't working. I'm new at this, so any help would be appreciated
SELECT * FROM CUSTOMER_TABLE
SUBSTRING (SERIAL,4,4) as 'SS4'
CUSTOMER_TABLE
+-----------------------+------------+----------+--+
| "Complaint Full Date" | Source | SERIAL | |
+-----------------------+------------+----------+--+
| 02/04/16 | DAPIS_CAIR | DG540732 | |
| 04/18/16 | DAPIS_CAIR | DG553384 | |
| 03/23/17 | RO | DG559515 | |
| 03/29/16 | CAIR | DG559781 | |
| 12/10/14 | DAPIS_CAIR | DG561621 | |
+-----------------------+------------+----------+--+
I have one table:
| id | head1| head2 | head3|
| 1 | fv1 | fw1,fw2,fw3| fv3 |
| 2 | sv2 | sw1,sw2,sw3| sv4 |
And would like to have the following:
| id | head2 |
| 1 | fw1 |
| 1 | fw2 |
| 1 | fw3 |
| 2 | sw1 |
| 2 | sw2 |
| 2 | sw3 |
So I would like to split a comma-delimited content of some columns and then copy it over into the different table as rows for search purposes.
Which Talend component should I use to achieve this? Is that possible?
tNormalize should help you with this problem.
Just select "," as field separator, and head2 as the column to normalize.