I had a TABLE
ID TYPES
1 A \\
1 B \\
2 B \\
3 A \\
4 A \\
4 A \\
4 A \\
4 C \\
4 D \\
4 E \\
5 B \\
5 B \\
6 A \\
7 A \\
7 B \\
7 C \\
8 B \\
8 B \\
9 D \\
10 A \\
10 A \\
10 D
I have TABLE:
ID TYPES
1 A+B \\
2 B \\
3 A \\
4 A+A+A+C+D+E \\
5 B+B \\
6 A \\
7 A+B+C \\
8 B+B \\
9 D \\
10 A+A+D
It was used:
let
Source = Excel.Workbook(File.Contents("c:\Desktop\stac.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"TYPE", type text}, {"ID", type text}}),
#"Changed Type" = Table.TransformColumnTypes(#"Changed Type1",{{"ID", type text}, {"TYPE", type text}}),
#"Grouped Rows1" = Table.Group(#"Changed Type", {"ID"}, {{"All Rows", each , type table [ID=text, TYPE=text]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows1", "Custom", each [All Rows][TYPE]),
#"Extracted Values" = Table.TransformColumns(#"Added Custom", {"Custom", each Text.Combine(List.Transform(, Text.From), "+"), type text}),
#"Removed Columns" = Table.RemoveColumns(#"Extracted Values",{"All Rows"})
in
#"Removed Columns"
But I need to distinct values:
ID TYPES
1 A+B \\
2 B \\
3 A \\
4 A+C+D+E \\
5 B \\
6 A \\
7 A+B+C \\
8 B \\
9 D \\
10 A+D
As first step group your table in the query designer by ID and Types. So your table would become from this
ID Types
1 A
1 B
1 B
2 B
3 A
4 A
4 A
4 A
to this:
ID Types
1 A
1 B
2 B
3 A
4 A
Then apply the same step as you did in your code above to combine the different types in one column:
Table.TransformColumns(#"Added Custom", {"Custom", each Text.Combine(List.Transform(, Text.From), "+"), type text})
Related
There is a table with a column that I would like to break into multiple records. For example
q)tab:([]a:1 2 3;b:(`a;`$"b c";`d);c:2 3 4)
q)tab
a b c
-------
1 a 2
2 b c 3
3 d 4
There is a space between b and c in the second entry of column b, I would like the table to become
a b c
-----
1 a 2
2 b 3
2 c 3
3 d 4
I tried
" " string vs exec b from tab
but didn't work.
Any idea?
Since b is the column with multiple entries per row, you can count each value and expand the corresponding row entries accordingly. Then ungroup like Terry mentioned should work.
q)t:([]a:1 2 3;b:(`a;`b`c;`d);c:2 3 4)
q)![t;();0b;{x!(enlist({(count each x)#'y};`b)),/:x}cols t]
a b c
------------
,1 ,`a ,2
2 2 `b`c 3 3
,3 ,`d ,4
q)ungroup ![t;();0b;{x!(enlist({(count each x)#'y};`b)),/:x}cols t]
a b c
-----
1 a 2
2 b 3
2 c 3
3 d 4
EDIT: Realised after your comment that the input is different. I think this is what you want.
q)t:([]a:1 2 3;b:(`a;`$"b c";`d);c:2 3 4)
q)ungroup update`$" "vs'string b from t
a b c
-----
1 a 2
2 b 3
2 c 3
3 d 4
You would normally do this using ungroup:
q)ungroup([]a:1 2 3;b:((),`a;`b`c;(),`d);c:2 3 4)
a b c
-----
1 a 2
2 b 3
2 c 3
3 d 4
I have a 5x5 table:
a b c d e
a 1 2 3 4 5
b 3 5 7 2 6
c 1 3 4 6 1
d 4 4 1 7 8
e 6 7 2 1 6
where the headers are the strings.
I want to know how to reorder the table rows and columns using the headers
so for example of I wanted them to be in this order e b c a d then this will be the table:
e b c a d
e 6 7 2 6 1
b 6 5 7 3 2
c 1 3 4 1 6
a 5 7 3 1 4
d 8 4 1 4 7
Let the table be defined as
T = table;
T.a = [1 3 1 4 6].';
T.b = [2 5 3 4 7].';
T.c = [3 7 4 1 2].';
T.d = [4 2 6 7 1].';
T.e = [5 6 1 8 6].';
And let the new desired order be
order = {'e' 'b' 'c' 'a' 'd'};
The table can be reordered using just indexing:
[~, ind] = ismember(order, T.Properties.VariableNames);
T_reordered = T(ind,order);
Note that:
To reorder only columns you'd use T_reorderedCols = T(:,order);
To reorder only rows you'd use T_reorderedRows = T(ind,:);
So in this example,
T =
a b c d e
_ _ _ _ _
1 2 3 4 5
3 5 7 2 6
1 3 4 6 1
4 4 1 7 8
6 7 2 1 6
T_reordered =
e b c a d
_ _ _ _ _
6 7 2 6 1
6 5 7 3 2
1 3 4 1 6
5 2 3 1 4
8 4 1 4 7
Here is a way to do it using indexing. You can indeed re-arrange the rows and columns using indices as you would for any array. In this case, I substitute each letter in the headers array with a number (originally [1 2 3 4 5]) and then, using a vector defining the new order [5 2 3 1 4], re-order the table. You could make some kind of lookup table to automate this when you deal with larger tables:
clc
clear
a = [1 2 3 4 5;
3 5 7 2 6;
1 3 4 6 1;
4 4 1 7 8;
6 7 2 1 6];
headers = {'a' 'b' 'c' 'd' 'e'};
%// Original order. Not used but useful to understand the idea... I think :)
OriginalOrder = 1:5;
%// New order
NewOrder = [5 2 3 1 4];
%// Create table
t = table(a(:,1),a(:,2),a(:,3),a(:,4),a(:,5),'RowNames',headers,'VariableNames',headers)
As a less cumbersome alternative to manually creating the table with the function table, you can use (thanks to #excaza) the function array2table which saves a couple steps:
t = array2table(a,'RowNames',headers,'VariableNames',headers)
Either way, re-arrange the table using the new indices:
New_t = t(NewOrder,NewOrder)
Output:
t =
a b c d e
_ _ _ _ _
a 1 2 3 4 5
b 3 5 7 2 6
c 1 3 4 6 1
d 4 4 1 7 8
e 6 7 2 1 6
New_t =
e b c a d
_ _ _ _ _
e 6 7 2 6 1
b 6 5 7 3 2
c 1 3 4 1 6
a 5 2 3 1 4
d 8 4 1 4 7
I have searched and attempted to solve this puzzle myself (I've gotten close, but I've had no luck). I have a large table of values (composed of Sets of Values) that can have multiple combinations, but those combinations must be returned in the ID order.
I have not been able to get this to work in SQL.
Example Set:
(Sorry I am not able to post an image which would explain it better so Ill keep it simple.)
Table[(ID, Value) {(1,A),(1,B),(1,C),(2,D),(3,F),(3,G), (4,J), (5,S),(5,T),(5,U))}
RESULTS
ID VALUE
1 A
2 F
3 G
4 J
5 S
1 A
2 F
3 G
4 J
5 T
1 A
2 F
3 G
4 J
5 U
1 A
2 F
3 H
4 J
5 S
1 A
2 F
3 H
4 J
5 T
1 A
2 F
3 H
4 J
5 U
1 B
2 F
3 G
4 J
5 S
1 B
2 F
3 G
4 J
5 T
1 B
2 F
3 G
4 J
5 U
1 B
2 F
3 H
4 J
5 S
1 B
2 F
3 H
4 J
5 T
1 B
2 F
3 H
4 J
5 U
1 C
2 F
3 G
4 J
5 S
1 C
2 F
3 G
4 J
5 T
1 C
2 F
3 G
4 J
5 U
1 C
2 F
3 H
4 J
5 S
1 C
2 F
3 H
4 J
5 T
1 C
2 F
3 H
4 J
5 U
Here's the problem in dynamic SQL without any cursors or loops.
IF OBJECT_ID('yourTable') IS NOT NULL
DROP TABLE yourTable;
CREATE TABLE yourTable (ID INT, Value CHAR(1));
INSERT INTO yourTable
VALUES (1,'A'),(1,'B'),(1,'C'),
(2,'D'),
(3,'F'),(3,'G'),
(4,'J'),
(5,'S'),(5,'T'),(5,'U');
DECLARE #row_number_cols VARCHAR(MAX),
#Aliased_Cols VARCHAR(MAX),
#Cross_Joins VARCHAR(MAX),
#Unpivot VARCHAR(MAX);
SELECT #row_number_cols = COALESCE(#row_number_cols + ',','') + col,
#Aliased_Cols = COALESCE(#Aliased_Cols + ',','') + CONCAT(col,' AS col',ID),
#Cross_Joins = COALESCE(#Cross_Joins,'') + CASE
WHEN ID = 1 THEN CONCAT(' FROM (SELECT * FROM yourTable WHERE ID = 1) AS ID',ID)
ELSE CONCAT(' CROSS JOIN (SELECT * FROM yourTable WHERE ID = ',ID,') AS ID',ID)
END,
#Unpivot = COALESCE(#Unpivot + ',','') + CONCAT('col',ID)
FROM yourTable A
CROSS APPLY (SELECT CONCAT('ID',ID,'.Value')) CA(col) --Just so I can reuse "col" in my code
GROUP BY A.ID,CA.col
SELECT #row_number_cols,#Aliased_Cols,#Cross_Joins,#Unpivot
SELECT
'WITH CTE_crossJoins
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY ' + #row_number_cols + ') group_num,' + #Aliased_Cols +
#Cross_Joins + '
)
SELECT group_num,
val
FROM CTE_crossJoins
UNPIVOT
(
val for col IN (' + #Unpivot + ')
) unpvt
ORDER BY 1,2'
Results:
group_num val
-------------------- ----
1 A
1 D
1 F
1 J
1 S
2 A
2 D
2 G
2 J
2 S
3 A
3 D
3 G
3 J
3 T
4 A
4 D
4 F
4 J
4 T
5 A
5 D
5 F
5 J
5 U
6 A
6 D
6 G
6 J
6 U
7 B
7 D
7 G
7 J
7 S
8 B
8 D
8 F
8 J
8 S
9 B
9 D
9 F
9 J
9 T
10 B
10 D
10 G
10 J
10 T
11 B
11 D
11 G
11 J
11 U
12 B
12 D
12 F
12 J
12 U
13 C
13 D
13 F
13 J
13 S
14 C
14 D
14 G
14 J
14 S
15 C
15 D
15 G
15 J
15 T
16 C
16 D
16 F
16 J
16 T
17 C
17 D
17 F
17 J
17 U
18 C
18 D
18 G
18 J
18 U
I think this has been answered before here:
How to generate all possible data combinations in SQL?
difference being that they essentially dropped the ID column, should be easy to pull it through though.
You can employ the SQL windows function to achieve this.
;WITH CTE AS
(
SELECT Id,
Value,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RN
FROM Tbl
)
SELECT * FROM CTE ORDER BY RN, ID, VALUE
Fiddle
How does one remove 'u'(unicode) from all the values in a column in a DataFrame?
table.place.unique()
array([u'Newyork', u'Chicago', u'San Francisco'], dtype=object)
>>> df = pd.DataFrame([u'c%s'%i for i in range(11,21)], columns=["c"])
>>> df
c
0 c11
1 c12
2 c13
3 c14
4 c15
5 c16
6 c17
7 c18
8 c19
9 c20
>>> df['c'].values
array([u'c11', u'c12', u'c13', u'c14', u'c15', u'c16', u'c17', u'c18',
u'c19', u'c20'], dtype=object)
>>> df['c'].astype(str).values
array(['c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17', 'c18', 'c19', 'c20'], dtype=object)
>>>
I have a 165 x 165 rank matrix such that each row has values ranging from 1-165. I want to parse each row and delete all values >= 5, sort each row in increasing order, then replace the values 1-5 with the name of the column from the original matrix.
For example, for row k the values 1 ,2 3, 4, 5, would result after the first two transformations and would be replaced by p,d, m, n, a.
I am assuming that your array consists of an array of arrays...
Neither Awk, Sed, or Perl have multi-dimensional arrays. However, they can be emulated in Perl by using arrays of arrays.
$a[0]->[0] = xx;
$a[0]->[1] = yy;
[...]
$a[0]->[164] = zz;
$a[1]->[0] = qq;
$a[1]->[1] = rr;
[...]
$a[164]->[164] = vv;
Does this make sense?
I'm calling the row $x and columns $y, so an element in your array will be $array[$x]->[$y]. Is that good?
Okay, your column names will be in row $array[0], so if we find a value less than five in $array[$x]->[$y], we know the column name is in $array[0]->[$y]. Is that good?
for my $x (1..164) { #First row is column names
for my $y (0..164) {
if ($array[$x]->[$y] <= 5) {
$array[$x]->[$y] = $array[0]->[$y];
}
}
}
I'm simply going through all the rows, and for each row, all the columns, and checking the value. If the value is less than or equal to five, I replace it with the column name.
I hope I'm not doing your homework for you.
This GNU sed solution might work although it will need scaling up as I only used a 10x10 matrix for testing purposes:
# { echo {a..j};for x in {1..10};do seq 1 10 | shuf |sed 'N;N;N;N;N;N;N;N;N;s/\n/ /g';done; }> test_data
# cat test_data
a b c d e f g h i j
4 5 9 3 6 2 10 8 7 1
3 7 4 2 1 6 10 5 8 9
10 9 3 1 2 7 8 5 6 4
5 10 4 9 7 8 1 3 6 2
8 6 5 9 1 4 3 2 7 10
2 8 9 3 5 6 10 1 4 7
3 9 8 2 1 4 10 6 7 5
3 7 2 1 8 6 10 4 5 9
1 10 8 3 6 5 4 2 7 9
7 2 3 5 6 1 10 4 8 9
# cat test_data |
sed -rn '1{h;d};s/[0-9]{2,}|[6-9]/0/g;G;s/\n|$/ &/g;s/$/&1 2 3 4 5 /;:a;s/^(\S*) (.*\n)(\S* )(.*)/\2\4\1\3/;ta;s/\n//;s/0[^ ]? //g;:b;s/([1-5])(.*)\1(.)/\3\2/;tb;p'
j f d a b
e d a c h
d e c j h
g j h c a
e h g f c
h a d i e
e d a f j
d c a h i
a h d g f
f b c h d
The sed command works as follows.
The first line of the data file contains the column headings is stored in the hold space then the pattern space (current line) is deleted. For all subsequent data lines all two or more digit numbers and values 6 to 9 are converted to 0. The column names are appended, along with a newline to the data values. Spaces are inserted before the newline and end of string. The data is transformed into a lookup and the sorted values i.e.. 1 2 3 4 5 is prepended to it. The newline is removed along with any 0 values and associated lookups. The values 1 to 5 are replaced by the column names in the lookup.
EDIT:
I may have misunderstood the problem regarding sorting columns or rows, if so it's a minimal fix - replace 1 2 3 4 5 by the original values and perform a numeric sort prior to replacing the numeric data with column names from the lookup.