Putting keyword data into a csv file MATLAB - matlab

Given a table of the following format in MATLAB:
userid | itemid | keywords
A = [ 3 10 'book'
3 10 'briefcase'
3 10 'boat'
12 20 'windows'
12 20 'picture'
12 35 'love'
4 10 'day'
12 10 'working day'
... ... ... ];
where A is a table of size (58000*3), I want to write the data in a csv file with the following format:
csv.file
itemid keywords
10 book, briefcase, boat, day, working day, ...
20 windows, picture, ...
35 love, ...
where we the list of itemids is stored in Iids = [10,20,35,...]
I would like to avoide using loops for this as you can imagine the matrix is big-sized. Any idea is appreciated.

I wasn't able to think of a solution without loops. But you can optimize your loop by:
using logical indexing
running such loop only M times (if M is the number of unique itemid elements) instead of N times (if N is the number of elements in your table).
The solution I come up with is this.
First of all, create your table
A=table([3;3;3;12;12;12;4;12], [10;10;10;20;20;35;10;10],{'book','briefcase','boat','windows','picture','love','day','working day'}','VariableNames',{'userid','itemid','keywords'});
which looks like
Select the unique values for column itemid (your Iids):
Iids=unique(A.itemid);
which looks like
Create a new, empty, table which will contain the results:
NewTable=table();
And now the minimal loop I've come up with:
for id=Iids'
% select rows with given itemid value
RowsWithGivenId=A(A.itemid==id,:);
% create new row in NewTable with the id and the (joined together) keywords from the selected rows
NewTable=[NewTable; table(id,{strjoin(RowsWithGivenId.keywords,', ')})];
end
Also, append the new column names in NewTable
NewTable.Properties.VariableNames = {'itemid','keywords'};
And now NewTable looks like:
Please note: due to the fact that the keywords in the new table are separated by comma, a csv file is not the format I recommend. By using writetable() as writetable(NewTable,'myfile.csv');
what you'll get is
As instead, by replacing ; instead of a separating comma (in strjoin()), you'll get a nicer format:

Related

Query Importrange with concat - adding 2 of the returned columns together

I am trying to import specified data using query importrange but at the same time I want to reduce need for additional calculation columns and by using concat or something similar to add 2 columns together with a space in between ie. first name 'bob' last name 'smith' returns 'bob smith' in 1 column
=QUERY({IMPORTRANGE("https://docs.google.com/spreadsheets/d/1oaZP3-p1cI4d1QyLQ2qM5sMwnVGz8S0bhe29W4QqH6g/edit#gid=1908577977","Sheet7!A2:c"),"select Col1&" "&Col2,Col3",0})
I've tried the above but it returns formula parse error
https://docs.google.com/spreadsheets/d/1oaZP3-p1cI4d1QyLQ2qM5sMwnVGz8S0bhe29W4QqH6g/edit?usp=sharing
in post-IMPORTRANGE you can join two columns only like this:
=FLATTEN(QUERY(TRANSPOSE(QUERY(
IMPORTRANGE("13Ptmj3sejlOADvwhgfBPxRy_H-RGCxLX4r2jecbceIE", "Sheet7!A2:C"),
"select Col1,Col2", )),,9^9))
so for 3 columns:
={FLATTEN(QUERY(TRANSPOSE(QUERY(
IMPORTRANGE("13Ptmj3sejlOADvwhgfBPxRy_H-RGCxLX4r2jecbceIE", "Sheet7!A2:C"),
"select Col1,Col2", )),,9^9)),
IMPORTRANGE("13Ptmj3sejlOADvwhgfBPxRy_H-RGCxLX4r2jecbceIE", "Sheet7!C2:C")}

Min value with GROUP BY in Power BI Desktop

id datetime new_column datetime_rankx
1 12.01.2015 18:10:10 12.01.2015 18:10:10 1
2 03.12.2014 14:44:57 03.12.2014 14:44:57 1
2 21.11.2015 11:11:11 03.12.2014 14:44:57 2
3 01.01.2011 12:12:12 01.01.2011 12:12:12 1
3 02.02.2012 13:13:13 01.01.2011 12:12:12 2
3 03.03.2013 14:14:14 01.01.2011 12:12:12 3
I want to make new column, which will have minimum datetime value for each row in group by id.
How could I do it in Power BI desktop using DAX query?
Use this expression:
NewColumn =
CALCULATE(
MIN(
Table[datetime]),
FILTER(Table,Table[id]=EARLIER(Table[id])
)
)
In Power BI using a table with your data it will produce this:
UPDATE: Explanation and EARLIER function usage.
Basically, EARLIER function will give you access to values of different row context.
When you use CALCULATE function it creates a row context of the whole table, theoretically it iterates over every table row. The same happens when you use FILTER function it will iterate on the whole table and evaluate every row against the filter condition.
So far we have two row contexts, the row context created by CALCULATE and the row context created by FILTER. Note FILTER use the EARLIER to get access to the CALCULATE's row context. Having said that, in our case for every row in the outer (CALCULATE's row context) the FILTER returns a set of rows that correspond to the current id in the outer context.
If you have a programming background it could give you some sense. It is similar to a nested loop.
Hope this Python code points the main idea behind this:
outer_context = ['row1','row2','row3','row4']
inner_context = ['row1','row2','row3','row4']
for outer_row in outer_context:
for inner_row in inner_context:
if inner_row == outer_row: #this line is what the FILTER and EARLIER do
#Calculate the min datetime using the filtered rows
...
...
UPDATE 2: Adding a ranking column.
To get the desired rank you can use this expression:
RankColumn =
RANKX(
CALCULATETABLE(Table,ALLEXCEPT(Table,Table[id]))
,Table[datetime]
,Hoja1[datetime]
,1
)
This is the table with the rank column:
Let me know if this helps.

Get substring into a new column

I have a table that contains a column that has data in the following format - lets call the column "title" and the table "s"
title
ab.123
ab.321
cde.456
cde.654
fghi.789
fghi.987
I am trying to get a unique list of the characters that come before the "." so that i end up with this:
ab
cde
fghi
I have tried selecting the initial column into a table then trying to do an update to create a new column that is the position of the dot using "ss".
something like this:
t: select title from s
update thedot: (title ss `.)[0] from t
i was then going to try and do a 3rd column that would be "N" number of characters from "title" where N is the value stored in "thedot" column.
All i get when i try the update is a "type" error.
Any ideas? I am very new to kdb so no doubt doing something simple in a very silly way.
the reason why you get the type error is because ss only works on string type, not symbol. Plus ss is not vector based function so you need to combine it with each '.
q)update thedot:string[title] ss' "." from t
title thedot
---------------
ab.123 2
ab.321 2
cde.456 3
cde.654 3
fghi.789 4
There are a few ways to solve your problem:
q)select distinct(`$"." vs' string title)[;0] from t
x
----
ab
cde
fghi
q)select distinct(` vs' title)[;0] from t
x
----
ab
cde
fghi
You can read here for more info: http://code.kx.com/q/ref/casting/#vs
An alternative is to make use of the 0: operator, to parse around the "." delimiter. This operator is especially useful if you have a fixed number of 'columns' like in a csv file. In this case where there is a fixed number of columns and we only want the first, a list of distinct characters before the "." can be returned with:
exec distinct raze("S ";".")0:string title from t
`ab`cde`fghi
OR:
distinct raze("S ";".")0:string t`title
`ab`cde`fghi
Where "S " defines the types of each column and "." is the record delimiter. For records with differing number of columns it would be better to use the vs operator.
A variation of WooiKent's answer using each-right (/:) :
q)exec distinct (` vs/:x)[;0] from t
`ab`cde`fghi

Can we edit a row by updating sequence number of record 3 times while inserting those 3 into an array and insert to table?

Suppose I fetch valid rows from table where marks_colm = '300' and we get 100 rows
For Each fetched row, I'd like to:
create 3 new rows:
increase max count of sequence_column by +1 set marks ='350'
again increase max count of sequence_column by +1 set marks ='351'
again increase max count of sequence_column by +1 set marks ='352'
copy these three rows to an array ..
insert the whole array into the table
Example
input row:
Name1 ... RollNo31.... sequence5 ... marks300
output should be
3 output rows for each one of the input row above
Name1 ... RollNo31.... sequence6 ... marks350
Name1 ... RollNo31.... sequence7 ... marks351
Name1 ... RollNo31.... sequence8 ... marks352
How can I achieve this?
I believe you can achieve your goal using multi-row insert. Note that, because there may be multiple errors encountered due to your inserting multiple rows, you must use the get diagnostics statement to retrieve details of any errors that may occur, DSNTIAR will be insufficient.

Perl + PostgreSQL-- Selective Column to Row Transpose

I'm trying to find a way to use Perl to further process a PostgreSQL output. If there's a better way to do this via PostgreSQL, please let me know. I basically need to choose certain columns (Realtime, Value) in a file to concatenate certains columns to create a row while keeping ID and CAT.
First time posting, so please let me know if I missed anything.
Input:
ID CAT Realtime Value
A 1 time1 55
A 1 time2 57
B 1 time3 75
C 2 time4 60
C 3 time5 66
C 3 time6 67
Output:
ID CAT Time Values
A 1 time 1,time2 55,57
B 1 time3 75
C 2 time4 60
C 3 time5,time6 66,67
You could do this most simply in Postgres like so (using array columns)
CREATE TEMP TABLE output AS SELECT
id, cat, ARRAY_AGG(realtime) as time, ARRAY_AGG(value) as values
FROM input GROUP BY id, cat;
Then select whatever you want out of the output table.
SELECT id
, cat
, string_agg(realtime, ',') AS realtimes
, string_agg(value, ',') AS values
FROM input
GROUP BY 1, 2
ORDER BY 1, 2;
string_agg() requires PostgreSQL 9.0 or later and concatenates all values to a delimiter-separated string - while array_agg() (v8.4+) creates am array out of the input values.
About 1, 2 - I quote the manual on the SELECT command:
GROUP BY clause
expression can be an input column name, or the name or ordinal number
of an output column (SELECT list item), or ...
ORDER BY clause
Each expression can be the name or ordinal number of an output column
(SELECT list item), or
Emphasis mine. So that's just notational convenience. Especially handy with complex expressions in the SELECT list.