combine two blocks of text with equal number of lines in emacs - emacs

Suppose I have the following lines of text
aa aa
bb bb
cc cc
xx xx
yy yy
zz zz
etc
Using emacs I want to combine the lines in columns:
aa aa xx xx
bb bb yy yy
cc cc zz zz
etc
Is this possible?

Select the second part of the file (future new columns), kill it with kill-rectangle (C-xrk).
Go to the end of the first line, add a space, yank the rectangle using yank-rectangle or C-xry.

Related

KDB+ Merging multiple update statements

q)d:([] f1:`a`b` ;f2:```c; m1:`x``z;m2:``y`z)
f1 f2 m1 m2
-----------
a x
b y
c z z
I want to update the f1 & m1 columns to f2 & m2 respectively if f1 & m1 have nulls; actually I want to merge these 2 queries to one update statement :
update f1:f2 from d where null f1
update m1:m2 from d where null m1`
An alternative you might like to consider is fill, ^ which allows you to fill nulls in one list with items from another list (in this case, the lists are columns in the table) e.g.
q)d:([] f1:`a`b` ;f2:```c; m1:`x``z;m2:``y`z)
q)update f2^f1,m2^m1 from d
f1 f2 m1 m2
-----------
a x
b y y
c c z z
You can use Triadic vector conditional evaluation ?
?[vb;exprtrue;exprfalse]
The new query would be :
q)update f1:?[null f1;f2;f1] , m1:?[null m1;m2;m1] from d
f1 f2 m1 m2
-----------
a x
b y y
c c z z
Fill can be used to update nulls:
If you want to update table d in place, then you can use:
update f2^f1,m2^m1 from`d
or
![`d;();0b;`f1`m1!((^;`f2;`f1);(^;`m2;`m1))]
If you want to display the output of update without updating original table, then:
update f2^f1,m2^m1 from d or
![d;();0b;`f1`m1!((^;`f2;`f1);(^;`m2;`m1))]

plot categorization of strings, matlab

Edit: while the solution suggested works this format is problematic when I have many long strings, any suggestions?
Lets say I have categorized a vector of M stings into N groups. Meaning each of the M strings is assigned a number between 1 to N indicating the category the string belongs into. For example if M=6 and N=3 I might have:
v = [ 'a' ; 'b' ; 'c' ; 'd' ; 'e' ; 'f' ]
c = [ 1 ; 2 ; 1 ; 1 ' 3 ; 2 ]
which indicates that a, c and d were all categorized to group "1". "e" was categorized to group 3.
I want to somehow plot - using Matlab - this categorization.
I am trying something along the lines of:
plot(v,'b--o')
set(gca,'xticklabel',c.')
but I need the plot to look more like a scatter, sadly it seems scatter does not work with strings. Any suggestions?
Plus, the vector of strings might get very long, anyone knows how to make the plot scrollable?
The following gets you a barchart with your names as x-axis labels. Uncomment the other line for a scatterplot. In general, such a visualisation is probably not the right format for extremely many words (very high M).
v = [ 'a' ; 'b' ; 'c' ; 'd' ; 'e' ; 'f' ];
c = [ 1 ; 2 ; 1 ; 1 ; 3 ; 2 ];
bar(c)
% scatter(1:length(c), c) % use this for a scatter plot
set(gca, 'xticklabel', v)
bar is usually slow. You can get a similar result more quickly and without Matlab binning things funnily by using plot.
Edit: I think you wanted the strings on the y axis.
plot(c,'bo')
ax = gca;
ax.XTick = 1:length(c);
ax.YTick = 0:max(c);
set(ax,'xticklabel',v)
view(-90,90)

Match cell arrays with different size based on two conditions in Matlab

RECCELL is a cell array with 8 columns and 30000 rows:
C1 C2 C3 C4 C5 C6 C7 C8
'AA' 1997 19970102 1 'BACHE' 'MORI' 148 127
'AA' 1997 19970108 2 'MORGAN' [] 1595 0
'AA' 1997 19970224 3 'KEMSEC' 'FATHI' 1315 297
CONCELL is a cell array with 4 columns and 70000 rows:
C1 C2 D3 D4
'AA' 1997 19970116 2,75
'AA' 1997 19970220 2,71
'AA' 1997 19970320 2,61
I would like to add to RECCELL the 4 columns of CONCELL only in case the C1s match and C3 and D3 (both dates) are the closest possible. For instance I would get in this example:
C1 C2 C3 C4 C5 C6 C7 C8 C1 C2 D3 D4
'AA' 1997 19970102 1 'BACHE' 'MORI' 148 127 'AA' 1997 19970116 2,75
'AA' 1997 19970108 2 'MORGAN' [] 1595 0 'AA' 1997 19970116 2,75
'AA' 1997 19970113 3 'KEMSEC' 'FATHI' 1315 297 'AA' 1997 19970220 2,71
To the first row of RECCELL corresponds the first row of CONCELL.
To the second row of RECCELL corresponds the first row of CONCELL.
To the third row of RECCELL corresponds the second row of CONCELL.
The code I have so far is:
[~, indCon, indREC] = intersect(CONCELL(:,1), RECCELL(:,1));
REC_CON=[RECCELL(indREC,:),CONCELL(indCon,:)];
NO_REC_CON= RECCELL(setdiff(1:size(RECCELL,1), indREC),:);
It's wrong because I cannot use intersect for a string element and because I am not considering the second condition, which is to choose the closest dates.
Can someone help me? Thank you
I would suggest to do this inside a for loop as the cells are very tall.
(Note: it seems like the date format (C3/D3) in the cell is a double opposed to a string, thus needs to be converted first for using datenum)
n=size(RECCELL,1);
ind=zeros(n,1);
rd=datenum(num2str(cell2mat(CONCELL(:,3))),'yyyymmdd'); % convert double to string
for k=1:n
a=find(ismember(CONCELL(:,1),RECCELL(k,1))==1); % find indices of matching C1s
if ~isempty(a) % do only if there is a match for the C1s
dnk=datenum(num2str(RECCELL{k,3}),'yyyymmdd'); % convert double to string
[~,f]=min((rd(a)-dnk).^2); % find closest date of the subset a
ind(k,1)=a(f); % assign index of closest match to ind
RECCELL(k,(end+1):(end+4))=CONCELL(ind(k,1),:); % add CONCELL to RECCELL, be aware that other rows will now display empty cells, and a row of RECCELL can keep 'growing'
end
end
The vector ind contains the indices of the closest match in CONCELL for each entry in RECCELL. When it contains a 0, no match was found between the C1s.
Edit: One possible solution to avoid increasing the number of columns of RECCELL if multiple CONCELL entries are added to the same RECCELL entry is the following which results in a adding a single column to the RECCELL matrix:
n=size(RECCELL,1);
RECCELL{1,end+1}=[]; % to add a single empty column to RECCELL
ind=zeros(n,1);
rd=datenum(num2str(cell2mat(CONCELL(:,3))),'yyyymmdd'); % convert double to string
for k=1:n
a=find(ismember(CONCELL(:,1),RECCELL(k,1))==1); % find indices of matching C1s
if ~isempty(a) % do only if there is a match for the C1s
dnk=datenum(num2str(RECCELL{k,3}),'yyyymmdd'); % convert double to string
[~,f]=min((rd(a)-dnk).^2); % find closest date of the subset a
ind(k,1)=a(f); % assign index of closest match to ind
if isempty(RECCELL{k,end}) % if nothing is in this cell, add the CONCELL entry to it
RECCELL{k,end}=CONCELL(ind(k,1),:);
else % if something is already in, add the new CONCELL entry to the cell
RECCELL{k,end}(end+1,1:4)=CONCELL(ind(k,1),:);
end
end
end

Column - Most frequent letter in a group of 4 rows

I have this column in excel;
V
V
F
V
C
F
F
F
...
Now I'm reading it with matlab using
[~,txt] = xlsread('2012_15min.xls','JAN','B25:B2999');
And now I want to get a new column that gives me the most repetitive letter in groups of 4 rows, so for the first 4 rows I will get V (in this example), and for the second F.
So I will get a new column with;
V
F
...
I hope you can help me.
You can use the command mode to find the most frequent occurence. The only cavity is that mode does not work with chars. So, you can reshape txt to be of size 4-by-whatever and then find the mode of each 4-column
>> res = char( mode( double( reshape( txt, 4, [] ) ) ) ).'
res =
V
F

Extra and misaligned rows when importing data in SAS

I am trying to import data in SAS like:
A B C D E
x y z h i
s1 s2 s3 s4 s5
where A B C D and E are column names.
I have 240 columns in my dataset and the code I am using is:
data INFO;
infile Attdata notab dlm='09'x dsd missover LRECL = 100000000;
length A B C D E $200; (I am importing 240 columns)
input A B C D E;
run;
Whenever I import data, some of the rows of column B, C etc, gets stacked below rows of A:
A B C D E
x h i
s1 s2 s3 s5
y s4
z
Is there a way to fix this? Do I need to do something with lrecl one? My data gets all weird after running this code. Might there be problem with length one?
It may be something to do with missover (rather than LRECL). I have found this site to be useful in the past: http://www2.sas.com/proceedings/sugi26/p009-26.pdf
Are you saying in your question that data from line 1 is appearing on lines 3 and 4 and some data from line 2 is on line 3? I've never seen SAS do this before.
You may want to check your delimiter/end of line characters.