How to load a specially formatted data file into matlab?

How to load a specially formatted data file into matlab? - matlab

I need to load a data file, test.dat, into Matlab. The contents of data file are like
*a682 1233~0.2
*a2345 233~0.8 345~0.2 4567~0.3
*a3457 345~0.9 34557~1.2 34578~0.2 9809~0.1 2345~2.9 23452~0.9 334557~1.2 234578~0.2 19809~0.1 23452~2.9 3452~0.9 4557~1.2 3578~0.2 92809~0.1 12345~2.9 232452~0.9 33557~1.6 23478~0.6 198099~2.1 234532~2.9 …
How to read this type of file into matlab, and use the terms, such as *2345 to identify a row, which links to corresponding terms, including 233~0.8 345~0.2 4567~0.3
Thanks.

Because each of the rows is a different size, you either have to make a cell array, a structure, or deal with adding NaN or zero to a matrix. I chose to use a cell array, hope it is ok! If someone is better with regexp than me please comment, the output cells are now not perfect (i.e. show 345~ instead of 345~0.9) but I am sure it is a minor fix. Here is the code:
datfile = 'test.dat';
text = fileread(datfile);
row1 = regexp(text,'*[a-z]?\d+','match');
data(:,1) = row1';
row2 = regexp(text,'*[a-z]?\d+','split');
row2 = [row2(:,2:end)'];
for i = 1:size(row2,1)
data{i,2} = regexp(row2{i},'\d+\S\d+\s','split');
end
What this creates is a cell array called data where the first column of every row is your *a682 id and the second column of each row is a cell with your data values. To get them you could use:
data{1}
to show the id
data{1,2}
to show the cell contents
data{1,2}{1}
to show the specific data point
This should work and is relatively simple!

Related

Remove formula from column with python

I am trying to remove the formula from a column in a existing sheet with python.
I tryed to set my formula to None using the column object (column.formula = None)
It does not work and my column object remains unchanged. Anyone have inputs to solve this issue ? Thank you !

This took me a bit to figure out, but seems like I've found a solution. Turns out that this is a 2-step process:
Update the column object to remove the formula (by setting column.formula to an empty string).
For each row in the sheet, update the cell within that column to remove the formula (set cell.value to an empty string and cell.formula to None).
Completing the STEP 1 will remove the formula from the column object -- but that cell in each row will still contain the formula. That's why STEP 2 is needed -- STEP 2 will remove the formula from the individual cell in each row.
Here's some example code in Python that does what I've described. (Be sure to update the id values to correspond to your sheet.)
STEP 1: Remove formula from the Column
column_spec = smartsheet.models.Column({
'formula': ''
})
# Update column
sheetId = 3932034054809476
columnId = 4793116511233924
result = smartsheet_client.Sheets.update_column(sheetId, columnId, column_spec)
STEP 2: Remove the formula from that cell in each row
Note: This sample code updates only one specific row -- in your case, you'll need to update every row in the sheet. Just build a row object for each row in the sheet (like shown below), then call smartsheet_client.Sheets.update_rows once, passing in the array of row objects that you've built corresponding to all rows in the sheet. By doing things this way, you're only calling the API once, which is the most efficient way of doing things.
# Build new cell value
new_cell = smartsheet.models.Cell()
new_cell.column_id = 4793116511233924
new_cell.value = ''
new_cell.formula = None
# Build the row to update
row_to_update = smartsheet.models.Row()
row_to_update.id = 5225480965908356
row_to_update.cells.append(new_cell)
# Update row
sheetId = 3932034054809476
result = smartsheet_client.Sheets.update_rows(sheetId, [row_to_update])

How to insert a structure within a structure

I have a 1x1 structure called imu_data.txyzrxyz1. It has one field called txyzrxyz1 and the value is 4877x7 double. I just want to "copy and paste" row 62 into row 63 (double up that row) so that the structure now becomes a 4878x7 structure. I've tried the following, with other versions without success:
extra_63 = imu_data.txyzrxyz1(63,:);
imu_data2.txyzrxyz1 = [{imu_data.txyzrxyz1(1:62,:) extra_63 imu_data.txyzrxyz1(63:end,:)}]
Thanks

You can index the row to duplicate twice while matrix indexing:
row_to_duplicate = 63;
yourdata = rand(100,10);
yourstruct.data = yourdata;
yourstruct.data = yourstruct.data([1:row_to_duplicate, row_to_duplicate:end],:)
So in case of 63, 1:row_to_duplicate will create a column vector from 1:63, and row_to_duplicate:end will create a column vector from 63:100 in this example. When combining these, 63 will occur twice, hence that row is duplicated.
You were almost there, you only had to get rid of the {}'s and put the data in the right orientation by using ; instead of a space between matrix entries to vertically concatenate instead of horizontally:
extra_63 = imu_data.txyzrxyz1(63,:);
imu_data2.txyzrxyz1 = [imu_data.txyzrxyz1(1:62,:); extra_63; imu_data.txyzrxyz1(63:end,:)]

Find and replace in matlab?

I want do find and replace all in matlab (As we do in MS office).
https://www.dropbox.com/s/hxfqunjwhnvkl1f/matlab.mat?dl=0
I have a cell array LUT_HS_complete (contains identifier in column 1 and protein name in column 2 and summary in column 3) this is my look up table. on the other hand, I have my protein-protein interaction data (named Second_layer with identifiers in first two columns and the score in column 3).
I want to replace the first two columns in my Second_layer with the corresponding protein name from my look up table.
I tried strmatch, but that didn't help me.
Source_gene = Second_layer(:,1); Source_gene = regexprep(Source_gene,'[-/\s]','');
Target_gene = Second_layer(:,2); Target_gene = regexprep(Target_gene,'[-/\s]','');
Inter_score = Second_layer(:,3);
%%
for i=1:length(Source_gene(1:end,1));
SG = strmatch(Source_gene(i),LUT_HS_complete(1:end,1),'exact');
renamed_Source_gene(SG,1) = LUT_HS_complete(SG,2);
end
for j=1:length(Target_gene(1:end,1));
TG = strmatch(Target_gene(j),LUT_HS_complete(1:end,1),'exact');
renamed_Target_gene(TG,1) = LUT_HS_complete(TG,2);
end
If you could find a solution. It would be a great help.

Might this work for you?
renamed_Second_layer(:,1)=LUT_HS_complete(cellfun(#(x) find(strcmp(x,LUT_HS_complete(:,1))),Second_layer(:,1)),2);
renamed_Second_layer(:,2)=LUT_HS_complete(cellfun(#(x) find(strcmp(x,LUT_HS_complete(:,1))),Second_layer(:,2)),2);
renamed_Second_layer(:,3)=Second_layer(:,3);

I'm having trouble shuffling a deck of cards in Matlab. Need help to see where I went wrong

I currently have the deck of cards coded, but it is unshuffled. This is for programming the card game of War if it helps. I need to shuffle the deck, but whenever I do, it will only shuffle together the card numbers and the suits, not the full card. For example, I have A identified as an ace and the suits come after each number. A normal card would be "AH" (an ace of hearts) or "6D" (a six of diamonds). Instead, it will output "5A" as one of the cards, as in a 5 of aces. I don't know how to fix this, but the code that I currently have is this:
card_nums = ('A23456789TJQK')';
card_suits = ('HDSC')';
unshuffled_deck = [repmat(card_nums,4,1),repmat(card_suits,13,1)];
disp(unshuffled_deck)
shuffled_deck = unshuffled_deck(randperm(numel(unshuffled_deck)));
disp(shuffled_deck)
I would appreciate any help with this, and thank you very much for your time!

You're creating a random permutation of all of the elements from both columns of unshuffled_deck combined. Instead you need to create a random permutation of the rows of unshuffled_deck:
shuffled_deck = unshuffled_deck(randperm(size(unshuffled_deck,1)),:);
The call to size gives you the number of rows in the deck array, then we get a random permutation of the row indices, and copy the row (value, suit) as a single entity.
Here's a version using a structure array in response to #Carl Witthoft's comment. I was afraid it would add too much complexity to the solution, but it really isn't bad:
card_nums = ('A23456789TJQK')';
card_suits = ('HDSC')';
deck_nums = repmat(card_nums,4,1);
deck_suits = repmat(card_suits,13,1);
cell_nums = cellstr(deck_nums).'; %// Change strings to cell arrays...
cell_suits = cellstr(deck_suits).'; %// so we can use them in struct
%// Construct a struct array with fields 'value' and 'suit'
unshuffled_deck = struct('value',cell_nums,'suit',cell_suits);
disp('unshuffled deck:');
disp([unshuffled_deck.value;unshuffled_deck.suit]);
%// Shuffle the deck using the number of elements in the structure array
shuffled_deck = unshuffled_deck(randperm(numel(unshuffled_deck)));
disp('shuffled deck:');
disp([shuffled_deck.value; shuffled_deck.suit]);
Here's a test run:
unshuffled deck:
A23456789TJQKA23456789TJQKA23456789TJQKA23456789TJQK
HDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSCHDSC
shuffled deck:
4976TT93KTJQJATK953A75QA82Q6226K5J784J4A3372486K859Q
CHSSSHCDSCSSHDDCDSHHCDHSDDCDHCCHHCHHHDDCSCDSSCHDSCSD
To access an individual card, you can do:
>> shuffled_deck(2)
ans =
scalar structure containing the fields:
value = 9
suit = H
Or you can access the individual fields:
>> shuffled_deck(2).value
ans = 9
>> shuffled_deck(2).suit
ans = H
Unfortunately, I don't know of any way to simply index the struct array and get, for instance, 9H as you would in a regular array using disp(shuffled_deck(2,:)). In this case, the only option I know of is to explicitly concatenate each field:
disp([shuffled_deck(2).value,shuffled_deck(2).suit]);

Is there a quick way to assign unique text entries in an array a number?

In MatLab, I have several data vectors that are in text. For example:
speciesname = [species1 species2 species3];
genomelength = [8 10 5];
gonometype = [RNA DNA RNA];
I realise that to make a plot, arrays must be numerical. Is there a quick and easy way to assign unique entries in an array a number, for example so that RNA = 1 and DNA = 2? Note that some arrays might not be binary (i.e. have more than two options).
Thanks!

So there is a quick way to do it, but im not sure that your plots will be very intelligible if you use numbers instead of words.
You can make a unique array like this:
u = unique(gonometype);
and make a corresponding number array is just 1:length(u)
then when you go through your data the number of the current word will be:
find(u == current_name);
For your particular case you will need to utilize cells:
gonometype = {'RNA', 'DNA', 'RNA'};
u = unique(gonometype)
u =
'DNA' 'RNA'
current = 'RNA';
find(strcmp(u, current))
ans =
2