Splitting a string into various combinations - matlab

I am in need of some direction on splitting a string into various combinations. Actually my requirement is to split an integer, but I guess those can't be split, that's why I've converted the integer into string.
For eg.
I've a string "123456"
I want to split it like
12 34 56
123 45 6
12 345 6
12 3 456
and like wise. One more problem is, the size of the string can be variable. As I told, these are actually integers, so it can have from 4 places to 7-8 places, and so will be the size of resultant string to be split into combinations.
I currently don't have any code to achieve it. I've just performed the simple splitting operation in the command box, but couldn't think of the way of achieving the required result. Please give me some direction on what I can do.
Thanks.

First you can use the num2str() function to convert the integer value to a string. Once you have converted the number to a string, you can then use the length() function to determine how many digits there are in the number. You can then use the length of the string split up the number in various ways. The example below only splits in groups of two, but you can adjust as desired.
val=123456;
str=num2str(val);
i=1;
k=1;
len=2;%split values into groups of 2
while(i<length(str)-1)
val(k)=str2num(str(i:i+len-1));
i=i+len;
k=k+1;
end
if(i<=length(str))
val(k)=str2num(str(i:end));%catches the remainder
end

Related

How can I print the ascii value of an input in Brainfuck?

What I want to do is for a Brainfuck code to print out the ascii value of the input. For example, typing in an input of "a" will give an output of 97. The python equivalent of this is print(ord(input())). What I'm thinking is that once I get the input with the , command, I can split the input value's digits into separate cells, and then print each cell individually. What I mean by this is let's say you type in an input of a. The , command will store the ascii value of a in the first cell(cell 0), which is 97 in this case. Then I run some algorithm that will split the 97 into its individual digits. So, in this case, cell 1 will have a value of 0(because 97 has a hundred digit of 0), cell 2 will have a value of 9, and cell 3 will have a value of 7. Then we can add 48 to each of those cells(0 has an ascii value of 48) and print each cell individually, starting from cell 1(the hundreds place). The problem I'm facing is writing the digit separation algorithm. I can't seem to make it work. My idea is to subtract 100 from the original number until that number is less than 100 while keeping track of how many times 100 has been subtracted, then repeatedly subtract 10, and finally we are left with the ones place. But the problem with this idea is that I have no idea how to track if the number falls under 100 or 10. Any suggestions or ideas? Thanks for the help in advance.
What you are trying to implement is called "divmod". divmod is a function that divides two numbers (in your case positive integers) and stores the result and the remainder. Implementations for this in brainfuck exist: Divmod algorithm in brainfuck
Good luck!

4 bit hash from String in Swift

I have a 16 elements array that holds 15 colors that match my UI. Based on the user’s entered name (Firstname + Lastname) I want to select a color (index in array of 0-15). So not a random color, but a color thats always the same for the same name. I figure I need to calculate a 4bit (0-15) hash of the String. Researching the web I find lots of hashing libs for eg MD5. But what would be a good approach to get a 4bit hash number?
There are many possible solutions. You just need a way to hash your string into a number and then use % 16 to reduce that final number into an index in your desired range.
Here's one approach that sums up the bytes of the string's UTF-8 encoding to come up with a total and then uses % 16.
extension String {
var fourBitHash: Int {
return self.utf8.reduce(0) { $0 + Int($1) } % 16
}
}
let colorIndex = "John R Smith".fourBitHash
print(colorIndex)
This will always give the same result for a given string.
The problem is you want the color to stay the same for particular name but also be able to distinguish those 16 colors.
As far as the alphabet consists of 26 letters, you need to pack those different combination in 16 colors.
There's practically no way to get such a hash without collisions (no way to encode even 1 letter in 4 bits) so you would want to get different colors for 2 different names (whilst they both are presented somewhere where you show them or whatever) but hash will give you the same color.
Random is probably not a bad way at all. Or having constantly growing dictionary for name-color (which can be stored on disk). Choose for yourself.

How can i get 6 digits after comma (matlab)?

I read from text some comma seperated values.
-8.618643,41.141412
-8.639847,41.159826
...
I write script below;
get_in = zeros(lendata,2);
nums = str2num(line); % auto comma seperation.(two points)
for x=1:2
get_in(i,x)=nums(x);
end
it automatically round numbers. For example;
(first row convert to "-8.6186 , 41.1414")
How can i ignore round operation?
I want to get 6 digits after comma.
I tried "str2double" after split line with comma delimeter.
I tried import data tool
But it always rounded to 4 digits, too.
As one of the replies has already said, the values aren't actually rounded, just the displayed values (for ease of reading them). As suggested, if you just enter 'format long' into the command window that should help.
The following link might help with displaying individual values to certain decimal places though: https://uk.mathworks.com/matlabcentral/newsreader/view_thread/118222
It suggests using the sprintf function. For example sprintf(%4.6,data) would display the value of 'data' to 6 decimal places.

Matlab - Splitting a column into two (efficiently)

I had previously wrote some code to split 3 columns into 4, however the code was very inefficient and time consuming. As I am working with millions of rows it wasn't suitable. (Below is my previous code)
tline = fgetl(fid);
ID=tline(1:4);
IDN = str2double(ID);
Day=tline(6:8);
DayN = str2double(Day);
HalfHour=tline(9:10);
HalfHourN = str2double(HalfHour);
Usage=tline(12:end);
UsageN = str2double(Usage);
There must be a more efficient and quicker way of doing this?
Going back to basics, I have produced a x by 3 matrix. but require an x by 4 matrix
To show what I am trying to do, examining one row -
I am trying to change
1001 36501 1005
to
1001 365 01 1005
Any help would be much appreciated!
Edit:
The second column I am trying to divide into two, is always composed of 5 characters. I am trying to get the first 3 characters into their own column, likewise for the remaining characters.
What might take time in your case is actually the use of the str2double function. It is known that this built-in function becomes very slow when the data set is large. You might try to get rid of it if possible.
you can use modulo
ans = (36501 - mod(36501,100))/100
This would give you 365
if you want the 1, it is mod(36501,100)
so this would effectively split your second column into 2 different numbers, you can then re name them etc.
hmmm on second thoughts, if all your numbers on your second column are 5 digits, this can be extremely efficient, since mod is computed in matlab by b = a - m.*floor(a./m);
check http://uk.mathworks.com/help/matlab/ref/mod.html it should work for vectors (i.e. your second column)

how to find all the possible longest common subsequence from the same position

I am trying to find all the possible longest common subsequence from the same position of multiple fixed length strings (there are 700 strings in total, each string have 25 alphabets ). The longest common subsequence must contain at least 3 alphabets and belong to at least 3 strings. So if I have:
String test1 = "abcdeug";
String test2 = "abxdopq";
String test3 = "abydnpq";
String test4 = "hzsdwpq";
I need the answer to be:
String[] Answer = ["abd", "dpq"];
My one problem is this needs to be as fast as possible. I am trying to find the answer with suffix tree, but the solution of suffix tree method is ["ab","pq"].Suffix tree can only find continuous substring from multiple strings.The common longest common subsequence algorithm cannot solve this problem.
Does anyone have any idea on how to solve this with low time cost?
Thanks
I suggest you cast this into a well known computational problem before you try to use any algorithm that sounds like it might do what you want.
Here is my suggestion: Convert this into a graph problem. For each position in the string you create a set of nodes (one for each unique letter at that position amongst all the strings in your collection... so 700 nodes if all 700 strings differ in the same position). Once you have created all the nodes for each position in the string you go through your set of strings looking at how often two positions share more than 3 equal connections. In your example we would look first at position 1 and 2 and see that three strings contain "a" in position 1 and "b" in position 2, so we add a directed edge between the node "a" in the first set of nodes of the graph and "b" in the second group of nodes (continue doing this for all pairs of positions and all combinations of letters in those two positions). You do this for each combination of positions until you have added all necessary links.
Once you have your final graph, you must look for the longest path; I recommend looking at the wikipedia article here: Longest Path. In our case we will have a directed acyclic graph and you can solve it in linear time! The preprocessing should be quadratic in the number of string positions since I imagine your alphabet is of fixed size.
P.S: You sent me an email about the biclustering algorithm I am working on; it is not yet published but will be available sometime this year (fingers crossed). Thanks for your interest though :)
You may try to use hashing.
Each string has at most 25 characters. It means that it has 2^25 subsequences. You take each string, calculate all 2^25 hashes. Then you join all the hashes for all strings and calculate which of them are contained at least 3 times.
In order to get the lengths of those subsequences, you need to store not only hashes, but pairs <hash, subsequence_pointer> where subsequence_pointer determines the subsequence of that hash (the easiest way is to enumerate all hashes of all strings and store the hash number).
Based on the algo, the program in the worst case (700 strings, 25 characters each) will run for a few minutes.