Optimize code for string split and extraction - matlab

I have a code where the overall aim is the extract two numbers from a string.
I get the string located in a cell array. To easy this example I have made the string in the test below in my code. I want to extract 1400 in one cell and the 2 in a second cell. The code I have made is working fine, but I think it can be optimized (speed and condensed) a lot. Does any of you have any suggestions?
Code:
test{1,1}='1:1400 og 2-fold'
FD1=test{1,1};
C = strsplit(FD1);
C2 = cell2mat(cellfun(#str2num,strrep(C,':',' '),'un',0));
C3 = cell2mat(C(1,3));
C3=strsplit(C3,'-');
Dilut1=C2(1,2);
Fold1=str2double(C3(1,1));

It really depends on your general structure. For this case, you can split the string at colon, space and dash by using:
A = strsplit(test{1,1},{':',' ','-'});
and then simply extract the two numbers as the second and fourth element
Dilut1=str2num(A{2});
Fold1 = str2num(A{4});
But as said it really comes down to your general structure. The more cases you have to account for the longer the code.
Thus it would maybe be better if you could write out something like
test{1,1}='1 dilute 1400 fold 2';
Then you could split at spaces, and search for the word you are interested in and the next string is then the number, ie
A = strsplit(test{1,1});
Dilute = str2num(A{circshift(strcmp(A,'dilute'),1)})
Fold = str2num(A{circshift(strcmp(A,'fold'),1)})

Related

What function do I use in a Salesforce apex trigger to trim a text field?

I'm new to writing apex triggers. I've looked through a lot of the apex developer documentation, and I can't seem to find the combination of functions that I should use to automatically trim characters from a text field.
My org has two text fields on the Case object, which automatically store the email addresses that are included in an email-to-case. The text fields have a 255 character limit each. We are seeing errors pop up because the number of email addresses that these fields contain often exceeds 255 characters.
I need to write a trigger that can trim these text fields to the last ".com" before it hits the 255 character limit.
Perhaps I'm going about this all wrong. Any advice?
You can use replace() function in Apex.
String s1 = 'abcdbca';
String target = 'bc';
String replacement = 'xy';
String s2 = s1.replace(target, replacement);
If you need to use regular expression to find the pattern, then you can use replaceAll()
String s1 = 'a b c 5 xyz';
String regExp = '[a-zA-Z]';
String replacement = '1';
String s2 = s1.replaceAll(regExp, replacement);
For more information please refer Apex Reference Guide
The following code I think that covers what you are searching:
String initialUrl = 'https://stackoverflow.com/questions/69136581/what-function-do-i-use-in-a-salesforce-apex-trigger-to-trim-a-text-field';
Integer comPosition = initialUrl.indexOf('.com');
System.debug(initialUrl.left(comPosition + 4));
//https://stackoverflow.com
The main problems that I see are that other extensions are not covered (like ".net" urls) and that urls that have a ".com" appearing previous to the last one (something like "https://www.comunications.com"). But I think that this covers most of the use cases.
Why not increasing the length of that specific field? Trimming the text might cause data loss or unusable. Just go to object manager find that object and that specific field. Edit field and increase the length.

Defaultdict() the correct choice?

EDIT: mistake fixed
The idea is to read text from a file, clean it, and pair consecutive words (not permuations):
file = f.read()
words = [word.strip(string.punctuation).lower() for word in file.split()]
pairs = [(words[i]+" " + words[i+1]).split() for i in range(len(words)-1)]
Then, for each pair, create a list of all the possible individual words that can follow that pair throughout the text. The dict will look like
[ConsecWordPair]:[listOfFollowers]
Thus, referencing the dictionary for a given pair will return all of the words that can follow that pair. E.g.
wordsThatFollow[('she', 'was')]
>> ['alone', 'happy', 'not']
My algorithm to achieve this involves a defaultdict(list)...
wordsThatFollow = defaultdict(list)
for i in range(len(words)-1):
try:
# pairs overlap, want second word of next pair
# wordsThatFollow[tuple(pairs[i])] = pairs[i+1][1]
EDIT: wordsThatFollow[tuple(pairs[i])].update(pairs[i+1][1][0]
except Exception:
pass
I'm not so worried about the value error I have to circumvent with the 'try-except' (unless I should be). The problem is that the algorithm only successfully returns one of the followers:
wordsThatFollow[('she', 'was')]
>> ['not']
Sorry if this post is bad for the community I'm figuring things out as I go ^^
Your problem is that you are always overwriting the value, when you really want to extend it:
# Instead of this
wordsThatFollow[tuple(pairs[i])] = pairs[i+1][1]
# Do this
wordsThatFollow[tuple(pairs[i])].append(pairs[i+1][1])

matlab: check which lines of a path are used - graphshortestpath

The related problem comes from the power Grid in Germany. I have a network of substations, which are connected according to the Lines. The shortest way from point A to B was calculated using the graphshortestpath function. The result is a path with the used substation ID's. I am interested in the Line ID's though, so I have written a sequential code to figure out the used Line_ID's for each path.
This algorithm uses two for loops. The first for-loop to access the path from a cell array, the second for-loop looks at each connection and searches the Line_ID from the array.
Question: Is there a better way of coding this? I am looking for the Line_ID's, graphshortestpath only returns the node ID's.
Here is the main code:
for i = i_entries
path_i = LKzuLK_path{i_entries};
if length(path_i) > 3 %If length <=3 no lines are used.
id_vb = 2:length(path_i) - 2;
for id = id_vb
node_start = path_i(id);
node_end = path_i(id+1);
idx_line = find_line_idx(newlinks_vertices, node_start, ...
node_end);
Zuordnung_LKzuLK_pathLines(ind2sub(size_path,i),idx_line) = true;
end
end
end
Note: The first and last enrty of path_i are area ID's, so they are not looked upon for the search for the Line_ID's
function idx_line = find_line_idx(newlinks_vertices, v_id_1, v_id_2)
% newlinks_vertices includes the Line_ID, and then the two connecting substations
% Mirror v_id's in newlinks_vertices:
check_links = [newlinks_vertices; newlinks_vertices(:,1), newlinks_vertices(:,3), newlinks_vertices(:,2)];
tmp_dist1 = find(check_links(:,2) == v_id_1);
tmp_dist2 = find(check_links(tmp_dist1,3) == v_id_2,1);
tmp_dist3 = tmp_dist1(tmp_dist2);
idx_line = check_links(tmp_dist3,1);
end
Note: I have already tried to shorten the first find-search routine, by indexing the links list. This step will return a short list with only relevant entries of the links looked upon. That way the algorithm is reduced of the first and most time consuming find function. The result wasn't much better, the calculation time was still at approximately 7 hours for 401*401 connections, so too long to implement.
I would look into Dijkstra's algorithm to get a faster implementation. This is what Matlab's graphshortestpath uses by default. The linked wiki page probably explains it better than I ever could and even lays it out in pseudocode!

Find a Name in an Email (Low-Level I/O)

Round 2: Picking out leaders in an email
Alrighty, so my next problem is trying to figure out who the leader is in a project. In order to determine this, we are given an email and have to find who says "Do you want..." (capitalization may vary). I feel like my code should work for the most part, but I really have an issue figuring out how to correctly populate my cell array. I can get it to create the cell array, but it just puts the email in it over over again. So each cell is basically the name.
function[Leader_Name] = teamPowerHolder(email)
email = fopen(email, 'r'); %// Opens my file
lines = fgets(email); %// Reads the first line
conversations = {lines}; %// Creates my cell array
while ischar(lines) %// Populates my cell array, just not correct
Convo = fgets(email);
if Convo == -1 %// Prevents it from just logging -1 into my cell array like a jerk
break; %// Returns to function
end
conversations = [conversations {lines}]; %// Populates my list
end
Sentences = strfind(conversations,'Do you want'); %// Locates the leader position
Leader_Name = Sentences{1}; %// Indexes that position
fclose(email);
end
What I ideally need it to do is find the '/n' character (hence why I used fgets) but I'm not sure how to make it do that. I tried to have my while loop be like:
while lines == '/n'
but that's incorrect. I feel like I know how to do the '/n' bit, I just can't think of it. So I'd appreciate some hints or tips to do that. I could always try to strsplit or strtok the function, but I need to then populate my cell array so that might get messy.
Please and thanks for help :)
Test Case:
Anna: Hey guys, so I know that he just assigned this project, but I want to go ahead and get started on it.
Can you guys please respond and let me know a weekly meeting time that will work for you?
Wiley: Ummmmm no because ain't nobody got time for that.
John: Wiley? What kind of a name is that? .-.
Wiley: It's better than john. >.>
Anna: Hey boys, let's grow up and talk about a meeting time.
Do you want to have a weekly meeting, or not?
Wiley: I'll just skip all of them and not end up doing anything for the project anyway.
So I really don't care so much.
John: Yes, Anna, I'd like to have a weekly meeting.
Thank you for actually being a good teammate and doing this. :)
out2 = teamPowerHolder('teamPowerHolder_convo2.txt')
=> 'Anna'
The main reason why it isn't working is because you're supposed to update the lines variable in your loop, but you're creating a new variable called Convo that is updating instead. This is why every time you put lines in your cell array, it just puts in the first line repeatedly and never quits the loop.
However, what I would suggest you do is read in each line, then look for the : character, then extract the string up until the first time you encounter this character minus 1 because you don't want to include the actual : character itself. This will most likely correspond to the name of the person that is speaking. If we are missing this occurrence, then that person is still talking. As such, you would have to keep a variable that keeps track of who is still currently talking, until you find the "do you want" string. Whoever says this, we return the person who is currently talking, breaking out of the loop of course! To ensure that the line is case insensitive, you'll want to convert the string to lower.
There may be a case where no leader is found. In that case, you'll probably want to return the empty string. As such, initialize Leader_Name to the empty string. In this case, that would be []. That way, should we go through the e-mail and find no leader, MATLAB will return [].
The logic that you have is pretty much correct, but I wouldn't even bother storing stuff into a cell array. Just examine each line in your text file, and keep track of who is currently speaking until we encounter a sentence that has another : character. We can use strfind to facilitate this. However, one small caveat I'll mention is that if the person speaking includes a : in their conversation, then this method will break.
Judging from the conversation that I'm seeing your test case, this probably won't be the case so we're OK. As such, borrowing from your current code, simply do this:
function[Leader_Name] = teamPowerHolder(email)
Leader_Name = []; %// Initialize leader name to empty
name = [];
email = fopen(email, 'r'); %// Opens my file
lines = fgets(email); %// Reads the first line
while ischar(lines)
% // Get a line in your e-mail
lines = fgets(email);
% // Quit like a boss if you see a -1
if lines == -1
break;
end
% // Check if this line has a ':' character.
% // If we do, then another person is talking.
% // Extract the characters just before the first ':' character
% // as we don't want the ':' character in the name
% // If we don't encounter a ':' character, then the same person is
% // talking so don't change the current name
idxs = strfind(lines, ':');
if ~isempty(idxs)
name = lines(1:idxs(1)-1);
end
% // If we find "do you want" in this sentence, then the leader
% // is found, so quit.
if ~isempty(strfind(lower(lines), 'do you want'))
Leader_Name = name;
break;
end
end
By running the above code with your test case, this is what I get:
out2 = teamPowerHolder('teamPowerHolder_convo2.txt')
out2 =
Anna

Matlab: dynamic name for structure

I want to create a structure with a variable name in a matlab script. The idea is to extract a part of an input string filled by the user and to create a structure with this name. For example:
CompleteCaseName = input('s');
USER WRITES '2013-06-12_test001_blabla';
CompleteCaseName = '2013-06-12_test001_blabla'
casename(12:18) = struct('x','y','z');
In this example, casename(12:18) gives me the result test001.
I would like to do this to allow me to compare easily two cases by importing the results of each case successively. So I could write, for instance :
plot(test001.x,test001.y,test002.x,test002.y);
The problem is that the line casename(12:18) = struct('x','y','z'); is invalid for Matlab because it makes me change a string to a struct. All the examples I find with struct are based on a definition like
S = struct('x','y','z');
And I can't find a way to make a dynamical name for S based on a string.
I hope someone understood what I write :) I checked on the FAQ and with Google but I wasn't able to find the same problem.
Use a structure with a dynamic field name.
For example,
mydata.(casename(12:18)) = struct;
will give you a struct mydata with a field test001.
You can then later add your x, y, z fields to this.
You can use the fields later either by mydata.test001.x, or by mydata.(casename(12:18)).x.
If at all possible, try to stay away from using eval, as another answer suggests. It makes things very difficult to debug, and the example given there, which directly evals user input:
eval('%s = struct(''x'',''y'',''z'');',casename(12:18));
is even a security risk - what happens if the user types in a string where the selected characters are system(''rm -r /''); a? Something bad, that's what.
As I already commented, the best case scenario is when all your x and y vectors have same length. In this case you can store all data from the different files into 2 matrices and call plot(x,y) to plot each column as a series.
Alternatively, you can use a cell array such that:
c = cell(2,nufiles);
for ii = 1:numfiles
c{1,ii} = import x data from file ii
c{2,ii} = import y data from file ii
end
plot(c{:})
A structure, on the other hand
s.('test001').x = ...
s.('test001').y = ...
Use eval:
eval(sprintf('%s = struct(''x'',''y'',''z'');',casename(12:18)));
Edit: apologies, forgot the sprintf.