How to subtract strings that are not consecutive python - python-3.7

What I mean is if I have a string, "apwswe", and another string "appegwisbnwe", if we "subtract" the two strings together, which means "appegwisbnwe" - "apwswe", I want to get "pegibn". Is there a way to do this? BTW pegibn is the characters that they don't have in "common" with eachother.

Not exactly a thing of beauty, but this will get you there:
subtrahend = "apwswe"
minuend = list("appegwisbnwe")
for char in subtrahend:
if minuend.count(char):
minuend.remove(char)
difference = "".join(minuend)
print(difference)
pgibne

Possible alternatives to rhurwitz's solution:
input = "appegwisbnwe"
for char, occurrences in collections.Counter("apwswe"):
input = input.replace(char, '', occurrences)
this is quite simple and can be implemented as a straightforward functools.reduce expression but will rewrite the input string as many times as there are different characters in the filter.
A possibly more efficient alternative as it works in O(len(input) + len(filter)) rather than O(len(input)*len(uniq(filter))
input = "appegwisbnwe"
filter = collections.Counter("apwswe")
output = ''
for c in input:
if filter[c]:
filter[c] -= 1
else:
output += c

Related

addressing array in the function return

I would like to make the following code simpler.
files=dir('~/some*.txt');
numFiles=length(files);
for i = 1:numFiles
name=files(i).name;
name=strsplit(name,'.');
name=name{1};
name=strsplit(name, '_');
name=name(2);
name = str2num(name{1});
disp(name);
end
I am a begginer in Matlab, in general I would love something like:
name = str2num(strsplit(strsplit(files(i).name,'.')(1),'_')(2));
but matlab does not like this.
Another issue of the approach above is that matlab keeps giving cell type even for something like name(2) but this may be just the problem with my syntax.
Example file names:
3000_0_100ms.txt
3000_0_5s.txt
3000_110_5s.txt
...
Let's say I want to select all files ending in '5s' then I need to split them (after removing the extension) by '_' and return the second part, in the case of the three filenames above, that would be 0, 0, 110.
But I am in general curious how to do this simple operation in matlab without the complicated code that I have above.
Because your filenames follow a specific pattern, they're a prime candidate for a regular expression. While regular expressions can be confusing to learn at the outset they are very powerful tools.
Consider the following example, which pulls out all numbers that have both leading and trailing underscores:
filenames = {'3000_0_100ms.txt', '3000_0_5s.txt', '3000_110_5s.txt'};
strs = regexp(filenames, '(?<=\_)(\d+)(?=\_)', 'match');
strs = [strs{:}]; % Denest one layer of cells
nums = str2double(strs);
Which returns:
nums =
0 0 110
Being used here are what's called lookbehind (?<=...) and lookahead (?=...) operators. As their names suggest, they look in their respective directions related to the expression they're part of, (\d+) in our case, which looks for one or more digits. Though this approach requires more steps than the simple '\_(\d+)\_' expression, the latter requires you either utilize MATLAB's 'tokens' regex operator, which adds another layer of cells and that annoys me, or use the 'match' operator and strip the underscores from the match prior to converting to a numeric value.
Approach 2:
filenames = {'3000_0_100ms.txt', '3000_0_5s.txt', '3000_110_5s.txt'};
strs = regexp(filenames, '\_(\d+)\_', 'tokens');
strs = [strs{:}]; % Denest one layer of cells
strs = [strs{:}]; % Denest another layer of cells
nums = str2double(strs);
Approach 3:
filenames = {'3000_0_100ms.txt', '3000_0_5s.txt', '3000_110_5s.txt'};
strs = regexp(filenames, '\_(\d+)\_', 'match');
strs = [strs{:}]; % Denest one layer of cells
strs = regexprep(strs, '\_', '');
nums = str2double(strs);
You can use regexp to do a regular expression matching and obtain the numbers in the second place directly. This is an explanation of the regular expression I am using.
>>names = regexp({files(:).name},'\d*_(\d*)_\d*m?s\.txt$','tokens')
>>names = [names{:}]; % Get names out of their cells
>>names = [names{:}]; % Break cells one more time
>> nums = str2double(names); % Convert to double to obtain numbers

Function to split string in matlab and return second number

I have a string and I need two characters to be returned.
I tried with strsplit but the delimiter must be a string and I don't have any delimiters in my string. Instead, I always want to get the second number in my string. The number is always 2 digits.
Example: 001a02.jpg I use the fileparts function to delete the extension of the image (jpg), so I get this string: 001a02
The expected return value is 02
Another example: 001A43a . Return values: 43
Another one: 002A12. Return values: 12
All the filenames are in a matrix 1002x1. Maybe I can use textscan but in the second example, it gives "43a" as a result.
(Just so this question doesn't remain unanswered, here's a possible approach: )
One way to go about this uses splitting with regular expressions (MATLAB's strsplit which you mentioned):
str = '001a02.jpg';
C = strsplit(str,'[a-zA-Z.]','DelimiterType','RegularExpression');
Results in:
C =
'001' '02' ''
In older versions of MATLAB, before strsplit was introduced, similar functionality was achieved using regexp(...,'split').
If you want to learn more about regular expressions (abbreviated as "regex" or "regexp"), there are many online resources (JGI..)
In your case, if you only need to take the 5th and 6th characters from the string you could use:
D = str(5:6);
... and if you want to convert those into numbers you could use:
E = str2double(str(5:6));
If your number is always at a certain position in the string, you can simply index this position.
In the examples you gave, the number is always the 5th and 6th characters in the string.
filename = '002A12';
num = str2num(filename(5:6));
Otherwise, if the formating is more complex, you may want to use a regular expression. There is a similar question matlab - extracting numbers from (odd) string. Modifying the code found there you can do the following
all_num = regexp(filename, '\d+', 'match'); %Find all numbers in the filename
num = str2num(all_num{2}) %Convert second number from str

Extract character between two specified characters of a string using Matlab

I have a string and want to extract the numbers between the character 'w' and 's'. The positions of the the characters vary between different strings.
For example:
s = '1w12s01'
desired result: '12'
and
s = '102w22s21'
desired result: '22'
It can also be done using a regular expression with lookahead and lookbehind:
regexp(s,'(?<=w).*(?=s)','match')
The function strfind will do this easily enough. This will work as long as the number is always directly between a 'w' and and 's', both are only in the target string once, and the number you're after is the only thing between those two characters.
s = '102w22s21';
r = s((strfind(s, 'w')+1):(strfind(s, 's')-1));
Use this:
e = extractBetween(s,'w','s');

Capitalizing only the first letters without changing any numbers or punctuation

I would like to modify a string that will have make the first letter capitalized and all other letters lower cased, and anything else will be unchanged.
I tried this:
function new_string=switchCase(str1)
%str1 represents the given string containing word or phrase
str1Lower=lower(str1);
spaces=str1Lower==' ';
caps1=[true spaces];
%we want the first letter and the letters after space to be capital.
strNew1=str1Lower;
strNew1(caps1)=strNew1(caps1)-32;
end
This function works nicely if there is nothing other than a letter after space. If we have anything else for example:
str1='WOW ! my ~Code~ Works !!'
Then it gives
new_string =
'Wow My ^code~ Works !'
However, it has to give (according to the requirement),
new_string =
'Wow! My ~code~ Works !'
I found a code which has similarity with this problem. However, that is ambiguous. Here I can ask question if I don't understand.
Any help will be appreciated! Thanks.
Interesting question +1.
I think the following should fulfil your requirements. I've written it as an example sub-routine and broken down each step so it is obvious what I'm doing. It should be straightforward to condense it into a function from here.
Note, there is probably also a clever way to do this with a single regular expression, but I'm not very good with regular expressions :-) I doubt a regular expression based solution will run much faster than what I've provided (but am happy to be proven wrong).
%# Your example string
Str1 ='WOW ! my ~Code~ Works !!';
%# Convert case to lower
Str1 = lower(Str1);
%# Convert to ascii
Str1 = double(Str1);
%# Find an index of all locations after spaces
I1 = logical([0, (Str1(1:end-1) == 32)]);
%# Eliminate locations that don't contain lower-case characters
I1 = logical(I1 .* ((Str1 >= 97) & (Str1 <= 122)));
%# Check manually if the first location contains a lower-case character
if Str1(1) >= 97 && Str1(1) <= 122; I1(1) = true; end;
%# Adjust all appropriate characters in ascii form
Str1(I1) = Str1(I1) - 32;
%# Convert result back to a string
Str1 = char(Str1);

ignore spaces and cases MATLAB

diary_file = tempname();
diary(diary_file);
myFun();
diary('off');
output = fileread(diary_file);
I would like to search a string from output, but also to ignore spaces and upper/lower cases. Here is an example for what's in output:
the test : passed
number : 4
found = 'thetest:passed'
a = strfind(output,found )
How could I ignore spaces and cases from output?
Assuming you are not too worried about accidentally matching something like: 'thetEst:passed' here is what you can do:
Remove all spaces and only compare lower case
found = 'With spaces'
found = lower(found(found ~= ' '))
This will return
found =
withspaces
Of course you would also need to do this with each line of output.
Another way:
regexpi(output(~isspace(output)), found, 'match')
if output is a single string, or
regexpi(regexprep(output,'\s',''), found, 'match')
for the more general case (either class(output) == 'cell' or 'char').
Advantages:
Fast.
robust (ALL whitespace (not just spaces) is removed)
more flexible (you can return starting/ending indices of the match, tokenize, etc.)
will return original case of the match in output
Disadvantages:
more typing
less obvious (more documentation required)
will return original case of the match in output (yes, there's two sides to that coin)
That last point in both lists is easily forced to lower or uppercase using lower() or upper(), but if you want same-case, it's a bit more involved:
C = regexpi(output(~isspace(output)), found, 'match');
if ~isempty(C)
C = found; end
for single string, or
C = regexpi(regexprep(output, '\s', ''), found, 'match')
C(~cellfun('isempty', C)) = {found}
for the more general case.
You can use lower to convert everything to lowercase to solve your case problem. However ignoring whitespace like you want is a little trickier. It looks like you want to keep some spaces but not all, in which case you should split the string by whitespace and compare substrings piecemeal.
I'd advertise using regex, e.g. like this:
a = regexpi(output, 'the\s*test\s*:\s*passed');
If you don't care about the position where the match occurs but only if there's a match at all, removing all whitespaces would be a brute force, and somewhat nasty, possibility:
a = strfind(strrrep(output, ' ',''), found);