I have a huge string. I need to extract a substring from that that huge string. The conditions are the string starts with either "TECHNICAL" or "JUSTIFY" and ends with a number, any number from 1 to 10. so for example, i have
string x = "This is a test, again I am test TECHNICAL: I need to extract this substring starting with testing. 8. This is test again and again and again and again";
so I need this
TECHNICAL: I need to extract this substring starting with testing.
I was wondering if someone has elegant solution for that.
Thanks in advance.
You can use Regular Expression for that.
Example:
string input = "This is a test, again I am test TECHNICAL: I need to extract this substring starting with testing. 8. This is test again and again and again and again";
string pattern = #"(TECHNICAL|JUSTIFY).*?(10|[1-9])";
System.Text.RegularExpressions.Regex myTextRegex = new Regex(pattern);
Match match = myTextRegex.Match(input );
string matched = null;
if (match.Groups.Count > 0)
{
matched = match.Groups[0].Value;
}
//Result: matched = TECHNICAL: I need to extract this substring starting with testing. 8
Related
What I mean is if I have a string, "apwswe", and another string "appegwisbnwe", if we "subtract" the two strings together, which means "appegwisbnwe" - "apwswe", I want to get "pegibn". Is there a way to do this? BTW pegibn is the characters that they don't have in "common" with eachother.
Not exactly a thing of beauty, but this will get you there:
subtrahend = "apwswe"
minuend = list("appegwisbnwe")
for char in subtrahend:
if minuend.count(char):
minuend.remove(char)
difference = "".join(minuend)
print(difference)
pgibne
Possible alternatives to rhurwitz's solution:
input = "appegwisbnwe"
for char, occurrences in collections.Counter("apwswe"):
input = input.replace(char, '', occurrences)
this is quite simple and can be implemented as a straightforward functools.reduce expression but will rewrite the input string as many times as there are different characters in the filter.
A possibly more efficient alternative as it works in O(len(input) + len(filter)) rather than O(len(input)*len(uniq(filter))
input = "appegwisbnwe"
filter = collections.Counter("apwswe")
output = ''
for c in input:
if filter[c]:
filter[c] -= 1
else:
output += c
I'm trying to check if a string contains one of four sub strings in a simpler way than this:
if (imageUrl.contains('.jpg') ||
imageUrl.contains('.png') ||
imageUrl.contains('.tif') ||
imageUrl.contains('.gif')) {
}
Is there a way to do this? For example checking against a list?
You can use a regex pattern instead of a simple string:
imageUrl.contains(new RegExp("\.(jpg|png|tif|gif)"))
Might be somewhat simpler.
RegularExpression can solve your problem. RegEx are used to search patterns in strings.
RegEx example:
^The matches any string that starts with The
end$ matches a string that ends with end
^The end$ exact string match (starts and ends with The end)
abc* matches a string that has ab followed by zero or more c
i have a string "October.29.2009 11:00 a.m."
I want to write a generic code that will replace the time with blank space.
I have tried the below code :
{
val date="October.29.2009 11:00 a.m." //time may be any value
date.replace("a.m.","").replace("p.m.","")
}
Above code can replace am and pm only. I need to replace time also.
Have you tried splitting the string. Use this:
date.split(" ")
The first element of the array returned will give you the date without time.
you can use regex like that
^([0-9]|0[0-9]|1[0-9]|2[0-3]):[0-5][0-9]$
Is your time always with leading zeros?
07:08
Then use
myString.reverse.substring(countYourCharsAndDigits).reverse
Or use a regex that matches the first part, that you are interestet in, and the rest and write it like
def dateExtractor(date:String) = {
val MatchExpression = "('hereTheRegExForTheInterestingPart')('regExForRest)".r
val MatchExpression(myNewString,rest) = date
myNewString
}
the ( and ) is needed for the matchen the ' not
I have a string and I need two characters to be returned.
I tried with strsplit but the delimiter must be a string and I don't have any delimiters in my string. Instead, I always want to get the second number in my string. The number is always 2 digits.
Example: 001a02.jpg I use the fileparts function to delete the extension of the image (jpg), so I get this string: 001a02
The expected return value is 02
Another example: 001A43a . Return values: 43
Another one: 002A12. Return values: 12
All the filenames are in a matrix 1002x1. Maybe I can use textscan but in the second example, it gives "43a" as a result.
(Just so this question doesn't remain unanswered, here's a possible approach: )
One way to go about this uses splitting with regular expressions (MATLAB's strsplit which you mentioned):
str = '001a02.jpg';
C = strsplit(str,'[a-zA-Z.]','DelimiterType','RegularExpression');
Results in:
C =
'001' '02' ''
In older versions of MATLAB, before strsplit was introduced, similar functionality was achieved using regexp(...,'split').
If you want to learn more about regular expressions (abbreviated as "regex" or "regexp"), there are many online resources (JGI..)
In your case, if you only need to take the 5th and 6th characters from the string you could use:
D = str(5:6);
... and if you want to convert those into numbers you could use:
E = str2double(str(5:6));
If your number is always at a certain position in the string, you can simply index this position.
In the examples you gave, the number is always the 5th and 6th characters in the string.
filename = '002A12';
num = str2num(filename(5:6));
Otherwise, if the formating is more complex, you may want to use a regular expression. There is a similar question matlab - extracting numbers from (odd) string. Modifying the code found there you can do the following
all_num = regexp(filename, '\d+', 'match'); %Find all numbers in the filename
num = str2num(all_num{2}) %Convert second number from str
Hi need help in using regexp for condition matching.
ex.my file has the following content
{hello.program='function'`;
bye.program='script'; }
I am trying to use regexp to match the string that has .program='function' in them:
pattern = '[.program]+\=(function)'
also tried pattern='[^\n]*(.hello=function)[^\n]*';
pattern_match = regexp(myfilename,pattern , 'match')
but this returns me pattern_match={} while i expect the result to be hello.program='function'`;
If 'function' comes with string-markers, you need to include these in the match. Also, you need to escape the dot (otherwise, it's considered "any character"). [.program]+ looks for one or several letters contained in the square brackets - but you can just look for program instead. Also, you don't need to escape the =-sign (which is probably what messed up the match).
cst = {'hello.program=''function''';'bye.program=''script'''; };
pat = 'hello\.program=''function''';
out = regexp(cst,pat,'match');
out{1}{1} %# first string from list, first match
hello.program='function'
EDIT
In response to the comment
my file contains
m2 = S.Parameter;
m2.Value = matlabstandard;
m2.Volatility = 'Tunable';
m2.Configurability = 'None';
m2.ReferenceInterfaceFile ='';
m2.DataType = 'auto';
my objective is to find all the lines that match, .DataType='auto'
Here's how you find the matching lines with regexp
%# read the file with textscan into a variable txt
fid = fopen('myFile.m');
txt = textscan(fid,'%s');
fclose(fid);
txt = txt{1};
%# find the match. Allow spaces and equal signs between DataType and 'auto'
match = regexp(txt,'\.DataType[ =]+''auto''','match')
%# match is not empty only if a match was found. Identify the non-empty match
matchedLine = find(~cellfun(#isempty,match));
Try this as it matches .program='function' exactly:
(\.)program='function'
I think this did not work:
'[.program]+\=(function)'
because of how the []'s work. Here is a link explaining why I say that: http://www.regular-expressions.info/charclass.html