How to extract certain text with NSRegularExpression? - iphone

I'm trying to extract (80.4) from the following code:
<div id="row" style="width:80.4px">
What would the expression look like to extract that text? Thanks.

A regex is rather heavyweight for this particular situation. I would just do this:
NSString *originalString; //which will contain "<div id="row" style="width:80.4px">", however you want to get it there
NSString *afterColon = [[originalString componentsSeparatedByString:#":"] objectAtIndex:1];
float theValue = [afterColon floatValue];

Here are two possibilities but the answer will vary depending on the following factors:
1) what other content is in the text that you do NOT want to match against,
2) and what variations will you permit to match against (eg. adding spaces or new lines inside the test that you want to match, or swapping the order of the parts of the text to match)
This matches only the "width:80.4px" portion of your given string (allowing for extra white space):
width\s*:\s*(\d+.\d+)\s*px
And this matches the entire string that you gave (also allowing for extra white space):
<div\s+id\s*=\s*"row"\s+style\s*=\s*"width:\s*(\d+.\d+)\s*px">
So in these regexs the 80.4 will be captured in the $1 capture group.

Related

Regular expression: retrieve one or more numbers after certain text

I'm trying to parse HTML code and extra some data from with using regular expressions. The website that provides the data has no API and I want to show this data in an iOS app build using Swift. The HTML looks like this:
$(document).ready(function() {
var years = ['2020','2021','2022'];
var currentView = 0;
var amounts = [1269.2358,1456.557,1546.8768];
var balances = [3484626,3683646,3683070];
rest of the html code
What I'm trying to extract is the years, amounts and balances.
So I would like to have an array with the year in in [2020,2021,2022] same for amount and balances. In this example there are 3 years, but it could be more or less. I'm able to extra all the numbers but then I'm unable to link them to the years or amounts or balances. See this example: https://regex101.com/r/WMwUji/1, using this pattern (\d|\[(\d|,\s*)*])
Any help would be really appreciated.
Firstly I think there are some errors in your expression. To capture the whole number you have to use \d+ (which matches 1 or more consecutive numbers e.g. 2020). If you need to include . as a separator the expression then would look like \d+\.\d+.
In addition using non-capturing group, (?:) and non-greedy matches .*? the regular-expression that gives the desired result for years is
(?:year.*?|',')(\d+)
This can also be modified for the amount field which would look like this:
(?:amounts.*?|,)(\d+\.\d+)
You can try it here: https://regex101.com/r/QLcFQN/1
Edited: in the previous Version my proposed regex was non functional and only captured the last match.
You can continue with this regex:
^var (years \= (?'year'.*)|balances \= (?'balances'.*)|amounts \= (?'amounts'.*));$
It searches for lines with either years, balances or amount entries and names the matches acordingly. It matches the whole string within the brackets.

Replace every non letter or number character in a string with another

Context
I am designing a code that runs a bunch of calculations, and outputs figures. At the end of the code, I want to save everything in a nice way, so my take on this is to go to a user specified Output directory, create a new folder and then run the save process.
Question(s)
My question is twofold:
I want my folder name to be unique. I was thinking about getting the current date and time and creating a unique name from this and the input filename. This works but it generates folder names that are a bit cryptic. Is there some good practice / convention I have not heard of to do that?
When I get the datetime string (tn = datestr(now);), it looks like that:
tn =
'07-Jul-2022 09:28:54'
To convert it to a nice filename, i replace the '-',' ' and ':' characters by underscores and append it to a shorter version of the input filename chosen by the user. I do that using strrep:
tn = strrep(tn,'-','_');
tn = strrep(tn,' ','_');
tn = strrep(tn,':','_');
This is fine but it bugs me to have to use 3 lines of code to do so. Is there a nice one liner to do that? More generally, is there a way to look for every non letter or number character in a string and replace it with a given character? I bet that's what regexp is there for but frankly I can't quite get a hold on how regexps work.
Your point (1) is opinion based so you might get a variety of answers, but I think a common convention is to at least start the name with a reverse-order date string so that sorting alphabetically is the same as sorting chronologically (i.e. yymmddHHMMSS).
To answer your main question directly, you can use the built-in makeValidName utility which is designed for making valid variable names, but works for making similarly "plain" file names.
str = '07-Jul-2022 09:28:54';
str = matlab.lang.makeValidName(str)
% str = 'x07_Jul_202209_28_54'
Because a valid variable can't start with a number, it prefixes an x - you could avoid this by manually prefixing something more descriptive first.
This option is a bit more simple than working out the regex, although that would be another option which isn't too nasty here using regexprep and replacing non-alphanumeric chars with an underscore:
str = regexprep( str, '\W', '_' ); % \W (capital W) matches all non-alphanumeric chars
% str = '07_Jul_2022_09_28_54'
To answer indirectly with a different approach, a nice trick with datestr which gets around this issue and addresses point (1) in one hit is to use the following syntax:
str = datestr( now(), 30 );
% str = '20220707T094214'
The 30 input (from the docs) gives you an ISO standardised string to the nearest second in reverse-order:
'yyyymmddTHHMMSS' (ISO 8601)
(note the T in the middle isn't a placeholder for some time measurement, it remains a literal letter T to split the date and time parts).
I normally use your folder naming approach with a meaningful prefix, replacing ':' by something else:
folder_name = ['results_' strrep(datestr(now), ':', '.')];
As for your second question, you can use isstrprop:
folder_name(~isstrprop(folder_name, 'alphanum')) = '_';
Or if you want more control on the allowed characters you can use good old ismember:
folder_name(~ismember(folder_name, ['0':'9' 'a':'z' 'A':'Z'])) = '_';

In search of Regular Expression to return substring of Phone number

I've looking looking at this with no success so far. I need a regular expression that returns me the string after the one in a phone number. Let me give you the perfect example:
phone number (in exact format i need): 15063217711
return i need: 5063217711 --> so the FIRST char if its a 1 is removed. I need a regular expression that matches a 1 only at the BEGINNING of the string and then returns the rest of the phone number.
By the way im using objective C but it should not make much difference.
Use positive lookbehind to skip digit 1: (?<=^1)\d+$
There's really no need for a regular expression.
Just test if the first character in the string is a 1 and if so, take the substring of the rest.
Check out [NSString characterAtIndex:] and [NSString substringFromIndex:]

How to avoid the hyphen entered in the textfield using regular expressions?

I have a textfield where I am performing validations using regular expressions. But I have a problem when the user enters hyphen in the textfield and click on the button then an alert should be provided that no entered is not valid. I am performing validations for string and special characters but for special characters it is not getting validated.
This is my code:
NSRegularExpression *regex = [[[NSRegularExpression alloc]
initWithPattern:#"[a-zA-Z][#$%&*()']" options:0 error:NULL] autorelease];
You can check occurance of hyphen using rangeOfString
NSRange k = [str rangeOfString:#"-"];
if(k.length != 0)
... Show your alert message
I feel this will meet your requirements, No need to step in complexity if regexp..
I'm not very familiar with objective-c and it's flavor of regular expressions, and I'm not completely clear on what you are asking, but your regular expression will not match a hyphen. It should match a letter followed by one of the following characters: #$%&*()'. Below are a few examples of what should match and some things that should not match:
SHOULD MATCH SHOULD NOT MATCH
---- ---- ---- ------ ------
a# a$ A -- C3PO
A) A* # - huh?
T% Q# g- AA## 123
j& a( $R () bbb5
z' B& qq "D%" ()-
If any of these are inconsistent with what you want to match, then you need to ask a question containing the desired pattern that you want to match. Some examples would be:
"any number of letters, parentheses, asterisks, ampersands"
"any number of any character except hyphens and square brackets ([])"
"between one and three capital letters followed by an optional number"
...or whatever meets your requirements.
Regular Expressions are a language that allows specification of almost any pattern of text-based data. If you do not need to match a pattern, then regular expressions might not be necessary. Good luck!
Don't show an alert because a user entered a character you don't like. That's terrible UX, especially when using a soft keyboard where typos are more likely.
Instead, process the input to extract the characters you do want. Think of a writing a web form with an input field for a phone number. You're not interested in the human-readable formatting in the phone number--the hyphens, the plus prefix to the country code, etc. But nothing should stop the user from entering those characters. You have code that can strip out the invalid characters leaving what you want.
NSString even makes this easy for you with -stringByTrimmingCharactersInSet:. You define an NSCharacterSet with the characters you want removed from a given NSString instance, invoke -stringByTrimmingCharactersInSet:, and you get an autoreleased NSString instance with only the characters you want in it, e.g.:
NSString *rawInput; // value assigned elsewhere to #"555-555-1212"
NSCharacterSet *characterSet = [ NSCharacterSet characterSetWithCharactersInString: #"-" ];
NSString *processedInput = nil;
processedInput = [ rawInput stringByTrimmingCharactersInSet: characterSet ];
NSLog( #"processedInput: %#", processedInput ); // logs "5555551212"

removing a specific string from a string and the next string after a space

i have a string #"ABC 1.23 bla bla bla" from which i have to remove the #"ABC" string and after that the string #"1.23". The problem is that the text #"1.23" varies .. it could be #"1.55" etc. How can i remove the string #"ABC" and the next word after space ?
You can use regular expressions, or you can do that in several ways using NSString methods.
use componentsSeparatedByString and pass a space; your string will be split in an array at word boundaries; then you use componentsJoinedByString: ignoring the first two elements of the array;
you can use twice in succession rangeOfString: passing a space in; the first time it will find the space after ABC; the second time, it will find the space after 1.23 (or whatever); then you get the substringFromIndex: starting at that position.
Regular expression would give you much more options, but it would be a steeper curve, if you have never used regex in ObjC. Have a look at RegExKitLite if you are interested.