I want to parse an html webpage (Specifically a Google Search Results Page)
Looking for the specific counter string
"About *many results"
where *many can range from 0 to 999,999,999,999 results
grep("About [0-9] results",file)
I can't figure out how to incorporate the range of numbers (including commas) into the regular expression. Can anyone clarify? I've looked for similar questions posted, but their codes do not work for this task.
I'm guessing introduce some kind of wildcard "." but I don't think I'm using it correctly
The structure I had in mind was
Any#Times { { Any#Times( [0-9] ) },}
Solved own question...
didn't have to be fancy at all
"About .* results"
works fine
Depending on the content of the page then your .* works, but could get a very long and incorrect string.
If you want to make sure that you get only numbers, try:
"About ([0-9]+|[0-9]{1,3}(,[0-9]{3})*) results"
I've tested it with grep -E and it'll give you ungrouped numbers:
About 10000000 results
as well as grouped numbers using British/English conventions:
About 100,000 results
but not non-numbers:
About a bajillion results
or badly grouped numbers:
About 100,0 results
Related
I'm looking for a regsub example that does the following:
123tcl456TCL789 => 123!tcl!456!TCL!789
This is an Tcl example => This is an !Tcl! example
Yes, I could use string first to find a position and mash things but I saw in past a regsub command that does what I want but I can't recall. What would be the regsub command that allows that? I would guess regsub -all -nocase is a start.
I am bad at regsub and regexps. I wonder if there is a site or tool/script that we can supply a string, the final result and then we get the regsub form.
You're looking at the right tool, but there are various options, depending on exactly what the conditions are when faced with other text. Here's one that wraps each occurrence of "Tcl" (any capitalisation) with exclamation marks:
set inputString "123tcl456TCL789"
set replaced [regsub -all -nocase {tcl} $inputString {!&!}]
puts $replaced
That's using a very simple regular expression with the -nocase option, and the replacement means "put ! on either side of the substring matched".
Another (more generally applicable... perhaps) might be to put ! after any letter or number sequence that is followed by a number or letter.
set replaced [regsub -all {[A-Za-z]+(?=[0-9])|[0-9]+(?=[A-Za-z])} $inputString {&!}]
Note that doing things correctly typically requires understanding the real input data fairly well. For example, whether the numbers include floating point numbers in scientific notation, or whether the substrings to delimit are of fixed length.
I am trying to search a substring in a string with certain mismatch (mismatch allowed is taken from user). I came across String::Approx but I cant seem to understand how it work and your input is appreciated.
use String::Approx 'aindex';
$a="Aperlisaparlhardobjectproduced...";
$b="paal";
print aindex($b, ["I0", "D0", "S1"], $a);
When I search with 1 substitution, it works fine (I0 and D0 because I want to search for same length match) and gives output as 8. When I search with S2 or S3 it still gives 8 instead of position 1 (should have matched perl). Is it intended result?
Thank you!
I'm trying to parse HTML code and extra some data from with using regular expressions. The website that provides the data has no API and I want to show this data in an iOS app build using Swift. The HTML looks like this:
$(document).ready(function() {
var years = ['2020','2021','2022'];
var currentView = 0;
var amounts = [1269.2358,1456.557,1546.8768];
var balances = [3484626,3683646,3683070];
rest of the html code
What I'm trying to extract is the years, amounts and balances.
So I would like to have an array with the year in in [2020,2021,2022] same for amount and balances. In this example there are 3 years, but it could be more or less. I'm able to extra all the numbers but then I'm unable to link them to the years or amounts or balances. See this example: https://regex101.com/r/WMwUji/1, using this pattern (\d|\[(\d|,\s*)*])
Any help would be really appreciated.
Firstly I think there are some errors in your expression. To capture the whole number you have to use \d+ (which matches 1 or more consecutive numbers e.g. 2020). If you need to include . as a separator the expression then would look like \d+\.\d+.
In addition using non-capturing group, (?:) and non-greedy matches .*? the regular-expression that gives the desired result for years is
(?:year.*?|',')(\d+)
This can also be modified for the amount field which would look like this:
(?:amounts.*?|,)(\d+\.\d+)
You can try it here: https://regex101.com/r/QLcFQN/1
Edited: in the previous Version my proposed regex was non functional and only captured the last match.
You can continue with this regex:
^var (years \= (?'year'.*)|balances \= (?'balances'.*)|amounts \= (?'amounts'.*));$
It searches for lines with either years, balances or amount entries and names the matches acordingly. It matches the whole string within the brackets.
I have an issue with the search parameters. I want to pass a phrase in my query. For exemple i'm looking for emails where the subject is "Test 1".
For this i'm doing a get on this ressource.
https://graph.microsoft.com/v1.0/me/messages?$search="subject:Test 1"
But the behaviour of this query is : Looking for mails that contains "Test" in the subject OR 1 in any other fields.
Refering to the KQL Syntax
A phrase (includes two or more words together, separated by spaces; however, the words must be enclosed in double quotation marks)
So, to do what i want i have to put double quotes (") around my phrase to do a strict value search. Like below
subject:"Test 1"
The problem it's at this point. Microsoft graph api already use double quotes (") after the parameters $search.
?$search="Key words"
So I can't do what is mentioned in the KQL doc.
https://graph.microsoft.com/v1.0/me/messages?$search="subject:"Test 1""
It's throwing an error :
"Syntax error: character '1' is not valid at position 15 in '\"subject:\"test 1\"\"'.",
It's an expected behaviour. I was pretty sure it will not work.
If someone has any suggestions for a solution or a workaround, I'm a buyer.
What I've already tried so far :
Use simple quote
Remove the quotes right after $select=
Remove the subject part $select="Test 1", same behaviour as the first request mentioned in this post. It will looks for emails that contain "test" or "1".
Best regards.
EDIT :
After sasfrog's anwser :
I used $filter : It works well with simple operator AND, OR.I have some errors by using the Not Operator. And btw you have to use the orderby parameter to show the result by date and add the field in filter parameters.
Exemple 1 (working, what I asked for first) :
https://graph.microsoft.com/v1.0/me/messages/?$orderby=receivedDateTime desc &$filter=receivedDateTime ge 1900-01-01T00:00:00Z AND contains(subject,'test 1')
Exemple 2 (not working)
https://graph.microsoft.com/v1.0/me/messages/?$orderby=receivedDateTime desc &$filter=(receivedDateTime ge 1900-01-01T00:00:00Z AND contains(subject,'test 1')) NOT(contains(from/EmailAddress/address,[specific address]))
EDIT 2
After some test with the filter parameters.
The NOT operator is still not working so to workaround use "ne" (non-equals)
the example 2 becomes :
https://graph.microsoft.com/v1.0/me/messages/?$orderby=receivedDateTime desc&$filter=(receivedDateTime ge 1900-01-01T00:00:00Z AND contains(subject,'test 1')) AND (from/EmailAddress/address ne [specific address])
UPDATE : OTHER SOLUTION WITH $search
Using $filter is great but it looks like it was sometimes pretty slow. So I found a workaround aboutmy issue.
It's to use AND operator between all terms.
Exemple 4 :
I'm looking for the mails where the subject is test 1;
Let value = "test 1". So you have to splice it by using space separator. And after write some code to manipulate this array, to obtain something like below.
$search="(subject:test AND subject:1)"
The brackets can be important if you use a multiple fields search. And VoilĂ .
Not sure if it's sufficient for what you're doing, but how about using the contains function within a filter query instead:
https://graph.microsoft.com/v1.0/me/messages?$filter=contains(subject,'Test 1')
Sounds like you're already looking at the doco but here it is just in case.
Update also, this worked for me using the search method:
https://graph.microsoft.com/v1.0/me/messages?$search="subject:'Test 1'"
I have the following in my index this is test, and I want to be able to search for "this test is".
In other words, I'm looking for exact match with no regards to the words position.
Is this true?
$cl->SetMatchMode( SPH_MATCH_EXTENDED);
$result = $cl->Query( '"this test is"~3', $index );
If it is true then is the number after the ~ should be the count of the words to get the exact match search with no regards to the positions?
I have tested the result and it looks good but I'm not sure.
Your use of the proximity operator looks fine to me.