I have a string that looks like this:
17/07/2013 TEXTT TEXR 1 Text 1234567 456.78 987654
I need to separate this so I only end up with 2 values (in this example it's 1234567 and 456.78). The rest is unneeded.
I tried using string split with %A_Space% but as the whole middle area between values is filled with spaces, it doesn't really work.
Anyone got an idea?
src:="17/07/2013 TEXTT TEXR 1 Text "
. " 1234567 456.78 987654", pattern:="([\d\.]+)\s+([\d\.]+)"
RegexMatch(src, pattern, match)
MsgBox, 262144, % "result", % match1 "`n"match2
You should look at RegExMatch() and RegexReplace().
So, you will need to build a regex needle (I'm not an expert regexer, but this will work)
First, remove all of the string up to the end of "1 Text" since "1 Text" as you say, is constant. That will leave you with the three number values.
Something like this should find just the numbers you want:
needle:= "iO)1\s+Text"
partialstring := RegexMatch(completestring, needle, results)
lenOfFrontToRemove := results.pos() + results.len()
lastthreenumbers := substr(completestring, lenOfFrontToRemove, strlen(completestring) )
lastthreenumbers := trim(lastthreenumbers)
msgbox % lastthreenumbers
To explain the regex needle:
- the i means case insensitive
- the O stands for options - it lets us use results.pos and results.len
- the \s means to look for whitespace; the + means to look for more than one if present.
Now you have just the last three numbers.
1234567 456.78 987654
But you get the idea, right? You should able to parse it from here.
Some hints: in a regex needle, use \d to find any digit, and the + to make it look for more than one in a row. If you want to find the period, use \.
Related
Using q’s like function, how can we achieve the following match using a single regex string regstr?
q) ("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13") like regstr
>>> 0111110b
That is, like regstr matches the foo-strings which end in the numbers 8,9,10,11,12.
Using regstr:"foo[8-12]" confuses the square brackets (how does it interpret this?) since 12 is not a single digit, while regstr:"foo[1[0-2]|[1-9]]" returns a type error, even without the foo-string complication.
As the other comments and answers mentioned, this can't be done using a single regex. Another alternative method is to construct the list of strings that you want to compare against:
q)str:("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")
q)match:{x in y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
0111110b
If your eventual goal is to filter on the matching entries, you can replace in with inter:
q)match:{x inter y,/:string z[0]+til 1+neg(-/)z}
q)match[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
A variation on Cillian’s method: test the prefix and numbers separately.
q)range:{x+til 1+y-x}.
q)s:"foo",/:string 82,range 7 13 / include "foo82" in tests
q)match:{min(x~/:;in[;string range y]')#'flip count[x]cut'z}
q)match["foo";8 12;] s
00111110b
Note how unary derived functions x~/: and in[;string range y]' are paired by #' to the split strings, then min used to AND the result:
q)flip 3 cut's
"foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo"
"82" ,"7" ,"8" ,"9" "10" "11" "12" "13"
q)("foo"~/:;in[;string range 8 12]')#'flip 3 cut's
11111111b
00111110b
Compositions rock.
As the comments state, regex in kdb+ is extremely limited. If the number of trailing digits is known like in the example above then the following can be used to check multiple patterns
q)str:("foo7"; "foo8"; "foo9"; "foo10"; "foo11"; "foo12"; "foo13"; "foo3x"; "foo123")
q)any str like/:("foo[0-9]";"foo[0-9][0-9]")
111111100b
Checking for a range like 8-12 is not currently possible within kdb+ regex. One possible workaround is to write a function to implement this logic. The function range checks a list of strings start with a passed string and end with a number within the range specified.
range:{
/ checking for strings starting with string y
s:((c:count y)#'x)like y;
/ convert remainder of string to long, check if within range
d:("J"$c _'x)within z;
/ find strings satisfying both conditions
s&d
}
Example use:
q)range[str;"foo";8 12]
011111000b
q)str where range[str;"foo";8 12]
"foo8"
"foo9"
"foo10"
"foo11"
"foo12"
This could be made more efficient by checking the trailing digits only on the subset of strings starting with "foo".
For your example you can pad, fill with a char, and then simple regex works fine:
("."^5$("foo7";"foo8";"foo9";"foo10";"foo11";"foo12";"foo13")) like "foo[1|8-9][.|0-2]"
I am trying to parse all the files and verify if any of the file content has strings TESTDIR or TEST_DIR
Files contents might look something like:-
TESTDIR = foo
include $(TESTDIR)/chop.mk
...
TEST_DIR := goldimage
MAKE_TESTDIR = var_make
NEW_TEST_DIR = tesing_var
Actually I am only interested in TESTDIR ,$(TESTDIR),TEST_DIR but in my case last two lines should be ignored. I am new to perl , Can anyone help me out with re-rex.
/\bTEST_?DIR\b/
\b means a "word boundary", i.e. the place between a word character and a non-word character. "Word" here has the Perl meaning: it contains characters, numbers, and underscores.
_? means "nothing or an underscore"
Look at "characterset".
Only (space) surrounding allowed:
/^(.* )?TEST_?DIR /
^ beginning of the line
(.* )? There may be some content .* but if, its must be followed by a space
at the and says that a whitespace must be there. Otherwise use ( .*)?$ at the end.
One of a given characterset is allowed:
Should the be other characters then a space be possible you can use a character class []:
/^(.*[ \t(])?TEST_?DIR[) :=]/
(.*[ \t(])? in front of TEST_?DIR may be a (space) or a \t (tab) or ( or nothing if the line starts with itself.
afterwards there must be one of (space) or : or = or ). Followd by anything (to "anything" belongs the "=" of ":=" ...).
One of a given group is allowed:
So you need groups within () each possible group in there devided by a |:
/^(.*( |\t))?TEST_?DIR( | := | = )/
In this case, at the beginning is no change to [ \t] because each group holds only one character and \t.
At the end, there must be (single space) or := (':=' surrounded by spaces) or = ('=' surrounded by spaces), following by anything...
You can use any combination...
/^(.*[ \t(])?TEST_?DIR([) =:]| :=| =|)/
Test it on Debuggex.com. (Use 'PCRE')
I'm using AutoHotkey for this as the code is the most understandable to me. So I have a document with numbers and text, for example like this
120344 text text text
234000 text text
and the desired output is
12:03:44 text text text
23:40:00 text text
I'm sure StrReplace can be used to insert the colons in, but I'm not sure how to specify the position of the colons or ask AHK to 'find' specific strings of 6 digit numbers. Before, I would have highlighted the text I want to apply StrReplace to and then press a hotkey, but I was wondering if there is a more efficient way to do this that doesn't need my interaction. Even just pointing to the relevant functions I would need to look into to do this would be helpful! Thanks so much, I'm still very new to programming.
hfontanez's answer was very helpful in figuring out that for this problem, I had to use a loop and substring function. I'm sure there are much less messy ways to write this code, but this is the final version of what worked for my purposes:
Loop, read, C:\[location of input file]
{
{ If A_LoopReadLine = ;
Continue ; this part is to ignore the blank lines in the file
}
{
one := A_LoopReadLine
x := SubStr(one, 1, 2)
y := SubStr(one, 3, 2)
z := SubStr(one, 5)
two := x . ":" . y . ":" . z
FileAppend, %two%`r`n, C:\[location of output file]
}
}
return
Assuming that the "timestamp" component is always 6 characters long and always at the beginning of the string, this solution should work just fine.
String test = "012345 test test test";
test = test.substring(0, 2) + ":" + test.substring(2, 4) + ":" + test.substring(4, test.length());
This outputs 01:23:45 test test test
Why? Because you are temporarily creating a String object that it's two characters long and then you insert the colon before taking the next pair. Lastly, you append the rest of the String and assign it to whichever String variable you want. Remember, the substring method doesn't modify the String object you are calling the method on. This method returns a "new" String object. Therefore, the variable test is unmodified until the assignment operation kicks in at the end.
Alternatively, you can use a StringBuilder and append each component like this:
StringBuilder sbuff = new StringBuilder();
sbuff.append(test.substring(0,2));
sbuff.append(":");
sbuff.append(test.substring(2,4));
sbuff.append(":");
sbuff.append(test.substring(4,test.length()));
test = sbuff.toString();
You could also use a "fancy" loop to do this, but I think for something this simple, looping is just overkill. Oh, I almost forgot, this should work with both of your test strings because after the last colon insert, the code takes the substring from index position 4 all the way to the end of the string indiscriminately.
I am using Autohotkey.
I have a string that looks like this S523.WW.E.SIMA. I want to remove the last few characters of the string after the dot (including the dot itself). So, after the removal, the string will look like S523.WW.E.
This may look like a simple question but I just cannot figure out using the available string functions in Autohotkey. How can this be done using Autohotkey? Thank you very much.
Example 1 (last index of)
string := "S523.WW.E.SIMA"
LastDotPos := InStr(string,".",0,0) ; get position of last occurrence of "."
result := SubStr(string,1,LastDotPos-1) ; get substring from start to last dot
MsgBox %result% ; display result
See InStr
See SubStr
Example 2 (StrSplit)
; Split it into the dot-separated parts,
; then join them again excluding the last part
parts := StrSplit(string, ".")
result := ""
Loop % parts.MaxIndex() - 1
{
if(StrLen(result)) {
result .= "."
}
result .= parts[A_Index]
}
Example 3 (RegExMatch)
; Extract everything up until the last dot
RegExMatch(string, "(.*)\.", result)
msgbox % result1
Example 4 (RegExReplace)
; RegExReplace to remove everything, starting with the last dot
result := RegExReplace(string, "\.[^\.]+$", "")
I have a list of pdf files in this format "123 - Test - English.pdf". I want to be able to set "111", "Test" and "English.pdf" in their own individual variables. I tried running the code below but I don't think it accounts for multiple dashes "-". How can I do this? Please help Thanks in advance.
Loop,C:\My Documents\Notes\*.pdf, 0, 0
{
NewVariable = Trim(Substr(A_LoopFileName,1, Instr(A_LoopFileName, "-")-1))
I would recommend using a parse loop to get your variables. The following loops through values between the dashes and removes the whitespace.
FileName = Test - file - name.pdf
Loop, parse, FileName, `-
MyVar%A_Index% := RegExReplace(A_LoopField, A_Space, "")
msgbox % Myvar1 "`n" Myvar2 "`n" MyVar3
First, I don't know if it was a typo, but if you use a { under your loop statement, you also need to close it. If your next statement is just one line, you don't need any brackets at all.
Second, if you just use = then your code will output as just that very code text. You need to use a :=
Third, your present code, if coded correctly would result in this:
somepdffile.pd
if it found any pdf files without a dash. Instr() will return the position of a dash. If there is no dash, it returns 0 - in which case, your substr() statement will add 0 and your -1 which adds up to -1 and if you use a negative number with substr(), it will search from the end of the string instead of the beginning - which is why your string would get cut off.
Loop, C:\My Documents\Notes\*.pdf, 0, 0
{
;look at the docs (http://www.autohotkey.com/docs/) for `substr`
}
So there is an explanation of why your code doesn't work. To get it to do what you want to do, can you explain a bit more as to how you want NewVariable to look like?
; here is another way (via RegExMatch)
src:="123 - Test - English.pdf", pat:="[^\s|-]+"
While, mPos:=RegExMatch(src, pat, match, mPos ? mPos+StrLen(match):1)
match%A_Index%:=match
MsgBox, 262144, % "result", % match1 ", "match2 ", "match3