Efficient string editing in Applescript

Efficient string editing in Applescript - sed

I'm writing an applescript that needs to take a string and output just the numbers in that string. I have a method that works
do shell script "sed s/[a-zA-Z\\']//g <<< " & s
where s is the input string, but this script is doing this thousands of times and it ends up taking on the order of twenty minutes to get through them all. Is there any way I could make this faster?
Expected input is a string that can contain pretty much anything (except no / or \. I've tried to have there be no whitespace characters as that breaks my method, but sometimes I get one still). Expected output is a string of numbers (or an empty string). For example the sentence "1 man paid 1,23 cents for 51 cans of beer " would have the desired output of simply "112351", and "I h8 dealing with all these numb3rs" would output "83"

If you really have to pick the numbers from a string at a time, then maybe this approach is faster, even though it seems like a lot more, but I should tell you, that specifying the full path of the tr command, is something that will save some, if just some milliseconds.
set mlist to "1 man paid 1,23 cents for 51 can\\'s of beer "
script o
property l : missing value
end script
set o's l to words of mlist
repeat with i from 1 to (count o's l)
try
item i of o's l as integer
on error
set item i of o's l to missing value
end try
end repeat
set o's l to o's l's text
set tids to my text item delimiters
set my text item delimiters to space
set o's l to text items of o's l
set my text item delimiters to tids
set o's l to o's l as text
set my text item delimiters to ","
set o's l to text items of o's l
set my text item delimiters to tids
set numb3rs to o's l as text
log numb3rs
--> (*112351*)
This approach may be faster, because you save the overhead of do shell script for every line you got, it should also just return the numbers, if those numbers are well formed (correct decimal separator). I haven't tried it with e-notation, but I think that should work as well.

have you tried tr?
tr -d '[:alpha:][:punct:]' <<< string

Another sed-solution, to make sure that only numbers are displayed:
do shell script "echo " & quoted form of s & "|sed s/[^0-9]//g"

I offer this alternate code. I don't see why McUsr's code needs to be that complicated.
set s to "1 man paid 1,23 cents for 51 can\\'s of beer "
set n to ""
repeat with i from 1 to (length of s)
set c to (character i of s)
# http://stackoverflow.com/questions/20313799
if class of c is number then
set n to n & c
end if
end repeat
return n

Related

Postgresql: How to replace multiple substrings in a string and with regexp

Postgresql 9.4: 1 column in 1 table is a string of text representing a route of flight for an aircraft.
The complete field consists of "Fixes" and "Routes" up to 80
characters in total length.
Routes and Fixes can be either 3 or 5 characters in length.
Routes and Fixes can have the same name.
There may be zero, one, or two Routes
Routes are followed by a single non-zero digit or a hash.
Routes and Fixes can be preceded or followed by a "+" or "*".
The field may contain CR/LF or double-triple spaces which should remain.
Each schema contains 6-20000s fields in this table
There are nearly 1800 Route names, but generally only 40-80 per schema
Examples:
"KIND ROCKY1 STL BUM OATHE CLASH5 KDEN"
"+MEARZ7 OKK+
KIND OKK FWA MIZAR3 KDTW"
"KIND OOM OOM5 WEGEE PXV J131 LIT BYP5 KDFW"
"KIND MEARZ# OKK ECK YEE YXI N171B VALIEE***EGSS"
The task is to clean up the lazy use of the hash instead of a digit and to update the Route versions (the trailing numbers). I.e. replace-in-place the Route with the correct digit rather than the # or what might be a wrong number. So every instance of "MEARZ7" or "MEARZ#" becomes "MEARZ9" and "OOM5" becomes "OOM6" but "OOM " stays "OOM ".
Currently I have been testing this:
UPDATE target SET detail =
CASE WHEN POSITION('CLASH' in detail) > 0
AND SUBSTRING(detail,POSITION('CLASH' in detail)+5,1) != ' '
THEN REGEXP_REPLACE (detail, 'CLASH.', 'CLASH5')
WHEN POSITION('MEARZ' in detail) > 0
AND SUBSTRING(detail,POSITION('MEARZ' in detail)+5,1) != ' '
THEN REGEXP_REPLACE (detail, 'MEARZ.', 'MEARZ9')
WHEN POSITION('OOM' in detail) > 0
AND SUBSTRING(detail,POSITION('OOM' in detail)+3,1) != ' '
THEN REGEXP_REPLACE (detail, 'OOM.', 'OOM6')
WHEN POSITION('ROCKY' in fsrtedtail) > 0
AND SUBSTRING(detail,POSITION('ROCKY' in detail)+5,1) != ' '
THEN REGEXP_REPLACE (detail, 'ROCKY.', 'ROCKY1')
ELSE detail END;
My logic was to:
Find the Route name.
Check if it's followed by a space.
If not, replace it with the correct Route+digit
I hadn't yet attempted to avoid "+" or "* ". I was thinking I could first replace the "#" with a number, then update the Route+digit as to not worry about the # and this would eliminate the need to look for the "+" or "* ". Then I could just look for a trailing space.
The second Route (in order of the WHEN statements) does not get updated so I guess am barking up the wrong tree.
They other big obstacle is there can be 80 or more Routes in a schema so if I have to nest a statement, it's gonna be huge.
I have tried array_to_string(array_replace(string_to_array( but it leaves behind double quotes, commas, and curly brackets so doesn't seem feasible.
At this point I'm thinking a function is the way to go, but I don't know where to start.

TXR: How to combine all lines where the following line begins with a tab?

I am trying to parse the text output of a shell command using txr.
The text output uses a tab indented line following it to continue the current line (not literal \t characters as I show below). Note that on other variable assignment lines (that don't represent extended length values), there are leading spaces in the input.
Variable Group: 1
variable = the value of the variable
long_variable = the value of the long variable
\tspans across multiple lines
really_long_variable = this variable extends
\tacross more than two lines, but it
\tis unclear how many lines it will end up extending
\tacross ahead of time
Variable Group: 2
variable = the value of the variable in group 2
long_variable = this variable might not be that long
really_long_variable = neither might this one!
How might I capture these using the txr pattern language? I know about the #(freeform) directive and it's optional numeric argument to treat the next n lines as one big line. Thus, it seems to me the right approach would be something like:
#(collect)
Variable Group: #i
variable = #value
#(freeform 2)
long_variable = #long_value
#(set long_value #(regsub #/[\t ]+/ "" long_value))
#(freeform (count-next-lines-starting-with-tab))
really_long_variable = #really_long_value
#(set really_long_value #(regsub #/[\t ]+/ "" really_long_value))
#(end)
However, it's not clear to me how I might write the count-next-lines-starting-with-tab procedure with TXR lisp. On the other hand, maybe there is another better way I could approach this problem. Could you provide any suggestions?
Thanks in advance!

Let's apply the KISS principle; we don't need to bring in #(freeform). Instead we can separately capture the main line and the continuation lines for the (potentially) multi-line variables. Then, intelligently combine them with #(merge):
#(collect)
Variable Group: #i
variable = #value
long_variable = #l_head
# (collect :gap 0 :vars (l_cont))
#l_cont
# (end)
really_long_variable = #rl_head
# (collect :gap 0 :vars (rl_cont))
#rl_cont
# (end)
# (merge long_variable l_head l_cont)
# (merge really_long_variable rl_head rl_cont)
#(end)
Note that the big indentations in the above are supposed to be literal tabs. Instead of literal tabs, we can encode tabs using #\t.
Test run on the real data with \t replaced by tabs:
$ txr -Bl new.txr data
(i "1" "2")
(value "the value of the variable" "the value of the variable in group 2")
(l_head "the value of the long variable" "this variable might not be that long")(l_cont ("spans across multiple lines") nil)
(rl_head "this variable extends" "neither might this one!")
(rl_cont ("across more than two lines, but it" "is unclear how many lines it will end up extending"
"across ahead of time") nil)
(long_variable ("the value of the long variable" "spans across multiple lines")
("this variable might not be that long"))
(really_long_variable ("this variable extends" "across more than two lines, but it"
"is unclear how many lines it will end up extending" "across ahead of time")
("neither might this one!"))
We use a strict collect with :vars for the continuation lines, so that the variable is bound (to nil) even if nothing is collected. :gap 0 prevents these inner collects from scanning across lines that don't start with tabs: another strictness measure.
#(merge) has "special" semantics for combining lists of strings that haver different nesting levels; it's perfect for assembling data from different levels of collection and is basically tailor made for this kind of thing. This problem is very similar to extracting HTTP, Usenet or e-mail headers, which can have continuation lines.
On the topic of how to write a Lisp function to look ahead in the data, the most important aspect is how to get a handle on the data at the current position. The TXR pattern matching works by backtracking over a lazy list of strings (lines/records).　We can use the #(data) directive to capture the list pointer at the given input position. Then we can just treat that as a list:
#(data here)
#(bind tab-start-lines #(length (take-while (f^ #/\t/) here))
Now tab-start-lines has a count of how many lines in the input start with tabs. However, take-while has a termination condition bug, unfortunately; if the following data consists of nothing but one or more tab lines, it misbehaves.⚠ Until TXR 166 is released, this requires a little workaround: (take-while [iff stringp (f^ #/\t/)] here).

Adding zero in front of a number python

I am making program which will go through all possible choices.
Range is from 00000 to 99999.
For example:
00001,00002...01000,01001,01002...99999.
The problem is that i can make string as '00000' but as i convert it to int in order to add extra 1 to keep cycle going only one 0 appears. In that case i will get 0+1 = 1 and i need 00001.
Not completely sure how should i do it with lists because i might need it in the future for certain operations (to get one element from a current number 00450, 01004, 94571...)
Any advice/help would be greatly appreciated! :)

You can use zfill(num) on strings to add leading zeros
def convert_int(number,decimals) :
return str(number).zfill(decimals)
print convert_int(1,6) #prints 000001

I don't know if this is exactly what you want, but you can use string formatting. For example, this will turn int('00000') + 1 into '00001':
new_i = '%05d'%(int('00000')+1)
where %05d adds as many trailing zeros as necessary to whatever comes after % so the total length of the final string formatted number is 5.

AutoHotKey Source Code Line Break

Is there a way to do line break in AutoHotKey souce code? My code is getting longer than 80 characters and I would like to separate them neatly. I know we can do this in some other language, such as VBA for example below:
http://www.excelforum.com/excel-programming-vba-macros/564301-how-do-i-break-vba-code-into-two-or-more-lines.html
If Day(Date) > 10 _
And Hour(Time) > 20 Then _
MsgBox "It is after the tenth " & _
"and it is evening"
Is there a souce code line break in AutoHotKey? I use a older version of the AutoHotKey, ver 1.0.47.06

There is a Splitting a Long Line into a Series of Shorter Ones section in the documentation:
Long lines can be divided up into a collection of smaller ones to
improve readability and maintainability. This does not reduce the
script's execution speed because such lines are merged in memory the
moment the script launches.
Method #1: A line that starts with "and", "or", ||, &&, a comma, or a
period is automatically merged with the line directly above it (in
v1.0.46+, the same is true for all other expression operators except
++ and --). In the following example, the second line is appended to the first because it begins with a comma:
FileAppend, This is the text to append.`n ; A comment is allowed here.
, %A_ProgramFiles%\SomeApplication\LogFile.txt ; Comment.
Similarly, the following lines would get merged into a single line
because the last two start with "and" or "or":
if (Color = "Red" or Color = "Green" or Color = "Blue" ; Comment.
or Color = "Black" or Color = "Gray" or Color = "White") ; Comment.
and ProductIsAvailableInColor(Product, Color) ; Comment.
The ternary operator is also a good candidate:
ProductIsAvailable := (Color = "Red")
? false ; We don't have any red products, so don't bother calling the function.
: ProductIsAvailableInColor(Product, Color)
Although the indentation used in the examples above is optional, it might improve
clarity by indicating which lines belong to ones above them. Also, it
is not necessary to include extra spaces for lines starting with the
words "AND" and "OR"; the program does this automatically. Finally,
blank lines or comments may be added between or at the end of any of
the lines in the above examples.
Method #2: This method should be used to merge a large number of lines
or when the lines are not suitable for Method #1. Although this method
is especially useful for auto-replace hotstrings, it can also be used
with any command or expression. For example:
; EXAMPLE #1:
Var =
(
Line 1 of the text.
Line 2 of the text. By default, a line feed (`n) is present between lines.
)
; EXAMPLE #2:
FileAppend, ; The comma is required in this case.
(
A line of text.
By default, the hard carriage return (Enter) between the previous line and this one will be written to the file as a linefeed (`n).
By default, the tab to the left of this line will also be written to the file (the same is true for spaces).
By default, variable references such as %Var% are resolved to the variable's contents.
), C:\My File.txt
In the examples above, a series of lines is bounded at
the top and bottom by a pair of parentheses. This is known as a
continuation section. Notice that the bottom line contains
FileAppend's last parameter after the closing parenthesis. This
practice is optional; it is done in cases like this so that the comma
will be seen as a parameter-delimiter rather than a literal comma.
Please read the documentation link for more details.
So your example can be rewritten as the following:
If Day(Date) > 10
And Hour(Time) > 20 Then
MsgBox
(
It is after the tenth
and it is evening
)

I'm not aware of a general way of doing this, but it seems you can break a line and start the remainder of the broken line (e.g. the next real line) with an operator. As long as the second line (and the third, fourth, etc., as applicable) starts with (optional whitespace plus) an operator, AHK will treat the whole thing as one line.
For instance:
hello := "Hello, "
. "world!"
MsgBox %hello%
The presence of the concatenation operator . at the logical beginning of the second line here makes AHK treat both lines as one.
(I also tried leaving the operator and the end of the first line and starting the second off with a double-quoted string; that didn't work.)

Vim: change formatting of variables in a script

I am using vim to edit a shell script (did not use the right coding standard). I need to change all of my variables from camel-hum-notation startTime to caps-and-underscore-notation START_TIME.
I do not want to change the way method names are represented.
I was thinking one way to do this would be to write a function and map it to a key. The function could do something like generating this on the command line:
s/<word under cursor>/<leave cursor here to type what to replace with>
I think that this function could be applyable to other situations which would be handy. Two questions:
Question 1: How would I go about creating that function.
I have created functions in vim before the biggest thing I am clueless about is how to capture movement. Ie if you press dw in vim it will delete the rest of a word. How do you capture that?
Also can you leave an uncompleted command on the vim command line?
Question 2: Got a better solution for me? How would you approach this task?

Use a plugin
Check the COERCION section at the bottom of the page:
http://www.vim.org/scripts/script.php?script_id=1545
Get the :s command to the command line
:nnoremap \c :%s/<C-r><C-w>/
<C-r><C-w> gets the word under the cursor to command-line
Change the word under the cursor with :s
:nnoremap \c lb:s/\%#<C-r><C-w>/\=toupper(substitute(submatch(0), '\<\#!\u', '_&', 'g'))/<Cr>
lb move right, then to beginning of the word. We need to do this to get
the cursor before the word we wish to change because we want to change only
the word under the cursor and the regex is anchored to the current cursor
position. The moving around needs to be done because b at the
start of a word moves to the start of the previous word.
\%# match the current cursor position
\= When the substitute string starts with "\=" the remainder is interpreted as an expression. :h sub-replace-\=
submatch(0) Whole match for the :s command we are dealing with
\< word boundary
\#! do not match the previous atom (this is to not match at the start of a
word. Without this, FooBar would be changed to _FOO_BAR)
& in replace expressions, this means the whole match
Change the word under the cursor, all matches in the file
:nnoremap \a :%s/<C-r><C-w>/\=toupper(substitute(submatch(0), '\<\#!\u', '_&', 'g'))/g<Cr>
See 3. for explanation.
Change the word under the cursor with normal mode commands
/\u<Cr> find next uppercase character
i_ insert an underscore.
nn Search the last searched string twice (two times because after exiting insert mode, you move back one character).
. Repeat the last change, in this case inserting the underscore.
Repeat nn. until all camelcases have an underscore added before them, that is, FooBarBaz has become Foo_Bar_Baz
gUiw uppercase current inner word
http://vim.wikia.com/wiki/Converting_variables_to_camelCase

I am not sure what you understand under 'capturing movements'. That
said, for a starter, I'd use something like this for the function:
fu! ChangeWord()
let l:the_word = expand('<cword>')
" Modify according to your rules
let l:new_var_name = toupper(l:the_word)
normal b
let l:col_b = col(".")
normal e
let l:col_e = col(".")
let l:line = getline(".")
let l:line = substitute(
\ l:line,
\ '^\(' . repeat('.', l:col_b-1) . '\)' . repeat('.', l:col_e - l:col_b+1),
\ '\1' . l:new_var_name,
\ '')
call setline(".", l:line)
endfu
As to leaving an uncompleted command on the vim command line, I think you're after
:map ,x :call ChangeWord(
which then can be invoked in normal mode by pressing ,x.
Update
After thinking about it, this following function is a bit shorter:
fu! ChangeWordUnderCursor()
let l:the_word = expand('<cword>')
"" Modify according to your rules
let l:new_var_name = '!' . toupper(l:the_word) . '!'
normal b
let l:col_b = col(".")
normal e
let l:col_e = col(".")
let l:line = getline(".")
exe 's/\%' . l:col_b . 'c.*\%' . (l:col_e+1) .'c/' . l:new_var_name . '/'
endfu

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Efficient string editing in Applescript - sed

have you tried tr? tr -d '[:alpha:][:punct:]' <<< string

Another sed-solution, to make sure that only numbers are displayed: do shell script "echo " & quoted form of s & "|sed s/[^0-9]//g"

Related

Postgresql: How to replace multiple substrings in a string and with regexp

TXR: How to combine all lines where the following line begins with a tab?

Adding zero in front of a number python

AutoHotKey Source Code Line Break

Vim: change formatting of variables in a script

Categories

Resources