Catching punctuation in AppleScript for use in a Mail Rule - sed

I get a ton of spam mail where the from or reply to has punctuation in the address. Instead of creating a giant rule with a bunch of rows with Contains "!" and such, I want to use a script in a mail rule to send everything from an address with punctuation other than a dot and an at to the trash.
What can I replace = "hello world" in the example below to catch punctuation?
tell application "Mail"
set theSelection to selection
set theMessage to item 1 of theSelection
subject of theMessage
if subject of theMessage = "hello world" then
set mailbox of theMessage to mailbox "Trash"
end if
end tell
I've looked at some get shell script and sed examples but didn't understand how to specify that I only want to find punctuation.

What you’re looking for is a regular expression; AppleScript doesn’t have them by default. There are a couple of ways around this.
The easiest AppleScript way is to have a whitelist of valid characters, and then check your string to see if it contains anything not in that whitelist.
Because we don’t want to be trashing messages willy-nilly while testing, I’ve replaced the trash line with a display dialog line. Because the AppleScript calls a function, I also do this outside of the tell block. While you can call local functions inside a tell block, it’s easier in a short script like this to not worry about it.
property goodCharacters : "abcdefghijklmnopqrstuvwxyz.#"
tell application "Mail"
set theSelection to selection
set theMessage to item 1 of theSelection
set theSubject to theSubject of theMessage
end tell
if badCharacter(theSubject) is true then
display dialog theSubject & " is bad."
--tell application "Mail" to set the mailbox of theMessage to mailbox "Trash"
end if
--return true of the text contains any non-good character
on badCharacter(theText)
repeat with theCharacter in theText
if goodCharacters does not contain theCharacter then
return true
end if
end repeat
return false
end badCharacter
The badCharacter handler returns true if any of the characters in the given string are not in the list of good characters. Otherwise, it returns false.
You may also find these useful when you switch from testing against the subject to testing against what you said you want, which is the from and the reply-to:
set theSender to extract address from the sender of theMessage
set theReply to extract address from the reply to of theMessage
If you’re familiar with JavaScript, you may find JXA (JavaScript for Automation) more useful, because JavaScript does have regular expressions. In the upper left of Script Editor, change “AppleScript” to “JavaScript”.
mail = Application('Mail');
trash = mail.mailboxes.byName("Trash")
theSelection = mail.selection();
theMessage = theSelection[0];
theSender = theMessage.sender().replace(/^.*<([^>]+)>/, "$1");
theReply = theMessage.replyTo();
theSubject = theMessage.subject();
//text.search returns the index of the match
//it returns -1 on no match
//so if the result is 0 or greater, it found a match
//the caret (^) reverses the sense of the brackets
//where [a-z.#] would match *on* any character from a to z as well as the period and #
//[^a-z.#] matches any any character *other than* a-z, ., and #
if (theSubject.search(/[^a-z.#]/i) >= 0) {
mail.move(theMessage, {to: trash})
}
Other options might include using a third-party tool to add regular expressions to AppleScript. You might also use do shell script to use sed or some other command-line tool to parse the text you need parsed; I’m inclined to think running unknown text to the command line is dangerous, however.
And AppleScript does have tools for strings besides =. As shown above, it has contains (or does not contain), but it also has starts with and ends with. You can even get the words of any string, which will remove all punctuation.

Related

Automate a Grep Applescript to Word Document

I'm using a Mac and I'm preparing accounts for a company. Every payslip which I've made in Microsoft Word has a voucher number. Because a transaction was missed all voucher numbers are wrong so now there are hundreds of wrong payslips. I want to create a script that can find the following GREP (find beginning of paragraph, text:Vch, any character until \r):
^Vch.+\r
and replace it with nothing (thereby deleting the whole sentence).
I was thinking of using Applescript as it can open the document, perform the GREP find (tricky part), save the document and save it as a pdf (all which is needed).
But apparently my knowledge fails me. Commands from the dictionary like create range, execute find, all bring errors.
Somebody experienced in Applescript that could help me devise a script? Any suggestions? It should be something like:
Tell application "Microsoft Word"
tell active document
set myRange to create range start 0 end 0
tell myRange
execute find find "^Vch.+\r" replace with ""
end tell
end tell
end tell
Many thanks!
There are no special characters to indicate the beginning of a line.
To search at beginning of the paragraph, the script must use return & "some text"
You can use "^p" as paragraph mark, but it doesn't work when you set the match wildcards property to true
To match an entire paragraph, the script must use return & "some text" & return, and the script must use replace with return to delete one paragraph mark instead of two.
Because the first paragraph does not begin with a paragraph mark, the script must use two execute find commands.
The wildcard is *
tell application "Microsoft Word" -- (tested on version 15.25, Microsoft Office 2016)
-- check the first paragraph
select (characters of paragraph 1 of active document)
execute find (find object of selection) find text ("Vch*" & return) replace with "" replace replace one wrap find find stop with match wildcards and match case without match forward and find format
--to search forward toward the end of the document.
execute find (find object of selection) find text (return & "Vch*" & return) replace with return replace replace all wrap find find continue with match wildcards, match case and match forward without find format
save active document
-- export to PDF in the same directory as the active document
set pdfPath to path of active document & ":" & (get name of active window) & ".pdf"
set pdfPath to my createFile(pdfPath) -- create an empty file if not exists, the handler return a path of type alias (to avoid grant access issue)
save as active document file name pdfPath file format format PDF
end tell
on createFile(f)
do shell script "touch " & quoted form of POSIX path of f
return f as alias
end createFile

How to add phonetic guides to all the texts at once?

I have an essay with roughly 1000 Chinese words. I want to add phonetic guide (Pin Yin) on top of each Chinese word.
Therefore, in MS Words, I use Phonetic Guide. However, Phonetic Guide only allows me to create Pin Yin for 20 to 30 words each time. I tried to look for a function which allows me to add phonetic guides for all the words at once, but I cannot find an answer online.
I also want to make the phonetic guide font bigger and create more space between the Chinese text and the Pin Yin.
Can any expert give me some lights?
Not familiar with this area, but the starting point is that you can invoke the Phonetic Guide dialog box and get it to create the pinyin for the selection. For example
Sub testInsertPhoneticGuide()
Call insertPhoneticGuide(Selection.Range)
End Sub
Sub insertPhoneticGuide(r As Word.Range)
Dim d As Word.Dialog
Dim lng As Long
Dim lngChars As Long
Dim r1 As Word.Range
Dim r2 As Word.Range
On Error Resume Next
Set d = Word.Dialogs(wdDialogPhoneticGuide)
Set r1 = r.Duplicate
r1.TextRetrievalMode.IncludeFieldCodes = False
For lng = Len(r1.Text) To 1 Step -1
Set r2 = r1.Characters(lng)
' Do not insert pinyin for any range that
' contains a field (this will prevent the code from re-inserting
' pinyin, but you can change the way this works if you like)
If r2.Fields.Count = 0 Then
r2.Select
d.Show 1
' Error 6031 says there's no text to pinyin
If Err.Number = 6031 Then
Err.Clear
Else
On Error GoTo 0
End If
End If
Next
Set r2 = Nothing
Set r1 = Nothing
Set d = Nothing
End Sub
As far as I can tell, there is no way to specify the font and size/position parameters in the dialog box. They are not "sticky". But the Phonetic guide replaces each suitable character by an { EQ } field that contains the pinyin and the original character. The EQ looks somehting like this:
{ EQ \* jc2 \* "Font:SimSun" \* hps11 \o\ad(\s\up 10(fā),发) }
so as long as you want the same font, size and positioning, you should be able to display all the field codes and use Word Find/Replace to modify those values in every EQ field (or you could add code to modify the values for each character that you pinyin.
NB, there is also a PhoneticGuide() member of Word's Range object that lets you specify the pinyin text and the positioning parameters. However, to use that you would have to get the pinyin text somehow - the only way I know within Word is actually to use the Phonetic Guide dialog to insert it, but I imagine the necessary info for each character is available on the web.
In case anybody comes to this question again, after searching for a solution for a while I managed to add pinyin to my entire Chinese document by using the following two tools:
1) Open Office
2) The OO Pinyin Guide Extension for Open Office.
Hope this helps : )

Applescript for inserting hyperlink into MSWord comment

I'm trying to add a hyperlink object inside a Word comment.
To create a new comment in the active document I'm using this piece of script:
tell application "Microsoft Word"
set tempString to "lorem ipsum"
make new Word comment at selection with properties {comment text:tempString}
end tell
but now I'm not able to get a reference to the new created comment for use it with the command "make new hyperlink object".
Thanks for any suggestions.
Riccardo
I don't think you can work with the object returned by make new Word comment (at least not in this case), and you have to insert a unique, findable string then iterate through the comments:
tell application "Microsoft Word"
-- insert a unique string
set tempString to (ASCII character 127)
set theComments to the Word comments of the active document
repeat with theComment in theComments
if the content of the comment text of theComment = tempString then
set theRange to the comment text of theComment
-- you do not have to "set theHyperlink". "make new" is enough
set theHyperlink to make new hyperlink object at theRange with properties {text range:theRange, hyperlink address:"http://www.google.com", text to display:"HERE", screen tip:"click to search Google"}
insert text "You can search the web " at theRange
exit repeat
end if
end repeat
end tell
(edited to insert some text before the Hyperlink. If you want to insert text after the hyperlink, you can also use 'insert text "the text" at end of theRange.).
So for adding text, it was enough to use "the obvious" after all.
[
For anyone else finding this Answer. The basic problem with working with Word ranges in Applescript is that every attempt to redefine a range in the Comments story results in a range that is in the main document story. OK, I may not have tried every possible method, but e.g., collapsing the range, moving the start of range and so on cause that problem. In the past, I have noticed that with other story ranges as well, but have not investigated as far as this.
Also, I suspect that the reason why you cannot set a range to the Word comment that you just created is because the properties of the Comment specify a range object of some kind that I think is a temporary object that may be destroyed immediately after creation. SO trying to reference the object that you just created just doesn't work.
This part of the Answer is modified...
Finally, the only other way I found to populate a Comment with "rich content" was to insert the content in a document at a known place, then copy its formatted text to the comment, e.g. if the "known place is the selection, you can set the content of theComment via
set the formatted text of the comment text of theComment to the formatted text of the text object of the selection
If you are using a version of Word that supports VBA as well as Applescript, I don't really see any technical reason why you shouldn't invoke VBA to do some of these trickier things, even if you need the main code to be Applescript.
]
Finally I got a solution here:
https://discussions.apple.com/message/24628799#24628799
that allowed me to insert the hyperlink in reference with part of the comment text, with the following lines, if somebody in the future will search for the same:
tell application "Microsoft Word"
set wc to make new Word comment at end of document 1 with properties {comment text:"some text"}
set ct to comment text of wc
set lastChar to last character of ct
make new hyperlink object at end of document 1 with properties {hyperlink address:"http://www.example.com", text object:lastChar}
end tell

How to do search and replace involving fields in Microsoft Word?

I have a Word document with fields of the reference variety, which occur in the form "[field].[field]"--in other words, there's a period between the two fields. I want to globally replace this with a space.
Word offers the ^d special character to search for fields, but for some reason the query "^d.^d" does not find anything. However, ".^d" does. Now comes the problem, however--what do I specify as the replacement text in order to retain the field code? If using regular expressions, I could use a "Find What Expression" such as \1, but with regexp ("wild card") mode the ^d is not permitted.
I guess I could write a macro...
I would like to add to Bibadia's solution.
An example of an index entry field; we want to change a name we misspelled.
Make sure hidden formatting is displayed (toggle with SHIFT+CTRL+F8).
Make sure wildcards option is not selected. To search for fields, use the opening and closing field braces code (optionally use ^w for spaces, as Bibadia suggested):^19 XE "Deo, John" ^21
Replace won't recognize field braces character, but will allow to insert the clipboard's content. ;). To do that, insert in text the correct entry. CTRL+F9 to insert field and type:XE "Doe, John"
Select the field above and copy
Use ^c in the replace box
Hit Replace All
Ta-da!
It's usually better to go the macro route when finding fields because, as you say, the find algorithm that Word uses doesn't work the way you might hope with fields.
But if you know exactly what the fields contain, you can specify a search pattern that will probably work (however not in wildcard mode).
For example, if you want to look for figure number field pairs such as
{ STYLEREF 1 \s }.{ SEQ Figure \* ARABIC \s 1 }
(which would typically be the same set of fields everywhere in the document)
If you only really need to look for the following:
{ STYLEREF 1 \s }.<any field>
you could ensure that field codes are displayed and search for
^d STYLEREF 1 \s ^21.^d
or
^19 STYLEREF 1 \s ^21.^19
If you need to be more precise, you can spell out the second field as well.
"^d" only works for finding the field beginning, not the field end.
It's a shame that ^w wants to find at least 1 whitespace character because otherwise it would be more robust to look for
^19^wSTYLEREF^w1^w\s^w^21.^19
Perhaps someone else knows how to work around that without using wildcards?
Torzaburo,
I suggest that you do this using a macro. You can start by recording the macro, and later refining your processing steps within the macro.
First turn on the hidden characters by navigating to Home > Paragraph > toggle the show/hide Paragraph symbol. Also, select all and toggle the field codes on (right-click and select "Toggle Field Codes".
Open a new blank Word doc in addition to the one you have open. You will use this later. Start the macro recording and find the field using the "^d" (field code) as you said.
When the field is found, copy only the field text within the brackets, and not the full field reference. While the macro is still recording, ALT + TAB to the new blank document and paste the field code in as plain text.
At this point, do the necessary find & replace processing to the field codes. Highlight the processed field codes, copy, ALT + TAB back to the original document, and paste back between the { } brackets.
Stop the macro recording. Add any further custom processing to the macro VBA.
Select-All and re-toggle the field codes. Update the field codes.
You don't need a macro. Just toggle all field codes on by using Alt+F9. Then do a find and replace for what you want to change. Once the replacement is complete, use Alt+F9 again to toggle the field codes back off.
Disclaimer: I didn't originate this solution, but it's clean and elegant and I thought it should be included here:
(Adapted from Search & Replace Field Codes in Word):
Create or find a single instance of the field you want to convert text to
Toggle Field Codes visible (AltF9)
Copy the code for the field you want to use to the Clipboard (highlight and CtrlC)
Open the Replace dialog box (CtrlH), insert the text you want to replace in the Find What box and then enter ^c in the Replace With box.
This will replace your text with the contents of the Clipboard, turning it into the field code you copied in step 3. It also copies formatting information (font, color, etc.), to control how the field will appear when hidden. (Caveat: I've tested this with Word 2003 under Windows 7 only.)
Coming in late on this, probably way too late for Beth (sorry Beth). And this may not be quite what Beth was looking for. But for anyone interested ...
It sounds like Beth may have created captions throughout the document using INSERT CAPTION (hence the presence of field codes). This means these captions will have been (automatically) created in CAPTION style.
To globally replace the separator "." with " " (space) in such captions, take two steps:
[1] Go to REFERENCES | INSERT CAPTION, then click on NUMBERING and replace the SEPARATOR "." with "EM-DASH". This will replace all separators in captions for the selected label in the CAPTION Window. If you have other labels in use in the document (e.g. FIGURE), select the other labels one by one and repeat this process.
[2] Do a find/replace searching for special character "em-dash" (^+) in style CAPTION, replacing with " ". Click REPLACE ALL.
Voila!
NOTE: This presumes that em-dash does not appear in the caption text anywhere. If it does, then you'll need to do a pre- and post- "fiddle" to ensure these em-dashes are not touched by the global replace above.
The "pre-fiddle" is to do a global find/replace across captions, replacing the em-dash ("^+") with some other string (e.g. "EM-DASH") that doesn't ever occur in any caption's text. Then you do the separator change as described above. Finally, the "post-fiddle" is to restore the em-dashes that were in the captions, by doing a global replace of the string "EM-DASH" with the actual em-dash character "^+".

How to use “considering case” while addressing Microsoft Word in Applescript

The following Applescript code does not compile; the compiler highlights “case” and says: “Syntax Error: expected application constant or consideration but found property”. I guess case has a special meaning in the context of tell application "Microsoft Word". How can I work around that nicely?
tell application "Microsoft Word"
set c to content of character 1 of selection as string
considering case
if (c is "a") then
set content of text object of selection to "A"
end if
end considering
end tell
What I would do is try moving the considering block to the outside so that it encompasses everything else.
It appears that case is a reserved word for Microsoft Word, and so using that word in a different context while inside the tell block just confuses the compiler, hence the syntax error you are getting.
fireshadow52 is right ...
considering case
tell application "Microsoft Word"
set c to content of character 1 of selection as string
if (c is "a") then
set content of text object of selection to "A"
end if
end tell
end considering