substitution within a text file, using Applescript and sed - sed

The question is a sequel to plain text URL to HTML code (Automator/AppleScript).
Suppose I have a plain txt file /Users/myname/Desktop/URLlist.txt:
title 1
http://a.b/c
title 2
http://d.e/f
...
I'd like to (1) convert all the URL (http://...) to HTML code, and (2) add
<br />
to each empty line, so that the aforementioned content will become:
title 1
http://a.b/c
<br />
title 2
http://d.e/f
<br />
...
I come to the following Applescript:
set inFile to "/Users/myname/Desktop/URLlist.txt"
set middleFile to "/Users/myname/Desktop/URLlist2.txt"
set outFile to "/Users/myname/Desktop/URLlist3.txt"
do shell script "sed 's/\\(http[^ ]*\\)/<a href=\"\\1\">\\1<\\/a>/g' " & quoted form of inFile & " >" & quoted form of middleFile
do shell script "sed 's/^$/\\ <br \\/>/g' " & quoted form of middleFile & " >" & quoted form of outFile
It works, but it is redundant (and silly?). Could anyone make it more succinct? Can it be done involving only one text file instead of three (i.e. the original content in /Users/myname/Desktop/URLlist.txt is overwritten with the end result)?
Thank you very much in advance.

Try:
set inFile to "/Users/myname/Desktop/URLlist.txt"
set myData to (do shell script "sed '
/\\(http[^ ]*\\)/ a\\
<br />
' " & quoted form of inFile & " | sed 's/\\(http[^ ]*\\)/<a href=\"\\1\">\\1<\\/a>/g' ")
do shell script "echo " & quoted form of myData & " > " & quoted form of inFile
This will let you use the myData variable later in your script. If this is not part of a larger script and you are simply modifying your file, use the -i option as jackjr300 suggests. Also, this script looks for the original pattern and appends the new line to it rather than simply looking for empty lines.
EDIT:
set inFile to "/Users/myname/Desktop/URLlist.txt"
set myData to (do shell script "sed 's/\\(http[^ ]*\\)/<a href=\"\\1\">\\1<\\/a>/g; s/^$/\\ <br \\/>/g' " & quoted form of inFile)
do shell script "echo " & quoted form of myData & " > " & quoted form of inFile

Use the -i '' option to edit files in-place.
set inFile to "/Users/myname/Desktop/URLlist.txt"
do shell script "sed -i '' 's:^$:\\ <br />:; s:\\(http[^ ]*\\):\\1:g' " & quoted form of inFile
If you want a copy of the original file, use a specified extension like sed -i ' copy'
--
Updated:
A `DOCTYPE is a required preamble.
DOCTYPEs are required for legacy reasons. When omitted, browsers tend to use a different rendering mode that is incompatible with some specifications. Including the DOCTYPE in a document ensures that the browser makes a best-effort attempt at following the relevant specifications.
The HTML lang attribute can be used to declare the language of a Web page or a portion of a Web page. This is meant to assist search engines and browsers. According to the W3C recommendation you should declare the primary language for each Web page with the lang attribute inside the <html> tag
The <meta> tag provides metadata about the HTML document. <meta> tags always goes inside the <head> element.
The http-equiv attribute provides an HTTP header for the information/value of the content attribute.
content: the value associated with the http-equiv or name attribute.
charset: To display an HTML page correctly, the browser must know what character-set to use.
In this script: I put "utf-8" as encoding, change it by the encoding of your original file.
set inFile to "/Users/myname/Desktop/URLlist.html" -- text file with a ".html" extension
set nL to linefeed
set prepandHTML to "<!DOCTYPE html>\\" & nL & "<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en-US\" lang=\"en-US\">\\" & nL & tab & "<head><meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\" />\\" & nL & "</head>\\" & nL
do shell script "sed -i '' 's:^$:\\ <br />:; s:\\(http[^ ]*\\):\\1:g; 1s~^~" & prepandHTML & "~' " & quoted form of inFile
do shell script "echo '</html>' " & quoted form of inFile -- write last HTML tag

I can't understand sed commands very well (it makes my brain hurt) so here's the applescript way to do this task. Hope it helps.
set f to (path to desktop as text) & "URLlist.txt"
set emptyLine to " <br />"
set htmlLine1 to "<a href=\""
set htmlLine2 to "\">"
set htmlLine3 to "</a>"
-- read the file into a list
set fileList to paragraphs of (read file f)
-- modify the file as required into a new list
set newList to {}
repeat with i from 1 to count of fileList
set thisItem to item i of fileList
if thisItem is "" then
set end of newList to emptyLine
else if thisItem starts with "http" then
set end of newList to htmlLine1 & thisItem & htmlLine2 & thisItem & htmlLine3
else
set end of newList to thisItem
end if
end repeat
-- make the new list into a string
set text item delimiters to return
set newFile to newList as text
set text item delimiters to ""
-- write the new string back to the file overwriting its contents
set openFile to open for access file f with write permission
write newFile to openFile starting at 0 as text
close access openFile
EDIT: if you have trouble with the encoding these 2 handlers will handle the read/write properly. So just insert them in the code and adjust those lines to use the handlers. Good luck.
NOTE: when opening the file using TextEdit, use the File menu and open specifically as UTF-8.
on writeTo_UTF8(targetFile, theText, appendText)
try
set targetFile to targetFile as text
set openFile to open for access file targetFile with write permission
if appendText is false then
set eof of openFile to 0
write «data rdatEFBBBF» to openFile starting at eof -- UTF-8 BOM
else
tell application "Finder" to set fileExists to exists file targetFile
if fileExists is false then
set eof of openFile to 0
write «data rdatEFBBBF» to openFile starting at eof -- UTF-8 BOM
end if
end if
write theText as «class utf8» to openFile starting at eof
close access openFile
return true
on error theError
try
close access file targetFile
end try
return theError
end try
end writeTo_UTF8
on readFrom_UTF8(targetFile)
try
set targetFile to targetFile as text
targetFile as alias -- if file doesn't exist then you get an error
set openFile to open for access file targetFile
set theText to read openFile as «class utf8»
close access openFile
return theText
on error
try
close access file targetFile
end try
return false
end try
end readFrom_UTF8

Related

How can I get neovim to return the cursor to the original line after formatting the file?

In order to solve the file formatting problem of vim, I simply wrote a function:
function FileFormat()
let cursorLine = col(".")
let filetype = &filetype
if filetype == 'json'
%!jq .
execute cursorLine
elseif filetype == 'cpp'
%!astyle --style=attach --pad-oper --lineend=linux -N -C -L -xw -xW -w
execute cursorLine
else
echo "Formatting of " . filetype . " files is not currently supported."
endif
endfunction
And map a shortcut key for this function:
:nnoremap <C-f> :call FileFormat()<cr>
But I found that after formatting the file, the cursor is still at the beginning of the line. I know this is because the cursor disappears when neovim enters command mode, causing the col() function to not get a valid line number.
Is there any other way to solve this problem?
neovim version: 0.6.1
The reason is that I use the wrong function, I should not use the col function, I should get the cursor line number through line("."), this function will not be affected by the mode switch.

How to generate caption from img alt atribute

Is there a way to convert an img tag containing an alt attribute (in a html file),
<img src="pics/01.png" alt="my very first pic"/>
to an image link plus caption (org file),
#+CAPTION: my very first pic
[[pics/01.png]]
using pandoc?
I'm calling pandoc like this:
$ pandoc -s -r html index.html -o index.org
where index.html contains the img tag from above, but it doesn't add the caption in the output org file:
[[pics/01.png]]
Currently the Org Writer unfortunately throws away the image alt and title strings. Feel free to submit an issue or patch if there's a way to do alt text in Org.
You can also always write a filter to modify the doc AST and add the alt text to an additional paragraph.
OP here. I didn't manage to make pandoc bend to my needs in this case. But a little bash scripting with some awk help does the trick.
The script replaces all img tags with org-mode equivalents plus captions. Pandoc leaves these alone when converting from html to org-mode.
The awk script,
# replace_img.awk
#
# Sample input:
# <img src="/pics/01.png" alt="my very first pic"/>
# Sample output:
# #+CAPTION: my very first pic
# [[/pics/01.png]]
BEGIN {
# Split the input at "
FS = "\""
}
# Replace all img tags with an org-mode equivalent.
/^<img src/{
print "#+CAPTION: " $4
print "[["$2"]]"
}
# Leave the rest of the file intact.
!/^<img src/
and the bash script,
# replace_img.sh
php_files=`find -name "*.php"`
for file in $php_files; do
awk -f replace_img.awk $file > tmp && mv tmp $file
done
Place these files at the root of the project, chomod +x replace_img.sh and then run the script: ./replace_img.sh. Change the extension of the files, if needed. I've had over 300 php files.

Mac Automater: from a string, get a file

I'm trying to make a shortcut via an automater service that will move the selected file(s) up a directory. It goes as follows:
Get Selected Finder Items
Get Value of Variable Path
Run Applescript:
on join(someList, delimiter)
set prevTIDs to AppleScript's text item delimiters
set AppleScript's text item delimiters to delimiter
set output to "" & someList
set AppleScript's text item delimiters to prevTIDs
return output
end join
to split(someText, delimiter)
set AppleScript's text item delimiters to delimiter
set someText to someText's text items
set AppleScript's text item delimiters to {""}
return someText
end split
on run {input, parameters}
set pathToMe to POSIX path of (item 1 of input as text)
set newPath to split(pathToMe, "/")
set revPath to reverse of newPath
set restList to rest of revPath
set restList to rest of restList
set joinPath to join(reverse of restList, "/")
set source to POSIX file joinPath
return source
end run
Set Value of Variable Parent
Move Finder Items To Parent
The Applescript parses the first file path in the Path in order to find the item's grandparent, returning it as a POSIX file string. The problem is that the "Move Finder" action only accepts Files/Folders. How can I select the target parent folder with the resulting string in order to pass it to the "Move Finder" action?
Things I've tried:
Using mv in a Run Bash Script: the Run Applescript action doesn't seem to return anything to the Run Bash Script; set to input as arguments, "$#" is always empty.
Doing a tell finder in Run Applescript. No error or warning, just nothing happens.
Manually setting the value of the parent variable.
Thanks in advance!
return a path of type alias in a list instead of a posix file
on run {input, parameters}
set pathToMe to (item 1 of input) as text
set f to my getParent(pathToMe, ":")
return {f as alias}
end run
to getParent(someText, delimiter)
set prevTIDs to AppleScript's text item delimiters
set AppleScript's text item delimiters to delimiter
set n to -2
if someText ends with ":" then set n to -3
set t to text 1 thru text item n of someText
set AppleScript's text item delimiters to prevTIDs
return t
end getParent
I prefer to do this all in applescript so try this code. I didn't test it but it should work. You can still add this to automator with an applescript action but you don't need all the other actions. It will do everything itself. Good luck.
tell application "Finder"
set theSelection to get selection
set parentFolder to container of (item 1 of theSelection)
move theSelection to parentFolder
end tell

Unicode named Folder shows ? in wscript prompt

I am facing problems with Unicode named folders. When I drag the folder to the script, it doesn't show the path of the folder properly.
Simple VBScript (this is just a portion of it):
Dim Wshso : Set Wshso = WScript.CreateObject("WScript.Shell")
Dim FSO : Set FSO = CreateObject("Scripting.FileSystemObject")
If WScript.Arguments.Count = 1 Then
If FSO.FileExists(Wscript.Arguments.Item(0)) = true and FSO.FolderExists(Wscript.Arguments.Item(0)) = false Then
Alert "You dragged a file, not a folder! My god." & vbcrlf & "Script will terminate immediately", 0, "Alert: User is stupid", 48
WScript.Quit
Else
targetDir = WScript.Arguments.Item(0)
Wshso.Popup targetDir
End If
Else
targetDir = Wshso.SpecialFolders("Desktop")
Alert "Note: No folder to traverse detected, default set to:" & vbcrlf & Wshso.SpecialFolders("Desktop"), 0, "Alert", 48
End If
If it is a normal path without Unicode characters, it's fine. But in this case:
Directory: 4minute (포미닛) - Hit Your Heart
Then it will show something like 4minute (?) - Hit Your Heart
And if I do a FolderExists it can't find the dragged folder.
Is there any workaround to support Unicode named Folders?
Thanks!
I'll edit if this is not clear enough
This does seem to be a problem peculiar to the Windows Script Host's DropHandler shell extension. Whereas:
test.vbs "C:\포미닛.txt"
C:\WINDOWS\System32\WScript.exe "test.vbs" "C:\포미닛.txt"
both work when typed from the console (even if the console can't render the Hangul so it looks like ?), a drag and drop operation that should result in the same command goes through a Unicode->ANSI->Unicode translation that loses all characters that aren't in the current ANSI code page. (So 포미닛 will work on a default Korean Windows install but not Western.)
I'm not aware of a proper way to fix the problem. You could perhaps work around it by changing the DropHandler for .vbs files in the registry:
HKEY_CLASSES_ROOT\VBSFile\ShellEx\DropHandler\(Default)
from the WSH DropHandler ({60254CA5-953B-11CF-8C96-00AA00B8708C}) to {86C86720-42A0-1069-A2E8-08002B30309D}, the one used for .exe, .bat and similar, which doesn't suffer from this issue. You would also probably have to change the file association for .vbs to put quotes around the filename argument too, since the EXE DropHandler doesn't, to avoid problems with spaces in filenames.
Since this affects argument-passing for all VBS files it would be a perilous fix to deploy on any machine but your own. If you needed to do that, maybe you could try creating a new file extension with the appropriate DropTarget rather than changing VBSFile itself? Or maybe forgo drop-onto-script behaviour and provide a file Open dialog or manual drop field instead.
For anyone landing here from Google...
Bobince's tip lead me to work around this problem by wrapping my vbscript file (myscript.vbs) in a dos batch file (mybatch.bat).
The tip was:
"Seem to be a problem peculiar to the Windows Script Host's
DropHandler shell extension whereas.... the one used for .exe, .bat and
similar... doesn't suffer from this issue."
mybatch.bat contains:
:Loop
IF "%1"=="" GOTO Continue
set allfiles=%allfiles% "%1"
SHIFT
GOTO Loop
:Continue
"myscript.vbs" %allfiles%
You may also find this code from my myscript.vbs to be helpful
For Each strFullFileName In Wscript.Arguments
' do stuff
Next
Based on DG's answer, if you just want to accept one file as drop target then you can write a batch file (if you have it named as "x.bat" place VBScript with filename "x.bat.vbs" at same folder) that just contains:
#"%0.vbs" %1
the # means to not output the row on the display (I found it to show garbage text even if you use chcp 1250 as first command)
don't use double-quotes around %1, it won't work if your VBScript uses logic like the following (code I was using below was from http://jeffkinzer.blogspot.com/2012/06/vbscript-to-convert-excel-to-csv.html). Tested it and it works fine with spaces in the file and folder names:
Dim strExcelFileName
strExcelFileName = WScript.Arguments.Item(0) 'file name to parse
' get path where script is running
strScript = WScript.ScriptFullName
Dim fso
Set fso = CreateObject ("Scripting.FileSystemObject")
strScriptPath = fso.GetAbsolutePathName(strScript & "\..")
Set fso = Nothing
' If the Input file is NOT qualified with a path, default the current path
LPosition = InStrRev(strExcelFileName, "\")
if LPosition = 0 Then 'no folder path
strExcelFileName = strScriptPath & "\" & strExcelFileName
strScriptPath = strScriptPath & "\"
else 'there is a folder path, use it for the output folder path also
strScriptPath = Mid(strExcelFileName, 1, LPosition)
End If
' msgbox LPosition & " - " & strExcelFileName & " - " & strScriptPath
Modify WSH DropHandler ({60254CA5-953B-11CF-8C96-00AA00B8708C}) to {86C86720-42A0-1069-A2E8-08002B30309D} and add this function to convert short path to long:
Function Short2Long(shortFullPath)
dim fs
Set fs = CreateObject("Scripting.FileSystemObject")
Set f = fs.GetFile(shortFullPath)
Set app = CreateObject("Shell.Application")
Short2Long = app.NameSpace(f.ParentFolder.Path).ParseName(f.Name).Path
end function

Unicode to UTF-8

i'm using vbscript to extract data from db2 and write to file.
Writing to file like:
Set objTextFile = objFSO.CreateTextFile(sFilePath, True, True)
that creates file in unicode. But that is xml file and it uses UTF-8.
So when i open xml file with MS XML Notepad it throws error:
'hexadecimal value 0x00 is an invalid character'
So i opening this text file with TextPad and saving in UTF-8. After that XML opens without any problems.
Can i convert file from Unicode to UTF-8 by vbScript?
Using the Stream object to save your file with the utf-8 charset might work better for you; here's a simple .vbs function you could test out on your data:
Option Explicit
Sub Save2File (sText, sFile)
Dim oStream
Set oStream = CreateObject("ADODB.Stream")
With oStream
.Open
.CharSet = "utf-8"
.WriteText sText
.SaveToFile sFile, 2
End With
Set oStream = Nothing
End Sub
' Example usage: '
Save2File "The data I want in utf-8", "c:\test.txt"
Well, in some cases, we need to do this in WSH in a machine without ADO. In this case, keep in your mind that WSH don't create file in UTF-8 format (CreateTextFile method not work with UTF-8), but is completely possible to manipulate an UTF-8 file (appending data). Thinking this, I found an non-orthodoxal solution. Follow this steps:
1) Open a blank NOTEPAD, click FILE > SAVE AS, type a name for the file (like UTF8FileFormat.txt, per example), change the field "Encoding" to UTF-8 and click in [Save]. Leave NOTEPAD.
2) In your WSH you will use the UTF8FileFormat.txt to create your UTF8 text file. To do this, after your FileSystemObject declaration, use the CopyFile method to copy the UTF8FileFormat.txt to a new file (remember to use the Overwrite option) and, then, use the OpenTextFile method to open your new file with ForAppending and NoCreate options. After this, you will can write in this file normally (as in CreateTextFile method). Your new file will be in UTF-8 format. Below follow an example:
'### START
' ### REMEMBER: You need to create the UTF8FileFormat.txt file in a blank
' ### NOTEPAD with UTF-8 Encoding first.
Unicode=-1 : ForAppending=8 : NoCreate=False : Overwrite=True
set fs = CreateObject("Scripting.FileSystemObject")
fs.CopyFile "UTF8FileFormat.txt","MyNewUTF8File.txt",Overwrite
set UTF8 = fs.OpenTextFile("MyNewUTF8File.txt", ForAppending, NoCreate)
UTF8.writeline "My data can be writed in UTF-8 format now"
UTF8.close
set UTF8 = nothing
'### END