Batch processing Pandoc conversions in Windows - powershell

I am trying to convert a large number of HTML files into Markdown using Pandoc in Windows, and have found an answer on how to do this on a Mac, but receive errors when attempting to run the following in Windows PowerShell.
find . -name \*.md -type f -exec pandoc -o {}.txt {} \;
Can someone help me translate this to work in Windows?

to convert files in folders recursively try this (Windows prompt command line):
for /r "startfolder" %i in (*.htm *.html) do pandoc -f html -t markdown "%~fi" -o "%~dpni.txt"
For use in a batch file double the %.

Most of the answers here (for ... solutions) are for cmd.exe, not PowerShell.
mb21's answer is on the right track, but has a bug with respect to targeting each input file; also, it is hard to parse visually.
The functionally equivalent PowerShell command is:
Get-ChildItem -File -Recurse -Filter *.md | ForEach-Object {
pandoc -o ($_.FullName + '.txt') $_.FullName
}

Endoro's answer is great, don't get confused by the parameters added to %i.
For helping others, I needed to convert from RST (restructured text) to dokuwiki syntax, so I created a convert.bat with:
FOR /r "startfolder" %%i IN (*.rst) DO pandoc -f rst -t dokuwiki "%%~fi" -o "%%~dpni.txt"
Works for all rst files in folders and subfolders.

If you want to go recursively through a directory and its subdirectories to compile all the files of type, say, *.md, then you can use the batch file I wrote in answer to another question How can I use pandoc for all files in the folder in Windows? . I call it pancompile.bat and the usage is below. Go to the other answer for the code.
Usage: pancompile DIRECTORY FILENAME [filemask] ["options"]
Uses pandoc to compile all documents in specified directory and subdirectories to a single output document
DIRECTORY the directory/folder to parse recursively (passed to pandoc -s);
use quotation marks if there are spaces in the directory name
FILENAME the output file (passed to pandoc -o); use quotation marks if spaces
filemask an optional file mask/filter, e.g. *.md; leave blank for all files
"options" optional list of pandoc commands (must be in quotation marks)
Minimal example: pancompile docs complete_book.docx
Typical example: pancompile "My Documents" "Complete Book.docx" *.md "-f markdown -t docx --standalone --toc"

Using the powershell built-in gci:
gci -r -i *.md |foreach{$docx=$_.directoryname+"\"+$_.basename+".docx";pandoc $_.name -o $docx}
from https://github.com/jgm/pandoc/issues/5429

I created a python script that I've been using to convert a tree of markdown files into a single output file. It's available on github:
https://github.com/andrewrproper/pandoc-folder

Related

Git Bash find exec recursively on folders and files containing spaces

Question: In Git Bash on windows, how would you run the following in a way that it will also search folders with spaces in the name, and execute on files with spaces in the name?
$ find ./ -type f -name '*.png' -exec sh -c 'cwebp -q 75 $1 -o "${1%.png}.webp"' _ {} \;
Context I'm running Git Bash on windows, trying to execute a command on all found .png files to convert them to .webp format. It works for all files without spaces in the path, but it's failing to find files with spaces in the filename or files within folders that have spaces in the folder name.A few considerations:
I have many, many levels of folders to iterate through, and I can't run this command separately for each. I really need the recursion to work.I cannot change the folder names; it will break other dependencies (nor did I create the folder or filenames originally, so cut me some slack!)I arrived here by following the suggestions from this article: https://www.smashingmagazine.com/2018/07/converting-images-to-webp/the program, to my knowledge, doesn't ship with any built-in recursive command... golly that'd be handy
Any help you can provide will be appreciated. Thanks!

Rename and overwrite files using wildcard in Windows

I am working on a script for auto deployment, where I need to replace my files with the same filenames.
For example, I have the following files in my current directory
deployment.properties
wrapper.conf
config.properties
Later, I will generate another set of files like this
deployment.properties.tokenized
wrapper.conf.tokenized
config.properties.tokenized
Lastly, I want to replace the existing config files (in the first code block) using the *.tokenized version and remove the tokenized files.
In Linux, the following can do the job. But I don't know how to do in Windows
for f in *tokenized;
do mv "$f" "`echo $f | sed s/tokenized//`";
done
I tried to use powershell's move-item, rename-item but still cannot figure out the right way to do it. Could somebody help? bat / powershell scripts are both welcomed. Using loop is also okay. Thank you.
It is almost the same code but in cmd / batch files we have access to the elements of the file name.
From command line
for %a in (*.tokenized) do move /y "%a" "%~na"
Or, for a batch file (you need to escape the for replaceable parameter)
for %%a in (*.tokenized) do move /y "%%a" "%%~na"
As the extension of the file (the text after the last dot) is .tokenized, when you request just the name (without extension) of the file being referenced (%%~na) you get the original file name.
This PowerShell script should do the job:
Get-ChildItem *.tokenized | % {
move $_.Name ([System.IO.Path]::GetFileNameWithoutExtension($_.Name)) -Force
}

How do I create a file of a hash of everything (individually) in a directory tree?

I have several pdf, jpg, png files inside an alphabetical directory tree. How do I produce a file of the hash of each individual file?
There are a lot of ways to do this..
which OS are you using?
What is the exact format to save the results?
Here is an example of a simple bash (version 4) script in Linux that gives you the hash followed by the file name on separate lines, including all sub-directories.
#!/bin/bash
shopt -s globstar
FILES=**
OUTPUT=output.txt
for f in $FILES
do
md5sum $f >> $OUTPUT
done

Recursively replace colons with underscores in Linux

First of all, this is my first post here and I must specify that I'm a total Linux newb.
We have recently bought a QNAP NAS box for the office, on this box we have a large amount of data which was copied off an old Mac XServe machine. A lot of files and folders originally had forward slashes in the name (HFS+ should never have allowed this in the first place), which when copied to the NAS were all replaced with a colon.
I now want to rename all colons to underscores, and have found the following commands in another thread here: pitfalls in renaming files in bash
However, the flavour of Linux that is on this box does not understand the rename command, so I'm having to use mv instead. I have tried using the code below, but this will only work for the files in the current folder, is there a way I can change this to include all subfolders?
for f in *.*; do mv -- "$f" "${f//:/_}"; done
I have found that I can find al the files and folders in question using the find command as follows
Files:
find . -type f -name "*:*"
Folders:
find . -type d -name "*:*"
I have been able to export a list of the results above by using
find . -type f -name "*:*" > files.txt
I tried using the command below but I'm getting an error message from find saying it doesn't understand the exec switch, so is there a way to pipe this all into one command, or could I somehow use the files I exported previously?
find . -depth -name "*:*" -exec bash -c 'dir=${1%/*} base=${1##*/}; mv "$1" "$dir/${base//:/_}"' _ {} \;
Thank you!
Vincent
So your for loop code works, but only in the current dir. Also, you are able to use find to build a file with all the files with : in the filename.
So, as you've already done all this, I would just loop over each line of your file, and perform the same mv command.
Something like this:
for f in `cat files.txt`; do mv $f "${f//:/_}"; done
EDIT:
As pointed out by tripleee, using a while loop is a better solution
EG
while read -r f; do mv "$f" "${f//:/_}"; done <files.txt
Hope this helps.
Will

Translate a Unix1Liner to PowerShell

I would like to translate the following Unix 1 Liner to PowerShell.
Synopsis of the command:
This command will search recursively form the PWD (pressent working directory) for any file with the extenstion .jsp, and look inside the file for a simple string match of 'logoutButtonForm'. If it finds a match, it will print the file name and the text that it matched.
find . -name "*.jsp" -exec grep -aH "logoutButtonForm" {}\;
I am new to power shell and have done some googling/binging but have not found a good answer yet.
ls . -r *.jsp | Select-String logoutButtonForm -case
I tend to prefer -Filter over -Include. Guess I never trusted the -Exclude/-Include parameters after observing buggy behavior in PowerShell 1.0. Also, -Filter is significantly faster than using -Include.