How do I create a file of a hash of everything (individually) in a directory tree? - hash

I have several pdf, jpg, png files inside an alphabetical directory tree. How do I produce a file of the hash of each individual file?

There are a lot of ways to do this..
which OS are you using?
What is the exact format to save the results?
Here is an example of a simple bash (version 4) script in Linux that gives you the hash followed by the file name on separate lines, including all sub-directories.
#!/bin/bash
shopt -s globstar
FILES=**
OUTPUT=output.txt
for f in $FILES
do
md5sum $f >> $OUTPUT
done

Related

PDFtk: Merge PDF Problems

I am using PDFtk (Version 2.02, UNIX) for merging PDF and facing below problems in the output PDF:
Initial View of the PDF is changed (should open with Bookmarks Panel and Page)
Bookmarks doesn’t point to the exact linked section as in the separate PDFs (shows fit page of the section)
Original metadata is lost (should retain first PDF's metadata)
Please suggest any workaround for the above points.
Regards,
Umesh
It's a little late to answer, but I came across this question while looking for a solution to the same problem. After taking a look at the man of pdftk I found a solution and I made a little script:
#!/usr/bin/env bash
# pdfcat
array=( $# )
len=${#array[#]}
merged=${array[$len-1]}
pdf2merge=${array[#]:0:$len-1}
pdftk $1 dump_data output metadata
pdftk $pdf2merge cat output $merged
pdftk $merged update_info metadata output out
mv out $merged
rm metadata
exiftool $merged
The script save the metadata of the first PDF file (first argument) and write it to a file called metadata. Then it uses the cat command of pdftk to merge all the files (the output file is the last argument). Finally it loads metadata's content to the metadata of the resulting file before erasing metadata. The last line uses exiftoolto print the metadata of the resulting file in order to check if everything went well.
You can save this script to your home/username/bin directory, make it executable with:
$ chmod u+x scriptname
and then you can use it to merge files with the following syntax:
$ scriptname 1.pdf 2.pdf 3.pdf output.pdf
The resulting output.pdf will have the same metadata as the original 1.pdf file.

Batch processing Pandoc conversions in Windows

I am trying to convert a large number of HTML files into Markdown using Pandoc in Windows, and have found an answer on how to do this on a Mac, but receive errors when attempting to run the following in Windows PowerShell.
find . -name \*.md -type f -exec pandoc -o {}.txt {} \;
Can someone help me translate this to work in Windows?
to convert files in folders recursively try this (Windows prompt command line):
for /r "startfolder" %i in (*.htm *.html) do pandoc -f html -t markdown "%~fi" -o "%~dpni.txt"
For use in a batch file double the %.
Most of the answers here (for ... solutions) are for cmd.exe, not PowerShell.
mb21's answer is on the right track, but has a bug with respect to targeting each input file; also, it is hard to parse visually.
The functionally equivalent PowerShell command is:
Get-ChildItem -File -Recurse -Filter *.md | ForEach-Object {
pandoc -o ($_.FullName + '.txt') $_.FullName
}
Endoro's answer is great, don't get confused by the parameters added to %i.
For helping others, I needed to convert from RST (restructured text) to dokuwiki syntax, so I created a convert.bat with:
FOR /r "startfolder" %%i IN (*.rst) DO pandoc -f rst -t dokuwiki "%%~fi" -o "%%~dpni.txt"
Works for all rst files in folders and subfolders.
If you want to go recursively through a directory and its subdirectories to compile all the files of type, say, *.md, then you can use the batch file I wrote in answer to another question How can I use pandoc for all files in the folder in Windows? . I call it pancompile.bat and the usage is below. Go to the other answer for the code.
Usage: pancompile DIRECTORY FILENAME [filemask] ["options"]
Uses pandoc to compile all documents in specified directory and subdirectories to a single output document
DIRECTORY the directory/folder to parse recursively (passed to pandoc -s);
use quotation marks if there are spaces in the directory name
FILENAME the output file (passed to pandoc -o); use quotation marks if spaces
filemask an optional file mask/filter, e.g. *.md; leave blank for all files
"options" optional list of pandoc commands (must be in quotation marks)
Minimal example: pancompile docs complete_book.docx
Typical example: pancompile "My Documents" "Complete Book.docx" *.md "-f markdown -t docx --standalone --toc"
Using the powershell built-in gci:
gci -r -i *.md |foreach{$docx=$_.directoryname+"\"+$_.basename+".docx";pandoc $_.name -o $docx}
from https://github.com/jgm/pandoc/issues/5429
I created a python script that I've been using to convert a tree of markdown files into a single output file. It's available on github:
https://github.com/andrewrproper/pandoc-folder

Change file extensions of multiple files in a directory with terminal/bash?

I'm developing a simple launchdaemon that copies files from one directory to another. I've gotten the files to transfer over fine.
I just want the files in the directory to be .mp3's instead of .dat's
Some of the files look like this:
6546785.8786.dat
3678685.9834.dat
4658679.4375.dat
I want them to look like this:
6546785.8786.mp3
3678685.9834.mp3
4658679.4375.mp3
This is what I have at the end of the bash script to rename the file extensions.
cd $mp3_dir
mv *.dat *.mp3
exit 0
Problem is the file comes out as *.mp3 instead of 6546785.8786.mp3
and when another 6546785.8786.dat file is imported to $mp3_dir, the *.mp3 is overwritten with the new .mp3
I need to rename just the .dat file extensions to .mp3 and keep the filename.
Ideas? Suggestions?
Try:
for file in *.dat; do mv "$file" "${file%dat}mp3"; done
Or, if your shell has it:
rename .dat .mp3 *.dat
Now, why your command didn't work: first of all, it is more than certain that you only had one file in your directory when it was renamed to *.mp3, otherwise mv would have failed with *.mp3: not a directory.
And mv does NOT do any magic with file globs, it is the shell which expands globs. Which means, if you had this file in the directory:
t.dat
and you typed:
mv *.dat *.mp3
the shell would have expanded *.dat to t.dat. However, as nothing would match *.mp3, the shell would have left it as is, meaning the fully expanded command is:
mv t.dat *.mp3
Which will create a file named, literally, *.mp3.
If, on the other hand, you had several files named *.dat, as in:
t1.dat t2.dat
the command would have expanded to:
mv t1.dat t2.dat *.mp3
But this will fail: if there are more than two arguments to mv, it expects the last argument (ie, *.mp3) to be a directory.
For anyone on a mac, this is quite easy if you have BREW, if you don't have brew then my advice is get it. then when installed just simply do this
$ brew install rename
then once rename is installed just type (in the directory where the files are)
$ rename -s dat mp3 *

matlab, textfile

I have a bunch of text files which have both strings and numbers in it, but the string are just in the first few rows.
I'm trying to write a script which go in to my folder search all the file in the folder and delete the text from the files and write the rest as it is in the new text file.
Does anybody know how?
I don't think this is a good use of MATLAB.
I think you'd be better off scripting this in Python or shell. Here is one way you could do it with tr in shell if you're on *nix or mac and if your files are all in the same directory and all have the file extension .txt:
#!/bin/sh
for i in `ls *.txt`
do
cat $i | tr -d "[:alpha:]" > $i.tr.txt
done
To run. save the code above as a file, make it executable (chmod a+x filename), and run it in the directory with your text files.
If the number of string lines is always the same, you can use textread() with 'headerlines' option to skip over those string lines, then write the entire text buffer out.

Shell Script to update the contents of a folder

I'm a beginner in Unix Shell Scripting and Perl Scripting.
I would like to have an example program that teaches me how to update a file contents on a directory.
The scenario is, there is a directory which has some n number of files.
Among those n number of files, m number of files have been modified.
I need to update the contents of the modified files in the directory.
Give me a simple shell script to do this.
Thanks and Regards,
Vijay
I would do it with find like this:
find your_directory -newermt time_of_last_check -exec modify_script.sh {} \;
where:
your_directory is the directory where you have the files.
time_of_last_check is when you last ran this command
modify_script.sh is the program that you will run to modify the files, it should take one argument, and that is the filename to modify.
In Perl
To Update a File content see perlfaq5, you will find lot of information regarding File manipulation.You will get a lot of examples of file manipulations.
Getting File or Dir Statistics see perl built in function stat.
For Traverse a directory tree, see
File::Find