stripping out (ascii armored) PGP blocks in text file - sed

given a file formatted as markdown occasionally interspersed with blocks of PGP, how can I strip out the PGP blocks using standard linux tools in a shell script?
The file looks like this gist
(I had to create a gist because of formatting issues)

Using sed you can do that:
sed '/^-----BEGIN PGP/,/^-----END PGP/d' file
In short: you define a range of lines between two patterns /pat1/,/pat2/ that are deleted (d).

Related

Compare filenames with different encoding in Octave

I'm trying to accomplish following task in Octave:
Read filename from text file
Search for this file in particular location on hard drive
My script works for most files, but for certain files containing unicode characters I'm unable to match the filename from textfile with filename as it appears in the file system.
Filenames in textfile are in UTF-8 encoding and I read them in Octave with function fgetl().
Filenames from file system are obtained via function readdir(). I'm on Windows, NTFS file system.
For example, one problematic filename contains character "Č".
When printed out in Octave console, the characters appear exactly the same. However, a HEX viewer reveals that the characters are not actually the same. In the first case the character is encoded as 0x010C, in the second case as 0x0043 + 0x030C. Comparing both of them via strcmp() fails, of course.
What I tried to do is to omitt all non-ASCII characters from the filename and then compare them. But this didn't work, probably because in the second variant the first part of the character (0x0043) is actually ASCII.
Now I'm looking for some way of converting one format to another to be able to compare them. Any ideas?
EDIT:
As I discovered later, the character Č in the filename on Windows is actually written as C+ˇ, which is just another way you can write that character. So the difference probably insn't in encoding standard, but in 2 different ways to achieve 1 visible character (glyph).
This question basically then changes to a task of matching characters written "at once" and corresponding pair of letter+combining character.

How to replace the same character in multiple text files?

So I have over 100 text files, all of which are over the size required to be opened in a normal text editor (eg; notepad, notepad++). Meaning I cannot use those mentioned.
All text files contain the same format, they contain:
abc0001:00000009a
abc0054:000000809a
abc00888:054450000009a
and so on..
I was wondering, how do I replace the ":" in each of those text files to then be "\n" (regex for new line)
So then it would be:
abc0001
00000009a
abc0054
000000809a
abc00888
054450000009a
How would I do this to all of the 100 text files, without doing this manually and individually. (if there's any way?)
Any help is appreciated.
You can use sed. The following does something similar to what you want. The question concerns Unix, but a lot of Unix utilities have been ported to MS Windows (even sed): http://gnuwin32.sourceforge.net/packages/sed.htm
UNIX: Replace Newline w/ Colon, Preserving Newline Before EOF
Something like (where you provide your text file as input, and the output becomes your new text file):
sed 's/:/\n/g'

Pandoc: Prevent conversion of special characters in LaTeX

I am converting a MS Word-document (.docx) with Pandoc to LaTeX (.tex). The .docx-file contains backslashes and brackets which Pandoc converts to the corresponding LaTeX-commands (e.g. \textbackslash) what I do not want.
How can I prevent Pandoc from converting special characters?
I think pandoc is actually doing what you want. You cannot have plain backslashes in LaTeX since they would be interpreted as commands, so instead you have to use \textbackslash{}, which is the command to print a simple plain backslash in LaTeX. Try generating a PDF with LaTeX and you'll see what I mean.
If you actually want to include LaTeX commands in your Word file, I think that's not possible. (How would pandoc know whether the word user wanted to write a backslash or a LaTeX command?) However, you can transform your word doc to markdown, adjust it (in pandoc markdown you actually can include raw TeX), then export it to LaTeX.
pandoc input.docx -o file.md
# edit file.md now
pandoc file.md -o output.tex
For a more automated solution, you could look into pandoc filters. Then it's up to you how to solve the ambiguity of backslashes...

Ensure format CRLF if not present

I have a script, MM.pl, which is the “workhorse”, and a simple “.patch” that it reads from. It targets an original text file from a 2004 program usually a text file of the .txt or .ini extention.It searches the target file for the "old" data from the patch file and if found substitutes it with the "new" data from the patch file. To find the problem I have programmed the pl to hexdump the old and new data and the target file. Viola! The target file is formatted with CRLF and the patch file old and new only contain LF. I need a solution that will ensure the patch file old/new data contains the CRLF format. This is used by Mac and windows users and the patch file can be generated by any text editor. Thats why I need it to check and correct the EOL format to ensure comapatability with the CRLF format.
You can use a regular expression to replace single \n with \r\n.
I don't have a Perl interpreter at hand, but something like this should work:
$string =~ s/!\r\n/\r\n/g;

Charset conversion from XXX to utf-8, command line

I have a bunch of text files that are encoded in ISO-8851-2 (have some polish characters). Is there a command line tool for linux/mac that I could run from a shell script to convert this to a saner utf-8?
Use iconv, for example like this:
iconv -f LATIN1 -t UTF-8 input.txt > output.txt
Some more information:
You may want to specify UTF-8//TRANSLIT instead of plain UTF-8. To quote the manpage:
If the string //TRANSLIT is appended to to-encoding, characters being converted are transliterated when needed and possible. This means that when a character cannot be represented in the target character set, it can be approximated through one or several similar looking characters. Characters that are outside of the target character set and cannot be transliterated are replaced with a question mark (?) in the output.
For a full list of encoding codes accepted by iconv, execute iconv -l.
The example above makes use of shell redirection. Make sure you are not using a shell that mangles encodings on redirection – that is, do not use PowerShell for this.
recode latin2..utf8 myfile.txt
This will overwrite myfile.txt with the new version. You can also use recode without a filename as a pipe.
GNU 'libiconv' should be able to do the job.