I have a HTML file which gets generated by some tool. However, I need to use Perl to inset a newline after each line.
In a browser, if I was to view the source of this file, it would look something like this:
<body>
This is line number one
This is another line
This is also
Another line
</body>
Obviously this is just a snippet of the file. However, I think I would need to open this file and insert at the end of each line in order to format it, so that when viewed in a browser, it looks "nice".
Is this the most sensible / efficient method to employ?
Thank you.
If you want the file to appear on the browser in a way that more closely matches what you see when you open it with a text editor, there are several ways.
Perhaps the easiest is to encase the text in a <pre> block, this way the format is kept as-is.
<pre>
Hello,
this is
a test
</pre>
If you absolutely positively want to add a line break that will "work" with the browser, and you must use Perl, you might try something like this:
perl -n -e "print if s/\n/<br>\n/" < source.html > destination.html
This is pretty awful perl but it works.
You can achieve a similar result with sed:
sed -e "s/$/<br>/" < source.html > destination.html
Related
I want to write a simple Markdown to LaTex converter and chose sed as the core component of the converter. It was suitable for everything until now when I hit the following problem: I want to convert a markdown code block (3 backticks) into a LaTex listing. The problem is that I want to work on multiple lines here. I tried the following command but it does not work since sed is processing the input line by line:
sed -E 's/```([[:print:]]*)```/\\begin{lstlisting}/1\\end{lstlisting}/g'
Another idea would be to try to only search and replace only the three backticks, but since every other occurrence needs to be replaced with \end{lstlisting} I do not know if it is possible. A hacky way would be to use three backticks for the start of the code block and four for the end, but that is quite a dirty solution in my opinion.
This might work for you (GNU sed):
sed -E '/^```/{:a;N;/\n```$/!ba
s/^```(.*)```$/\\begin{lstlisting}\1\\end{lstlisting}/}' file
On encountering ``` at the beginning of a line, gather up all lines untill another such line and replace those lines by \begin{lstlisting} and \end{lstlisting}.
I have a list of files .tex file that contain fragments in the tex that build ps pictures which can be slow to process.
There are multiple fragments across multiple files and the end delimiter is \end{pspicture}
% this is the beginning of the fragment
\begin{pspicture}(0,0)(23,5)
\rput{0}(0,3){\crdKs}
\rput(1,3){\crdtres}
\rput(5,3){\crdAh}
\rput(6,3){\crdKh}
\rput(7,3){\crdsixh}
\rput(8,3){\crdtreh}
\rput(12,3){\crdQd}
\rput(13,3){\crdeigd}
\rput(14,3){\crdsixd}
\rput(15,3){\crdfived}
\rput(16,3){\crdtwod}
\rput(20,3){\crdKc}
\rput(21,3){\crdfourc}
\end{pspicture}
I would like to extract the fragments.
I am not sure how to go about this? can awk do this or sed?
They seem to work line by line, rather than work on the whole fragment.
I am not really looking for a solution just a good candidate tool.
sed -En '/^\\begin\{pspicture\}.*$/,/^\\end\{pspicture\}.*$/p' file
Utilising sed with -E for regular expressions.
Use //,// to determine start and ending regular expressions and print all lines from the start to the end.
I have a large PDF (~20mb, 160 mb. uncompressed).
I need to do a find and replace in the text in it, about 1000 times.
Here is what I tried.
Via SVG
Tranform to SVG (inkscape)
Read SVG line by line and do the replace in the file
Transform back to PDF
=> bad output, probably due to some geometric transform matrix in the SVG, the text is not well rendered
Creating ~1000 sed command
Uncompress PDF
Perform each replace with a sed command
Recompress PDF
=> way too long. each sed command takes about 20 sec, leading to several hours of process
Read line-by-line and replace
Uncompress PDF
Read line by line the PDF
find text to be replaced
replace using perl
write line to a new file
Compress the new file
=> due to left data-stream in the uncompressed PDF, the new file is apparently damaged (writing binary as lines of text)
I wonder if it would be possible to read line-by-line the uncompressed PDF, but do the editing directly in it. How could I do this?
I have searched for perl inline editing, but it performs the changes in the whole file at once, while I'd like to edit a single line.
Other ideas are more than welcome ;)
Following advise, I used CAM::PDF, this was the most efficient and simple solution
There is no difference between 2. and 3. Sed reads the input file line by line and writes changed lines into the output file. If you fed -i switch to it, sed just opens the input file and then unlinks (it's what rm do) then opens the output file with the same name and writes into. That's it. No magic involved. So if you damaged content by Perl, but not by sed you do something different than by sed. The main difference is, you can make Perl script way faster for replacing many strings. See Using sed on text files with a csv
The main trick is you can compile regexp for search nad replace which works in linear time.
my %replace = ( foo => 'bar' );
my $re = join '|', map quotemeta, keys %replace;
$re = qr/($re)/;
while (<>) {
s/$re/$replace{$1}/g;
}
You can use it with your original approach, but I would recommend to make it in Perl script which allows you to keep the regexp and replace hash between pdf files. You can also try it to combine with CAM::PDF. There is the example script changepagestring.pl in it. You can also look at PDF::API2 which would require more work but may provide better result. But remember, PDF format is not intended for modification.
You can follow the pdftk steps as described in
How to find and replace text in a existing PDF file with PDFTK (or other command line application)
You can first split the PDF into smaller documents with a few pages each, replace the text and again merge them together - all using pdftk.
There is also the PDFEdit software (http://pdfedit.cz/en/index.html). It is a GUI app with a scripting interface. You can process individual pages and then do a find replace using scripting commands. See if it loads your PDF.
I could not find this answer in the man or info pages, nor with a search here or on Google. I have a file which is, in essence, a text file, but it somehow got screwed up upon saving. (I think there are a few strange bytes at the front of the file accidentally.)
I am able to open the file, and it makes sense, using head or cat, but not using any sort of editor.
In the end, all I wish to do is open the file in emacs, delete the "messy" characters, and save it once cleaned up. The file, however, is huge, so I need something powerful like emacs to be able to open it.
Otherwise, I suppose I can try to create a script to read this in line by line, forcing the script to read it in text format, then write it. But I wanted something quick, since I won't be doing this over & over.
Thanks!
Mike
perl -i.bk -pe 's/[^[:ascii:]]//g;' file
Found this perl one liner here: http://www.perlmonks.org/?node_id=619792
Try M-xfind-file-literally in Emacs.
You could edit the file using hexl-mode, which lets you edit the file in hexadecimal. That would let you see precisely what those offending characters are, and remove them.
It sounds like you either got a different line ending in the file (eg: carriage returns on a *nix system) or it got saved in an unexpected encoding.
You could use strings to grab "printable characters in file". You might have to play with the --encoding though I have only ever used it to grab ascii strings from executable files.
If I have output from two sources that I want to put together on the same line, how would I do that?
In my case I have a file and a program. The file is something like this:
listOfThings=
My program outputs a list of strings on a single line. I want have a small script that runs nightly to put these two things together on a single line. I can't figure out how to do this right though
example batch file
type header.txt > outputfile.txt
myProgram >> outputfile.txt
which results in this:
listOfThings=
foo bar baz etc
I really need the output file to have the list immediately follow the =, but I can't figure out how to do it with the >> operator. (and before anyone suggests it, I can't do something like put a \ on the end of the listOfThings= line, that won't work for what I'm trying to do)
You need to make sure that the contents of header.txt does not have a carriage return linefeed pair in it. Look at it with a hex editor and make sure there is no 0x0d0a in it.
Have you made sure that header.txt doesn't have any line separators in it at all? (Ie, the = is the very last byte of the file).
Also, try copying header.txt to outputfile.txt in case type is appending a line feed on it's own.