I have a list of 800 words in an .rtf file in this format:
word1
word2
.
.
word799
word800
Is there any way I can create a plist of Strings out of this? I really do not want to add each word one at a time.
Thanks A Lot!
What I would do is copy and paste the list of words into a straight up text file. Then you can write a command line script in the language of your choice (I'd choose Python, for example) that will read in this text file and invoke the defaults utility from the command line to write your values. You can use the defaults utility like:
$ defaults write $PLIST_FILE $PROPERTY_NAME $VALUE # sets
Related
I have a large PDF (~20mb, 160 mb. uncompressed).
I need to do a find and replace in the text in it, about 1000 times.
Here is what I tried.
Via SVG
Tranform to SVG (inkscape)
Read SVG line by line and do the replace in the file
Transform back to PDF
=> bad output, probably due to some geometric transform matrix in the SVG, the text is not well rendered
Creating ~1000 sed command
Uncompress PDF
Perform each replace with a sed command
Recompress PDF
=> way too long. each sed command takes about 20 sec, leading to several hours of process
Read line-by-line and replace
Uncompress PDF
Read line by line the PDF
find text to be replaced
replace using perl
write line to a new file
Compress the new file
=> due to left data-stream in the uncompressed PDF, the new file is apparently damaged (writing binary as lines of text)
I wonder if it would be possible to read line-by-line the uncompressed PDF, but do the editing directly in it. How could I do this?
I have searched for perl inline editing, but it performs the changes in the whole file at once, while I'd like to edit a single line.
Other ideas are more than welcome ;)
Following advise, I used CAM::PDF, this was the most efficient and simple solution
There is no difference between 2. and 3. Sed reads the input file line by line and writes changed lines into the output file. If you fed -i switch to it, sed just opens the input file and then unlinks (it's what rm do) then opens the output file with the same name and writes into. That's it. No magic involved. So if you damaged content by Perl, but not by sed you do something different than by sed. The main difference is, you can make Perl script way faster for replacing many strings. See Using sed on text files with a csv
The main trick is you can compile regexp for search nad replace which works in linear time.
my %replace = ( foo => 'bar' );
my $re = join '|', map quotemeta, keys %replace;
$re = qr/($re)/;
while (<>) {
s/$re/$replace{$1}/g;
}
You can use it with your original approach, but I would recommend to make it in Perl script which allows you to keep the regexp and replace hash between pdf files. You can also try it to combine with CAM::PDF. There is the example script changepagestring.pl in it. You can also look at PDF::API2 which would require more work but may provide better result. But remember, PDF format is not intended for modification.
You can follow the pdftk steps as described in
How to find and replace text in a existing PDF file with PDFTK (or other command line application)
You can first split the PDF into smaller documents with a few pages each, replace the text and again merge them together - all using pdftk.
There is also the PDFEdit software (http://pdfedit.cz/en/index.html). It is a GUI app with a scripting interface. You can process individual pages and then do a find replace using scripting commands. See if it loads your PDF.
I'm using MATLAB under Windows, and trying to display (dump) the contents of a text file in the command shell. It seems like overkill to open a small file in the editor, or to load the file to use disp.
Use type and specify the explicit file name (including the extension), for instance:
type('myfile.txt')
As well as type, there's also dbtype which lets you pick a start and end range to print, and shows line numbers - handy for listing source files.
We received as input in our application (running on Windows) a list of files. These files were automatically extracted from a database with a script.
Apparently some of the names are containing special characters (like accents) and these characters are rendered as '©' on our side.
How can rename programmatically these text files (around 900'000) to get rid of this character?
We cannot change the source neither re-extract the files.
The problem is that because of this character another program involved with our system does not accept the files.
Have a look at the unix command rename. It allows you to apply a perl regex to the names of a bunch of files. In this case you might want something like:
$ rename 's/[^a-zA-Z0-9]//' *
In debian the rename command is part of the perl package. It should also be available on CPAN.
I ended up creating a new script that reads the input files and search for special characters in their title.
It was quite easy indeed:
string filename = filename.Replace("©", "e");
Since the '©' is in the filename, the script (in C#) is able to recognize it and replace the match accordingly. In this way I can loop through all the folders and subfolders simply reading the filename and change specials characters.
Thank you all for the contributions!
I have a text file called playlist.pls which is dynamically created, and in the text file I have thousands of lines that look like this:
File000001=/home/ubu32sc/Documents/octave/pre/wavefn_0001.wav
File000002=/home/ubu32sc/Documents/octave/pre/wavefn_0002.wav
File000003=/home/ubu32sc/Documents/octave/pre/wavefn_0003.wav
File000004=/home/ubu32sc/Documents/octave/pre/wavefn_0004.wav
File000005=/home/ubu32sc/Documents/octave/pre/wavefn_0005.wav
File000006=/home/ubu32sc/Documents/octave/pre/wavefn_0006.wav
File000007=/home/ubu32sc/Documents/octave/pre/wavefn_0007.wav
File000008=/home/ubu32sc/Documents/octave/pre/wavefn_0008.wav
File000009=/home/ubu32sc/Documents/octave/pre/wavefn_0009.wav
File000010=/home/ubu32sc/Documents/octave/pre/wavefn_0010.wav etc...
I need to have the data in the text file split into several different files.
example:
The play1.pls file would contain:
File000001=/home/ubu32sc/Documents/octave/pre/wavefn_0001.wav
File000002=/home/ubu32sc/Documents/octave/pre/wavefn_0002.wav
File000003=/home/ubu32sc/Documents/octave/pre/wavefn_0003.wav
The play2.pls file would contain:
File000004=/home/ubu32sc/Documents/octave/pre/wavefn_0004.wav
File000005=/home/ubu32sc/Documents/octave/pre/wavefn_0005.wav
File000006=/home/ubu32sc/Documents/octave/pre/wavefn_0006.wav
The play3.pls file would contain:
File000007=/home/ubu32sc/Documents/octave/pre/wavefn_0007.wav
File000008=/home/ubu32sc/Documents/octave/pre/wavefn_0008.wav
File000009=/home/ubu32sc/Documents/octave/pre/wavefn_0009.wav
The play4.pls file would contain:
File000010=/home/ubu32sc/Documents/octave/pre/wavefn_0010.wav etc...
What's the best way to go about doing this I was thinking about using octave/matlab to do this but I think this would be over kill and resource intensive to run a for loop on a text file with 10's of thousands of lines. Is grep or perl the proper thing to use and or should I use another type of program? and if so how could I do this with it?
I'm using Ubuntu 32 10.04 6 gig ram
Thanks
As you mentionned it, Matlab / Octave seems to be an overkill if you just want to split a text file into multiple files.
There are a thousand ways to do this (espcially on a unix system) so just pick yours.
One of the possibilities is to use split which goes like this:
split --lines=3 file prefix
I am using a perl script to read in a file, but I'm not sure what encoding the file is in. Basically, my file is a list of book titles, but each book has other info associated with it (author, publication date, etc). So each book title is within a discrete chunk of data for the book. So I iterate through the file line by line until I find the regular expression '/Book Title: (.*)/' and take what's in the paren. Then, I create a separate .txt file with the name of the text file being my book. However, in my unix server, when I look at the name of the file, it's actually not, for example, 'LordOfTheFlies.txt' but rather 'LordOfTheFlies^M.txt'
What is this '^M'? Is that a weird end of line encoding I'm not taking into account? I tried chomp but it doesn't seem to be working. What is the best file encoding for working with perl?
It's the additional carriage return character that Windows systems insert before line feed characters (M == 13th letter, hence ASCII 13 is visualised as ^M).
It has nothing to do with file encoding, it's just the line ending policy biting you. Perl is usually good at handling line ending characters correctly, but if they occur somewhere else than the end of a line you have to do it yourself. You can use s/\r// instead of chomp() to get them out.
Before processing the file, you need to know the encoding of the file, which is determined by the producer of the file.
That "^M" is control-M, which is a carriage return, and is not needed in Unix file systems.Looks like the file is created in Unix and transferred to Windows. It can also be added with ftp when text file are transfered as binaries.
Try chop, instead of 'chomp'. Chomp removes the 'new line character'. s/\r// is also good.
For your general question, you might want to use appropriate module for the file type you have to make your life easier and better with Perl.