After playing with images and plots for a while in ipython notebook, I have noticed that %history -g prints out huge number of output lines, mostly what seems to be base64.
The file history.sqlite has size of 1,2 MB
What is being logged in history.sqlite ?
history is supposed to log commands that I typed. All in all, I have not typed more than 1000 lines. What is the base64 stuff doing there. Please don' t tell me it is logging the images/pictures/plots that I generate. This would be insane.
Related
In a perl script, I try to convert svg files to pdf. This works great by just refering to Inkscape:
system "inkscape -D -z --file=$in --export-pdf=$out";
But it is enormously slow even for little 100 KB files, I mean it can be minutes per file, causing the script to fail when running with a time-out constrain, eg. on a webserver.
To speed up, I have read about svg2pdf as a standalone, but never found a binary for Win7 or managed to compile it, even with the libcairo dlls present.
My last idea now is to use the CPAN module Cairo. It makes me hoping that it can convert an svg file to pdf, but in the documentation I only find drawings and surfaces, but no method to write/convert.
Has anyone experience with that?
Making my comment an answer: You could try rsvg-convert which is part of the librsvg library. It's probably faster than Inkscape but it's still an external command.
I hope you are all well.
So my question is about the procedure to open multiple raw data files that are compressed.
My files' names are ordered so I have for example : o_equities_20080528.tas.zip o_equities_20080529.tas.zip o_equities_20080530.tas.zip ...
Thank you all in advance.
How much work this will be depends on whether:
You have enough space to extract all the files simultaneously into one folder
You need to be able to keep track of which file each record has come from (i.e. you can't tell just from looking at a particular record).
If you have enough space to extract everything and you don't need to track which records came from which file, then the simplest option is to use a wildcard infile statement, allowing you to import the records from all of your files in one data step:
infile "c:\yourdir\o_equities_*.tas" <other infile options as per individual files>;
This syntax works regardless of OS - it's a SAS feature, not shell expansion.
If you have enough space to extract everything in advance but you need to keep track of which records came from each file, then please refer to this page for an example of how to do this using the filevar option on the infile statement:
http://www.ats.ucla.edu/stat/sas/faq/multi_file_read.htm
If you don't have enough space to extract everything in advance, but you have access to 7-zip or another archive utility, and you don't need to keep track of which records came from each file, you can use a pipe filename and extract to standard output. If you're on a Linux platform then this is very simple, as you can take advantage of shell expansion:
filename cmd pipe "nice -n 19 gunzip -c /yourdir/o_equities_*.tas.zip";
infile cmd <other infile options as per individual files>;
On windows it's the same sort of idea, but as you can't use shell expansion, you have to construct a separate filename for each zip file, or use some of 7zip's more arcane command-line options, e.g.:
filename cmd pipe "7z.exe e -an -ai!C:\yourdir\o_equities_*.tas.zip -so -y";
This will extract all files from all of the matching archives to standard output. You can narrow this down further via the 7-zip command if necessary. You will have multiple header lines mixed in with the data - you can use findstr to filter these out in the pipe before SAS sees them, or you can just choose to tolerate the odd error message here and there.
Here, the -an tells 7-zip not to read the zip file name from the command line, and the -ai tells it to expand the wildcard.
If you need to keep track of what came from where and you can't extract everything at once, your best bet (as far as I know) is to write a macro to process one file at a time, using the above techniques and add this information while you're importing each dataset.
=== BACKGROUND ===
Some time ago I ripped a lot of music from an internet radio station. Unfortunately something seems to have went wrong, since the length of most files is displayed as being several hours, but they started playing at the correct position.
Example: If a file is really 3 minutes long and it would be displayed as 3 hours, playback would start at 2 hours and 57 minutes.
Before I upgraded my system, gstreamer was in an older version and its behaviour would be as described above, so I didn't pay too much attention. Now I have a new version of gstreamer which cannot handle these files correctly: It "plays" the whole initial offset.
=== /BACKGROUND ===
So here is my question: How is it possible to modify an OGG/Vorbis file in order to get rid of useless initial offsets? Although I tried several tag-edit programs, none of them would allow me to edit these values. (Interestingly enough easytag will display me both times, but write the wrong one...)
I finally found a solution! Although it wasn't quite what I expected...
After trying several other options I ended up with the following code:
#!/bin/sh
cd "${1}"
OUTDIR="../`basename "${1}"`.new"
IFS="
"
find . -wholename '*.ogg' | while read filepath;
do
# Create destination directory
mkdir -p "${OUTDIR}/`dirname "${filepath}"`"
# Convert OGG to OGG
avconv -i "${filepath}" -f ogg -acodec libvorbis -vn "${OUTDIR}/${filepath}"
# Copy tags
vorbiscomment -el "${filepath}" | vorbiscomment -ew "${OUTDIR}/${filepath}"
done
This code recursively reencodes all OGG files and then copies all vorbis comments. It's not a very efficient solution, but it works nevertheless...
What the problem was: I guess it has something to do with the output of ogginfo:
...
New logical stream (#1, serial: 74a4ca90): type vorbis
WARNING: Vorbis stream 1 does not have headers correctly framed. Terminal header page contains additional packets or has non-zero granulepos
Vorbis headers parsed for stream 1, information follows...
Version: 0
Vendor: Xiph.Org libVorbis I 20101101 (Schaufenugget)
...
Which disappears after reencoding the file...
At the rate at which I'm currently encoding it will probably take several hours until my whole media library will be completely reencoded... but at least I verified with several samples that it works :)
I would like to view my CSV files in a column-aligned format from the command line, with something like less, but my CSV files are sometimes gigabytes big, and I'm using a little computer (Netbook, 1GB RAM, 8GB HD, 1GHz processor), so I don't want to waste a lot of memory or processing power viewing the file.
I mention that I'd like to use something like less because I would like to be able to navigate around within the file.
cat FILE | column -s, -t | less is one thought, but cat is still going to try to print the whole file and I'm not sure how much buffering the pipes will use (if any) or what sort of caching less employs.
This question is similar to this other question, but I'm specifically interested in viewing large files using minimal resources preferably already on the machine. I don't presently use VI or EMACS, and think they'd both be overkill here. VI, for instance, would be a 27MB install for a utility acting merely as a viewer.
First of all, less can open oversized files. Second, both vim (which I use with the Largefile plugin and with files over 8 GB) and emacs can do it.
But... Most of the time, viewing a big file in a 80x40 (or a bit bigger) terminal is useless... so you should filter it with something like (f)grep or process it with awk. If you want only the start or end, then there are head and tail.
HTH
Check the tail \ head commands.
Or even better, Download VIM source and compile it. That should be easy enough. Version 5.8 source is 1Mb before decompressing (4MB after). Enjoy.
I know you can use the file test operator -B to test if a file is binary, but how does Perl implement this internally?
From perldoc -f -B:
The -T and -B switches work as follows.
The first block or
so of the file is examined for odd characters such as strange
control codes or characters with the high bit set. If too many
strange characters (>30%) are found, it’s a -B file;
otherwise it’s a -T file. Also, any file containing null in
the first block is considered a binary file.
If -T or -B
is used on a filehandle, the current IO buffer is examined
rather than the first block.
Both -T and -B return true on
a null file, or a file at EOF when testing a filehandle.
Because you have to read a file to do the -T test, on most
occasions you want to use a -f against the file first, as in
"next unless -f $file && -T $file".
According to Chapter 11 of the book Learning Perl:
The answer is **Perl cheats**: it opens the file, looks at the first few thousand bytes, and makes an educated guess. If it sees a lot of null bytes, unusual control characters, and bytes with the high bit set, then that looks like a binary file. If there’s not much weird stuff, then it looks like text. It sometimes guesses wrong. If a text file has a lot of Swedish or French words (which may have characters represented with the high bit set, as some ISO-8859-something variant, or perhaps even a Unicode version), it may fool Perl into declaring it binary. So it’s not perfect, but if you need to separate your source code from compiled files, or HTML files from PNGs, these tests should do the trick.