iPython notebook not loading: too much output - ipython

I have an iPython notebook file which is not loading, presumably because there is too much output in the file (thousands of lines of results printed, old computer).
I can edit the file with notepad without problems, but copying and then cleaning the code from there cell by cell is very time-consuming.
Is there a way to recover the code differently, or to ask iPython notebook to only load the code and not print all the past outputs when opening the file?

Here is an output removal script I found on Github. Credits to the author.
import sys
import io
from IPython.nbformat import current
def remove_outputs(nb):
for ws in nb.worksheets:
for cell in ws.cells:
if cell.cell_type == 'code':
cell.outputs = []`
if __name__ == '__main__':
fname = sys.argv[1]
with io.open(fname, 'r') as f:
nb = current.read(f, 'json')
remove_outputs(nb)
print current.writes(nb, 'json')

Related

Problems importing data in Stata using "file" and a local macro

I'm attempting to import data from a number of log files. Here's how I'm approaching it:
I have the log files in a separate directory and I generate a txt file with the list of filenames in that directory. I then read through that txt file and subsequently use the filenames in a loop with an import command to bring the data in. This process worked when I last run the do file about 6 months ago and now it's not working.
The basics of the code look like this:
cd "filepath\logs\"
! dir cpuusage*.log /a-d /b /o:-d >"filepath\filelist.txt"
file open myfile using "filepath\filelist.txt",read
file read myfile line
import delimited using `line', delim(" ") varnames(nonames)
The result of that import command is (0 vars, 0 obs) despite the fact that filelist.txt has a list of 14 filenames.
I'm a novice so I'm really hoping there's something simple and obvious that I'm overlooking. I still don't understand why this exact method worked six months ago... Any thoughts?
I think you can use fs for this:
cd "filepath\logs\"
fs cpuusage*.log
foreach f in `r(files)' {
import delimited using `f', delim(" ") varnames(nonames)
}
It's hard to help more without knowing what your log files look like. I would suggest trying to import one using the menu to get the syntax right.

ipython notebook and patsy categorical variable (formula)

I had the same error as in this question.
What is weird, is that it works (with the answer provided) in an ipython shell, but not in an ipython notebook. But it's related to the C() operator, because without it works (but not as an operator)
Same with that example :
import statsmodels.formula.api as smf
import numpy as np
import pandas
url = "http://vincentarelbundock.github.com/Rdatasets/csv/HistData/Guerry.csv"
df = pandas.read_csv(url)
df = df[['Lottery', 'Literacy', 'Wealth', 'Region']].dropna()
df.head()
mod = smf.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df)
res = mod.fit()
print res.summary()
This works well, both in the ipython notebook and in the shell, and patsy treats Region as categorical variable because it's composed of strings.
but if I try this (as in the tutorial) :
res = smf.ols(formula='Lottery ~ Literacy + Wealth + C(Region)', data=df).fit()
I got an error in the ipython notebook:
TypeError: 'Series' object is not callable
Note that both in the notebook and in the shell statsmodels and patsy are the same versions (0.5.0 and 0.3.0 respectively)
Do you have the same error ?
I eventually found the problem.
It is because there was a variable called C that I used way earlier in the notebook. What is surprising though, is that it was not a column of the df I used.
Anyway, the basic solution is :
del C
before running the regression.
Hope this will help people facing the same problem.
But I'm still not sure whether this is an expected behavior of patsy.

Seperate and process odd named text files from an even named text files in Python

I am new to programming & python and is trying to write a program to process astronomical data.I have a huge list of files naming like ww_12m_no0021.spc, ww_12m_no0022.spc and so on. I want to move all the odd numbered files and even numbered files in two seperate folders.
import shutil
import os
for file in os.listdir("/Users/asifrasha/Desktop/python_test/input"):
if os.path.splitext(file) [1] == ".spc":
print file
shutil.copy(file, os.path.join("/Users/asifrasha/Desktop/python_test/output",file))
which is actually copying all the spc file to a different folder. I am struggling a bit on how I can only copy the odd number files (no0021, no0023…) to a seperate folder. Any help or suggestions will be much appreciated!!!
import os
import shutil
# Modify these to your need
odd_dir = "/Users/asifrasha/Desktop/python_test/output/odd"
even_dir = "/Users/asifrasha/Desktop/python_test/output/even"
for filename in os.listdir("/Users/asifrasha/Desktop/python_test/input"):
basename, extenstion = os.path.splitext(filename)
if extenstion == ".spc":
num = basename[-4:] # Get the numbers (i.e. the last 4 characters)
num = int(num, 10) # Convert to int (base 10)
if num % 2: # Odd
dest_dir = odd_dir
else: # Even
dest_dir = even_dir
dest = os.path.join(dest_dir, filename)
shutil.copy(filename, dest)
Obviously you can simplify it a bit; I'm just trying to be as clear as possible.
Assuming your files are named ww_12m_no followed by the number:
if int(os.splitext(file)[0][9:])%2==1:
#file is oddly numbered, go ahead and copy...
If the length of the first half of the name changes, I would use regex... I didn't test the code, but that's the gist of it. I'm not sure this question belongs here though...

opening a batch file that opens a text file in python

I am writing a script that can execute a batch file, which needs to open a file in the same folder first. My current code is:
from subprocess import Popen
p = Popen("Mad8dl.bat <RUNTHISTO.txt>", cwd=r"C:\...\test")
stdout, stderr = p.communicate()
where the ... is just the path to the folder. However, everytime I run it I get the syntax error:
The syntax of the command is incorrect
Any help regarding the syntax would be greatly appreciated.
First, you should probably remove the < and > angle brackets from your code; just pass the filename, without any brackets, to your batch file. (Unless your filename really does contain < and > characters, in which case I really want to know how you managed it since those characters are forbidden in filenames in Windows).
Second, your code should look like:
from subprocess import Popen, PIPE
p = Popen(["Mad8dl.bat", "RUNTHISTOO.txt"], cwd=r"C:\...\test", stdout=PIPE, stderr=PIPE)
stdout, stderr = p.communicate()
Note the list containing the components of the call, rather than a single string. Also note that you need to specify stdout=PIPE and stderr=PIPE in your Popen() call if you want to use communicate() later on.

Spotify Tech Puzzle - stdin in Python

I'm trying to solve the bilateral problem on Spotify's Tech Puzzles. http://www.spotify.com/us/jobs/tech/bilateral-projects/ I have something that is working on my computer that reads input from a file input.txt, and it outputs to ouput.txt. My problem is that I cannot figure out how to make my code work when I submit it where it must read from stdin. I have looked at several other posts and I don't see anything that makes sense to me. I see some people just use raw_input - but this produces a user prompt?? Not sure what to do. Here is the protion of my code that is suposed to read the input, and write the output. Any suggestions on how this might need changed? Also how would I test the code once it is changed to read from stdin? How can I put test data in stdin? The error i get back from spotify says Run Time Error - NameError.
import sys
# Read input
Input = []
for line in sys.stdin.readlines():
if len(line) <9:
teamCount = int(line)
if len(line) > 8:
subList = []
a = line[0:4]
b = line[5:9]
subList.append(a)
subList.append(b)
Input.append(subList)
##### algorithm here
#write output
print listLength
for empWin in win:
print empWin
You are actually doing ok.
for line in sys.stdin.readlines():
will read lines from stdin. It can however be shortened to:
for line in sys.stdin:
I don't use Windows, but to test your solution from a command line, you should run it like this:
python bilateral.py < input.txt > output.txt
If I run your code above like that, I see the error message
Traceback (most recent call last):
File "bilateral.py", line 20, in <module>
print listLength
NameError: name 'listLength' is not defined
which by accident (because I guess you didn't send in that) was the error the Spotify puzzle checker discovered. You have probably just misspelled a variable somewhere.