ipython notebook and patsy categorical variable (formula) - ipython

I had the same error as in this question.
What is weird, is that it works (with the answer provided) in an ipython shell, but not in an ipython notebook. But it's related to the C() operator, because without it works (but not as an operator)
Same with that example :
import statsmodels.formula.api as smf
import numpy as np
import pandas
url = "http://vincentarelbundock.github.com/Rdatasets/csv/HistData/Guerry.csv"
df = pandas.read_csv(url)
df = df[['Lottery', 'Literacy', 'Wealth', 'Region']].dropna()
df.head()
mod = smf.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df)
res = mod.fit()
print res.summary()
This works well, both in the ipython notebook and in the shell, and patsy treats Region as categorical variable because it's composed of strings.
but if I try this (as in the tutorial) :
res = smf.ols(formula='Lottery ~ Literacy + Wealth + C(Region)', data=df).fit()
I got an error in the ipython notebook:
TypeError: 'Series' object is not callable
Note that both in the notebook and in the shell statsmodels and patsy are the same versions (0.5.0 and 0.3.0 respectively)
Do you have the same error ?

I eventually found the problem.
It is because there was a variable called C that I used way earlier in the notebook. What is surprising though, is that it was not a column of the df I used.
Anyway, the basic solution is :
del C
before running the regression.
Hope this will help people facing the same problem.
But I'm still not sure whether this is an expected behavior of patsy.

Related

Unicode to Latin in Teradata Conversion

I have been trying to convert Unicode strings to Latin in Teradata version 16.20.32.23. I have seen many online forums but I was not able to formulate a solution. Following are some of the strings that I was unable to convert:
hyödyt
löydät
I have tried following solution but function translate_chk does not seems to work.
SELECT CASE WHEN Translate_Chk ( 'hyödyt' using UNICODE_TO_LATIN) <> 0
THEN
''
WHEN Translate_Chk ( 'hyödyt' using UNICODE_TO_LATIN ) = 0
THEN
Translate ( 'hyödyt' using UNICODE_TO_LATIN WITH ERROR)
END AS transalated
The error I receive is SELECT FAILED. 6706: The string contains untranslatable character.
I think I have reached a dead end, could anyone help me here?
I'm not familiar with Teradata, but the strings you have are double-mis-decoded as Windows-1252, which is a variation of ISO-8859-1 a.k.a latin1. Example to fix in Python:
>>> s='hyödyt'
>>> s.encode('cp1252').decode('utf8').encode('cp1252').decode('utf8')
'hyödyt'
>>> s='löydät'
>>> s.encode('cp1252').decode('utf8').encode('cp1252').decode('utf8')
'löydät'
So not a Teradata solution, but should help you figure it out.
Following is the python code I used, it might help someone. In order to use below code you need to follow below instructions:
Download chilkat package as per your python release:
https://www.chilkatsoft.com/python.asp#winDownloads
Follow installation guidelines in below URL:
https://www.chilkatsoft.com/installWinPython.asp
Open IDLE shell and run the following code
import sys
import chilkat
charset = chilkat.CkCharset()
charset.put_FromCharset("utf-8")
charset.put_ToCharset("ANSI")
charset.put_ToCharset("Windows-1252")
success = charset.ConvertFile("source_file_name.csv","target_file_name.csv")
if (success != True):
print(charset.lastErrorText())
sys.exit()
print("Success.")

Why is Jupyter using a column's values to populate column names?

I'm using an SPSS .sav file that has typical column names like name, type, width, and so forth. The 'names' column labels the rows m1, I1, I2, etc.
Here's the Jupyter notebook:
https://imgur.com/9hXuL7u
import pandas as pd
df = pd.read_spss('./Data.sav')
df.head()
As you can see, the column names are the entries for 'Name':
https://imgur.com/ZVMS0F0
I.e., rather than 'name', 'type', 'width' as column names, there are the values for 'name': m1, I1, I2, etc.
I'm quite new to Jupyter and SPSS and have no idea where to start.
EDIT:
Following Rahul Singh's suggestions, I've added header=None, though read_spss() doesn't seem to recognize the argument.
import pandas as pd
df = pd.read_spss('./Data.sav',header=None)
df.head()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-39-77d006c914c9> in <module>
1 import pandas as pd
----> 2 df = pd.read_spss('./Data_ANQAR_Wave39.sav',header=None)
3 df.head()
TypeError: read_spss() got an unexpected keyword argument 'header'
`
Actually this issue is not with jupyter it's with pandas (we should not say it issue :) )
Usually when you read data from any file(.csv, .txt etc) and header (column names) is is not available in it then pandas will automaticly take first row as header.
To get rid of this problem you can provide `header=None`
Code :
import savReaderWriter
import numpy as np
import pandas as pd
# Convert .sav file into .csv
reader_np = savReaderWriter.SavReaderNp("Data.sav")
array = reader_np.to_structured_array("outfile.dat")
np.savetxt("Data.csv", array, delimiter=",")
reader_np.close()
# Read .csv file without header
df = pd.read_spss("Data.csv",header=None)
df.head()
What you are looking at in SPSS is not the data (data editor) but the metadata (variable view) which shows characteristics for your columns rather than the data itself. Pandas is reading the data correctly, in SPSS switch to the Data Editor to see what I mean.

How to execute multi-line Maxima code from Octave / Matlab

I can execute Maxima code from Octave like this and it works:
mm=maxima("diff(a*x^3-b*x^2+x+d,x,1)")
but how can I execute multi line commands?
Example code below that works in Maxima
kill(all)$
numer:true$
ratprint:false$
angle_in_bits:3779$
total_fs:18136$
s:solve(angle_deg=(angle_in_bits/total_fs*360),angle_deg)$
round(s);
[round(angle_deg)=75]
When I try the code below in Octave I get syntax errors
mm=maxima("kill(all)$
numer:true$
ratprint:false$
angle_in_bits:3779$
total_fs:18136$
s:solve(angle_deg=(angle_in_bits/total_fs*360),angle_deg)$
round(s);")
Errors that I get:
>>> mm=maxima("kill(all)$
numer:true$
ratprint:false$
angle_in_bits:3779$
total_fs:18136$
s:solve(angle_deg=(angle_in_bits/total_fs*360),angle_deg)$
round(s);")
error: unterminated character string constant
parse error:
syntax error
>>> mm=maxima("kill(all)$
^
>>> _ide_reload_variables_list( whos() );
error: 'numer' undefined near line 1 column 1
error: invalid base value in colon expression
error: 'ratprint' undefined near line 1 column 1
error: invalid base value in colon expression
parse error:
syntax error
>>> angle_in_bits:3779$
^
parse error:
syntax error
>>> total_fs:18136$
^
parse error:
syntax error
>>> s:solve(angle_deg=(angle_in_bits/total_fs*360),angle_deg)$
^
error: unterminated character string constant
parse error:
syntax error
>>> round(s);")
^
Thanks to Fred Senese and rayryeng for the assist.
I know someone may need this so here's some example code. This bit of code allows you to directly access maxima's symbolic solver from octave (allows you to execute multiple lines of maxima's commands). Since octave doesn't have a good symbolic solver yet this will come in handy for another person down the line.
mm=maxima("(kill(all), numer:true, ratprint:false, angle_in_bits:3779, total_fs:18136, s:solve(angle_deg=(angle_in_bits/total_fs*360),angle_deg),(s))")
%mm = '[angle_deg = 75.01323334803705]';
[si ei xt mt] = regexp(mm, '(\d)*(\.)?(\d)*');
number = str2num(mt{1})
>>>number = 75.013
I will suppose here that you are using QtOctave which I am guessing from googling your error message "_ide_reload_variables_list( whos() );"
If this is not so, none of the following may apply to your question.
typing help maxima at the prompt points me to a file /usr/share/qtoctave/scripts_octave/maxima.m
with this contents:
function result=maxima(command)
in="";
in=sprintf("echo \"string(%s);\"|maxima --very-quiet", command);
[status,result]=system(in);
%if(status!=0) result=""; endif;
result = deblank ( strjust ( strrep (result, "%", "") ,"left") );
endfunction
Which tells me that maxima is called via octave's function system in a very special way that is not allowing for multiple commands in maxima.
modifying the assignment of in in the way below would allow you to call the function maxima now with a cell array of commands maxima({command_1,command_2}) where command_i are strings.
in=['echo ', sprintf('\"%s;\" ',command{:}), '| maxima --very-quiet'];
Please note that the function system still returns only one output, the one that is sent to standard out by maxima.
This may also be of interest for you as it describes methods of octave's interaction with subprocesses.
I am not sure if this helping much as I think the modification provided by me is only of very superficial use, but maybe it helps you to understand better what octave is doing if you tell it maxima(something). It helped me.
Last but not least as far as I know there is no real interface between octave (or matlab) and maxima. I hope someone will correct me if I am wrong about that.
I have Octave and Maxima in my Linux laptop (Ubuntu). There exist system -function in Octave, which could be used to run terminal -commands.
In terminal it is possible to call maxima functions by using pipe
(add quit(); to the end of maxima command) :
$ echo "factor(12345); quit();" | maxima
Maxima 5.41.0 http://maxima.sourceforge.net
using Lisp GNU Common Lisp (GCL) GCL 2.6.12
Distributed under the GNU Public License. See the file COPYING.
Dedicated to the memory of William Schelter.
The function bug_report() provides bug reporting information.
(%i1) (%o1) 3 5 823
$
In Octave' system -commad use double "" inside "-marks to get " :
[status,output]=system("echo ""factor(565);quit();""|maxima")
status = 0
output =
Maxima 5.41.0 http://maxima.sourceforge.net
using Lisp GNU Common Lisp (GCL) GCL 2.6.12
Distributed under the GNU Public License. See the file COPYING.
Dedicated to the memory of William Schelter.
The function bug_report() provides bug reporting information.
(%i1) (%o1) 5 113
Extra txt could be cutted out from the output -string in Octave. Use Maxima's properties to run it's commands from a script file, and the script could be created in Octave.
Br. Juha (juhap.karjalainen#mail.suomi.net)

iPython notebook not loading: too much output

I have an iPython notebook file which is not loading, presumably because there is too much output in the file (thousands of lines of results printed, old computer).
I can edit the file with notepad without problems, but copying and then cleaning the code from there cell by cell is very time-consuming.
Is there a way to recover the code differently, or to ask iPython notebook to only load the code and not print all the past outputs when opening the file?
Here is an output removal script I found on Github. Credits to the author.
import sys
import io
from IPython.nbformat import current
def remove_outputs(nb):
for ws in nb.worksheets:
for cell in ws.cells:
if cell.cell_type == 'code':
cell.outputs = []`
if __name__ == '__main__':
fname = sys.argv[1]
with io.open(fname, 'r') as f:
nb = current.read(f, 'json')
remove_outputs(nb)
print current.writes(nb, 'json')

inline iPython call to system

I really like to use system shell commands in iPython. But I was wondering if it is possible to loop over the returned values from a call to e.g. !ls. This works:
files = !ls ./*_subcell_cooc.txt
for f in files:
print f
But this does not:
for f in ( !ls ./*_subcell_cooc.txt):
print f
Error is:
File "<ipython-input-1-df2bc72907d7>", line 5
for f in ( !ls $ROOT/*_subcell_cooc.txt):
^
SyntaxError: invalid syntax
No it is not possible, the syntax var = !something is special cased in IPython. It is not valid python syntax, and we will not extend for loops and so on to work with it.
You can do assignment as you show in your first example, but using glob,os and other real python module to do that will be more robust, not much harder, and also work outside of IPython...
For the anecdote Guido was really not happy with IPython half-shell syntax when he saw it last time at SciPy2013.
(Also it uppercase I in IPython please.)