Pharo, Voyage and MongoDB - mongodb

I want to build a relatively simple web app using Pharo, Voyage and MongoDB + TeaPot. Before I start the project I did a lot of research and one question remains: How do I initially upload a bunch of data into the MongoDB? I basically have the data in CSV format. Do I have to program an importer in Smalltalk that does that? If I were to do it without smalltalk it would be missing all the object IDs etc. How do you go about things like that?
Thanks,
Henrik

If you have data in CSV format, then I would recommend creating a simple importer. You could use NeoCSV and then save it via Pharo. I presume you know how to setup Mongo repository (#workspace) do:
| repository |
repository := VOMongoRepository
host: VOMongoRepository defaultHost
database: 'MyMongoDb'.
VORepository setRepository: repository.
First create your two class methods for Voyage:
Kid class >> isVoyageRoot
^ true "instances of this object will be root"
Kid class >> voyageCollectionName
^ 'Kids' "The collection name in MongoDB"
The Kid class should have firstName(:), surname(:), age(:) accesors and instance variables of the same name.
Then simply have a reading from CSV and then saving it into mongoDB:
| personalInformation readData columnName columnData aKid |
"init variable"
personalInformation := OrderedDictionary new.
"emulate CSV reading"
readData := (NeoCSVReader on: 'firstName, surname, age\John, Smith, 5' withCRs readStream) upToEnd.
columnName := readData first.
columnData := readData second.
"Repeat for as many number of columns you may have"
1 to: columnName size do: [ :index |
personalInformation at: (columnName at: index) put: (columnData at: index)
].
aKid := Kid new.
"Storing Kid object information"
personalInformation keysAndValuesDo: [ :key :value |
aKid perform: (key asString,$:) asSymbol with: value "For every column store the information into a Kid object (you have to have accessors for that)"
].
aKid save "Saving into mongoDB"
This is only to give you rough idea
To query in your MongoDB do:
db.Kids.find()
You should see the stored information.
Disclaimer: Even thou the code should be fine, I did not have time to actually test it on mongoDB.

Related

Splayed table upsert leading to error: `cast

I built a data loader prototype that saves CSV into splayed tables. The workflow is as follows:
Create schema the first time e.g. volatilitysurface table:
volatilitysurface::([date:`datetime$(); ccypair:`symbol$()] atm_convention:`symbol$(); premium_included:`boolean$(); smile_type:`symbol$(); vs_type:`symbol$(); delta_ratio:`float$(); delta_setting:`float$(); wing_extrapolation:`float$(); spread_type:`symbol$());
For every file in the rawdata folder import it:
myfiles:#[system;"dir /b /o:gn ",string `$getenv[`KDBRAWDATA],"*.volatilitysurface.csv 2> nul";()];
if[myfiles~();.lg.o[`load;"no volatilitysurface files found!"];:0N];
.lg.o[`load;"loading data files ..."];
/ load each file
{
mypath:"" sv (string `$getenv[`KDBRAWDATA];x);
.lg.o[`load;"loading file name '",mypath,"' ..."];
myfile:hsym`$mypath;
tmp1:select date,ccypair,atm_convention,premium_included,smile_type,vs_type,delta_ratio,delta_setting,wing_extrapolation,spread_type from update date:x, premium_included:?[premium_included = `$"true";1b;0b] from ("ZSSSSSFFFS";enlist ",")0:myfile;
`volatilitysurface upsert tmp1;
} #/: myfiles;
delete tmp1 from `.;
.Q.gc[];
.lg.o[`done;"loading volatilitysurface data done"];
.lg.o[`save;"saving volatilitysurface schema to ",string afolder];
volatilitysurface::0!volatilitysurface;
.Q.dpft[afolder;`;`ccypair;`volatilitysurface];
.lg.o[`cleanup;"removing volatilitysurface from memory"];
delete volatilitysurface from `.;
.Q.gc[];
.lg.o[`done;"saving volatilitysurface schema done"];
This works perfectly. I use .Q.gc[]; frequently to avoid hitting the wsfull. When new CSV files are available I open the existing schema, upsert into it and save it again effectively overwriting the existing HDB file system.
Open schema:
.lg.o[`open;"tables already exists, opening the schema ..."];
#[system;"l ",(string afolder) _ 0;{.lg.e[`open;"failed to load hdb directory: ", x]; 'x}];
/ Re-create table index
volatilitysurface::`date`ccypair xkey select from volatilitysurface;
Re-run step #2 to append new CSV files into the existing volatilitysurfacetable, it upserts the first CSV perfectly but the second CSV fails with:
error: `cast
I debug to the point of the error and to double-check I see that the metadata of tmp1 and volatilitysurface are perfectly the same. Any ideas why this is happening? I get the same issue with any other table. I have tried cleaning the keys from the table after every upsert but doesn't help i.e.
volatilitysurface::0!volatilitysurface;
volatilitysurface::`date`ccypair xkey volatilitysurface;
And the metadata comparison at the point of the cast error:
meta tmp1
c | t f a
------------------| -----
date | z
ccypair | s
atm_convention | s
premium_included | b
smile_type | s
vs_type | s
delta_ratio | f
delta_setting | f
wing_extrapolation| f
spread_type | s
meta volatilitysurface
c | t f a
------------------| -----
date | z
ccypair | s p
atm_convention | s
premium_included | b
smile_type | s
vs_type | s
delta_ratio | f
delta_setting | f
wing_extrapolation| f
spread_type | s
UPDATE Using the input of the answer below I tried using Torq's .loader.loadallfiles function like this (it doesn't fail but nothing happens either, the table is not created in memory and the data is not written to the database):
.loader.loadallfiles[`headers`types`separator`tablename`dbdir`dataprocessfunc!(`x`ccypair`atm_convention`premium_included`smile_type`vs_type`delta_ratio`delta_setting`wing_extrapolation`spread_type;"ZSSSSSFFFS";enlist ",";`volatilitysurface;`:hdb; {[p;t] select date,ccypair,atm_convention,premium_included,smile_type,vs_type,delta_ratio,delta_setting,wing_extrapolation,spread_type from update date:x, premium_included:?[premium_included = `$"true";1b;0b] from t}); `:rawdata]
UDPATE2 This is the output I get from TorQ:
2017.11.20D08:46:12.550618000|wsp18497wn|dataloader|dataloader1|INF|dataloader|**** LOADING :rawdata/20171102_113420.disccurve.csv ****
2017.11.20D08:46:12.550618000|wsp18497wn|dataloader|dataloader1|INF|dataloader|reading in data chunk
2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|Read 10000 rows
2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|processing data
2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|Enumerating
2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|writing 4525 rows to :hdb/2017.09.12/volatilitysurface/
2017.11.20D08:46:12.581819000|wsp18497wn|dataloader|dataloader1|INF|dataloader|writing 4744 rows to :hdb/2017.09.13/volatilitysurface/
2017.11.20D08:46:12.659823000|wsp18497wn|dataloader|dataloader1|INF|dataloader|writing 731 rows to :hdb/2017.09.14/volatilitysurface/
2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|init|retrieving sort settings from :C:/Dev/torq//config/sort.csv
2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|sort|sorting the volatilitysurface table
2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|sorttab|No sort parameters have been specified for : volatilitysurface. Using default parameters
2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|sortfunction|sorting :hdb/2017.09.05/volatilitysurface/ by these columns : sym, time
2017.11.20D08:46:12.753428000|wsp18497wn|dataloader|dataloader1|ERR|sortfunction|failed to sort :hdb/2017.09.05/volatilitysurface/ by these columns : sym, time. The error was: hdb/2017.09.
I get the following error sorttab|No sort parameters have been specified for : volatilitysurface. Using default parameters where is this sorttab documented? does it use the table PK by default?
UPDATE3 Ok fixed UPDATE2 out by providing a non-default sort.csv under my config folder:
tabname,att,column,sort
default,p,sym,1
default,,time,1
volatilitysurface,,date,1
volatilitysurface,,ccypair,1
But now I see that if I call the function multiple times on the same files, it simply appends duplicated data instead of upserting it.
UPDATE4 Still not there yet ... assuming I can check to make sure that no duplicate file is used. When I load and then start the database I get some structure back that ressembles some sort of dictionary and not a table.
2017.10.31| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
2017.11.01| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
2017.11.02| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
2017.11.03| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
sym | `AUDNOK`AUDCNH`AUDJPY`AUDHKD`AUDCHF`AUDSGD`AUDCAD`AUDDKK`CADSGD`C..
Note that date is actually datetime Z and not just date. My full and latest version of the function invocation is:
target:hsym `$("" sv ("./";getenv[`KDBHDB];"/volatilitysurface"));
rawdatadir:hsym `$getenv[`KDBRAWDATA];
.loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`dataprocessfunc!(`x`ccypair`atm_convention`premium_included`smile_type`vs_type`delta_ratio`delta_setting`wing_extrapolation`spread_type;"ZSSSSSFFFS";enlist ",";`volatilitysurface;target;`date;{[p;t] select date,ccypair,atm_convention,premium_included,smile_type,vs_type,delta_ratio,delta_setting,wing_extrapolation,spread_type from update date:x, premium_included:?[premium_included = `$"true";1b;0b] from t}); rawdatadir];
I'm going to add a second answer here to try and tackle the question about using TorQ's data loader.
I'd like to clarify what output you are getting after running this function? There should be some logging messages output, can you post these? For example when I run the function:
jmcmurray#homer ~/deploy/TorQ (master) $ q torq.q -procname loader -proctype loader -debug
<torq startup messages removed>
q).loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`dataprocessfunc!(c;"TSSFJFFJJBS";enlist",";`quotes;`:testdb;`date;{[p;t] select date:.z.d,time:TIME,sym:INSTRUMENT,BID,ASK from t});`:csvtest]
2017.11.17D15:03:20.312336000|homer.aquaq.co.uk|loader|loader|INF|dataloader|**** LOADING :csvtest/tradesandquotes20140421.csv ****
2017.11.17D15:03:20.319110000|homer.aquaq.co.uk|loader|loader|INF|dataloader|reading in data chunk
2017.11.17D15:03:20.339414000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Read 11000 rows
2017.11.17D15:03:20.339463000|homer.aquaq.co.uk|loader|loader|INF|dataloader|processing data
2017.11.17D15:03:20.339519000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Enumerating
2017.11.17D15:03:20.340061000|homer.aquaq.co.uk|loader|loader|INF|dataloader|writing 11000 rows to :testdb/2017.11.17/quotes/
2017.11.17D15:03:20.341669000|homer.aquaq.co.uk|loader|loader|INF|dataloader|**** LOADING :csvtest/tradesandquotes20140422.csv ****
2017.11.17D15:03:20.349606000|homer.aquaq.co.uk|loader|loader|INF|dataloader|reading in data chunk
2017.11.17D15:03:20.370793000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Read 11000 rows
2017.11.17D15:03:20.370858000|homer.aquaq.co.uk|loader|loader|INF|dataloader|processing data
2017.11.17D15:03:20.370911000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Enumerating
2017.11.17D15:03:20.371441000|homer.aquaq.co.uk|loader|loader|INF|dataloader|writing 11000 rows to :testdb/2017.11.17/quotes/
2017.11.17D15:03:20.460118000|homer.aquaq.co.uk|loader|loader|INF|init|retrieving sort settings from :/home/jmcmurray/deploy/TorQ/config/sort.csv
2017.11.17D15:03:20.466690000|homer.aquaq.co.uk|loader|loader|INF|sort|sorting the quotes table
2017.11.17D15:03:20.466763000|homer.aquaq.co.uk|loader|loader|INF|sorttab|No sort parameters have been specified for : quotes. Using default parameters
2017.11.17D15:03:20.466820000|homer.aquaq.co.uk|loader|loader|INF|sortfunction|sorting :testdb/2017.11.17/quotes/ by these columns : sym, time
2017.11.17D15:03:20.527216000|homer.aquaq.co.uk|loader|loader|INF|applyattr|applying p attr to the sym column in :testdb/2017.11.17/quotes/
2017.11.17D15:03:20.535095000|homer.aquaq.co.uk|loader|loader|INF|sort|finished sorting the quotes table
After all this, I can run \l testdb and there is a table called "quotes" containing my loaded data
If you can post logging messages like these, it could be helpful to see what's going on.
UPDATE
"But now I see that if I call the function multiple times on the same files, it simply appends duplicated data instead of upserting it."
If I'm understanding the problem correctly, it sounds like you likely shouldn't call the function multiple times on the same files. Another process within TorQ could be useful here, the "file alerter". This process will monitor a directory for new & updated files, and can call a function on any that appear (so you can have it call the loader function with every new file automatically). It has a number of options such as moving files after processing (so you can "archive" loaded CSVs)
Note that the file alerter requires that a function take exactly two parameters - the directory & the file name. This effectively means you will need a "wrapper" function around the loader function, which takes a dictionary & a directory. I don't think TorQ includes a function similar to .loader.loadallfiles for a single file, so it might be necessary to copy the target file to a temporary directory, run loadallfiles on that directory and then delete the file from there before loading the next.
`cast error refers to a value not being enumerated
I can't see any enumeration going on here, splayed tables on disk need to have symbol columns enumerated. For example, this can be done with the following line, before calling .Q.dpft
volatilitysurface:.Q.en[afolder;volatilitysurface];
You may like to consider using an example CSV loader for loading your data. One such example is included in TorQ, the KDB framework developed by AquaQ Analytics (as a disclaimer, I work for AquaQ)
The framework is available (free of charge) here: https://github.com/AquaQAnalytics/TorQ
The specific component you will likely be interested in is dataloader.q and is documented here: http://aquaqanalytics.github.io/TorQ/utilities/#dataloaderq
This script will handle everything necessary, loading all files, enumerating, sorting on disk, applying attributes etc. as well as using .Q.fsn to prevent running out of memory

Insert field name with dot in mongo document

A Meteor server code tries to insert an object into a Mongo collection. The value of one of the property is a string which contains a dot i.e. ".".
Meteor terminal is complaining :-
Error: key Food 1.1 and drinks must not contain '.'
What does this mean and how to fix it?
let obj = {
food: group,
rest: rule,
item: item[0],
key: i
};
FoodCol.insert(obj);
edit
The suggested answer by Kishor for replacing the "." with "\uff0E" will produce a space after the dot which is not what a user expects.
From this link, How to use dot in field name?
You can replace dot symbols of your field name to Unicode equivalent
"\uff0E":
Update: As Fred suggested, please use "\u002E" for "."
We solved this issue by encoding (Base64) the key before insertion and decode after taking out from the db. Since we consume the document as it is and query fields are different and their keys are not encoded.
But if u want to make query using this key or the key should be readable to the user, this solution will be not be suitable.

Hive Table Creation for Dynamic Schemas

We are investigating whether Hive will allow us to run some SQL-like queries on
mongo style dynamic schema as a precursor to our map-reduce jobs.
The data comes in the form of several TiB of BSON files; each of the files contains
JSON "samples". An example sample is given as such:
{
"_id" : "SomeGUID",
"SomeScanner" :
{
"B64LR" : 22,
"Version" : 192565886128245
},
"Parser" :
{
"Size" : 73728,
"Headers" :
[
{
"VAddr" : 4096,
"VSize" : 7924.
. . . etc. . . .
As a dynamic schema, only a few of the fields are guaranteed to exist.
We would like to be able to run a query against an input set that may be something
like
SomeScanner.Parser.Headers.VSize > 9000
Having looked up the table-mapping, I'm not sure whether this is do-able with Hive . . . how would one map a column that may or may not be there . . . not to mention that there are about 2k-3k query-able values in a typical sample.
Hence, my questions to the Experts:
Can Hive build a dynamic schema from the data it encounters?
How can one go about building a Hive table with ~3k columns?
Is there a better way?
Appreciated, as always.
OK--with much ado, I can now answer my own questions.
Can Hive build a dynamic schema from the data it encounters?
A: No. However, an excellent tool for this exists. q.v., inf.
How can one go about building a Hive table w/~3K columns
A: Ibidem
Is there a better way?
A: Not that I found; but, with some help, it isn't too difficult.
First, a shout out to Michael Peterson at http://thornydev.blogspot.com/2013/07/querying-json-records-via-hive.html, whose blog post served as the toe-hold to figure all of this out.
Definitely check it out if you're starting out w/Hive.
Now, Hive cannot natively import a JSON document and deduce a schema from it . . . however, Michael Peterson has developed tool that does: https://github.com/midpeter444/hive-json-schema
Some caveats with it:
* Empty arrays and structs are not handled, so remove (or populate) them. Otherwise, things like { "something" : {} } or {"somethingelse": []} will throw errors.
If any field has the name "function", it will have to be re-named prior to executing the CREATE TABLE statement. E.g., the following code would throw an error: 1{ "someThing": { "thisIsOK": "true", "function": "thatThrowsAnError" } }`
Presumably, this is because "function" is an Hive keyword.
And, with dynamic schema in general, I have not found a way to handle a nested leading underscore name even if the schema is valid: { "somethings": { "_someVal": "123", "otherVal": "456" } } will fail.
For the common MongoDB "ID" field, this is map-able with the following addition: with serdeproperties("mapping.id" = "_id"), which appears to be similar to a macro-substitution.
Serialization/De-Serialization for JSON can be achieved with https://github.com/rcongiu/Hive-JSON-Serde by adding the following: ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
N.B., the JsonSerDe JAR must have been added to .hiverc or "add jar"'d into Hive to be used.
Thus, the schema:
CREATE TABLE samplesJSON
( id string,
. . . rest of huge schema . . . . )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH serdeproperties("mapping.id" = "_id");
The JSON data can be loaded into the table with a command along the lines of:
LOAD DATA LOCAL INPATH '/tmp/samples.json' OVERWRITE INTO TABLE samplesJSON;
Finally, queries are actually intuitive straight-forward. Using the above example from the original question:
hive> select id, somescanner.parser.headers.vaddr from samplesjson;
OK
id vaddr
119 [4096,53248,57344]

exporting MongoDB to CSV using pymongo

I would like write a script to generate a CSV file from my mongoDB database and I would like to know the most convenient version !
first let me begin with the structure of collections.
MyDataBase -> setting
users
fruits
in setting I have something like
setting -> _id
data
_tenant
and the thing I am after, is making a CSV file out of profiles in data
which they have some fields/properties like "name", "address", "postalcode", "email", age and etc. and not neccessary all of these profile have all files/properties and even some of them look like collection (have sub-branches) which I am not interested in at all !
so, my code is python so far is look like these
myquery = db.settings.find() # I am getting everything !
output = csv.writer(open('some.csv', 'wt')) # writng in this file
for items in myquery[0:10]: # first 11 entries
a = list(items['data']['Profile'].values()) # collections are importent as dictionary and I am making them as list
tt = list()
for chiz in a:
if chiz is not None:
tt.append(chiz.encode('ascii', 'ignore')) #encoding
else:
tt.append("none")
output.writerow(tt)
these fields/properties dont have neccessary all fields, and also even some of them are collection(with sub-branch) and will be imported as dictionary ! so, I have to convert them to list and all and all, there are quite few things to take care in such a process and in all doesn't look that straightforward !
My question might be sounds very general but is it a typical way to make such report ?! if not, can you someone make it clear ?!
Yes, I am using the same way.
It is clear and fast, also it works without of any additional libraries.

How to write this snippet in Python?

I am learning Python (I have a C/C++ background).
I need to write something practical in Python though, whilst learning. I have the following pseudocode (my first attempt at writing a Python script, since reading about Python yesterday). Hopefully, the snippet details the logic of what I want to do. BTW I am using python 2.6 on Ubuntu Karmic.
Assume the script is invoked as: script_name.py directory_path
import csv, sys, os, glob
# Can I declare that the function accepts a dictionary as first arg?
def getItemValue(item, key, defval)
return !item.haskey(key) ? defval : item[key]
dirname = sys.argv[1]
# declare some default values here
weight, is_male, default_city_id = 100, true, 1
# fetch some data from a database table into a nested dictionary, indexed by a string
curr_dict = load_dict_from_db('foo')
#iterate through all the files matching *.csv in the specified folder
for infile in glob.glob( os.path.join(dirname, '*.csv') ):
#get the file name (without the '.csv' extension)
code = infile[0:-4]
# open file, and iterate through the rows of the current file (a CSV file)
f = open(infile, 'rt')
try:
reader = csv.reader(f)
for row in reader:
#lookup the id for the code in the dictionary
id = curr_dict[code]['id']
name = row['name']
address1 = row['address1']
address2 = row['address2']
city_id = getItemValue(row, 'city_id', default_city_id)
# insert row to database table
finally:
f.close()
I have the following questions:
Is the code written in a Pythonic enough way (is there a better way of implementing it)?
Given a table with a schema like shown below, how may I write a Python function that fetches data from the table and returns is in a dictionary indexed by string (name).
How can I insert the row data into the table (actually I would like to use a transaction if possible, and commit just before the file is closed)
Table schema:
create table demo (id int, name varchar(32), weight float, city_id int);
BTW, my backend database is postgreSQL
[Edit]
Wayne et al:
To clarify, what I want is a set of rows. Each row can be indexed by a key (so that means the rows container is a dictionary (right)?. Ok, now once we have retrieved a row by using the key, I also want to be able to access the 'columns' in the row - meaning that the row data itself is a dictionary. I dont know if Python supports multidimensional array syntax when dealing with dictionaries - but the following statement will help explain how I intend to conceptually use the data returned from the db. A statement like dataset['joe']['weight'] will first fetch the row data indexed by the key 'joe' (which is a dictionary) and then index that dictionary for the key 'weight'. I want to know how to build such a dictionary of dictionaries from the retrieved data in a Pythonic way like you did before.
A simplistic way would be to write something like:
import pyodbc
mydict = {}
cnxn = pyodbc.connect(params)
cursor = cnxn.cursor()
cursor.execute("select user_id, user_name from users"):
for row in cursor:
mydict[row.id] = row
Is this correct/can it be written in a more pythonic way?
to get the value from the dictionary you need to use .get method of the dict:
>>> d = {1: 2}
>>> d.get(1, 3)
2
>>> d.get(5, 3)
3
This will remove the need for getItemValue function. I wont' comment on the existing syntax since it's clearly alien to Python. Correct syntax for the ternary in Python is:
true_val if true_false_check else false_val
>>> 'a' if False else 'b'
'b'
But as I'm saying below, you don't need it at all.
If you're using Python > 2.6, you should use with statement over the try-finally:
with open(infile) as f:
reader = csv.reader(f)
... etc
Seeing that you want to have row as dictionary, you should be using csv.DictReader and not a simple csv. reader. However, it is unnecessary in your case. Your sql query could just be constructed to access the fields of the row dict. In this case you wouldn't need to create separate items city_id, name, etc. To add default city_id to row if it doesn't exist, you could use .setdefault method:
>>> d
{1: 2}
>>> d.setdefault(1, 3)
2
>>> d
{1: 2}
>>> d.setdefault(3, 3)
3
>>> d
{1: 2, 3: 3}
and for id, simply row[id] = curr_dict[code]['id']
When slicing, you could skip 0:
>>> 'abc.txt'[:-4]
'abc'
Generally, Python's library provide a fetchone, fetchmany, fetchall methods on cursor, which return Row object, that might support dict-like access or return a simple tuple. It will depend on the particular module you're using.
It looks mostly Pythonic enough for me.
The ternary operation should look like this though (I think this will return the result you expect):
return defval if not key in item else item[key]
Yeah, you can pass a dictionary (or any other value) in basically any order. The only difference is if you use the *args, **kwargs (named by convention. Technically you can use any name you want) which expect to be in that order and the last one or two arguments.
For inserting into a DB you can use the odbc module:
import odbc
conn = odbc.odbc('servernamehere')
cursor = conn.cursor()
cursor.execute("INSERT INTO mytable VALUES (42, 'Spam on Eggs', 'Spam on Wheat')")
conn.commit()
You can read up or find plenty of examples on the odbc module - I'm sure there are other modules as well, but that one should work fine for you.
For retrieval you would use
cursor.execute("SELECT * FROM demo")
#Reads one record - returns a tuple
print cursor.fetchone()
#Reads the rest of the records - a list of tuples
print cursor.fetchall()
to make one of those records into a dictionary:
record = cursor.fetchone()
# Removes the 2nd element (at index 1) from the record
mydict[record[1]] = record[:1] + record[2:]
Though that practically screams for a generator expression if you want the whole shebang at once
mydict = dict((record[1], record[:1] + record[2:] for record in cursor.fetchall())
which should give you all of the records packed up neatly in a dictionary, using the name as a key.
HTH
a colon required after defs:
def getItemValue(item, key, defval):
...
boolean operators: In python !->not; &&->and and ||->or (see http://docs.python.org/release/2.5.2/lib/boolean.html for boolean operators). There's no ? : operator in python, there is a return (x) if (x) else (x) expression although I personally rarely use it in favour of plain if's.
booleans/None: True, False and None have capitals before them.
checking types of arguments: In python, you generally don't declare types of function parameters. You could go e.g. assert isinstance(item, dict), "dicts must be passed as the first parameter!" in the function although this kind of "strict checking" is often discouraged as it's not always necessary in python.
python keywords: default isn't a reserved python keyword and is acceptable as arguments and variables (just for the reference.)
style guidelines: PEP 8 (the python style guideline) states that module imports should generally only be one per line, though there are some exceptions (I have to admit I often don't follow the import sys and os on separate lines, though I usually follow it otherwise.)
file open modes: rt isn't valid in python 2.x - it will work, though the t will be ignored. See also http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files. It is valid in python 3 though, so I don't think it it'd hurt if you want to force text mode, raising exceptions on binary characters (use rb if you want to read non-ASCII characters.)
working with dictionaries: Python used to use dict.has_key(key) but you should use key in dict now (which has largely replaced it, see http://docs.python.org/library/stdtypes.html#mapping-types-dict.)
split file extensions: code = infile[0:-4] could be replaced with code = os.path.splitext(infile)[0] (which returns e.g. ('root', '.ext') with the dot in the extension (see http://docs.python.org/library/os.path.html#os.path.splitext).
EDIT: removed multiple variable declarations on a single line stuff and added some formatting. Also corrected the rt isn't a valid mode in python when in python 3 it is.