Comment a key/value pair with Ruamel.Yaml - ruamel.yaml

I need to comment a key/value pair entirely using ruamel.yaml. Something like:
import sys
from ruamel.yaml import YAML
inp = """\
# example
foo: bar
"""
yaml = YAML()
code = yaml.load(inp)
code['foo'].comment() # or whatever, can't seem to find a way to do this with existing api
yaml.dump(code, sys.stdout)
Output:
# foo: bar
Of course for multiline yaml key/value pairs it would need to comment the entire value:
foo:
- item1
- item2
to
# foo:
# - item1
# - item2

You won't be able to do your example using the existing routines for adding comments. In ruamel.yaml these comments are attached to either a dict (i.e. CommentedMap) or a list (CommentedSeq) and your end-result has neither.
You would need to dump the loaded code and, using the transform parameter of .dump() to add the start-of-line # sequence.
(Although you don't need that, it is possible to do that for some subpart of a data structure loaded from a YAML document. You would need to dump the key-value pair as a new dict (again with the transform parameter) and insert/update the result on the preceding key as comment, prepending a newline.)

Related

python 3.7 and ldap3 reading group membership

I am using Python 3.7 and ldap3. I can make a connection and retrieve a list of the groups in which I am interested. I am having trouble getting group members though.
server = Server('ldaps.ad.company.com', use_ssl=True, get_info=ALL)
with Connection(server, 'mydomain\\ldapUser', '******', auto_bind=True) as conn:
base = "OU=AccountGroups,OU=UsersAndGroups,OU=WidgetDepartment," \
+ "OU=LocalLocation,DC=ad,DC=company,DC=com"
criteria = """(
&(objectClass=group)
(
|(sAMAccountName=grp-*widgets*)
(sAMAccountName=grp-oldWidgets)
)
)"""
attributes = ['sAMAccountName', 'distinguishedName']
conn.search(base, criteria, attributes=attributes)
groups = conn.entries
At this point groups contains all the groups I want. I want to itterate over the groups to collect the members.
for group in groups:
# print(cn)
criteria = f"""
(&
(objectClass=person)
(memberof:1.2.840.113556.1.4.1941:={group.distinguishedName})
)
"""
# criteria = f"""
# (&
# (objectClass=person)
# (memberof={group.distinguishedName})
# )
# """
attributes = ['displayName', 'sAMAccountName', 'mail']
conn.search(base, criteria, attributes=attributes)
people = conn.entries
I know there are people in the groups but people is always an empty list. It doesn't matter if I do a recirsive search or not.
What am I missing?
Edit
There is a longer backstory to this question that is too long to go into. I have a theory about this particular issue though. I was running out of time and switched to a different python LDAP library -- which is working. I think the issue with this question might be that I "formated" the query over multiple lines. The new ldap lib (python-ldap) complained and I stripped out the newlines and it just worked. I have not had time to go back and test that theory with ldap3.
people is overwritten in each iteration of your loop over groups.
Maybe the search result for the last group entry in groups is just empty.
You should initialise an empty list outside of your loop and extend it with your results:
people = []
for group in groups:
...
conn.search(...)
people.extend(conn.entries)
Another note about your code snippet above. When combining objectClass definitions with attribute definitions in your search filter you may consider using the Reader class which will combine those internally.
Furthermore I would like to point out that I've created an object relational mapper where you can simply define your queries using declarative python syntax, e.g.:
from ldap3_orm import ObjectDef, Reader
from ldap3_orm.config import config
from ldap3_orm.connection import conn
PersonDef = ObjectDef("person", conn)
r = Reader(conn, PersonDef, config.base_dn, PersonDef.memberof == group.distinguishedName)
r.search()
ldap3-orm documentation can be found at http://code.bsm-felder.de/doc/ldap3-orm

Scala - how to use a computed variable name

I am using Gatling (https://gatling.io) and struggling a bit with the scala (just learning).
I have a feeder which pulls in user data from a csv file:
val feeder = csv("seedfile.csv").circular
And I can happily access values in this file, e.g this allows me login using a value from the 'user_email' column:
exec(http("SubmitLogin")
.post("/auth/login")
.formParam("email", "${user_email}")
The issue Im having is that a range of the columns on my csv file are named item1, item2, item3, etc. I would like to iterate over these items in a loop. I was hoping Scala may have a feature like php $$ vars (http://php.net/manual/en/language.variables.variable.php) so i could do something like:
// actual values pulled from csv file
val item1 = "i'm item 1s val"
val item2 = "i'm item 2s val"
// for i in range
var varname = "item"+i
println(s"${$varname}") //so that for i=1 would be equivalent to println($item1)
Note: I have also tried:
s"$${varname}"
s"${${varname}}"
based on my googling and playing with the repl it appears this is not an option in Scala (which I guess makes sense for statically typed language which encourages immutable data) so any advice on how to approach this the Scala way would be greatly appreciated

yaml safe_load of many different objects

I have a huge YAML file with tag definitions like in this snippet
- !!python/object:manufacturer.Manufacturer
name: aaaa
address: !!python/object:address.BusinessAddress {street: bbbb, number: 123, city: cccc}
And I needed to load this, first to make sure that the file is correct YAML, second to extract information at a certain tree-dept given a certain context. I had this all as nested dicts, lists and primitives that would be straightforward to do. But I cannot load the file as I don't have the original Python sources and class defines, so yaml.load() is out.
I have tried yaml.safe_load() but that throws and exception.
The BaseLoader loads the file, so it is correct. But that jumbles all primitive information (number, datetime) together as strings.
Then I found How to deserialize an object with PyYAML using safe_load?, since the file has over 100 different tags defined, the solutions presented there is impractical.
Do I have to use some other tools to strip the !!tag definitions (there is at least one occasion where !! occurs inside a normal string), so I can use safe_load. Is there simpler way to do solve this that I am not aware of?
If not I will have to do some string parsing to get the types back, but I thought I ask here first.
There is no need to go the cumbersome route of adding any of the classes if you want to use the safe_loader() on such a file.
You should have gotten an ConstructorError thrown in SafeConstructor.construct_undefined() in constructor.py. That method gets registered for the fall through case 'None' in the constructor.py file.
If you combine that info with the fact that all such tagged "classes" are mappings (and not lists or scalars), you can just copy the code for the mappings in a new function and register that as the fall-through case.
import yaml
from yaml.constructor import SafeConstructor
def my_construct_undefined(self, node):
data = {}
yield data
value = self.construct_mapping(node)
data.update(value)
SafeConstructor.add_constructor(
None, my_construct_undefined)
yaml_str = """\
- !!python/object:manufacturer.Manufacturer
name: aaaa
address: !!python/object:address.BusinessAddress {street: bbbb, number: 123, city: cccc}
"""
data = yaml.safe_load(yaml_str)
print(data)
should get you:
[{'name': 'aaaa', 'address': {'city': 'cccc', 'street': 'bbbb', 'number': 123}}]
without an exception thrown, and with "number" as integer not as string.

Defaultdict() the correct choice?

EDIT: mistake fixed
The idea is to read text from a file, clean it, and pair consecutive words (not permuations):
file = f.read()
words = [word.strip(string.punctuation).lower() for word in file.split()]
pairs = [(words[i]+" " + words[i+1]).split() for i in range(len(words)-1)]
Then, for each pair, create a list of all the possible individual words that can follow that pair throughout the text. The dict will look like
[ConsecWordPair]:[listOfFollowers]
Thus, referencing the dictionary for a given pair will return all of the words that can follow that pair. E.g.
wordsThatFollow[('she', 'was')]
>> ['alone', 'happy', 'not']
My algorithm to achieve this involves a defaultdict(list)...
wordsThatFollow = defaultdict(list)
for i in range(len(words)-1):
try:
# pairs overlap, want second word of next pair
# wordsThatFollow[tuple(pairs[i])] = pairs[i+1][1]
EDIT: wordsThatFollow[tuple(pairs[i])].update(pairs[i+1][1][0]
except Exception:
pass
I'm not so worried about the value error I have to circumvent with the 'try-except' (unless I should be). The problem is that the algorithm only successfully returns one of the followers:
wordsThatFollow[('she', 'was')]
>> ['not']
Sorry if this post is bad for the community I'm figuring things out as I go ^^
Your problem is that you are always overwriting the value, when you really want to extend it:
# Instead of this
wordsThatFollow[tuple(pairs[i])] = pairs[i+1][1]
# Do this
wordsThatFollow[tuple(pairs[i])].append(pairs[i+1][1])

How to write this snippet in Python?

I am learning Python (I have a C/C++ background).
I need to write something practical in Python though, whilst learning. I have the following pseudocode (my first attempt at writing a Python script, since reading about Python yesterday). Hopefully, the snippet details the logic of what I want to do. BTW I am using python 2.6 on Ubuntu Karmic.
Assume the script is invoked as: script_name.py directory_path
import csv, sys, os, glob
# Can I declare that the function accepts a dictionary as first arg?
def getItemValue(item, key, defval)
return !item.haskey(key) ? defval : item[key]
dirname = sys.argv[1]
# declare some default values here
weight, is_male, default_city_id = 100, true, 1
# fetch some data from a database table into a nested dictionary, indexed by a string
curr_dict = load_dict_from_db('foo')
#iterate through all the files matching *.csv in the specified folder
for infile in glob.glob( os.path.join(dirname, '*.csv') ):
#get the file name (without the '.csv' extension)
code = infile[0:-4]
# open file, and iterate through the rows of the current file (a CSV file)
f = open(infile, 'rt')
try:
reader = csv.reader(f)
for row in reader:
#lookup the id for the code in the dictionary
id = curr_dict[code]['id']
name = row['name']
address1 = row['address1']
address2 = row['address2']
city_id = getItemValue(row, 'city_id', default_city_id)
# insert row to database table
finally:
f.close()
I have the following questions:
Is the code written in a Pythonic enough way (is there a better way of implementing it)?
Given a table with a schema like shown below, how may I write a Python function that fetches data from the table and returns is in a dictionary indexed by string (name).
How can I insert the row data into the table (actually I would like to use a transaction if possible, and commit just before the file is closed)
Table schema:
create table demo (id int, name varchar(32), weight float, city_id int);
BTW, my backend database is postgreSQL
[Edit]
Wayne et al:
To clarify, what I want is a set of rows. Each row can be indexed by a key (so that means the rows container is a dictionary (right)?. Ok, now once we have retrieved a row by using the key, I also want to be able to access the 'columns' in the row - meaning that the row data itself is a dictionary. I dont know if Python supports multidimensional array syntax when dealing with dictionaries - but the following statement will help explain how I intend to conceptually use the data returned from the db. A statement like dataset['joe']['weight'] will first fetch the row data indexed by the key 'joe' (which is a dictionary) and then index that dictionary for the key 'weight'. I want to know how to build such a dictionary of dictionaries from the retrieved data in a Pythonic way like you did before.
A simplistic way would be to write something like:
import pyodbc
mydict = {}
cnxn = pyodbc.connect(params)
cursor = cnxn.cursor()
cursor.execute("select user_id, user_name from users"):
for row in cursor:
mydict[row.id] = row
Is this correct/can it be written in a more pythonic way?
to get the value from the dictionary you need to use .get method of the dict:
>>> d = {1: 2}
>>> d.get(1, 3)
2
>>> d.get(5, 3)
3
This will remove the need for getItemValue function. I wont' comment on the existing syntax since it's clearly alien to Python. Correct syntax for the ternary in Python is:
true_val if true_false_check else false_val
>>> 'a' if False else 'b'
'b'
But as I'm saying below, you don't need it at all.
If you're using Python > 2.6, you should use with statement over the try-finally:
with open(infile) as f:
reader = csv.reader(f)
... etc
Seeing that you want to have row as dictionary, you should be using csv.DictReader and not a simple csv. reader. However, it is unnecessary in your case. Your sql query could just be constructed to access the fields of the row dict. In this case you wouldn't need to create separate items city_id, name, etc. To add default city_id to row if it doesn't exist, you could use .setdefault method:
>>> d
{1: 2}
>>> d.setdefault(1, 3)
2
>>> d
{1: 2}
>>> d.setdefault(3, 3)
3
>>> d
{1: 2, 3: 3}
and for id, simply row[id] = curr_dict[code]['id']
When slicing, you could skip 0:
>>> 'abc.txt'[:-4]
'abc'
Generally, Python's library provide a fetchone, fetchmany, fetchall methods on cursor, which return Row object, that might support dict-like access or return a simple tuple. It will depend on the particular module you're using.
It looks mostly Pythonic enough for me.
The ternary operation should look like this though (I think this will return the result you expect):
return defval if not key in item else item[key]
Yeah, you can pass a dictionary (or any other value) in basically any order. The only difference is if you use the *args, **kwargs (named by convention. Technically you can use any name you want) which expect to be in that order and the last one or two arguments.
For inserting into a DB you can use the odbc module:
import odbc
conn = odbc.odbc('servernamehere')
cursor = conn.cursor()
cursor.execute("INSERT INTO mytable VALUES (42, 'Spam on Eggs', 'Spam on Wheat')")
conn.commit()
You can read up or find plenty of examples on the odbc module - I'm sure there are other modules as well, but that one should work fine for you.
For retrieval you would use
cursor.execute("SELECT * FROM demo")
#Reads one record - returns a tuple
print cursor.fetchone()
#Reads the rest of the records - a list of tuples
print cursor.fetchall()
to make one of those records into a dictionary:
record = cursor.fetchone()
# Removes the 2nd element (at index 1) from the record
mydict[record[1]] = record[:1] + record[2:]
Though that practically screams for a generator expression if you want the whole shebang at once
mydict = dict((record[1], record[:1] + record[2:] for record in cursor.fetchall())
which should give you all of the records packed up neatly in a dictionary, using the name as a key.
HTH
a colon required after defs:
def getItemValue(item, key, defval):
...
boolean operators: In python !->not; &&->and and ||->or (see http://docs.python.org/release/2.5.2/lib/boolean.html for boolean operators). There's no ? : operator in python, there is a return (x) if (x) else (x) expression although I personally rarely use it in favour of plain if's.
booleans/None: True, False and None have capitals before them.
checking types of arguments: In python, you generally don't declare types of function parameters. You could go e.g. assert isinstance(item, dict), "dicts must be passed as the first parameter!" in the function although this kind of "strict checking" is often discouraged as it's not always necessary in python.
python keywords: default isn't a reserved python keyword and is acceptable as arguments and variables (just for the reference.)
style guidelines: PEP 8 (the python style guideline) states that module imports should generally only be one per line, though there are some exceptions (I have to admit I often don't follow the import sys and os on separate lines, though I usually follow it otherwise.)
file open modes: rt isn't valid in python 2.x - it will work, though the t will be ignored. See also http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files. It is valid in python 3 though, so I don't think it it'd hurt if you want to force text mode, raising exceptions on binary characters (use rb if you want to read non-ASCII characters.)
working with dictionaries: Python used to use dict.has_key(key) but you should use key in dict now (which has largely replaced it, see http://docs.python.org/library/stdtypes.html#mapping-types-dict.)
split file extensions: code = infile[0:-4] could be replaced with code = os.path.splitext(infile)[0] (which returns e.g. ('root', '.ext') with the dot in the extension (see http://docs.python.org/library/os.path.html#os.path.splitext).
EDIT: removed multiple variable declarations on a single line stuff and added some formatting. Also corrected the rt isn't a valid mode in python when in python 3 it is.