yaml safe_load of many different objects - deserialization

I have a huge YAML file with tag definitions like in this snippet
- !!python/object:manufacturer.Manufacturer
name: aaaa
address: !!python/object:address.BusinessAddress {street: bbbb, number: 123, city: cccc}
And I needed to load this, first to make sure that the file is correct YAML, second to extract information at a certain tree-dept given a certain context. I had this all as nested dicts, lists and primitives that would be straightforward to do. But I cannot load the file as I don't have the original Python sources and class defines, so yaml.load() is out.
I have tried yaml.safe_load() but that throws and exception.
The BaseLoader loads the file, so it is correct. But that jumbles all primitive information (number, datetime) together as strings.
Then I found How to deserialize an object with PyYAML using safe_load?, since the file has over 100 different tags defined, the solutions presented there is impractical.
Do I have to use some other tools to strip the !!tag definitions (there is at least one occasion where !! occurs inside a normal string), so I can use safe_load. Is there simpler way to do solve this that I am not aware of?
If not I will have to do some string parsing to get the types back, but I thought I ask here first.

There is no need to go the cumbersome route of adding any of the classes if you want to use the safe_loader() on such a file.
You should have gotten an ConstructorError thrown in SafeConstructor.construct_undefined() in constructor.py. That method gets registered for the fall through case 'None' in the constructor.py file.
If you combine that info with the fact that all such tagged "classes" are mappings (and not lists or scalars), you can just copy the code for the mappings in a new function and register that as the fall-through case.
import yaml
from yaml.constructor import SafeConstructor
def my_construct_undefined(self, node):
data = {}
yield data
value = self.construct_mapping(node)
data.update(value)
SafeConstructor.add_constructor(
None, my_construct_undefined)
yaml_str = """\
- !!python/object:manufacturer.Manufacturer
name: aaaa
address: !!python/object:address.BusinessAddress {street: bbbb, number: 123, city: cccc}
"""
data = yaml.safe_load(yaml_str)
print(data)
should get you:
[{'name': 'aaaa', 'address': {'city': 'cccc', 'street': 'bbbb', 'number': 123}}]
without an exception thrown, and with "number" as integer not as string.

Related

comment structure of yaml in ruamel.yaml

I'm trying to understand the comment structure in ruamel.yaml library so that I can manipulate the comments correctly. What I don't get is why the comments in a .ca are inside a 4 items list? What are those other items and why they are always None?
For example, a comment attached to a sequence is like this with the first being CommentToken and the rest are Nones:
Comment(
start=None,
items={
2: [CommentToken('\n\n########### Foo ###########\n', line: 294, col: 0), None, None, None]
})
For a comment attached to a map, it seems to always be placed at the 3rd index?
Comment(
start=None,
items={
bar: [None, None, CommentToken('\n\n########## Bar ###########\n', line: 87, col: 0), None]
})
What's the difference between these and what's the significance of the order of their placements?
It only seems to you that for a YAML mapping the comments are placed in the same position, because you tend put the comments in the same place in your YAML file (i.e. at the end of the line after a value). I did the same when starting ruamel.yaml back in 2014, but over the years issues reported towards package pointed out that some users (of YAML) tend to put their comments in different places (e.g. between a key and a value on the next line, something not originally handled at all).
The significance is where the representer, on dumping the data structure, tries to insert the comments back into the YAML output stream. The code to do that has evolved over the last years to deal with more comment placements that were originally not handled (i.e. comments were lost or replaced on round-trip).
There is no documented API for this, the code that creates the Comment instances is changed in tandem with the code that processes them, so in principle the meaning of any position might change, and those meanings actually has changed in the past. That list structure might also be replaced by a dict with keys that are more indicative of where the comment (stored int he corresponding value) came from/has to be inserted in, than indices. A dict could also do away with the None values indicating empty slots.
I can't guarantee that the source code documentation is up to date but this is what it says:
# map key (mapping/omap/dict) or index (sequence/list) to a list of
# dict: post_key, pre_key, post_value, pre_value
# list: pre item, post item
IIRC some of these positions are no longer in use.
You should pin the version number of ruamel.yaml you work with. ruamel.yaml follows semantic versioning, but it there is no API for handling comment and it is pre version 1.0, so anything can change at any time. However the minor number tends to be bumped on major (internal) changes or on dropping support for no longer maintained Python versions. So stick with ruamel.yaml<0.18 if you get your code to work with 0.17.21, and test extensively if 0.18 and later will still do what you want.
An API for handling comments will at some point be forthcoming, and apart from dealing with more (exotic) comment placements, also have a way to specify how multi-line comments needs to be handled so they don't necessarily are attached to the preceding node like they are now, but can be assigned to the following node or split between those nodes according to some rule (e.g. first empty line).

In a Quiz, how can i instead of "Say A, B or C" let the user use one of the three response words?

VIA Actions Console, not Dialogflow!
After several days I finally finished to create a Quiz that works like this.
Google Mini says: "What is the capital of France? A) Rome, B) Berlin or C) Paris ?"
In my scene i have two conditions.
scene.slots.status == "FINAL" && intent.params.choosenABC.original == session.params.antwort
AND
!(scene.slots.status == "FINAL" && intent.params.choosenABC.original == session.params.antwort)
So here, these conditions check whether the user says the correct letter coming from the session parameter "antwort".
Everything works smooth as long as the user says "A", "B" or "C".
But how can i compare a condition to what the user says?
In the above example i want the user to be able to say "Rome" or "Berlin" or "Paris" and the condition to check these entries.
Thanks in advance!
You have a number of questions packed in there, so let's look at each.
Does input.params.original exist?
In short, yes. You can see the documentation of the request Intent object and you'll see that there is intent.params.*name*.original. Your question seems to suggest this would work as well.
There is also intent.params.*name*.resolved which contains the value after you take type aliases into account.
I found some variables on a Dialogflow forum...
Those only work if you're using Dialogflow and don't make any sense when you're looking at Action Builder.
How to match
You don't show the possible value of session.params.antwort or how you're setting antwort. But it sounds like it makes sense that you're setting this in a handler. So one thing you could do is to set antwort to the city name (or whatever the full word answer is) and set letter to the letter with the valid reply. Then test both against original to see if there is a match.
But, to be honest, that starts getting somewhat messy.
You also don't indicate how the Intent is setup, or if you're using an Entity Type to capture the answer. One great way to handle this, however, is to create a Type that can represent the answers, and use a runtime type override to set what the possible values and aliases for that value are. Then, you can control exactly what the valid value you will use to compare with will be.
For example, if you create a type named "Answer", then in your fulfillment when you ask the question you can set the possible values for this with something like
conv.session.typeOverrides = [{
name: 'Answer',
mode: 'TYPE_REPLACE',
synonym: {
entries: [
{
name: 'A',
synonyms: ['A', 'Rome']
},
{
name: 'B',
synonyms: ['B', 'Berlin']
},
{
name: 'C',
synonyms: ['C', 'Paris']
}
]
}
}];
If you then have an Intent with a parameter of type Answer with the name answer, then you can test if intent.parameter.answer.resolved contains the expected letter.
Adding a visual interface
Using runtime type overrides are particularly useful if you also decide to add support for a visual selection response such as a list. The visual response builds on the runtime type override to add visual aliases that users can select on appropriate devices. When you get the reply, however, it is treated as if they said the entry name.

Is there a way in play-json to read a key/value map, remembering the original ordering?

Given:
{
...
"fruits": {
"apple": { ... },
"banana": { ... },
"cherry": { ... },
...
"watermelon": { ... }
}
}
Is there a way in Scala to read this JSON map fruits of String -> object while remembering that the ordering of the keys was originally apple, banana, cherry ... watermelon?
[below added because I was asked why I wanted this and to provide a test example]
Normally I wouldn't care about the ordering (of a Map). I do not control the format of the input; it is a Map, not an Array. The real input is not fruits, it is alerts with alphanumeric keys, I just picked fruit names for simplicity. I am building test files based on the data. Suppose there were ten items in the first file, and I deleted "watermelon" (the 10th object) from the 2nd file. The code that read the first file put the objects in to a database. When it processes the objects (alerts), each produces an action. A test result is an EventAction(id:Long,action:String). The id is an auto-increment Long from the database; I do not control that. After processing the first file, it turns out that the alert associated with "watermelon" was created with an id of 2, not 10. When I'm building my test for the processing of the second file (the one without "watermelon"), if I think the id will be 10, the test will fail not because I predicted the action incorrectly, but because I didn't know the id would be 2 instead of 10.
One of way of dealing with Map ("you shouldn't care about ordering so you won't get any clues as to the original ordering in the JSON file") is I can build ad-hoc SQL to find out what database id was created for each key, just for the tests. Before I write ad-hoc SQL (the company normally asks all DB interactions be through stored procedures written by a DBA), I thought, "Wouldn't it be neat, at the time of reading the JSON, to remember the ordering in the moment, before it is lost."
Play-json(at least 2.6.9) parses js object from top to bottom collecting all fields to content: ListBuffer[(String, JsValue)]. When all fields in the object are parsed, then JsObject is instantiated via apply method
def apply(fields: Seq[(String, JsValue)]): JsObject = new JsObject(mutable.LinkedHashMap(fields: _*))
So the answer to your question: if you use the default Reads for Map from the library, the ordering is already kept.

How can I Count the Number of Rows of the JSON File?

My JSON file below contains six rows:
[
{"events":[[{"v":"INPUT","n":"type"},{"v":"2016-08-24 14:23:12 EST","n":"est"}]],
"apps":[],
"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},
"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.1.18","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"12","n":"cpu"},{"v":"154665","n":"seq"},{"v":"2016-08-24 14:23:17 EST","n":"est"}]
},
{"events":[[{"v":"INPUT","n":"type"},{"v":"2016-08-24 14:23:14 EST","n":"est"}]],"apps":[],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.1.18","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"5","n":"cpu"},{"v":"154666","n":"seq"},{"v":"2016-08-24 14:23:23 EST","n":"est"}]},
{"events":[[{"v":"LOGOFF","n":"type"},{"v":"2016-08-24 14:24:04 EST","n":"est"}]],"apps":[],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.1.18","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"0","n":"cpu"},{"v":"154667","n":"seq"},{"v":"2016-08-24 14:24:05 EST","n":"est"}]},
{"events":[],"apps":[[{"v":"ccSvcHst","n":"pname"},{"v":"7704","n":"pid"},{"v":"Old Virus Definition File","n":"title"},{"v":"O","n":"state"},{"v":"5376","n":"mem"},{"v":"0","n":"cpu"}]],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.0.5","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"29","n":"cpu"},{"v":"154668","n":"seq"},{"v":"2016-09-25 16:57:24 EST","n":"est"}]},
{"events":[],"apps":[[{"v":"ccSvcHst","n":"pname"},{"v":"7704","n":"pid"},{"v":"Old Virus Definition File","n":"title"},{"v":"F","n":"state"},{"v":"5588","n":"mem"},{"v":"0","n":"cpu"}]],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.0.5","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"16","n":"cpu"},{"v":"154669","n":"seq"},{"v":"2016-09-25 16:57:30 EST","n":"est"}]},
{"events":[],"apps":[[{"v":"ccSvcHst","n":"pname"},{"v":"7704","n":"pid"},{"v":"Old Virus Definition File","n":"title"},{"v":"F","n":"state"},{"v":"5588","n":"mem"},{"v":"0","n":"cpu"}]],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.0.5","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"17","n":"cpu"},{"v":"154670","n":"seq"},{"v":"2016-09-25 16:57:36 EST","n":"est"}]}
]
The JSON looks like the below records:
JSON
0
1
2
3
4
5
Required Output:
Count
6
Ok, you are in Spark, and you need to turn your Json into dataset, and use the appropriate operation on it. So here, I wrote the workflow to go from Json to dataset in general and the required steps with examples. I think this way of answering is more beneficial because you can see the steps and then you can decide what to do with the information.
Input Data: You have the Json, that is your data you should start working on. Then you need to decide which fields are important. Counting on its own, is the small part of most cases and you don't want to load all the fields which may not be necessary.
Create a Case Class: you can use case classes because then you can serialize your input data. To keep it simple I have a doctor which belongs to a department, and I get the data in Json. I could have the following case classes:
case class Department(name: String, address: String)
case class Doctor(name: String, department: Department)
so as you can see from the above code, I go bottom up to create the data I want to work on. In you Json, there are loads of fields (e.g., v) that I can't understand the meaning behind it. So be careful not to mix them.
Have a dataaset: Ok, the below code serialize a Json to the case class we defined:
spark.read.json("doctorsData.json).as[Doctor]
couple of points. The spark is a spark session, which you need to create. Here its instance is spark it could be anything. You also need to import spark.implicits._.
In Business!: Ok now you are in business, and in the Spark world. It is just the matter of using count() to count your dataset. THe following method shows how to count it:
def recordsCount(myDataset: Dataset[Doctor]): Long = myDataset.count()
A file of three records I have - with correct formatting, Spark 2.x., reading into a dataframe / dataset:
import org.apache.spark.sql.Column
import org.apache.spark.sql.functions._
val df = spark.read
.option("multiLine", true)
.option("mode", "PERMISSIVE")
.option("inferSchema", true)
.json("/FileStore/tables/json_01.txt")
df.select("*").show(false)
df.printSchema()
df.count()
If just a total tally count, then this will suffice, last line.
res15: Long = 3

Apply Command to String-type custom fields with YouTrack Rest API

and thanks for looking!
I have an instance of YouTrack with several custom fields, some of which are String-type. I'm implementing a module to create a new issue via the YouTrack REST API's PUT request, and then updating its fields with user-submitted values by applying commands. This works great---most of the time.
I know that I can apply multiple commands to an issue at the same time by concatenating them into the query string, like so:
Type Bug Priority Critical add Fix versions 5.1 tag regression
will result in
Type: Bug
Priority: Critical
Fix versions: 5.1
in their respective fields (as well as adding the regression tag). But, if I try to do the same thing with multiple String-type custom fields, then:
Foo something Example Something else Bar P0001
results in
Foo: something Example Something else Bar P0001
Example:
Bar:
The command only applies to the first field, and the rest of the query string is treated like its String value. I can apply the command individually for each field, but is there an easier way to combine these requests?
Thanks again!
This is an expected result because all string after foo is considered a value of this field, and spaces are also valid symbols for string custom fields.
If you try to apply this command via command window in the UI, you will actually see the same result.
Such a good question.
I encountered the same issue and have spent an unhealthy amount of time in frustration.
Using the command window from the YouTrack UI I noticed it leaves trailing quotations and I was unable to find anything in the documentation which discussed finalizing or identifying the end of a string value. I was also unable to find any mention of setting string field values in the command reference, grammer documentation or examples.
For my solution I am using Python with the requests and urllib modules. - Though I expect you could turn the solution to any language.
The rest API will accept explicit strings in the POST
import requests
import urllib
from collections import OrderedDict
URL = 'http://youtrack.your.address:8000/rest/issue/{issue}/execute?'.format(issue='TEST-1234')
params = OrderedDict({
'State': 'New',
'Priority': 'Critical',
'String Field': '"Message to submit"',
'Other Details': '"Fold the toilet paper to a point when you are finished."'
})
str_cmd = ' '.join(' '.join([k, v]) for k, v in params.items())
command_url = URL + urllib.urlencode({'command':str_cmd})
result = requests.post(command_url)
# The command result:
# http://youtrack.your.address:8000/rest/issue/TEST-1234/execute?command=Priority+Critical+State+New+String+Field+%22Message+to+submit%22+Other+Details+%22Fold+the+toilet+paper+to+a+point+when+you+are+finished.%22
I'm sad to see this one go unanswered for so long. - Hope this helps!
edit:
After continuing my work, I have concluded that sending all the field
updates as a single POST is marginally better for the YouTrack
server, but requires more effort than it's worth to:
1) know all fields in the Issues which are string values
2) pre-process all the string values into string literals
3) If you were to send all your field updates as a single request and just one of them was missing, failed to set, or was an unexpected value, then the entire request will fail and you potentially lose all the other information.
I wish the YouTrack documentation had some mention or discussion of
these considerations.