comment structure of yaml in ruamel.yaml - ruamel.yaml

I'm trying to understand the comment structure in ruamel.yaml library so that I can manipulate the comments correctly. What I don't get is why the comments in a .ca are inside a 4 items list? What are those other items and why they are always None?
For example, a comment attached to a sequence is like this with the first being CommentToken and the rest are Nones:
Comment(
start=None,
items={
2: [CommentToken('\n\n########### Foo ###########\n', line: 294, col: 0), None, None, None]
})
For a comment attached to a map, it seems to always be placed at the 3rd index?
Comment(
start=None,
items={
bar: [None, None, CommentToken('\n\n########## Bar ###########\n', line: 87, col: 0), None]
})
What's the difference between these and what's the significance of the order of their placements?

It only seems to you that for a YAML mapping the comments are placed in the same position, because you tend put the comments in the same place in your YAML file (i.e. at the end of the line after a value). I did the same when starting ruamel.yaml back in 2014, but over the years issues reported towards package pointed out that some users (of YAML) tend to put their comments in different places (e.g. between a key and a value on the next line, something not originally handled at all).
The significance is where the representer, on dumping the data structure, tries to insert the comments back into the YAML output stream. The code to do that has evolved over the last years to deal with more comment placements that were originally not handled (i.e. comments were lost or replaced on round-trip).
There is no documented API for this, the code that creates the Comment instances is changed in tandem with the code that processes them, so in principle the meaning of any position might change, and those meanings actually has changed in the past. That list structure might also be replaced by a dict with keys that are more indicative of where the comment (stored int he corresponding value) came from/has to be inserted in, than indices. A dict could also do away with the None values indicating empty slots.
I can't guarantee that the source code documentation is up to date but this is what it says:
# map key (mapping/omap/dict) or index (sequence/list) to a list of
# dict: post_key, pre_key, post_value, pre_value
# list: pre item, post item
IIRC some of these positions are no longer in use.
You should pin the version number of ruamel.yaml you work with. ruamel.yaml follows semantic versioning, but it there is no API for handling comment and it is pre version 1.0, so anything can change at any time. However the minor number tends to be bumped on major (internal) changes or on dropping support for no longer maintained Python versions. So stick with ruamel.yaml<0.18 if you get your code to work with 0.17.21, and test extensively if 0.18 and later will still do what you want.
An API for handling comments will at some point be forthcoming, and apart from dealing with more (exotic) comment placements, also have a way to specify how multi-line comments needs to be handled so they don't necessarily are attached to the preceding node like they are now, but can be assigned to the following node or split between those nodes according to some rule (e.g. first empty line).

Related

Mozilla Deep Speech SST suddenly can't spell

I am using deep speech for speech to text. Up to 0.8.1, when I ran transcriptions like:
byte_encoding = subprocess.check_output(
"deepspeech --model deepspeech-0.8.1-models.pbmm --scorer deepspeech-0.8.1-models.scorer --audio audio/2830-3980-0043.wav", shell=True)
transcription = byte_encoding.decode("utf-8").rstrip("\n")
I would get back results that were pretty good. But since 0.8.2, where the scorer argument was removed, my results are just rife with misspellings that make me think I am now getting a character level model where I used to get a word-level model. The errors are in a direction that looks like the model isn't correctly specified somehow.
Now I when I call:
byte_encoding = subprocess.check_output(
['deepspeech', '--model', 'deepspeech-0.8.2-models.pbmm', '--audio', myfile])
transcription = byte_encoding.decode("utf-8").rstrip("\n")
I now see errors like
endless -> "endules"
service -> "servic"
legacy -> "legaci"
earning -> "erting"
before -> "befir"
I'm not 100% that it is related to removing the scorer from the API, but it is one thing I see changing between releases, and the documentation suggested accuracy improvements in particular.
Short: The scorer matches letter output from the audio to actual words. You shouldn't leave it out.
Long: If you leave out the scorer argument, you won't be able to detect real world sentences as it matches the output from the acoustic model to words and word combinations present in the textual language model that is part of the scorer. And bear in mind that each scorer has specific lm_alpha and lm_beta values that make the search even more accurate.
The 0.8.2 version should be able to take the scorer argument. Otherwise update to 0.9.0, which has it as well. Maybe your environment is changed in a way. I would start in a new dir and venv.
Assuming you are using Python, you could add this to your code:
ds.enableExternalScorer(args.scorer)
ds.setScorerAlphaBeta(args.lm_alpha, args.lm_beta)
And check the example script.

how do duckduckgo spice IA secondary API calls get their parameters?

I have been looking through the spice instant answer source code. Yes, I know it is in maintenance mode, but I am still curious.
The documentation makes it fairly clear that the primary spice to API gets its numerical parameters $1, $2, etc. from the handle function.
My question: should there be secondary API calls included with spice alt_to as, say, in the movie spice IA, where do the numerical parameters to that API call come from?
Note, for instance, the $1 in both the movie_image and cast_image secondary API calls in spice alt_to at the preceding link. I am asking which regex capture returns those instances of $1.
I believe I see how this works now. The flow of information is still a bit murky to me, but at least I see how all of the requisite information is there.
I'll take the cryptocurrency instant answer as an example. The alt_to element in the perl package file at that link has a key named cryptonator. The corresponding .js file constructs a matching endpoint:
var endpoint = "/js/spice/cryptonator/" + from + "/" + to;
Note the general shape of the "remainder" past /js/spice/cryptonator: from/to, where from and to will be two strings.
Back in the perl package the hash alt_to->{cryptonator} has a key from which receives, I think, this remainder from/to. The value corresponding to that key is a regex meant to split up that string into its two constituents:
from => '([^/]+)/([^/]*)'
Applied to from/to, that regex will return $1=from and $2=to. These, then, are the $1 and $2 that go into
to => 'https://api.cryptonator.com/api/full/$1-$2'
in alt_to.
In short:
The to field of alt_to->{blah} receives its numerical parameters by having the from regex operate on the remainder past /js/spice/blah/ of the name of the corresponding endpoint constructed in the relevant .js file.

Apply Command to String-type custom fields with YouTrack Rest API

and thanks for looking!
I have an instance of YouTrack with several custom fields, some of which are String-type. I'm implementing a module to create a new issue via the YouTrack REST API's PUT request, and then updating its fields with user-submitted values by applying commands. This works great---most of the time.
I know that I can apply multiple commands to an issue at the same time by concatenating them into the query string, like so:
Type Bug Priority Critical add Fix versions 5.1 tag regression
will result in
Type: Bug
Priority: Critical
Fix versions: 5.1
in their respective fields (as well as adding the regression tag). But, if I try to do the same thing with multiple String-type custom fields, then:
Foo something Example Something else Bar P0001
results in
Foo: something Example Something else Bar P0001
Example:
Bar:
The command only applies to the first field, and the rest of the query string is treated like its String value. I can apply the command individually for each field, but is there an easier way to combine these requests?
Thanks again!
This is an expected result because all string after foo is considered a value of this field, and spaces are also valid symbols for string custom fields.
If you try to apply this command via command window in the UI, you will actually see the same result.
Such a good question.
I encountered the same issue and have spent an unhealthy amount of time in frustration.
Using the command window from the YouTrack UI I noticed it leaves trailing quotations and I was unable to find anything in the documentation which discussed finalizing or identifying the end of a string value. I was also unable to find any mention of setting string field values in the command reference, grammer documentation or examples.
For my solution I am using Python with the requests and urllib modules. - Though I expect you could turn the solution to any language.
The rest API will accept explicit strings in the POST
import requests
import urllib
from collections import OrderedDict
URL = 'http://youtrack.your.address:8000/rest/issue/{issue}/execute?'.format(issue='TEST-1234')
params = OrderedDict({
'State': 'New',
'Priority': 'Critical',
'String Field': '"Message to submit"',
'Other Details': '"Fold the toilet paper to a point when you are finished."'
})
str_cmd = ' '.join(' '.join([k, v]) for k, v in params.items())
command_url = URL + urllib.urlencode({'command':str_cmd})
result = requests.post(command_url)
# The command result:
# http://youtrack.your.address:8000/rest/issue/TEST-1234/execute?command=Priority+Critical+State+New+String+Field+%22Message+to+submit%22+Other+Details+%22Fold+the+toilet+paper+to+a+point+when+you+are+finished.%22
I'm sad to see this one go unanswered for so long. - Hope this helps!
edit:
After continuing my work, I have concluded that sending all the field
updates as a single POST is marginally better for the YouTrack
server, but requires more effort than it's worth to:
1) know all fields in the Issues which are string values
2) pre-process all the string values into string literals
3) If you were to send all your field updates as a single request and just one of them was missing, failed to set, or was an unexpected value, then the entire request will fail and you potentially lose all the other information.
I wish the YouTrack documentation had some mention or discussion of
these considerations.

Why does Google Docs operational transformation err on the side of deletion?

Tried out this experiment today: opened two offline editors for a Google document. In one, I bolded the first word. In the second, I deleted it. Regardless of which client I turn on first, the word always ends up deleted.
First off, why is this the case - my understanding of operational transformation is that ordering matters? In the simple example of two people typing "a" and "b" respectively, if the server receives "a" first, it will enforce the output of "ab" by transforming the second person's "b" event into a "pass one space, then add b" event, and vice versa.
Secondly, if ordering doesn't matter, are there technical reasons as to why Google Docs has chosen to err on the side of deletion? Or are the reasons largely simplicity for users?
Here is (5 years later I know) a graphical explanation of what why this happens. This is, in fact, what #osma describes but graphically explained:
When you bold a string in GDocs you are wrapping the string into a container, presumably <strong></strong> but they may use any other syntax. For simplicity lets just say that bold'ing a string just requires a "+" at the beginning of the word. So that, for simplicity, the text "lorem ipsum" would become lorem +ipsum and not lorem <strong>ipsum<strong>
1
Both Alice and Bob start with the text "Lorem ipsum"
2
Bob then deletes "ipsum". Notice that he sends the changeset retain(6), delete(5) to the server. A changeset is essentially a patch, Google probably used this library.
3
Now Alice bolds "ipsum" (adding "+"). She sends is the changeset retain(6), insert(+), retain(5)
4
Both changesets are traveling to the server. The server knows nothing about these sets yet.
5
Assuming the worst scenario: Bob's package arrives first and then the word will be deleted. The other scenario is obvious.
6
When Alice's package arrives, it will only add a "+" to the text because what she sent is only a single changeset.
7
Both texts are then broadcasted to the clients. This is the first one.
8
And this is the second one.
9
After patching these changesets into the original text you end up with "Lorem +". The server and all clients now have the same text. The + symbol would later be erased by an common HTML clean process which eliminates empty tags like <tag></tag>,
To test this demo go to: http://operational-transformation.github.io/visualization.html. There you can play with the texts and packages as they are sent/received.
It's not a question of erring on the side of deletion.
In cases where both clients have equality valid but differing versions of truth, Google Docs must elect to uphold one version, or else force users to resolve conflicts, something that is inherently complicated and hard to explain.
Thus, "truth" for Google Docs is consistency of the document, not discernment of intent. And consistency is best more easily achieved through destruction of information - a sort of tendency to entropy.
All this is basically my semi-philosophical BS though...
OT does not try to discern intent, it applies transformations in an order which produces a consistent result. When you apply both of those changes to a document, it does not matter which order you apply them in.
"first second" -> "first second" -> "first"
"first second" -> "first" -> "first"
In the second stream, the bold operation is performed on a zero-length string.
This is the exact same result you would get if in one of those documents you had italicized the second word: the end result would be "first second" regardless of transformation order. Delete transformation is no different.

Asp.Net MVC 2: How exactly does a view model bind back to the model upon post back?

Sorry for the length, but a picture is worth 1000 words:
In ASP.NET MVC 2, the input form field "name" attribute must contain exactly the syntax below that you would use to reference the object in C# in order to bind it back to the object upon post back. That said, if you have an object like the following where it contains multiple Orders having multiple OrderLines, the names would look and work well like this (case sensitive):
This works:
Order[0].id
Order[0].orderDate
Order[0].Customer.name
Order[0].Customer.Address
Order[0].OrderLine[0].itemID // first order line
Order[0].OrderLine[0].description
Order[0].OrderLine[0].qty
Order[0].OrderLine[0].price
Order[0].OrderLine[1].itemID // second order line, same names
Order[0].OrderLine[1].description
Order[0].OrderLine[1].qty
Order[0].OrderLine[1].price
However we want to add order lines and remove order lines at the client browser. Apparently, the indexes must start at zero and contain every consecutive index number to N.
The black belt ninja Phil Haack's blog entry here explains how to remove the [0] index, have duplicate names, and let MVC auto-enumerate duplicate names with the [0] notation. However, I have failed to get this to bind back using a nested object:
This fails:
Order.id // Duplicate names should enumerate at 0 .. N
Order.orderDate
Order.Customer.name
Order.Customer.Address
Order.OrderLine.itemID // And likewise for nested properties?
Order.OrderLine.description
Order.OrderLine.qty
Order.OrderLine.price
Order.OrderLine.itemID
Order.OrderLine.description
Order.OrderLine.qty
Order.OrderLine.price
I haven't found any advice out there yet that describes how this works for binding back nested ViewModels on post. Any links to existing code examples or strict examples on the exact names necessary to do nested binding with ILists?
Steve Sanderson has code that does this sort of thing here, but we cannot seem to get this to bind back to nested objects. Anything not having the [0]..[n] AND being consecutive in numbering simply drops off of the return object.
Any ideas?
We found a work around, by using the following:
Html.EditorFor(m => m, "ViewNameToUse", "FieldPrefix")
Where FieldPrefix is the "object[0]". This is hardly ideal, but it certainly works pretty well. It's simple and elegant.