Identifying .patch files - diff

Here is the problem. I am writing a piece of software that query bug attachments from a Bug Tracking System. I am able to filter the query by only getting text attachment (plain/text etc) and I want to keep only valid patch files (files that have a similar diff -u output that can be applied as a patch to a source file). So I need a way to specify which attachment is a valid patch. For example:
let say I have this attachment with the following content:
Index: compiler/cpp/src/generate/t_csharp_generator.cc
--- compiler/cpp/src/generate/t_csharp_generator.cc (revision 1033689)
+++ compiler/cpp/src/generate/t_csharp_generator.cc (working copy)
## -456,7 +456,7 ##
t = ((t_typedef*)t)->get_type();
}
if ((*m_iter)->get_value() != NULL) {
- print_const_value(out, "this." + (*m_iter)->get_name(), t, (*m_iter)->get_value(), true, true);
+ print_const_value(out, "this._" + (*m_iter)->get_name(), t, (*m_iter)->get_value(), true, true);
}
}
How can I know this is a valid patch? Is there a regex to match some possible diff -u outputs so I can write something like this in java:
String attachmentContent = .....
if(attachmentContent.matches(regex))...
Thank you in advance,
Elvis

You won't be able to test with a simple Regex, nor a complex by the way, it would require a regular expression engine which is able to interpret the portions between ## as numbers, and define repetition counts from that, I don't know a RE engine which does that.
On the other hand you should not have too much problems finding a library parsing and loading Unix patches (and this is definitely not a compute-intensive task), so simply trying to load the patch would allow you to validate it. e.g. Java diff utils should do that straight forward.

Related

SNMPTraps: pysnmp and "disableAuthorization"

I have written an SNMP trap receiver but currently I have to hardcode all the SNMPv2 community strings to be able to receive the traps.
How do I emulate the 'disableAuthorization' functionality from snmptrapd in pysnmp?
I have tried to not set the community string: config.addV1System(snmpEngine, 'my-area') but this errors about a missing param. I have also tried an empty string: config.addV1System(snmpEngine, 'my-area', '') but this stops all traps being processed.
What is the best way to allow receiving all traps through pysnmp regardless of the community string they were sent with? I haven't found anything in the pysnmp docs that could help me
I had made progress on setting up an observer for V1/2 (V3 to be added later) that picked up on notifications with an unknown community string and then called addV1System on the fly to dynamically add it in, like so:
When setting up the transportDispatcher:
snmpEngine.observer.registerObserver(_handle_unauthenticated_snmptrap,
"rfc2576.prepareDataElements:sm-failure", "rfc3412.prepareDataElements:sm-failure")
And then:
def _handle_unauthenticated_snmptrap(snmpEngine, execpoint, variables, cbCtx):
if variables["securityLevel"] in [ 1, 2 ] and variables["statusInformation"]["errorIndication"] == errind.unknownCommunityName:
new_comm_string = "%s" % variables["statusInformation"].get("communityName", "")
config.addV1System(my_snmpEngine, 'my-area', new_comm_string)
return
else:
msg = "%s" % variables["statusInformation"]
print(f"Trap: { msg }")
However, this will always throw away the first trap received while adding any new community string (and then there is the problem whereby when the daemon is restarted the updated list of community strings is lost).
In looking to improve this I then found hidden away in the docs this little gem:
https://pysnmp.readthedocs.io/en/latest/examples/v3arch/asyncore/manager/ntfrcv/advanced-topics.html#serve-snmp-community-names-defined-by-regexp
This example receives a notification and rewrites the community string into 'public' so all traps will be correctly received.
In essence, when setting up the transportDispatcher:
my_snmpEngine.observer.registerObserver(_trap_observer,
'rfc2576.processIncomingMsg:writable', cbCtx='public')
And then:
def _trap_observer(snmpEngine, execpoint, variables, community_string):
variables['communityName'] = variables['communityName'].clone(community_string)

Simple `to_tsvector` configuration - postgres

How can I change the to_tsvector configuration to use a simple tokenization rule like:
lowercase
split by spaces only
Executing the following query:
SELECT to_tsvector('english', 'birthday=19770531 Name=John-Oliver Age=44 Code=AAA-345')
I get these lexemes:
'-345':9 '19770531':2 '44':6 'aaa':8 'age':5 'birthday':1 'code':7 'john':4 'name':3
The kind of searching I'm looking for is like:
(!birthday | birthday=19770531) & (code=AAA-345)
It means, get me all records that has a text "birthday=19770531" or doesn't have "birthday" at all, and a text equals to "code=AAA-345"). The way lexemes are being created it is not possible. I was expecting to have something like this:
'birthday=19770531':1 'age=44':2 'code=aaa-345':4 'name=john-oliver':3
You would have to code a custom parser. This can only be done in C.
But you might be able to use the existing testing parser test_parser, it seems to do what you want. If not, it would at least be a good starting point.
The problem may be that this is in src/test/modules/, and I don't think it ships with most installation packaging. So it might take some effort to get it to install. It would depend on your OS, version, and package manager.

Mozilla Deep Speech SST suddenly can't spell

I am using deep speech for speech to text. Up to 0.8.1, when I ran transcriptions like:
byte_encoding = subprocess.check_output(
"deepspeech --model deepspeech-0.8.1-models.pbmm --scorer deepspeech-0.8.1-models.scorer --audio audio/2830-3980-0043.wav", shell=True)
transcription = byte_encoding.decode("utf-8").rstrip("\n")
I would get back results that were pretty good. But since 0.8.2, where the scorer argument was removed, my results are just rife with misspellings that make me think I am now getting a character level model where I used to get a word-level model. The errors are in a direction that looks like the model isn't correctly specified somehow.
Now I when I call:
byte_encoding = subprocess.check_output(
['deepspeech', '--model', 'deepspeech-0.8.2-models.pbmm', '--audio', myfile])
transcription = byte_encoding.decode("utf-8").rstrip("\n")
I now see errors like
endless -> "endules"
service -> "servic"
legacy -> "legaci"
earning -> "erting"
before -> "befir"
I'm not 100% that it is related to removing the scorer from the API, but it is one thing I see changing between releases, and the documentation suggested accuracy improvements in particular.
Short: The scorer matches letter output from the audio to actual words. You shouldn't leave it out.
Long: If you leave out the scorer argument, you won't be able to detect real world sentences as it matches the output from the acoustic model to words and word combinations present in the textual language model that is part of the scorer. And bear in mind that each scorer has specific lm_alpha and lm_beta values that make the search even more accurate.
The 0.8.2 version should be able to take the scorer argument. Otherwise update to 0.9.0, which has it as well. Maybe your environment is changed in a way. I would start in a new dir and venv.
Assuming you are using Python, you could add this to your code:
ds.enableExternalScorer(args.scorer)
ds.setScorerAlphaBeta(args.lm_alpha, args.lm_beta)
And check the example script.

Apply Command to String-type custom fields with YouTrack Rest API

and thanks for looking!
I have an instance of YouTrack with several custom fields, some of which are String-type. I'm implementing a module to create a new issue via the YouTrack REST API's PUT request, and then updating its fields with user-submitted values by applying commands. This works great---most of the time.
I know that I can apply multiple commands to an issue at the same time by concatenating them into the query string, like so:
Type Bug Priority Critical add Fix versions 5.1 tag regression
will result in
Type: Bug
Priority: Critical
Fix versions: 5.1
in their respective fields (as well as adding the regression tag). But, if I try to do the same thing with multiple String-type custom fields, then:
Foo something Example Something else Bar P0001
results in
Foo: something Example Something else Bar P0001
Example:
Bar:
The command only applies to the first field, and the rest of the query string is treated like its String value. I can apply the command individually for each field, but is there an easier way to combine these requests?
Thanks again!
This is an expected result because all string after foo is considered a value of this field, and spaces are also valid symbols for string custom fields.
If you try to apply this command via command window in the UI, you will actually see the same result.
Such a good question.
I encountered the same issue and have spent an unhealthy amount of time in frustration.
Using the command window from the YouTrack UI I noticed it leaves trailing quotations and I was unable to find anything in the documentation which discussed finalizing or identifying the end of a string value. I was also unable to find any mention of setting string field values in the command reference, grammer documentation or examples.
For my solution I am using Python with the requests and urllib modules. - Though I expect you could turn the solution to any language.
The rest API will accept explicit strings in the POST
import requests
import urllib
from collections import OrderedDict
URL = 'http://youtrack.your.address:8000/rest/issue/{issue}/execute?'.format(issue='TEST-1234')
params = OrderedDict({
'State': 'New',
'Priority': 'Critical',
'String Field': '"Message to submit"',
'Other Details': '"Fold the toilet paper to a point when you are finished."'
})
str_cmd = ' '.join(' '.join([k, v]) for k, v in params.items())
command_url = URL + urllib.urlencode({'command':str_cmd})
result = requests.post(command_url)
# The command result:
# http://youtrack.your.address:8000/rest/issue/TEST-1234/execute?command=Priority+Critical+State+New+String+Field+%22Message+to+submit%22+Other+Details+%22Fold+the+toilet+paper+to+a+point+when+you+are+finished.%22
I'm sad to see this one go unanswered for so long. - Hope this helps!
edit:
After continuing my work, I have concluded that sending all the field
updates as a single POST is marginally better for the YouTrack
server, but requires more effort than it's worth to:
1) know all fields in the Issues which are string values
2) pre-process all the string values into string literals
3) If you were to send all your field updates as a single request and just one of them was missing, failed to set, or was an unexpected value, then the entire request will fail and you potentially lose all the other information.
I wish the YouTrack documentation had some mention or discussion of
these considerations.

How to save variable after closing mIRC?

I'm new to do this language and i'm trying to code my own bot. I alredy got the basics and manage to use variables and aliases, however i was looking forward to do a mini-game in my chat in which you could have your own pet, name it and level it up.
I could do all this, however my problem resides in that at the end of the day, i would close the program and all the pets would go away, and that kind of destroys the purpose of it.
I was wondering if there was any way in i could save these variables and reload them each time i open the program, maybe save them on a .txt?
Any suggestion are greatly appreciated.
I agree with one of the comments that it's best to go with .ini files for this problem.
An example of the syntax, taken from the url linked above:
writeini reminder.ini birthday jenna 2/28/1983
writeini reminder.ini birthday Mike 10/10/1990
This produces the following file:
[birthday]
jenna = 2/28/1983
Mike = 10/10/1990
And is to be read like this:
echo -a Mike: $readini(reminder.ini, n, birthday, mike)
echo -a Jenna: $readini(reminder.ini, n, birthday, jenna)
If you want more flexibility to define your own data format, you can also revert to plain text files. The basic /write and $read functions have some pretty neat functionality: see the docs
Something like this should work for writing:
; search if the pet is already saved
$read(pets.txt,ns,%petname)
if ($readn == 0) {
; append to end of file
write pets.txt %petname %age %mood
}
else {
; replace line
write -l $readn pets.txt %petname %age %mood
}
To retrieve specific pets:
var %pet_info = $read(pets.txt, ns, %petname)
; Split the found line on space (ASCII-code 32)
tokenize 32 %pet_info
var %age = $2
var %mood = $3
This returns the line that starts with the petname you're looking for.