I cannot and do not know how to retrieve the values of an AST that I generated using the Lark parser.
My grammar is as follows, saved in a .lark file :
start: (un_handle ": ")? AMOUNT "|" p_handle ("," p_handle)* (" \"" MESSAGE* "\"")?
AMOUNT: /[0-9]+(\.[0-9][0-9]?)?/
un_handle: HANDLE
p_handle: HANDLE
HANDLE : /[A-Z][A-Z]/
MESSAGE : /[^"]+/
I then run:
testText = '10|GP "Bananas"'
testTree = parser.parse(testText)
and get:
Tree(start, [Token(AMOUNT, '10'), Tree(p_handle, [Token(HANDLE, 'GP')]), Token(MESSAGE, 'Bananas')])
But, what now?
I realize that I have to probably have to build a transformer, but what methods should I define and what should I call them? I just want to extract the values for AMOUNT, un_handle, p_handle (there may be more than one p_handle), and message into Python variables.
Thank you so much in advance! Have been debugging for hours.
First off, try adding a "line" rule to provide a reference point. Yes, your application does not probably use multiple lines, but it is usually good to include one just in case.
Now, write a subroutine to find each "line" token in the AST, and append it to a list.
Finally, I suggest that you process the resulting list using a subroutine based upon the eval() subroutine in LisPy.
Related
I have been looking through the spice instant answer source code. Yes, I know it is in maintenance mode, but I am still curious.
The documentation makes it fairly clear that the primary spice to API gets its numerical parameters $1, $2, etc. from the handle function.
My question: should there be secondary API calls included with spice alt_to as, say, in the movie spice IA, where do the numerical parameters to that API call come from?
Note, for instance, the $1 in both the movie_image and cast_image secondary API calls in spice alt_to at the preceding link. I am asking which regex capture returns those instances of $1.
I believe I see how this works now. The flow of information is still a bit murky to me, but at least I see how all of the requisite information is there.
I'll take the cryptocurrency instant answer as an example. The alt_to element in the perl package file at that link has a key named cryptonator. The corresponding .js file constructs a matching endpoint:
var endpoint = "/js/spice/cryptonator/" + from + "/" + to;
Note the general shape of the "remainder" past /js/spice/cryptonator: from/to, where from and to will be two strings.
Back in the perl package the hash alt_to->{cryptonator} has a key from which receives, I think, this remainder from/to. The value corresponding to that key is a regex meant to split up that string into its two constituents:
from => '([^/]+)/([^/]*)'
Applied to from/to, that regex will return $1=from and $2=to. These, then, are the $1 and $2 that go into
to => 'https://api.cryptonator.com/api/full/$1-$2'
in alt_to.
In short:
The to field of alt_to->{blah} receives its numerical parameters by having the from regex operate on the remainder past /js/spice/blah/ of the name of the corresponding endpoint constructed in the relevant .js file.
I'm playing around with Q's new .Q.trp, and the debug object which you're given in case of an error.
From what I see, the debug object contains a string representation of the source code where the error occured, as well as the offset in that string where the error was triggered.
For example,
{
something: 123;
x: 123; ThisThrowsAnError[456;789]; y: 123;
}[]
when executing above code, the debug object would contain this code in its entirity, as well as the offset pointing to (the beginning of) ThisThrowsAnError[].
My question is - based on this information, how can I extract the entire statement that cuased the error?
For example, in above example, I'd like to extract "ThisThorwsAnError[456;789]".
Things I've thought of so far...
Extract string from the offset, until the end of line. Doesn't work though, as there might be other statements in the same line (e.g. the "y: 123" above)
Parse the source code (literally, with "parse"). But then what..? The output could be anything (e.g. a lambda or a statement list), and then whatever it is still needs to be mapped back to the source locations somehow
Appreciate any ideas! Thanks
and thanks for looking!
I have an instance of YouTrack with several custom fields, some of which are String-type. I'm implementing a module to create a new issue via the YouTrack REST API's PUT request, and then updating its fields with user-submitted values by applying commands. This works great---most of the time.
I know that I can apply multiple commands to an issue at the same time by concatenating them into the query string, like so:
Type Bug Priority Critical add Fix versions 5.1 tag regression
will result in
Type: Bug
Priority: Critical
Fix versions: 5.1
in their respective fields (as well as adding the regression tag). But, if I try to do the same thing with multiple String-type custom fields, then:
Foo something Example Something else Bar P0001
results in
Foo: something Example Something else Bar P0001
Example:
Bar:
The command only applies to the first field, and the rest of the query string is treated like its String value. I can apply the command individually for each field, but is there an easier way to combine these requests?
Thanks again!
This is an expected result because all string after foo is considered a value of this field, and spaces are also valid symbols for string custom fields.
If you try to apply this command via command window in the UI, you will actually see the same result.
Such a good question.
I encountered the same issue and have spent an unhealthy amount of time in frustration.
Using the command window from the YouTrack UI I noticed it leaves trailing quotations and I was unable to find anything in the documentation which discussed finalizing or identifying the end of a string value. I was also unable to find any mention of setting string field values in the command reference, grammer documentation or examples.
For my solution I am using Python with the requests and urllib modules. - Though I expect you could turn the solution to any language.
The rest API will accept explicit strings in the POST
import requests
import urllib
from collections import OrderedDict
URL = 'http://youtrack.your.address:8000/rest/issue/{issue}/execute?'.format(issue='TEST-1234')
params = OrderedDict({
'State': 'New',
'Priority': 'Critical',
'String Field': '"Message to submit"',
'Other Details': '"Fold the toilet paper to a point when you are finished."'
})
str_cmd = ' '.join(' '.join([k, v]) for k, v in params.items())
command_url = URL + urllib.urlencode({'command':str_cmd})
result = requests.post(command_url)
# The command result:
# http://youtrack.your.address:8000/rest/issue/TEST-1234/execute?command=Priority+Critical+State+New+String+Field+%22Message+to+submit%22+Other+Details+%22Fold+the+toilet+paper+to+a+point+when+you+are+finished.%22
I'm sad to see this one go unanswered for so long. - Hope this helps!
edit:
After continuing my work, I have concluded that sending all the field
updates as a single POST is marginally better for the YouTrack
server, but requires more effort than it's worth to:
1) know all fields in the Issues which are string values
2) pre-process all the string values into string literals
3) If you were to send all your field updates as a single request and just one of them was missing, failed to set, or was an unexpected value, then the entire request will fail and you potentially lose all the other information.
I wish the YouTrack documentation had some mention or discussion of
these considerations.
I'm new to using Perl XML::SAX and I encountered a problem with the characters event that is triggered. I'm trying to parse a very large XML file using perl.
My goal is to get the content of each tag (I do not know the tag names - given any xml file, I should be able to crack the record pattern and return every record with its data and tag like Tag:Data).
While working with small files, everything is ok. But when running on a large file, the characters{} event does partial reading of the content. There is no specific pattern in the way it cuts down the reading. Sometimes its the starting few characters of data and sometimes its last few characters and sometimes its just one letter from the actual data.
The Sax Parser is:
$myhandler = MyFilter->new();
$parser = XML::SAX::ParserFactory->parser(Handler => $myhandler);
$parser->parse_file($filename);
And, I have written my own Handler called MyFilter and overridding the character method of the parser.
sub characters {
my ($self, $element) = #_;
$globalvar = $element->{Data};
print "content is: $globalvar \n";
}
Even this print statement, reads the values partially at times.
I also tried loading the Parsesr Package before calling the $parser->parse() as:
$XML::SAX::ParserPackage = "XML::SAX::ExpatXS";
Stil doesn't work. Could anyone help me out here? Thanks in advance!
Sounds like you need XML::Filter::BufferText.
http://search.cpan.org/dist/XML-Filter-BufferText/BufferText.pm
From the description "One common cause of grief (and programmer error) is that XML parsers aren't required to provide character events in one chunk. They can, but are not forced to, and most don't. This filter does the trivial but oft-repeated task of putting all characters into a single event."
It's very easy to use once you have it installed and will solve your partial character data problem.
I've started a little pet project to parse log files for Team Fortress 2. The log files have an event on each line, such as the following:
L 10/23/2009 - 21:03:43: "Mmm... Cycles!<67><STEAM_0:1:4779289><Red>" killed "monkey<77><STEAM_0:0:20001959><Blue>" with "sniperrifle" (customkill "headshot") (attacker_position "1848 813 94") (victim_position "1483 358 221")
Notice there are some common parts of the syntax for log files. Names, for example consist of four parts: the name, an ID, a Steam ID, and the team of the player at the time. Rather than rewriting this type of regular expression, I was hoping to abstract this out slightly.
For example:
my $name = qr/(.*)<(\d+)><(.*)><(Red|Blue)>/
my $kill = qr/"$name" killed "$name"/;
This works nicely, but the regular expression now returns results that depend on the format of $name (breaking the abstraction I'm trying to achieve). The example above would match as:
my ($name_1, $id_1, $steam_1, $team_1, $name_2, $id_2, $steam_2, $team_2)
But I'm really looking for something like:
my ($player1, $player2)
Where $player1 and $player2 would be tuples of the previous data. I figure the "killed" event doesn't need to know exactly about the player, as long as it has information to create the player, which is what these tuples provide.
Sorry if this is a bit of a ramble, but hopefully you can provide some advice!
I think I understand what you are asking. What you need to do is reverse your logic. First you need to regex to split the string into two parts, then you extract your tuples. Then your regex doesn't need to know about the name, and you just have two generic player parsing regexs. Here is an short example:
#!/usr/bin/perl
use strict;
use Data::Dumper;
my $log = 'L 10/23/2009 - 21:03:43: "Mmm... Cycles!<67><STEAM_0:1:4779289><Red>" killed "monkey<77><STEAM_0:0:20001959><
Blue>" with "sniperrifle" (customkill "headshot") (attacker_position "1848 813 94") (victim_position "1483 358 221")';
my ($player1_string, $player2_string) = $log =~ m/(".*") killed (".*?")/;
my #player1 = $player1_string =~ m/(.*)<(\d+)><(.*)><(Red|Blue)>/;
my #player2 = $player2_string =~ m/(.*)<(\d+)><(.*)><(Red|Blue)>/;
print STDERR Dumper(\#player1, \#player2);
Hope this what you were looking for.
Another way to do it, but the same strategy as dwp's answer:
my #players =
map { [ /(.*)<(\d+)><(.*)><(Red|Blue)>/ ] }
$log_text =~ /"([^\"]+)" killed "([^\"]+)"/
;
Your log data contains several items of balanced text (quoted and parenthesized), so you might consider Text::Balanced for parts of this job, or perhaps a parsing approach rather than a direct attack with regex. The latter might be fragile if the player names can contain arbitrary input, for example.
Consider writing a Regexp::Log subclass.