Procedural macro parsing weirdness in Rust - macros

I'm trying to parse a macro similar to this one:
annoying!({
hello({
// some stuff
});
})
Trying to do this with a procedural macro definition similar to the following, but I'm getting a behaviour I didn't expect and I'm not sure I'm doing something I'm not supposed to or I found a bug. In the following example, I'm trying to find the line where each block is,
for the first block (the one just inside annoying!) it reports the correct line, but for the inner block, when I try to print them it's always 1, no matter where the code is etc.
#![crate_type="dylib"]
#![feature(macro_rules, plugin_registrar)]
extern crate syntax;
extern crate rustc;
use macro_result::MacroResult;
use rustc::plugin::Registry;
use syntax::ext::base::{ExtCtxt, MacResult};
use syntax::ext::quote::rt::ToTokens;
use syntax::codemap::Span;
use syntax::ast;
use syntax::parse::tts_to_parser;
mod macro_result;
#[plugin_registrar]
pub fn plugin_registrar(registry: &mut Registry) {
registry.register_macro("annoying", macro_annoying);
}
pub fn macro_annoying(cx: &mut ExtCtxt, _: Span, tts: &[ast::TokenTree]) -> Box<MacResult> {
let mut parser = cx.new_parser_from_tts(tts);
let lo = cx.codemap().lookup_char_pos(parser.span.lo);
let hi = cx.codemap().lookup_char_pos(parser.span.hi);
println!("FIRST LO {}", lo.line); // real line for annoying! all cool
println!("FIRST HI {}", hi.line); // real line for annoying! all cool
let block_tokens = parser.parse_block().to_tokens(cx);
let mut block_parser = tts_to_parser(cx.parse_sess(), block_tokens, cx.cfg());
block_parser.bump(); // skip {
block_parser.parse_ident(); // hello
block_parser.bump(); // skip (
// block lines
let lo = cx.codemap().lookup_char_pos(block_parser.span.lo);
let hi = cx.codemap().lookup_char_pos(block_parser.span.hi);
println!("INNER LO {}", lo.line); // line 1? wtf?
println!("INNER HI {}", hi.line); // line 1? wtf?
MacroResult::new(vec![])
}
I think the problem might be the fact that I'm creating a second parser to parse the inner block, and that might be making the Span types inside it go crazy, but I'm not sure that's the problem or how to keep going from here. The reason I'm creating this second parser is so I can recursively parse what's inside each of the blocks, I might be doing something I'm not supposed to, in which case a better suggestion would be very welcome.

I believe this is #15962 (and #16472), to_tokens has a generally horrible implementation. Specifically, anything non-trivial uses ToSource, which just turns the code to a string, and then retokenises that (yes, it's not great at all!).
Until those issues are fixed, you should just handle the original tts directly as much as possible. You could approximate the right span using the .span of the parsed block (i.e. return value of parse_block), which will at least focus the user's attention on the right area.

Related

Call a PostgreSQL function and get result back with no loop

I have a simple rust program that interacts with a PostgreSQL database.
The actual code is:
for row in &db_client.query("select magic_value from priv.magic_value();", &[]).unwrap()
{
magic_value = row.get("magic_value");
println!("Magic value is = {}", magic_value);
}
And.. it works. But I don't like it: I know this function will return one and only one value.
From the example I found, for example here: https://docs.rs/postgres/latest/postgres/index.html
and here: https://tms-dev-blog.com/postgresql-database-with-rust-how-to/
You always have a recordset to loop on.
Which is the clean way to call a function without looping?
query returns a Result<Vec<Row>, _>. You are already unwrapping the Vec, so you can just use it directly instead of looping. By turning the Vec into an owning iterator yourself, you can even easily obtain a Row instead of a &Row.
magic_value = db_client.query("select magic_value from priv.magic_value();", &[])
.unwrap() // -> Vec<Row>
.into_iter() // -> impl Iterator<Item=Row>
.next() // -> Option<Row>
.unwrap() // -> Row
.get("magic_value");

Where does a variable in a match arm in a loop come from?

I am trying to implement an HTTP client in Rust using this as a starting point. I was sent to this link by the rust-lang.org site via one of their rust-by-example suggestions in their TcpStream page. I'm figuring out how to read from a TcpStream. I'm trying to follow this code:
fn handle_client(mut stream: TcpStream) {
// read 20 bytes at a time from stream echoing back to stream
loop {
let mut read = [0; 1028];
match stream.read(&mut read) {
Ok(n) => {
if n == 0 {
// connection was closed
break;
}
stream.write(&read[0..n]).unwrap();
}
Err(err) => {
panic!(err);
}
}
}
}
Where does the n variable come from? What exactly is it? The author says it reads 20 bytes at a time; where is this coming from?
I haven't really tried anything yet because I want to understand before I do.
I strongly encourage you to read the documentation for the tools you use. In this case, The match Control Flow Operator from The Rust Programming Language explains what you need to know.
From the Patterns that Bind to Values section:
In the match expression for this code, we add a variable called state to the pattern that matches values of the variant Coin::Quarter. When a Coin::Quarter matches, the state variable will bind to the value of that quarter’s state. Then we can use state in the code for that arm, like so:
fn value_in_cents(coin: Coin) -> u8 {
match coin {
Coin::Penny => 1,
Coin::Nickel => 5,
Coin::Dime => 10,
Coin::Quarter(state) => {
println!("State quarter from {:?}!", state);
25
},
}
}
If we were to call value_in_cents(Coin::Quarter(UsState::Alaska)), coin would be Coin::Quarter(UsState::Alaska). When we compare that value with each of the match arms, none of them match until we reach Coin::Quarter(state). At that point, the binding for state will be the value UsState::Alaska. We can then use that binding in the println! expression, thus getting the inner state value out of the Coin enum variant for Quarter.
There is an entire chapter about the pattern matching syntax available and where it can be used.
Figured it out, this is what's happening:
match stream.read(&mut read) {
This line is telling the software to pass stream.read(&mut read) to Ok(n) because stream.read returns the number of bytes read. I'm still not sure why they specify 20 bytes at a time as being read.

How to select symbols onWorkspaceSymbol

I am developing an extension for visual studio code using language server protocol, and I am including the support for "Go to symbol in workspace". My problem is that I don't know how to select the matches...
Actually I use this function I wrote:
function IsInside(word1, word2)
{
var ret = "";
var i1 = 0;
var lenMatch =0, maxLenMatch = 0, minLenMatch = word1.length;
for(var i2=0;i2<word2.length;i2++)
{
if(word1[i1]==word2[i2])
{
lenMatch++;
if(lenMatch>maxLenMatch) maxLenMatch = lenMatch;
ret+=word1[i1];
i1++;
if(i1==word1.length)
{
if(lenMatch<minLenMatch) minLenMatch = lenMatch;
// Trying to filter like VSCode does.
return maxLenMatch>=word1.length/2 && minLenMatch>=2? ret : undefined;
}
} else
{
ret+="Z";
if(lenMatch>0 && lenMatch<minLenMatch)
minLenMatch = lenMatch;
lenMatch=0;
}
}
return undefined;
}
That return the sortText if the word1 is inside the word2, undefined otherwise. My problem are cases like this:
My algorithm see that 'aller' is inside CallServer, but the interface does not mark it like expected.
There is a library or something that I must use for this? the code of VSCode is big and complex and I don't know where start looking for this information...
VSCode's API docs for provideWorkspaceSymbols() provide the following guidance (which I don't think your example violates):
The query-parameter should be interpreted in a relaxed way as the editor will apply its own highlighting and scoring on the results. A good rule of thumb is to match case-insensitive and to simply check that the characters of query appear in their order in a candidate symbol. Don't use prefix, substring, or similar strict matching.
These docs were added in response to this discussion, where somebody had very much the same issue as you.
Having a brief look at VSCode sources, internally it seems to use filters.matchFuzzy2() for the highlighting (see here and here). I don't think it's exposed in the API, so you would probably have to copy it if you wanted the behavior to match exactly.

Go: C like macros for writing test code

When writing test code, I do a lot of this
if (!cond) {
t.Fatal("error message")
}
It's a bit tedious. So I'd like to achieve the following
CHECK(cond, "error message")
So I attempted this
func CHECK(t *testing.T, cond bool, fmt string, a ...interface{}) {
if !cond {
t.Fatal(fmt, a)
}
}
If it were a C macro it would've worked perfectly. But in Go, the line number where the failure is is wrong.
Is there a fix for this?
Sadly you can't do that.
A workaround would be to get the line / function yourself, something like the trace function from https://stackoverflow.com/a/25954534/145587.
You could possibly make use of runtime.Callers()&plus;runtime.Caller(): the first one gives you the call stack while the second allows to extract the debug info about any arbitrary stack frame (obtained from that list).
Your CHECK() function is always one function call down the place the check should have happened at if it was a macro, so you can inspect the stack frame just above.
Update: the only functon which is really needed is runtime.Caller(). Here's your case, simplified:
package main
import (
"runtime"
"testing"
)
func CHECK(t *testing.T, cond bool) {
if !cond {
_, fname, lineno, ok := runtime.Caller(1)
if !ok {
fname, lineno = "<UNKNOWN>", -1
}
t.Fatalf("FAIL: %s:%d", fname, lineno)
}
}
func TestFoo(t *testing.T) {
CHECK(t, 12 == 13)
}
When saved as check_test.go and run via go test, it produces:
$ go test
--- FAIL: TestFoo (0.00 seconds)
check_test.go:14: FAIL: /home/kostix/devel/go/src/check/check_test.go:19
FAIL
exit status 1
FAIL check 0.001s
where line 19 is the line a call to CHECK() is located inside TestFoo().
While the above answer to use CHECK() function will work, I think that the actual answer is code readibility. Much of Go has been designed as a compromise to increase readibility among the community as a whole. See gofmt for example. Most people will agree that it's format is not best for every case. But having a convention agreed to by all is a huge plus for Go. The same answer is to your question. Go is for writing code for your peers, not for yourself. So don't think "I prefer this." Think "what will people reading my code understand."
Your original code should be like this, without parenthesis.
if !cond {
t.Fatal("error message")
}
This is idiomatic and every Go coder will recognize it instantly. That is the point.

NodeJS: What is the proper way to handling TCP socket streams ? Which delimiter should I use?

From what I understood here, "V8 has a generational garbage collector. Moves objects aound randomly. Node can’t get a pointer to raw string data to write to socket." so I shouldn't store data that comes from a TCP stream in a string, specially if that string becomes bigger than Math.pow(2,16) bytes. (hope I'm right till now..)
What is then the best way to handle all the data that's comming from a TCP socket ? So far I've been trying to use _:_:_ as a delimiter because I think it's somehow unique and won't mess around other things.
A sample of the data that would come would be something_:_:_maybe a large text_:_:_ maybe tons of lines_:_:_more and more data
This is what I tried to do:
net = require('net');
var server = net.createServer(function (socket) {
socket.on('connect',function() {
console.log('someone connected');
buf = new Buffer(Math.pow(2,16)); //new buffer with size 2^16
socket.on('data',function(data) {
if (data.toString().search('_:_:_') === -1) { // If there's no separator in the data that just arrived...
buf.write(data.toString()); // ... write it on the buffer. it's part of another message that will come.
} else { // if there is a separator in the data that arrived
parts = data.toString().split('_:_:_'); // the first part is the end of a previous message, the last part is the start of a message to be completed in the future. Parts between separators are independent messages
if (parts.length == 2) {
msg = buf.toString('utf-8',0,4) + parts[0];
console.log('MSG: '+ msg);
buf = (new Buffer(Math.pow(2,16))).write(parts[1]);
} else {
msg = buf.toString() + parts[0];
for (var i = 1; i <= parts.length -1; i++) {
if (i !== parts.length-1) {
msg = parts[i];
console.log('MSG: '+msg);
} else {
buf.write(parts[i]);
}
}
}
}
});
});
});
server.listen(9999);
Whenever I try to console.log('MSG' + msg), it will print out the whole buffer, so it's useless to see if something worked.
How can I handle this data the proper way ? Would the lazy module work, even if this data is not line oriented ? Is there some other module to handle streams that are not line oriented ?
It has indeed been said that there's extra work going on because Node has to take that buffer and then push it into v8/cast it to a string. However, doing a toString() on the buffer isn't any better. There's no good solution to this right now, as far as I know, especially if your end goal is to get a string and fool around with it. Its one of the things Ryan mentioned # nodeconf as an area where work needs to be done.
As for delimiter, you can choose whatever you want. A lot of binary protocols choose to include a fixed header, such that you can put things in a normal structure, which a lot of times includes a length. In this way, you slice apart a known header and get information about the rest of the data without having to iterate over the entire buffer. With a scheme like that, one can use a tool like:
node-buffer - https://github.com/substack/node-binary
node-ctype - https://github.com/rmustacc/node-ctype
As an aside, buffers can be accessed via array syntax, and they can also be sliced apart with .slice().
Lastly, check here: https://github.com/joyent/node/wiki/modules -- find a module that parses a simple tcp protocol and seems to do it well, and read some code.
You should use the new stream2 api. http://nodejs.org/api/stream.html
Here are some very useful examples: https://github.com/substack/stream-handbook
https://github.com/lvgithub/stick