According to http://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_QL
queries can use regular expressions on both the values and the keys. While I have no trouble using regex on the values, I'm having a problem with the keys.
The example on the wiki referenced above says (among other examples):
/* finds addr:* tags with value exactly "Foo" */
node[~"^addr:.*$"~"^Foo$"];
So, that's an example of using regex on the keys and the values.
What I am interested in is the name key. Specifically the name:en key. There are a couple problems with searching by name. Not all names are in English, and for those nodes/way/relations whose names are not in English, there is no guarantee there will be a name:en tag with an English version of the name.
In general, there is no way to know in advance if the name will be in English or that there is a name:en tag. If you only ask for name or name:en, you run the risk of finding no hit. (Of course, searching for both is no guarantee of success, either.)
I have a case where I know name fails, but name:en succeeds. That is my test case. I can query the overpass-api.de/api/interpreter using this:
[out:json][timeout:25][bbox:33.465530,36.156006,33.608615,36.574516];
(
node[name~"duma",i][place];
way[name~"duma",i][place];
>;
relation[name~"duma",i][place];
node["name:en"~"duma",i][place];
way["name:en"~"duma",i][place];
>;relation["name:en"~"duma",i][place];
);
out center;
see it on overpass
and it works fine ("duma" is not found through name, but it is found with name:en), but I find it lengthy and somewhat repetitive.
I would like to use a regular expression involving the name and name:en tags, but either the server does not understand the query or I simply am using an incorrect regex.
Using the example shown in the wiki: node[~"^addr:.*$"~"^Foo$"]
I have tried:
[~"name|name:en"~"duma",i]
[~"name.*"~"duma",i]
[~"^name.*$"~"duma",i]
and several others. I even mimicked the example with [~"^name:.*"~"duma",i] just to see if anything would be returned.
Does overpass-api.de recognize regular expressions on the keys or do I just have the regex wrong? I don't get an error from overpass-api.de, just the coordinates of the bbox and an empty result. It's usually very strict about reacting to a poortly formatted query. Thanks in advance.
That's really a bug in the Overpass API implementation concerning case-insensitive key regex matching, see this Github ticket for details.
For the time being, you can already test the patch on the development box:
http://overpass-turbo.eu/s/b1l
BTW: If you don't need case-insensitive regexp matching, this should already work on overpass-api.de as of today.
Related
I am working on a large project in Xcode. I'm wanting to search, using the Find Navigator (See Below), for all arrays regardless of their name. I only care about any array that has this format, someArray[index].
Some Examples That Should Match
people[12]
section[0].rows[0]
Should Not Match
people[index]
section[section].row[row]
The regex should only return arrays, it should not return any dictionaries or other types that are not a subscripted array.
Why am I doing this? Well, it appears there have been some issues within our app where devs have not properly handled index out of bounds errors or nil values. There are far too many arrays for me to manually go through line by line to find them, so this is the best option I've come up with and it may not even be possible. If anyone has other recommendations, please feel free to share.
You can create a regex to match any word followed by another word with optional period enclosed by brackets. Something like:
\w+\[\w+(\.\w+)?\]
For more info about the regex above you can check this link
For numbers only use \d+ instead:
\w+\[\d+\]
For more info about the regex above you can check this link
I don't need typedef's exactly. I need aliases (for a shell language). But the hack of looking up an identifier and returning a different token type is what I need to make the grammar work. I don't necessarily need it to be done in the lexer, although that would seem cleanest to me (or in a phase between the lexer and parser).
Here is (a fragment of) the closest I can seem to come to a solution given what I know of antlr4, but it requires a whole level of non-terminals for each keyword token. Note, that per Antlr4 Capitalized words or tokens, lower case words are non-terminals.
aliasstmt: alias ident ident; // rule that makes aliases
ifstmt: if expression then statement; // sample rule with two keywords
// non-terminals converting aliases into keywords
alias: Alias // normal token for keyword
// hack, LookupAlias is map, I need.
| { LookupAlias(_input.LT(1).getText()).equals("alias") }? Ident
;
if : If
| { LookupAlias(_input.LT(1).getText()).equals("if") }? Ident
;
then : Then
| { LookupAlias(_input.LT(1).getText()).equals("then") }? Ident
;
// Non-terminal going the other way, converting keywords to identifiers when needed
ident : Ident
| Alias
| If
| Then
;
Now, I suppose, I could get rid of the Tokens for the keywords and do it all in the parser for this example. It wouldn't completely work in the language I'm parsing because a significant number of the keywords have "normal" spellings like "Set-Alias" or "-Name" which are not legal identifiers (and "Set - Alias" or "Set -Alias" is not the same as "Set-Alias", uggh).
However, I want to LookupAlias() function to be it's own Java class not something just embedded in the parser. I have other times I need to us it that aren't part of parsing and those times need to have then coordinated. How to do that is a separate question I will ask.
(Caveat... maybe aliases can be used in a shell in places I don’t know about, so this is based on my understanding)
In a shell, an alias is essentially an identifier that is expanded when it’s encountered. It’s only expected where a command could occur, and since you can’t know all the command in the path, your grammar would likely have an IDENTIFIER token (or the like) at that location in the parser rule.
You’d then check it against a list of built-in commands, commands in your PATH, and aliases (I’m not sure of the precedence, TBH).
So, you’d need to keep a symbol table to look up the alias resolution. I think post-resolution is where things will get “tricky”. IIRC, aliases don’t have to be syntactically complete, you you couldn’t really expect to pre-parse them (they possibly won’t parse correctly). Also, they are pretty much “injected” into the input stream. In this way they’re much more like pre-processor macros. I don’t see much way around detecting them, building an expanded input stream and lexing/parsing it.
I suppose that you could write a custom TokenStream, that detected aliases and responded to getNextToken() (and methods to get the token at a particular index, etc.). That would allow aliases anywhere in the token stream, which could get weird, and it would be the devil, probably, to provide useful error messages. (I guess you’d just have to point them at the alias itself). This approach would supply the alias definition tokens in place of the alias as the parser asked for the next token. I don’t see a way that you’ll use actions/predicates to change ANTLRs mind about what token it just saw :).
I suspect playing with existing shells a bit, creating invalid alias substitutions into the command line, and observing the error messages, might give insight into how other shells handle it. My impression, is that the shell preprocess the input and substitutes things like aliases and ENV variables, etc. and then re-parses the result the result for execution.
I’m pretty sure trying to modify the tokenStream as the parser is already processing it, is either no doable, or the path to madness.
I am trying to improve my Sphinx configuration and I have a trouble with words ending with apostrophes.
For example, for Surfin' USA result, searching with "Surfin USA" returns match but "Surfing USA" doesn't return anything. How can I set Sphinx to return result for such situation?
Hmm, thats an interesting one. Not sure sphinx can automatically deal with this, because it has no way of knowing what the Apostrophe is meant to represent. I suppose there are cases where it could be multiple things.
The only way I can think would be to list them in exceptions, you can build a list of all words want to support
Surfin' > Surfing
Have to use exceptions to be able to use the apostrophe
You might want to add
Surfin > Surfing
too, so can search without the apostrophe too.
I'd like to check hardcoded values in (a lot of) Smartforms and SAPScript forms.
I have found a way to read the source code of both of these, but it seems that i will have to go through a lot of parsing before I get anything reliable.
I've come across function module GET_LITERAL but that doesn't seem to help me much since i have to specify the offset of the value, if i got right what the function is doing in the first place.
I also found RS_LITERAL_LIST but that also doesn't do what i expect.
I also tried searching for reports and methods, but haven't found anything that seemed to help.
A backup plan would be to get some good parsing tool, so do you know of anything like that.
Anyway, any hints would be helpful and appreciated.
[EDIT]
Forgot to mention, the version of my system is 4.6C
If you have a fairly recent version of ABAP, you can use a regex.
Follow the pattern of this example, but use your source as the text and create your own regex. Have it look for any single quotes on the end of a word separated by spaces or any integers with spaces on either side. That's just a start, you might need to work on a better pattern.
String functions count, find, and match
I've translated Double-Metaphone into ActionScript3 and I want to test it (obviously) before I release the source to ... um ... the open.
I'm looking for a long list of names with the primary and secondary codes. Google does not find anything except one list with pairs of names (presumably they should match).
Thanks
You could find someone else's double metaphone implementation, run it on the same long list of words, and compare the results to your own.
For long lists of words, I like infochimps. They have lots of word lists, like this one of 350,000 english words or this one of place names, and many more.
Here are implementations you can compare your results against. Here is an online example, except that it tests only one word at a time - I guess you'll have to download and run one of the scripts to test a large list of words.
For each word, two codes will be returned; you'll probably want to test that both codes returned match the ones returned of another implementation. You probably know that the reference implementation is here with full source code here, but including the links anyway for others' benefit.