How would you go about writing a Parser similar to Facebook Graph Search

How would you go about writing a Parser similar to Facebook Graph Search - facebook

I've read quite a few articles giving a bit of background information on how Facebook implemented their Graph Search. All of which seem to just glance over the actual implementation details of the parser they are using.
Such as https://www.facebook.com/notes/facebook-engineering/under-the-hood-building-graph-search-beta/10151240856103920
From that page:
We combined various parsing techniques to build a substring parser:
suppose a user inputs, say, "friends New York" and that we have
defined a comprehensive set of all the potential page titles our
system can handle. Our parser could then generate exactly the Graph
Search titles that contain the user's input, including things like
"friends who live in New York" and "friends who have visited New
York." If we could find a way to appropriately rank those suggested
titles for the Graph Search typeahead, we would have a good start.
I'm really interested in learning about the methods one would use to tackle this problem. What Algorithm / Techniques would be used to write such a system ?
Any links would be much appreciated too.

I was thinking about implementing something similar.. wanted to ask Q here on SO and found that this is already asked..
Here is what I have been thinking to start with -
Assume facebook search engine "knows" about the underlying data store (a complex graph). So the search engine understands key words like "Friends", "Relative" and other such relationships and does not treat them like a trivial word in english language.
In such case, a good idea could be to parse the user input (using client side javascript) to a JSON and send it over to the search engine .. a couple of benefits .. the parsing can be done on client side, save network bandwidth by not sending unwanted data, server side handling for the parsed input as JSON is way better..etc
Lets call this JSON fbJSON.. because apart from being a JSON .. it adheres to a certain format.. You can create a spec for your format.. such that the JSON that is sent over to search engine necessarily contains some information.. this can make life a bit easier .. just like we have geoJSON etc..
Use an NLP program to parse the user input into fbJSON [I still have to think about this]
This is a broad approach upon which i m embarking upon.. the only bottleneck is point #4..because I do not have much experience with NLPs..

Related

Need feedbck on the quality of REST URL

For getting the latest valid address (of the logged in user), how RESTful is the following URL?
GET /addresses/valid/latest
Probably
GET /addresses?valid=true&limit=1
is the best, but it should then return a list. And, I'd like to return an object rather then a list.
Any other suggestions?

Your url structure doesn't have much to do with how RESTful something is.
So lets assume which one is the 'best'. Also a bit hard to say, pretty subjective.
I would generally avoid a pattern like /addresses/valid/latest. This kinda suggest that there is a 'latest resource' in the 'valid collection', in the 'addresses collection'.
So I like your other suggestion a bit better, because it suggests that you're using an 'addresses' collection, filtering by valid items and only showing 1.
If you don't want all kinds of parameters, I would be more inclined to find a url pattern that's not literally 'addresses, but only the valid, but only the latest', but think about what the purpose is of the endpoint. Maybe something that's easier to remember like /fresh-address =)

how RESTful is the following URL?
Any identifier that satisfies the production rules described by RFC 3986 is RESTful.
General purpose components are not supposed to derive semantics from identifiers, they are opaque. Which means that the server is free to encode information into those identifiers at its own discretion.
Consider Google search: does your browser care what URI is used as the target of the search form? Does your browser care about the href provided by Google with each search result? In both cases, the browser just does what it is told, which is to say it creates an HTTP request based on the representation of application state that was provided by the server.
URI are in the same broad category as variable names in a programming language - the machines don't care so long as the spellings are consistent with some simple constraints. People care, so there are some benefits to having a locally consistent and logical scheme.
But there are contexts in which easily guessed URI are not what you want. See Mark Seemann 2013.
Since the semantic content of the URI is reserved for use by the server only, it follows that the server can choose to encode that information into path segments or the query part. Or both.
Spellings that can be described by a URI Template can be very powerful. The most familiar URI template is probably an HTML form using the GET method, which encodes key value pairs onto the query part of the URI; so you should think about whether that's a use case you want to support.

Can I use Swift on the back end of an API?

I am wanting to create a chess engine. I am most familiar with Swift, and super high-performance isn't all that important to me (otherwise I'd likely learn and write it in C++). I need my engine to take in a chess position in an FEN formatted string, which would look something like this: rnbqkbnr/pp1ppppp/8/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R b KQkq - 1 2. It would then process the position and spit out a move in algebraic notation, like Nxd4.
These specifics aren't all that important, however, as I can program all of this in Swift. What I am wondering is how one would create an API with swift to do this. That is, the url encoded FEN position is passed a parameter to the API like so: https://www.mywebsite.com/chessEngine?position=rnbqkbnr%2Fpp1ppppp%2F8%2F2p5%2F4P3%2F5N2%2FPPPP1PPP%2FRNBQKB1R%20b%20KQkq%20-%201%202
The Swift code would then process this position on the backend, and the response would be something like:
{status:"success",recommendedMove:"Nxd4",moveTime:"12.34"}
Is it even possible to have Swift code process on the back end? My API development experience is limited to taking parameters as url parameters, making an SQL query, and then echoing the query response as a JSON.
See also: https://chess.stackexchange.com/questions/26489/creating-chess-engine-machine-learning-vs-traditional-engine

Yes, it is possible. Although I haven't built a full site/API with Swift, I know that Vapor uses itself to host its website, and my (albeit limited) experience with it suggests that it would be a good pick. That said, you could also use Kitura or Perfect — try looking up a comparison between them.
Good luck!

REST - What is standard file format for RESTful API design?

I would like to have my design stored as file for version control.
Are there any standards or commonly used formats?
For example, I can write one file for structure definition:
User {
uid,
name
}
And another file for API definition:
GET /users/:uid => User
GET /users?name=:name => [User]
However, these are in my own preferences. Are there any commonly used formats for representing these?
I expect it to be something like UML, regardless of language, just focusing on API itself.

The notation you mention is quite close to what developers would expect to get as a design or specification, so that might be enough.
However, if your project will get certain scale you can try to use some notation that might be then used by tools to automate either code generation, testing or documentation.
In particular, Swagger is a quite common tool to use for this. If you write your specification following these standards you'll get documentation and even some code generation if you use that tool.
https://swagger.io/specification/

Chords in MIDI?

I'm looking for a way to represent chords in a MIDI file.
Note that I'm not looking to represent chord voicings. That can be trivially done with multiple note-on messages. But if I do that, then I have to do some sort of note-on to chord analysis every time I read the MIDI file back in, and that's a major nuisance especially since I already know the chord structures when I write the file.
Rather, I'm looking for something more akin to guitar tablature or fake books. That is, I want to record "C" or "Cm" or "I" or "I" or “iii7" at a particular point in time.
So my questions...
Is there a standard way to do this? (I'm not finding one, but I don't know the current spec thoroughly.)
Is there a non-standard way of doing this?
I'm considering using the "tag" facility of the lyric/display meta event. It appears as though I can invent {#chord=Cm} and that should be transparent to any reader, past, present, or future, who doesn't understand this usage. Am I reading the standard right? Would this be a reasonable, essentially private, non-standard extension?

The MIDI specification provides for values such as "note on" and "pitch value" (as seen here) which are only represented as integers.
Depending on the MIDI Type (there are 3), you should be able to save the chord values similarly to the way that you suggested. Karaoke files are created this way.
If you are using Windows, you could try something like Noteworthy Composer. The link also contains a suggestion for playback.

You are absolutely right, you can implement custom meta event and place such events before groups of NoteOn/NoteOff that represent a chord. I don't know what programming language you use, but for C# you can take a look at DryWetMIDI. It allows create custom meta events, read and write them. This article of the library docs shows how to do this.

POST/GET bindings in Racket

Is there a built-in way to get at POST/GET parameters in Racket? extract-binding and friends do what I want, but there's a dire note attached about potential security risks related to file uploads which concludes
Therefore, we recommend against their
use, but they are provided for
compatibility with old code.
The best I can figure is (and forgive me in advance)
(bytes->string/utf-8 (binding:form-value (bindings-assq (string->bytes/utf-8 "[field_name_here]") (request-bindings/raw req))))
but that seems unnecessarily complicated (and it seems like it would suffer from some of the same bugs documented in the Bindings section).
Is there a more-or-less standard, non-buggy way to get the value of a POST/GET-variable, given a field name and request? Or better yet, a way of getting back a collection of the POST/GET values as a list/hash/a-list? Barring either of those, is there a function that would do the same, but only for POST variables, ignoring GETs?

extract-binding is bad because it is case-insensitive, is very messy for inputs that return multiple times, doesn't have a way of dealing with file uploads, and automatically assumes everything is UTF-8, which isn't necessarily true. If you can accept those problems, feel free to use it.
The snippet you wrote works when the data is UTF-8 and when there is only one field return. You can define it is a function and avoid writing it many times.
In general, I recommend using formlets to deal with forms and their values.
Now your questions...
"Is there a more-or-less standard, non-buggy way to get the value of a POST/GET-variable, given a field name and request?"
What you have is the standard thing, although you wrongly assume that there is only one value. When there are multiple, you'll want to filter the bindings on the field name. Similarly, you don't need to turn the value into a string, you can leave it as bytes just fine.
"Or better yet, a way of getting back a collection of the POST/GET values as a list/hash/a-list?"
That's what request-bindings/raw does. It is a list of binding? objects. It doesn't make sense to turn it into a hash due to multiple value returns.
"Barring either of those, is there a function that would do the same, but only for POST variables, ignoring GETs?"
The Web server hides the difference between POSTs and GETs from you. You can inspect uri and raw post data to recover them, but you'd have to parse them yourself. I don't recommend it.
Jay

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse