Anyone have a link to a technical discussion of anything akin to the Facebook news feed system? - facebook

I'm looking for a presentation, PDF, blog post, or whitepaper discussing the technical details of how to filter down and display massive amounts of information for individual users in an intelligent (possibly machine learning) kind of way. I've had coworkers hear presentations on the Facebook news feed but I can't find anything published anywhere that goes into the dirty details. Searches seem to just turn up the controversy of the system. Maybe I'm not searching for the right keywords...
#AlexCuse I'm trying to build something similar to Facebook's system. I have large amounts of data and I need to filter it down to something manageable to present to the user. I cannot use another website due to the scale of what I've got to work at. Also I just want a technical discussion of how to implement it, not examples of people who have an implementation.

Are you looking for something along the lines of distributed pub/sub with content based filtering? If so, you may want to look into Siena and some of the associated papers such as Design and Evaluation of a Wide-Area Event Notification Service

Related

REST HATEOAS - How does the client know link semantics?

Imagine I have a fully implemented REST API that offers HATEOAS as well.
Let's assume I browse the root and besides the self link two other links (e.g. one for /users and one for /orders) are returned. As far as I have heard, HATEOAS eliminates the need for out-of-band information. How should a client know what users means? Where are the semantics stored?
I know that is kind of a stupid question, but I really would like to know that.
Suppose you've just discovered Twitter and are using it for the very first time. In your Web browser you see a column of paragraphs with a bunch of links spread around the page. You know there's a way to do something with this, but you don't know specifically what actions are available. How do you figure out what they are?
Well, you look at the links and consider what their names mean. Some you recognize right away based on convention: As an experienced Web user, you have a pretty good idea what clicking on the "home", "search" and "sign out" links is meant to accomplish.
But other links have names you don't recognize. What does "retweet" do? What does that little star icon do?
There are basically two ways you, or anyone, will figure this out:
Through experimentation, which is to say, clicking on the links and seeing what happens, then deriving a meaning for each link from the results.
Through some source of out-of-band information, such as the online help, a tutorial found through a Google search or a friend sitting next to you explaining how the site works.
It's the same with REST APIs. (Recall that REST is intended to model the way the Web enables interaction with humans.)
Although in principle computers (or API-client developers) could deduce the semantics of link relations through experimentation, obviously this isn't practical. That leaves
Convention, based on for instance the IANA 's list of standardized link relations and their meanings.
Out-of-band information, such as API documentation.
There is nothing inconsistent in the notion of REST requiring client developers to rely on something beyond the API itself to understand the meaning of link relations. This is standard practice for humans using websites, and humans using websites is what REST models.
What REST accomplishes is removing the need for out-of-band information regarding the mechanics of interacting with the API. Going back to the Twitter example, you probably had to have somebody explain to you at some point what, exactly, the "retweet" link does. But you didn't have to know the specific URL to type in to make the retweet happen, or the ID number of the tweet you wanted to act on, or even the fact that tweets have unique IDs. The Web's design meant all this complexity was taken care of for you once you figured out which link you wanted to click.
And so it is with REST APIs. It's true that in most cases, the computer or programmer will just need to be told what each link relation means. But once they have that information, they can navigate through the entire API without needing to know anything else about the details of how it's all put together.
REST doesn't eliminate the need for out-of-band information. You still have to document your media-types. REST eliminates the need for out-of-band information in the client interaction with the API underlying protocol.
The semantics are documented by the media-type. Your API root is a resource of a media-type, let's say something like application/vnd.mycompany.dashboard.v1+json, and the documentation for that media type would explain that the link relation users leads to a collection of application/vnd.mycompany.user.v1+json related to the currently authenticated user, and orders leads to a collection of application/vnd.mycompany.order.v1+json.
The library analogy works here. When you enter a library after a book, you know how to read a book, you know how to walk to a bookshelf and pick up the book, and you know how to ask the librarian for directions. Each library may have a different layout and bookshelves may be organized differently, but as long as you know what you're looking for and you and the librarian speak the same language, you can find it. However, it's too much to expect the librarian to teach you what a book is.

Existing app that extracts meaningful data from old e-mails?

I was wondering if there is an application, and if not if it's worth writing one, that can gather meaningful data from old e-mails. I'm thinking things like:
Instructions (that could become "5 steps to..." posts)
Definitions
etc
Any idea? Suggestions? etc?
Well, I can offer the same solution as I did to this post, that is software like TexLexan or Alchemy API that can find keywords and other summary information. There is also a good list of open source and commercial solutions on this page. Definitely easier to see if one of those works then writing your own.

Can WordPress handle these functionalities?

I'm a front-end designer/developer whose weapon of choice for the back-end is WordPress. Up to this point all of my projects involving WordPress were fairly basic and it has handled everything beautifully. I just landed a new client that wants some extra functionality built into his next project and I'm hoping some of you WordPress wizards can give me some good advice while I'm putting together the quote.
I'm trying to limit the need for any subcontracting for the back-end functionality, so my question is whether or not WordPress can handle the following (via plugins or light custom manipulation):
The idea behind the site is to be a community calendar based on location that Health Care providers can log in and post their events to, as well as participate in discussions, blogs and all the other WordPress goodness. The specific functionalities that I'm unsure of the best way to accomplish are:
Full featured calendar that members with access can add their own events to - must be searchable by date/type of event/location etc
Event generator module for members that integrates with calendar - includes upload field for images and forms for details event info
Interactive map to filter both of the above by location (I'm assuming this will need to be flash, but I'd rather find another solution if possible)
I know there are other solutions out there that may be more suited to this than WordPress (Drupal, custom build, etc) but if it's at all possible to tackle this as a one man show then I'm going to charge it head-on!
Stack Overflowers and fellow WordPress fans...your insight would be much appreciated. Thanks in advance for your time.
This graph grants your experience with your weapon of choice, but the results are still clear. You can still tackle this as a 1 man show, it will just take a bit of a learning curve to conquer the fundamentals of a CMS more suited to the task at hand. I'm sure plenty of WordPress affecionados will come along and strangle my reputation, but I've worked with both and have found that in terms of flexibility, WordPress is not king, and for the custom coding you are going to have to do (hope you have some PHP?), I feel that you will find it easier to integrate with another platform. This task will be difficult if not impossible to accomplish without writing code, even if there is a set of plugins that appear on the face to match your needs perfectly.
But anyway, since you probably don't really care that much about my opinion, for WordPress, your plugin options look like..
Calendar - Events Calendar
http://wordpress.org/extend/plugins/events-calendar/
The screenshots don't look terribly promising though.
Most plugins I have found are geared toward being administered from the admin panel, it may be difficult to provide a user interface to such plugins, and it does not look like the event calendar is an exception. An experienced developer should be able to hook into the event publishing code with relative ease, but it could be a frustrating experience for the inexperienced.
For interactive maps, the Google Maps API is very feature rich, and you should be able to adapt it to your suit your mapping needs, regardless of platform.
If you want all of your providers to have their own blog, etc, what was once the WordPress MU plugin, but is now core-bundled WordPress MS (multisite) is what you need.
This again may also prove rigid, and you may encounter difficulty trying to bend the iron of WordPress enabling all your multisite users to be able to post to a common community site. I've only built 2 platforms with MU, so I'm not positive about this.
To unapologetically reiterate my first point, what would be light custom code may turn impossibly frustrating using WordPress.
I like WordPress, and choose it often for my clients. I have never extended it to suit a larger project.
If you do decide to use it, I look forward to hopefully helping you with any questions you may have along the way, feel free to ask.

How to integrate vBulletin features into an external site

I have a web site I'm building and the client wants to have features from vBulletin (blog, forums) integrated into the site. Its not enough to simply add the sites skin to vBulletin. Is there a way to do this?
I would expect there to be documentation on how, if it is possible, to do such a thing but haven't been able to find anything.
I'd rather not connect and query the vBulletin database directly.
There is no proper API for this yet, so you'd either have to rely on things like RSS, or query the database directly. RSS won't get you old data, nor any forum structures, etc. just basics of new data.
After much research (see: cursing) I've found that external.php and blog_external.php do what I want though not quite as elegantly as I would like.
So if you want to incorporate forum threads into your web page then external.php is what you need. It appears to be a bit more customizable in that you can have it output in JavaScript, XML, RSS, and RSS Enclosure (podcasting).
If you want to incorporate blog posts you appear to be limited to RSS only. Like I said, less than ideal but at least its something.
There is more information here: http://www.vbulletin.com/docs/html/vboptions_group_external

Is tagging organizationally superior to discrete subforums?

I am interested in choosing a good structure for an online message board-type application. I will use SO as an example, as I think it's an example that we are all familiar with, but my question is more general; it is about how to achieve the right balance between organization and flexibility in online message boards.
The questions page is a load of random stuff. It moves quickly (some might say, too quickly) and contains a huge number of questions that I'm not interested in.
The idea, I imagine, is that we can use tags to find questions that we're interested in. However, I'm not sure that this works: you can't use tags negatively. I'm not interested in PHP or perl or web development. I want to exclude such posts. But with the tags, I can't.
Although discrete subforums are in a sense less flexible, as they generally force you to pick a category even if a question might fit into two (if SO had, say, areas for "Web Development", "Games development", "Computer Science", "Systems Programming", "Databases", etc. then sure, some people might want to post about developing of web-based games, for example) is it worth sacrificing some of that flexibility in order to make it easier to find the content that you are interested in, and hide the content that you are not interested in?
Is there any way with a pure tagging system to achieve the greater ease of use that subforums provide?
The real problem with subforums comes when you guess wrong about which topics have enough interest to get their own subforums. While some topics end up with their own vibrant subcommunities others end up as empty ghettos, with little activity or feeling of community. Topics that might flourish as occasional subjects in a larger forum end up fragmented among many subforums, none of which has the critical mass of people necessary to have an active, vibrant community.
Though I think that tagging is supperior to grouping, people tend to think hierarchically.
In general it depends on the target group for the forum.
Maybe you can go with a mixture: use tagging and later use tag groups to order to posts. Delicious uses this, for example, and I find it rather helpful.
If you're worried about the divide between specific forums and open tag-based systems, like Stack Overflow, consider making a query system that allows you to do a bit more complex queries than just the AND operator, like here on Stack Overflow.
I cannot make a query here that will give me all questions in .NET, SQL or C#, combined, and that is the biggest irritation I have with the tags. With such a query system, you can create virtual forums at least.
Other than that, I don't really have a good opinion. I like both, and I haven't yet decided which one is best.
The idea, I imagine, is that we can use tags to find questions that we're interested in. However, I'm not sure that this works: you can't use tags negatively. I'm not interested in PHP or perl or web development. I want to exclude such posts. But with the tags, I can't.
While it's currently the case that you can't use tags to hide content, it shouldn't be impossible. Using SO as an example again, there's no reason that a system similar to the ignore function on a forum couldn't be made for the tag system. By adding a right-click context menu or a small "X" link somewhere in the tag display, tags could be marked as ignored. This would also allow the current tag feature to function; Seeing everything (minus your ignore list), or clicking a tag to see only questions with that tag.
Ignored tags could be managed in your profile if you should later develop an interest in PHP or INTERCAL that you lacked before.
The real question is that of performance. In my head it's as simple as replacing a SELECT [stuff] WHERE Tag = 'buffer-overflow' with SELECT [stuff] WHERE Tag NOT IN ('php','offtopic','funny-hat-friday') but I've not put together any DB backed sites that get absolutely pounded on by thousands people.