SaaS Recommendation Engine - saas

I'm looking for a recommendation engine for a media company.
this engine should deliver recommendations on website and native mobile-apps.
it should handle news articles, classifieds products, web-shop products, entries from listing portals.
following methods should be possible:
contextual recommendations (keywords matching)
trending articles/products in specific periods
collaborative filtering
behavioral recommendations (based on the userflow on all websites, where the engine is running. e.g. user reads many chevrolet-articles, then it should get recommendations from a classified portal with chevrolets)
it would also good, if this this recommendations are filterable on categories, dates, geo-data or something else data (provided in meta tags on the websites) e.g. "give me only recommendations with articles published since yesterday --> articles with meta tag <meta name="rec-date" content="2013-10-10" />"
Do anybody know some Recommendation Engine like this? I found only something like cxense.com... any alternatives?

Considering your requirements, Recombee (https://www.recombee.com) could be a good fit. It supports both the collaborative filtering and text-mining/content-based recommendation. It also has very good support for filtering through its filtering/boosting language, ReQL (https://docs.recombee.com/reql.html). You may freely define custom set of item properties to be used for both learning a filtering, and there are even native functions such as earth_distance for your geo-restrictions.

YooChoose SaaS recommendation engine: http://www.yoochoose.com/en/our-product/yoochoose-newsmedia/

There is the Google Prediction Api: https://developers.google.com/prediction/
It's a pretty flexible machine learning tool, where you could build what you need yourself.

Related

How to apply collaborative filtering on no-rating system like Twitter, Facebook

I'm studying Collaborative Filtering and want to apply to some social network like Twitter or Facebook. I tried with some demo provided by MovieLens and understood that user has to rate on some items which reflect the interesting, and the rating will be used as input for recommend algorithms. However with some social network which there are no rating feature like Twitter or Facebook, how can I apply these algorithms.
Someone worked on this area, please give me suggestions for that.
The keywords you should use in search are "implicit feedback". Luckily there are some good systems/approaches out there that allow you to work with such type of data.
Here is the one I consider the best https://github.com/benfred/implicit And what's even better this GitHub page provides you with links to the articles explaining the theory behind each of the approaches it uses. There are also a couple of tutorials that would help you to write your first recommender system in no time. And it's incredibly fast, took me 2 hours on quad-core PC to calculate recommendations for 600K users basing on 40M entries.
Instead of using explicit ratings. You can infer implicit ratings by defining your own weights for actions like:
Twitter: Reteweet=1, Save=2, Both=3
Facebook: Like=1, Share=2, Both=3
Using this method, you maintained a 1-3 rating system that can be fed into the collaborative-filtering algorithm.

how is adoption of the activity stream standard (activitystrea.ms)

I am new to look at the activity standard. When i search on google, I quickly find there has the http://activitystrea.ms/ and in the first page, it said: The Activity Streams format has already been adopted by BBC, Gnip, Google Buzz Gowalla, IBM, MySpace, Opera, Socialcast, Superfeedr, TypePad, Windows Live, YIID, and many others.
I am not quite sure if it is still live and any other activity standard that much more popular in industry?
macf
Over at Fashiolista we've opensourced our approach to building feed systems.
https://github.com/tschellenbach/Feedly
We also use the activity stream standard and we're quite happy with it. As far as I know there are no other standards which have become mainstream. I do think that most companies slightly deviate from the standards.
In addition have a look at this high scalability post were we explain some of the design decisions involved:
http://highscalability.com/blog/2013/10/28/design-decisions-for-scaling-your-high-traffic-feeds.html
This tutorial will help you setup a system like Pinterest's feed using Redis. It's quite easy to get started with.
To learn more about feed design I highly recommend reading some of the articles which we based Feedly on:
Yahoo Research Paper
Twitter 2013 Redis based, with fallback
Cassandra at Instagram
Etsy feed scaling
Facebook history
Django project, with good naming conventions. (But database only)
http://activitystrea.ms/specs/atom/1.0/ (actor, verb, object, target)
Quora post on best practises
Quora scaling a social network feed
Redis ruby example
FriendFeed approach
Thoonk setup
Twitter's Approach

OpenGraph or Schema.org? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Just wondering whether you guys out there are favouring the OpenGraph protocol following markup like:
<meta property="og:title" content="The Rock" />
<meta property="og:type" content="movie" />
<meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
Or the Schema.org protocol with
<div itemscope itemtype="http://schema.org/Product">
<span itemprop="name">Kenmore White 17" Microwave</span>
<img src="kenmore-microwave-17in.jpg" alt='Kenmore 17" Microwave' />
<div itemprop="aggregateRating"
itemscope itemprop="http://schema.org/AggregateRating">
Which one should I integrate as I think only 1 is necessary ? [actually can you only integrate one or ?]
Frankly, IMHO - I think OpenGraph is "less intrusive" to the total codebase - as it's easier to implement Partial Views [using ASP.NET MVC] whereas the Schema.org protocol requires [at least in my opinion] disruptive HTML add-ins across your code ?
Edit: Seems I ended up integrating both - not sure whether this is allowed but the documentation on Schema.org is unclear. Notably, this link doesn't provide much info
Q: How does schema.org relate to Facebook Open Graph?
Facebook Open Graph serves its purpose well, but it
doesn't provide the detailed
information search engines need to
improve the user experience. A single
web page may have many components, and
it may talk about more than one thing.
If search engines understand the
various components of a page, we can
improve our presentation of the data.
Even if you mark up your content using
the Facebook Open Graph protocol,
schema.org provides a mechanism for
providing more detail about particular
entities on the page. For example, a
page about a band could include any or
all of the following:
A list of albums
A price for each album
A list of songs for each album, along with a link to hear samples of each song
A list of upcoming shows Bios of the band members
So I assume that they are compatible together.
So, to start with a couple of cliche's and mangled metaphors - we're talking apples and oranges a bit comparing OG and Schema.org, and when it comes to this metadata it's horses for courses.
The right answer depends on your intent, in adding metadata to your page. What is it that you're hoping to gain? What is the win for you here? The different forms of metadata are for slightly different purposes.
Google has made it clear that it's moving away from a focus on microformats and onto a focus on Schema.org in order to build rich-data results for search. If you want to optimize for Google, Bing and other search engines add the Schema.org markup. It's the direction HTML5 has stepped in.
Facebook OG markup is to be added if what you want is to benefit from turning your content into a social object and enable its multi-point connectivity to the social graph that is the Facebook universe.
In my experience most people are looking to gain from both approaches - do as well as they can in search rankings and increase reach and distribution through social channels. So, IMHO, it's probably best to be as thorough as possible adding the Schema.org markup where it fits your content and use Open Graph metadata. They do slightly different, but complementary things.
We are talking about two separate concepts here: syntax and vocabulary.
The Open Graph Protocol and Schema.org are vocabularies. Other vocabularies are, for example, Dublin Core, FOAF, and SIOC.
These vocabularies are typically not coupled to a specific syntax. If you want to describe your content in HTML documents with such a vocabulary, you could use the syntaxes RDFa and/or Microdata.
Which one should I integrate as I think only 1 is necessary ? [actually can you only integrate one or ?]
Your first example uses Open Graph Protocol (= vocabulary) with RDFa (= syntax). Your second example uses Schema.org (= vocabulary) with Microdata (= syntax).
You can mix them up as you like. (You could use both vocabularies with both syntaxes on the same page. You could use both vocabularies with only one syntax. You could use only one vocabulary with both syntaxes, or with only one syntax. …). It totally depends on your specific use case.
What do you want to achieve? If you are interested in a specific 3rd party parsing your content, you should check their documentation. They typically support only certain vocabularies with certain syntaxes.
But if you want to mark up your content with semantic metadata without having a specific use case in mind, you could stick to one syntax and use whichever vocabularies are appropriate for your content. Personally, I’d choose RDFa (Lite). It is based on RDF, which works with other formats than HTML, too. It is a W3C Recommendation (Microdata is not). And most vocabularies you’ll find are defined in RDF(S). See my answer about the future of RDFa and Microdata.
All depends if you're trying to markup your website for a social world (facebook) or search engines. Both are recommended but if you only have time for one then prioritize the company's marketing focus. OGP is huge for facebook, but does not have an ounce of use in SEO. Seo is completely reliant on micro-data and is the way for proper html5 creation.
HTML5Doctor on Microdata
http://html5doctor.com/microdata/
Google talking about markup:
http://www.google.com/support/webmasters/bin/answer.py?answer=1211158
Bing talking about markup:
http://onlinehelp.microsoft.com/en-us/bing/hh207238.aspx
Update
For anyone finding this answer, a lot has changed since I first posted it.
Schema.org is widely used by all major search engines and then some but the markup is now preferred to JSON-LD. Great article from SEO Skeptic outlining the change made by Google.
Google Structured Data provides documentation in JSON-LD and is greatly encouraged although RDFa and microdata is still partial supported.
JSON-LD should be used in conjunction with any social channels you are trying to target OGP for Facebook, Twitter Cards for Twitter, etc
They can both be used safely together. Currently the two efforts use different syntaxes to encode data in HTML (W3C RDFa or Microdata), but there are active discussions at W3C towards eventual convergence of those designs. Or greater compatibility, at least. Whether there will also be convergence at the vocabulary level between Schema.org and OGP, or services that consume both, remains to be seen. But in the meantime they both add value and can be safely combined.
Google does favor schema, and open graph is better for web content that is related to social media. Your sample code looks good, but don't forget to include the prefix
<html prefix="og: http://ogp.me/ns#">
in the head of each page that has ogp.
You can check to make sure the ogp or schema works by using the rich snippet testing tool
http://www.google.com/webmasters/tools/richsnippets
In the case of Schema, you can check by using the SDTT: Structured Data Testing Tool
https://search.google.com/structured-data/testing-tool
Why not use json-ld for markup? I'm thinking of implementing json-ld based schema.org markup. That way it'll not be intrusive. My ghost blog uses it. Don't know if it's well supported by search engines yet. But all examples on schema.org now includes implementation for json-ld.
see here
http://schema.org/WebPage
And all my apps use twitter cards, fb opengraph tags and microformats tags like rel and structured schema.org metadata. And I find implementing schema.org metadata most instrusive. So replacing this last bit with json-ld and keeping code clean is nice. Too many tags and it's recommended to keep your html small ;)
RDFa og serves as uniform way to recognize content better by REST for consideration when embedding within containers not predicted at the time of creation. If the container is predetermined as search results, then schema.org microdata is well-understood by search bots. With og presentation is responsibility of container publisher & such quality freedom might improvise search ranking, while schema.org will improvise search result comprehensibility in the context of content creator's intent. The vocabularies usually are ignored when used with the competing semantic markup technique so best to use microdata with schema.org only and og with RDFa only. Both microdata and RDFa can coexist in same document.
rdfa(opengraph) and microdata(schema) cannot be used on same html page
"3) We’ll continue to support our existing rich snippets markup formats.
If you’ve already done markup on your pages using microformats or RDFa, we’ll continue to support it. One caveat to watch out for: while it’s OK to use the new schema.org markup or continue to use existing microformats or RDFa markup, you should avoid mixing the formats together on the same web page, as this can confuse our parsers."
SRC: http://googlewebmastercentral.blogspot.in/2011/06/introducing-schemaorg-search-engines.html

social features- chat, forums, online directories

We are building a content based portal. Along with the content, we want to provide some collaborative tools- i.e. chat, forums, online directories etc
We are hoping to leverage open-source software for this, as this isn't really a differentiator and will hopefully be faster/cheaper. I am looking at light integration between the content and these (common login, ability to easily reference content in chat/ forums etc) and am flexible on features being offered- as long as the broad functionality is achieved.
We have hosted on MS Azure- what should our considerations be towards identifying the right product?
Joomla! is one option. You want to ensure that the majority or all of the tools you are looking for are openly available no your chosen platform. It is hard to make a solid recommendation without much detail on the content, but you can check it out here:
http://www.joomla.org/about-joomla.html
It is free and open source, site says
Joomla is used all over the world to power Web sites of all shapes and sizes. For example:
Corporate Web sites or portals
Corporate intranets and extranets
Disclaimer: Have never used Joomla

Choosing a CMS: EPiServer vs Orchard vs SiteCore vs Umbraco

Increasingly, I have noticed the number of Content Management Systems in use. I have some familiarity with SiteCore. I have read some literature on Umbraco. I only just got wind of Orchard the other day. I have only heard positive feedback about EPiServer. I am soon to move into a role that uses it.
Do these differ vastly in features and price? What has led you to choose one (or several) over the others?
EDIT
I did a brief review of so-called free CMSs here: On Free Microsoft Compatible Content Management Systems
Reasons I ditched Orchard when developing a 50k page website:
The Orchard CMS import tool is simply too slow. It would only accept
small batches at a time. Initially, it took eight minutes to import
1000 records. So, working on that principle I expected that it could
take seven hours to import all the records. Unfortunately, I started
to receive performance issues as more records were inserted into the
database. I even started to reduce the batch size, which helped only
temporarily in the early stages. (See Saying no to Orchard)
I can only comment mainly on Sitecore and a bit on Umbraco from my knowledge of others using it:
Sitecore is an enterprise level web CMS with an "enterprise price tag." It's very extensible, has a lot of developer/community support, and is very developer friendly. The structure of content is based on a tree of nodes with parent-children relationships. Sitecore is well known in the WCM community as a leader in content management and is rated very well by companies sch as Forrester Research, etc.
Based on my previous research and conversations with friends, Umbraco is very similar to Sitecore. It has a lower price compared to Sitecore but its not a complete rip off. Umbraco is also built on ASP.NET like Sitecore.
Here's a three-part series on Sitecore vs. Umbraco from a developer.
Of the ones you mention above, I have only used Umbraco and Sitecore to build with and am certified in both. I like the way they allow me to build systems that really work well for my customers. They both have a feel that they simply give you building blocks to create your masterpiece instead of "modules" of functionality plugged in that give you a blog, forum, etc. They make it really easy to share content throughout the site and create really nice admin experiences.
Umbraco's community is really great. They both struggle a little on the documentation side IMO, but Umbraco's videos really help and the community is quick to help. Also, if you're talking cost then its free (Umbraco) vs. quite expensive (Sitecore).
But the reality is that each developer has their own taste and the style of CMS they like to work with. Ultimately, its the team that has to build the site that really matters most when it comes to how each CMS performs for the end user.
In addition to the links above, here are a couple blog posts that may help you get a feel for the different systems:
Orchard & Umbraco - Introduction (part 1 of 4) - Aaron Powell
Sitecore vs. Umbraco Terminology
Good luck!
I mostly work with EPiServer and Sitecore, and I can tell you the difference in short:
Sitecore has broader architecture and more powerfull UI. CMS is deeply configurable and highly extensible, it has clever publishing and caching system, powerful search and page editor. But it doesn't provide much out of box and UI is pretty old, slow and hard to learn. So this will be a long journey until you understand it good and make a good support of all its features for editors.
EPiServer is easy, friendly to users and developers. It provides an essential bunch of features out of box, has easy UI and page editor, good drag-and-drop experience, easy personalization. It is code-first, distributed with NuGet, provides dependency injection for its services, out of box MVC support. But it's not so extensible and configurable, has pure search (without expensive EPiFind module) and generally lower-featured comparing to Sitecore. So it's good for small/middle websites, but can be an obstacle in complex solutions.
Both have similar tree-item concept, rich documentation, pure public module system and hard UI customization. Both expensive and not open source.
As I know, Umbraco is pretty similar to EPiServer and Sitecore, but free and open source. Of course you get less features, more bugs, not much docs and no free support.
Orchard is really different comparing to other three CMS. It is module-based like Wordpress: you use standard or public modules and themes, instead of writing the whole website from scratch. You create your own themes and modules to customize the website and CMS. So entire CMS is highly extensible and provides a lot of free community modules. But in the same time you lose control and learning curve is much longer. Orchard is free and open-source, entirely MVC-based, UI and API are well done, but it can be hard for both developers and editors to understand it.
Wordpress vs Episerver:
http://tedgustaf.com/blog/2011/2/comparison-of-episerver-and-wordpress/
OK so the guy who wrote that is an Episerver consultant but it's interesting and balanced.
All the different web content management systems have different strengths. So which one is best for you depends a lot on what kind of sites you create, what kind of budget you have and what you think matters the most in a CMS.
For example, Orchard and SiteCore are VERY different systems.
I'm a bit biased as I work there, but I believe that Webnodes CMS have several important advantages over the systems you mention.
Keywords: Relations between content, actual classes for the different content types, custom LINQ provider for all data access, expose all content as an OData endpoint etc.
Microsoft used our CMS to demonstrate OData at Mix11. Video from Mix 11