I need to make a music recommendation system for a music platform using machine learning.
The platform has:
40+ million songs
4.5+ million albums
2.5+ million artists
1648 genres
600k users
The relations between the above objects are:
1-song - many-genres
1-song - many-artists
1-song - 1-album
1-album - many-songs
1-album - many-artists
1-artist - many-albums
1-artist - many-songs
And i got the users activity(listened songs) and favorites(song, artist, albums)
Amazon ML doesn't seem to support collaborative filtering, and now i'm looking through Google Cloud ML.
One problem is the size of the data. Basically every song has 1+ genres and 1+ artists which are categorical attributes. Amazon ML support ~100 categories(at a glance i have 2.5m if the artists are regarded as categories). Looking through google's machine learning pages i found only simple examples so i don't really know where to start.
Being a beginner in the machine learning landscape i wonder if the problem is the way i see(try to solve) these recommendations, or even if ML is the way to go.
Google CloudML Engine is a hosted solution for running TensorFlow programs. TensorFlow is a machine learning framework, designed with scale in mind. So as long as you can write a Distributed TensorFlow program, you can run it on CloudML Engine, which should allow you to scale quite well. (I will note, there is a learning curve both to TensorFlow and Machine Learning in general, but you'll definitely want an ML-based solution for recommendations).
A quick Google search reveals multiple helpful materials for building a recommendation system using TensorFlow (caveat: I haven't vetted any of these):
This coursera class on machine learning.
A github repository of various recommendation algorithms in TensorFlow
A meetup in Singapore on the topic (comments have links to videos, etc.)
A YouTube video
Another YouTube video
Related
I'm learning about Amazon web services for mobile and I do have a project in mind. I don't think I plan on creating machine learning data models from scratch however I did notice there is a AI/machine learning service within AWS. I plan on using CoreML for iOS. I'm currently learning from the free tier and I wonder how much will truly be offered in this option. I'd like to know if a T2 instance is suitable enough to work with TensorFlow Mobile or TensorFlow Lite?
If you are using TensorFlow Mobile or TensorFlow Lite, still need to know what kind of process you are going to perform.
If you are going to build a model using T2 instances then I recommend going with P2 instances.
If you are going to run a mode using T2 instances then you can try on t2.2xlarge
But it won't be smooth and that much accurate as compare to GPU performance
But if your use-case is not that much deep n critical then this will definitely help you with performance and cost saving
Good morning,
currently I'm exploring my options for building an internal platform for the company I work for. Our team is responsible for the company's data warehouse and reporting.
As we evolve, we'll be developing an intranet to answer some of the company's necessities and, for some time now, I'm considering scala (and PlayFramework) as the way to go.
This will also envolve a lot of machine learning to cluster clients, predict sales evolution, and so on. This is when I've started to think in Spark ML and came across PredictionIO.
As we are shifting our skills towards data science, what will benefit and teach us/company most:
build everything on top of Play and Spark and have both the plataform and machine learning on the same project
using Play and PredictionIO where most of the stuff is already prepared
I'm not trying to open a question opinion based, rather then, learn from your experience / architectures / solutions.
Thank you
Both are good options: 1. use PredictionIO if you are new to ML, easy to start but it will limit you in a long run, 2. use spark if you have confidence in your data science and data engineering team, spark has excellent and easy to use api along with extensive ML library, saying that in order to put things into production, you will require some distributed spark knowledge - experience and it is tricky at times to make it efficient and reliable.
Here are options:
spark databricks cloud expensive but easy to use spark, no data engineering
PredictionIO if you certain that their ML can solve all your business cases
spark in google dataproc, easy managed cluster for 60% less than aws, still some engineering required
In summary: PredictionIO for a quick fix, and spark for long term data - science / engineering development. You can start with databricks to minimise expertise overheads and move to dataproc as you go along to minimise costs
PredictionIO uses Spark's MLLib for the majority of their engine templates.
I'm not sure why you're separating the two?
PredictionIO is as flexible as Spark is, and can alternatively use other libraries such as deeplearning4j & H2O to name a few.
I have been tasked with developing a recommender system for a video app and am relatively new to data science.
I was wondering whether, given a short time scale of about a month, it would be wiser to turn to a software as a service recommender engine like Recombee or to build the recommender algorithms from scratch using open source software like Apache Spark?
My main hesitation with the first option is that there might not be as much freedom using a SAAS. As such, the recommender system might not be as accurate as building from scratch?
However, I am concerned about the feasibility of creating a recommender system from scratch, especially given my lack of experience. Could I create something within a month that is as accurate and as scalable as using a SAAS?
I m trying to understand better the two frameworks therfore i m trying to figure out the similarities and differences between FIteagle framework and OpenIot because both of the frameworks includes the same aims, the first one provides a testbeds environments which provide different resources to manage and communicate with and the second one provides the possiblity to connect to different sensors within a database cloud and it provide the ability to communicate with the sensors and to aply some IoT services on it. Does anyone has an idea about the two frameworks ?
Not being familiar with any of the above frameworks, I would say that eventually all IoT frameworks will focus on virtual markets in order to deliver industry-specific services. Consider transportation and smart grids - those are completely separate industries. for example, in transportation - geo analytics is much more important than in smart grids where meters tend to have fixed locations.
For those who are still interesting and making research in this area and looking for a detailed comparative and understanding of the two frameworks, I published a paper in this matter which contain a specific and a complete understanding of both frameworks. Since the paper is not uploaded yet to the internet Please get in touch with me in case you want to read it.
I will provide a link here as soon as I upload it.
I'd like to play around with building a recommendations system, and by that I mean an algorithm that looks at preferences and/or reviews posted by a user and then makes recommendations for them, similar to what netflix or amazon use.
What are some good resources for learning how to write something like this? Where should I start?
Check out the Wikipedia page on the Netflix Prize and its discussion forum. Also, the somewhat related 2009 GitHub Contest is a good source for full source code on a number of different recommendation engines. And obviously there's also the Wikipedia page on the topic itself, which has some decent links.
If you start writing your own, you'll want to use a corpus. I'd actually recommend using the Netflix Prize's data set. Just carve the data set into two pieces. Train on the first piece and score your algorithm on the second piece.
Addenda: A somewhat related and scary application of this sort of thing is predicting demographic information: a user's gender, age, household income, IQ, sexual orientation, etc. You could probably do most of these attributes with the Netflix Prize dataset with a fairly high degree of accuracy. Fortunately everyone in that dataset is just a number.
Take a look at pysuggest a Python library that implements a variety of recommendation algorithms for collaborative filtering (which is used by Amazon.com).