Building a recommender system for videos from scratch vs. using a SAAS - recommendation-engine

I have been tasked with developing a recommender system for a video app and am relatively new to data science.
I was wondering whether, given a short time scale of about a month, it would be wiser to turn to a software as a service recommender engine like Recombee or to build the recommender algorithms from scratch using open source software like Apache Spark?
My main hesitation with the first option is that there might not be as much freedom using a SAAS. As such, the recommender system might not be as accurate as building from scratch?
However, I am concerned about the feasibility of creating a recommender system from scratch, especially given my lack of experience. Could I create something within a month that is as accurate and as scalable as using a SAAS?

Related

Is it better to have plugins loaded at runtime or direct code integration?

I'm in the process of making an app which I was hoping could have optional modules/plugins depending on what the user needs. Concretely, the host application would be lightweight (mostly a text/markdown editor) and I'd add the ability to use plugins. However, some plugins could be fairly heavy (for example a 3D viewer).
Would it be better to have plugins loaded at runtime at the cost of performance or to directly integrate those with the main code with an ability to turn them off at the cost of space? Ideally I'd want both high performance and low volume, but if I had to pick one I'd choose performance.
Feel free to suggest alternatives! I'm not too familiar with modular programming :)

Play Framework with Spark MLib vs PredictionIO

Good morning,
currently I'm exploring my options for building an internal platform for the company I work for. Our team is responsible for the company's data warehouse and reporting.
As we evolve, we'll be developing an intranet to answer some of the company's necessities and, for some time now, I'm considering scala (and PlayFramework) as the way to go.
This will also envolve a lot of machine learning to cluster clients, predict sales evolution, and so on. This is when I've started to think in Spark ML and came across PredictionIO.
As we are shifting our skills towards data science, what will benefit and teach us/company most:
build everything on top of Play and Spark and have both the plataform and machine learning on the same project
using Play and PredictionIO where most of the stuff is already prepared
I'm not trying to open a question opinion based, rather then, learn from your experience / architectures / solutions.
Thank you
Both are good options: 1. use PredictionIO if you are new to ML, easy to start but it will limit you in a long run, 2. use spark if you have confidence in your data science and data engineering team, spark has excellent and easy to use api along with extensive ML library, saying that in order to put things into production, you will require some distributed spark knowledge - experience and it is tricky at times to make it efficient and reliable.
Here are options:
spark databricks cloud expensive but easy to use spark, no data engineering
PredictionIO if you certain that their ML can solve all your business cases
spark in google dataproc, easy managed cluster for 60% less than aws, still some engineering required
In summary: PredictionIO for a quick fix, and spark for long term data - science / engineering development. You can start with databricks to minimise expertise overheads and move to dataproc as you go along to minimise costs
PredictionIO uses Spark's MLLib for the majority of their engine templates.
I'm not sure why you're separating the two?
PredictionIO is as flexible as Spark is, and can alternatively use other libraries such as deeplearning4j & H2O to name a few.

Agent Based Modeling in Modelica

Is it possible to simulate multi-agent systems in Modelica? I'm talking about a system such MASON written in Java. How easy or difficult it would be?
As I understand, Modelica is not a typical programming language, so would it be particularly helpful or will the basic design of modelica language throw any hindrance? And more importantly, how we're going to model "messaging" systems that's common in Agent-based modeling?
Modelica can simulate discrete event systems. Some libraries exist: ModelicaDEVS, ARENALib etc.
Maybe the syntax is not perfect yet for this "Messaging", but maybe the language will be improved further in this direction.
An advantage might be that real-time capable code can be created, so the agents could run in embedded systems even with hard real-time - only some of the other tools support this like Ptolemy II.
P.S. (added see first comment):
From the start Modelica was designed to create code which is capable to run in real-time. So you could take the unchanged modelica model of your agent connect IO to sensors and actuators and download it on real-time hardware (e.g. PowerPC). Your swarm of agents will then exactly fullfill the time behaviour you modeled and exist in real. Also you could have only one real agent in hardware (maybe this hardware is expensive) and simulate the interaction to all the other agents in real-time on a real-time simulator hardware using your unchanged models for that too.
This is one of the major reasons why Modelica's semantic is not that dynamic as e.g. Java. If you want to run your MASON agent on real hardware you are in trouble: you have to move to e.g. Safety Critical Java, which means that a lot of constructs of your code, but also of standard Java libraries must be rewritten or are not allowed at all. Without this you will have to live with the possibility that your agent will miss his mission and burn down the house ...

web applications, dynamic charts and simulation tools? (for MATLAB users)

We are a group of students of chemical engineering mostly proficient in MATLAB and Simulink but with almost no clue of web programming.
Our idea is to develop some online examples by using interactive graphics with dynamic effects and 2D/3D simulations. We know that MATLAB has some solutions but the compilers are not available for the student version. Furthermore, we want to promote the use of free open source alternatives (SciLab, Octave, NumPy)
Ideally, we would like to use a 4GL which includes a free library for numeric analysis and combine it with graphic user interface framework for web applications.
An good example will be Easy Java Simulations that generates java code and can be easily implemented online. However, we are looking for something that can be executed without java or another plugin (see google chart tool)
Although we are willing to learn (Python, Java), we would like to start with the easiest solution towards a painless transition for a chemical engineer ;)
We will really appreciate our recommendations and suggestions!
Your best shot is to buy the product Matlab Builder NE. You can use WebFigures to seamlessly create web applications from your Matlab application.

Real time system concept proof project

I'm taking an introductory course (3 months) about real time systems design, but any implementation.
I would like to build something that let me understand better what I'll learn in theory, but since I have never done any real time system I can't estimate how long will take any project. It would be a concept proof project, or something like that, given my available time and knowledge.
Please, could you give me some idea? Thank you in advance.
I programm in TSQL, Delphi and C#, but I'll not have any problem in learning another language.
Suggest you consider exploring the Real-Time Specification for Java (RTSJ). While it is not a traditional environment for constructing real-time software, it is an up-and-coming technology with a lot of interest. Even better, you can witness some of the ongoing debate about what matters and what doesn't in real-time systems.
Sun's JavaRTS is freely available for download, and has some interesting demonstrations available to show deterministic behavior, and show off their RT garbage collector.
In terms of a specific project, I suggest you start simple: 1) Build a work-generator that you can tune to consume a given amount of CPU time; 2) Put this into a framework that can produce a distribution of work-generator tasks (as threads, or as chunks of work executed in a thread) and a mechanism for logging the work produced; 3) Produce charts of the execution time, sojourn time, deadline, slack/overrun of these tasks versus their priority; 4) demonstrate that tasks running in the context of real-time threads (vice timesharing) behave differently.
Bonus points if you can measure the overhead in the scheduler by determining at what supplied load (total CPU time produced by your work generator tasks divided by wall-clock time) your tasks begin missing deadlines.
Try to think of real-time tasks that are time-critical, for instance video-playing, which fails if tasks are not finished (e.g. calculating the next frame) in time.
You can also think of some industrial solutions, but they are probably more difficult to study in your local environment.
You should definitely consider building your system using a hardware development board equipped with a small processor (ARM, PIC, AVR, any one will do). This really helped remove my fear of the low-level when I started developing. You'll have to use C or C++ though.
You will then have two alternatives : either go bare-metal, or use a real-time OS.
Going bare-metal, you can learn :
How to initalize your processor from scratch and most importantly how to use interrupts, which are the fastest way you have to respond to an externel event
How to implement lightweight threads with fast context switching, something every real-time OS implements
In order to ease this a bit, look for a dev kit which comes with lots of documentation and source code. I used Embedded Artists ARM boards and they give you a lot of material.
Going with the RT OS :
You'll fast-track your project, and will be able to learn how to fine-tune a RT OS
You may try your hand at an open-source OS, such as Linux or the BSDs, and learn a lot from the source code
Either choice is good, you will get a really cool hands-on project to show off and hopefully better understand your course material. Good luck!
As most realtime systems are still implemented in C or C++ it may be good to brush up your knowledge of these programming languages. Many realtime systems are also embedded systems, so you might want to play around with a cheap open source one like BeagleBoard (http://beagleboard.org/). This will also give you a chance to learn about cross compiling etc.