Operational Transformation - Does Google Doc preserve intention - real-time

I have been learning about Operational Transformation. One classic example people always use is Google Doc who has chosen to use operational transformation to propagate changes to the clients.
However, one thing I am very confused about is whether Google Doc preserves user intention or not ? (OR in general, does OT preserves user intention) ?
One past StackOverflow post and one FAQ on OT explain that Google Doc/OT doesn't. (Particularly, in the case of the FAQ on OT, it explains that no OT model can preserve user intention while it can preserve operation intention)
So this gets me kind of confused about what is the difference between user intention and operation intention ? Also, if OT/Google Doc does not preserve intention, won't we have an inconsistency ? (Obviously, this is not the case in Google Doc.)
Some explanation on these two terms would be very helpful as I struggle to find any good examples.

Related

How to implement Associative Rules Analysis or Market Basket Analysis from scratch?

I tried to went through numerous articles trying to understand what should be my first step to incorporate associative analysis (may be Market Basket analysis) into my system. They all go deep into implementation of algorithm but no one talked about how to store data in the first place.
I will really appreciate if someone can give me some start pointers or article links that I can begin with.
The first thing I want to implement is to track user clicks and provide suggestions based on tracked data.
E.g. User clicked on link A and subsequently on link B and link C. I can track this activity with some metadata associated (user, user organization, user role etc.)
I do not want it to be limited only to links. In future, I want to add number of similar usecases into the system and want to make it smart. E.g. If user set specific values for fields A and B, most likely he/she will set value <bla> for field C.
My system may generate several thousand such data points in a day (E.g. user clicks, field selection etc.).
Below are my questions:
How should I store my data? Go SQL or No SQL (I briefly looked into Mongo DB and it looked promising)
What tool should I use to perform the associative analysis? Are there any open source tools I can use?
It depend. Does your data suitable for NoSql databases? To answer this question it's better to read CAP Theorem and it's case studies: https://en.wikipedia.org/wiki/CAP_theorem or http://robertgreiner.com/2014/06/cap-theorem-explained/
. Some time you want Consistency(depending to your data) and Availability => so that it's better to use Relational Databases like Mysql(Try to read case studies and analyse your data to pick the best tools)
There is large number of open source libraries, but in my opinion it's better to first read some concepts and algorithms. Try searching for Apriori,ECLAT, FP-GROWTH Algorithms and get concepts of them. then you can pick a tool or write the code your self. Some usefull tools(depending to your programming language):
Python: https://github.com/asaini/Apriori, https://github.com/enaeseth/python-fp-growth, https://github.com/enaeseth/python-fp-growth/blob/master/fp_growth.py
PHP: https://github.com/sigidhanafi/fp-growth-php
JAVA: https://github.com/goodinges/FP-Growth-Java, http://www.philippe-fournier-viger.com/spmf/
Also you can use Spark: https://spark.apache.org/docs/1.1.1/mllib-guide.html

Can I use Apache Mahout Taste for User Preferences matching?

I am trying to match objects based on predefined user preferences. A simple example would be finding best matching vechicle.
Lets say a user 'Tom' is offered a rented vehicle for travel based on his predefined preferences. In this case, the predefined user preferences will be -
** Pre-defined user preferences for Tom:
PreferredVehicle (Make='ANY', Type='3-wheeler/4-wheeler',
Category='Sedan/Hatchback', AC/Non-AC='AC')
** while the 10 available vehicles are -
Vechile1(Make='Toyota', Type='4-wheeler', Category='Hatchback', AC/Non-AC='AC')
Vechile2(Make='Tata', Type='3-wheeler', Category='Transport', AC/Non-AC='Non-AC')
Vechile3(Make='Honda', Type='4-wheeler', Category='Sedan', AC/Non-AC='AC')
;
;
and so on upto 'Vehicle10'
All I want to do is - choose a vehicle for Tom that best matches his preferences and also probably give him choices in order, i.e. best match first.
Questions I have :
Can this be done with Mahout Taste?
If yes, can someone please point me to some example code where I can start quickly?
A recommender may not be the best tool for the job here, for a few reasons. First, I don't expect that the best answers are all that personal in this domain. If I wanted a Ford Focus, the best alternative you have is likely about the same for most every user. Second, there is not much of a discovery problem here. I'm searching for a vehicle that meets certain needs; I don't particularly want or need to find new and unknown vehicles, like I would for music. Finally you don't have much data per user; I assume most users have never rented before, and very few have even 3+ rentals.
Can you throw this data at a recommender anyway? Sure, try Mahout Taste (I'm the author). If you have the book Mahout in Action it will walk you through it. Since it's non-rating data, I can also recommend the successor project, Myrrix (http://myrrix.com) as it will be easier to set up and run. You can at least evaluate the results to see if it's anywhere near useful.
Either way, your work will just be to make a CSV file of "userID,vehicleID" pairs from your data and feed it in. Then it will give you vehicle IDs as recommendations for any user ID.
But, I imagine you will do much better to analyze what people picked when the car wasn't available, and look at the difference, and learn which attributes they are most and least likely to be sacrificed, and learn to score the alternatives that way. This is entirely feasible since this data set is small, and because you have rich item attribute data.

SimpleDB: Guaranteed to see all item attributes if we see the item? (non-consistent read)

I've just discovered an assumption in my use of SimpleDB. I suspect it's safe but would like other opinions since the docs don't seem to cover it.
So say Process 1 stores an item with x attributes. When Process 2 tries to access said item (without consistent read) & finds it, is it guaranteed to have all the attributes stored by Process 1?
I'm excluding the possibility that another process could have changed the data.
I also know that Process 2 has no guarantee of seeing the item unless consistent read is used, I'm just talking about the point when it does eventually see it.
I guess the question is, once I can get an item & am not changing it anywhere else can I assume it has an ad-hoc fixed schema and access all my expected attributes without checking they actually exist?
I don't want to be in a situation where I need to keep requesting items until they have all the attributes I need to use them.
Thanks.
Although Amazon makes no such guarantees in the documentation the current implementation of their eventual consistency guarantees that you'll see all the properties stored by Process 1 or none of them.
See this thread over at the AWS forums and more specifically this answer by an Amazon employee confirming the behavior (emphasis mine).
I don't think we make that guarantee in the documentation, but the
current implementation treats each Put request as a bundle. It won't
split the request up and apply the operations piecemeal. You'll get
either step-1 responses or step-2 responses until eventual consistency
shakes out and leaves you with step-2 responses.
While this is undocumented behavior I suspect quite a few SimpleDB users are relying on it now and as such Amazon won't be likely to change it anytime soon, but that's ju my guess.

How do I adapt my recommendation engine to cold starts?

I am curious what are the methods / approaches to overcome the "cold start" problem where when a new user or an item enters the system, due to lack of info about this new entity, making recommendation is a problem.
I can think of doing some prediction based recommendation (like gender, nationality and so on).
You can cold start a recommendation system.
There are two type of recommendation systems; collaborative filtering and content-based. Content based systems use meta data about the things you are recommending. The question is then what meta data is important? The second approach is collaborative filtering which doesn't care about the meta data, it just uses what people did or said about an item to make a recommendation. With collaborative filtering you don't have to worry about what terms in the meta data are important. In fact you don't need any meta data to make the recommendation. The problem with collaborative filtering is that you need data. Before you have enough data you can use content-based recommendations. You can provide recommendations that are based on both methods, and at the beginning have 100% content-based, then as you get more data start to mix in collaborative filtering based.
That is the method I have used in the past.
Another common technique is to treat the content-based portion as a simple search problem. You just put in meta data as the text or body of your document then index your documents. You can do this with Lucene & Solr without writing any code.
If you want to know how basic collaborative filtering works, check out Chapter 2 of "Programming Collective Intelligence" by Toby Segaran
Maybe there are times you just shouldn't make a recommendation? "Insufficient data" should qualify as one of those times.
I just don't see how prediction recommendations based on "gender, nationality and so on" will amount to more than stereotyping.
IIRC, places such as Amazon built up their databases for a while before rolling out recommendations. It's not the kind of thing you want to get wrong; there are lots of stories out there about inappropriate recommendations based on insufficient data.
Working on this problem myself, but this paper from microsoft on Boltzmann machines looks worthwhile: http://research.microsoft.com/pubs/81783/gunawardana09__unified_approac_build_hybrid_recom_system.pdf
This has been asked several times before (naturally, I cannot find those questions now :/, but the general conclusion was it's better to avoid such recommendations. In various parts of the worls same names belong to different sexes, and so on ...
Recommendations based on "similar users liked..." clearly must wait. You can give out coupons or other incentives to survey respondents if you are absolutely committed to doing predictions based on user similarity.
There are two other ways to cold-start a recommendation engine.
Build a model yourself.
Get your suppliers to fill in key information to a skeleton model. (Also may require $ incentives.)
Lots of potential pitfalls in all of these, which are too common sense to mention.
As you might expect, there is no free lunch here. But think about it this way: recommendation engines are not a business plan. They merely enhance the business plan.
There are three things needed to address the Cold-Start Problem:
The data must have been profiled such that you have many different features (with product data the term used for 'feature' is often 'classification facets'). If you don't properly profile data as it comes in the door, your recommendation engine will stay 'cold' as it has nothing with which to classify recommendations.
MOST IMPORTANT: You need a user-feedback loop with which users can review the recommendations the personalization engine's suggestions. For example, Yes/No button for 'Was This Suggestion Helpful?' should queue a review of participants in one training dataset (i.e. the 'Recommend' training dataset) to another training dataset (i.e. DO NOT Recommend training dataset).
The model used for (Recommend/DO NOT Recommend) suggestions should never be considered to be a one-size-fits-all recommendation. In addition to classifying the product or service to suggest to a customer, how the firm classifies each specific customer matters too. If functioning properly, one should expect that customers with different features will get different suggestions for (Recommend/DO NOT Recommend) in a given situation. That would the 'personalization' part of personalization engines.

Essential techniques for pinpointing missing requirements?

An initial draft of requirements specification has been completed and now it is time to take stock of requirements, review the specification. Part of this process is to make sure that there are no sizeable gaps in the specification. Needless to say that the gaps lead to highly inaccurate estimates, inevitable scope creep later in the project and ultimately to a death march.
What are the good, efficient techniques for pinpointing missing and implicit requirements?
This question is about practical techiniques, not general advice, principles or guidelines.
Missing requirements is anything crucial for completeness of the product or service but not thought of or forgotten about,
Implicit requirements are something that users or customers naturally assume is going to be a standard part of the software without having to be explicitly asked for.
I am happy to re-visit accepted answer, as long as someone submits better, more comprehensive solution.
Continued, frequent, frank, and two-way communication with the customer strikes me as the main 'technique' as far as I'm concerned.
It depends.
It depends on whether you're being paid to deliver what you said you'd deliver or to deliver high quality software to the client.
If the former, simply eliminate ambiguity from the specifications and then build what you agreed to. Try to stay away from anything not measurable (like "fast", "cool", "snappy", etc...).
If the latter, what Galwegian said + time or simply cut everything not absolutely drop-dead critical and build that as quickly as you can. Production has a remarkable way of illuminating what you missed in Analysis.
evaluate the lifecycle of the elements of the model with respect to a generic/overall model such as
acquisition --> stewardship --> disposal
do you know where every entity comes from and how you're going to get it into your system?
do you know where every entity, once acquired, will reside, and for how long?
do you know what to do with each entity when it is no longer needed?
for a more fine-grained analysis of the lifecycle of the entities in the spec, make a CRUDE matrix for the major entities in the requirements; this is a matrix with the operations/applications as the rows and the entities as the columns. In each cell, put a C if the application Creates the entity, R for Reads, U for Updates, D for Deletes, or E for "Edits"; 'E' encompasses C,R,U, and D (most 'master table maintenance' apps will be Es). Then check each column for C,R,U, and D (or E); if one is missing (except E), figure out if it is needed. The rows and columns of the matrix can be rearranged (manually or using affinity analysis) to form cohesive groups of entities and applications which generally correspond to subsystems; this may assist with physical system distribution later.
It is also useful to add a "User" entity column to the CRUDE matrix and specify for each application (or feature or functional area or whatever you want to call the processing/behavioral aspects of the requirements) whether it takes Input from the user, produces Output for the user, or Interacts with the user (I use I, O, and N for this, and always make the User the first column). This helps identify where user-interfaces for data-entry and reports will be required.
the goal is to check the completeness of the specification; the techniques above are useful to check to see if the life-cycle of the entities are 'closed' with respect to the entities and applications identified
Here's how you find the missing requirements.
Break the requirements down into tiny little increments. Really small. Something that can be built in two weeks or less. You'll find a lot of gaps.
Prioritize those into what would be best to have first, what's next down to what doesn't really matter very much. You'll find that some of the gap-fillers didn't matter. You'll also find that some of the original "requirements" are merely desirable.
Debate the differences of opinion as to what's most important to the end users and why. Two users will have three opinions. You'll find that some users have no clue, and none of their "requirements" are required. You'll find that some people have no spine, and things they aren't brave enough to say out loud are "required".
Get a consensus on the top two or three only. Don't argue out every nuance. It isn't possible to envision software. It isn't possible for anyone to envision what software will be like and how they will use it. Most people's "requirements" are descriptions of how the struggle to work around the inadequate business processes they're stuck with today.
Build the highest-priority, most important part first. Give it to users.
GOTO 1 and repeat the process.
"Wait," you say, "What about the overall budget?" What about it? You can never know the overall budget. Do the following.
Look at each increment defined in step 1. Provide a price-per-increment. In priority order. That way someone can pick as much or as little as they want. There's no large, scary "Big Budgetary Estimate With A Lot Of Zeroes". It's all negotiable.
I have been using a modeling methodology called Behavior Engineering (bE) that uses the original specification text to create the resulting model when you have the model it is easier to identify missing or incomplete sections of the requirements.
I have used the methodolgy on about six projects so far ranging from less than a houndred requirements to over 1300 requirements. If you want to know more I would suggest going to www.behaviorengineering.org there some really good papers regarding the methodology.
The company I work for has created a tool to perform the modeling. The work rate to actually create the model is about 5 requirements for a novice and an expert about 13 requirements an hour. The cool thing about the methodolgy is you don't need to know really anything about the domain the specification is written for. Using just the user text such as nouns and verbs the modeller will find gaps in the model in a very short period of time.
I hope this helps
Michael Larsen
How about building a prototype?
While reading tons of literature about software requirements, I found these two interesting books:
Problem Frames: Analysing & Structuring Software Development Problems by Michael Jackson (not a singer! :-).
Practical Software Requirements: A Manual of Content and Style by Bendjamen Kovitz.
These two authors really stand out from the crowd because, in my humble opinion, they are making a really good attempt to turn development of requirements into a very systematic process - more like engineering than art or black magic. In particular, Michael Jackson's definition of what requirements really are - I think it is the cleanest and most precise that I've ever seen.
I wouldn't do a good service to these authors trying to describe their aproach in a short posting here. So I am not going to do that. But I will try to explain, why their approach seems to be extremely relevant to your question: it allows you to boil down most (not all, but most!) of you requirements development work to processing a bunch of check-lists* telling you what requirements you have to define to cover all important aspects of the entire customer's problem. In other words, this approach is supposed to minimize the risk of missing important requirements (including those that often remain implicit).
I know it may sound like magic, but it isn't. It still takes a substantial mental effort to come to those "magic" check-lists: you have to articulate the customer's problem first, then analyze it thoroughly, and finally dissect it into so-called "problem frames" (which come with those magic check-lists only when they closely match a few typical problem frames defined by authors). Like I said, this approach does not promise to make everything simple. But it definitely promises to make requirements development process as systematic as possible.
If requirements development in your current project is already quite far from the very beginning, it may not be feasible to try to apply the Problem Frames Approach at this point (although it greatly depends on how your current requirements are organized). Still, I highly recommend to read those two books - they contain a lot of wisdom that you may still be able to apply to the current project.
My last important notes about these books:
As far as I understand, Mr. Jackson is the original author of the idea of "problem frames". His book is quite academic and theoretical, but it is very, very readable and even entertaining.
Mr. Kovitz' book tries to demonstrate how Mr. Jackson ideas can be applied in real practice. It also contains tons of useful information on writing and organizing the actual requirements and requirements documents.
You can probably start from the Kovitz' book (and refer to Mr. Jackson's book only if you really need to dig deeper on the theoretical side). But I am sure that, at the end of the day, you should read both books, and you won't regret that. :-)
HTH...
I agree with Galwegian. The technique described is far more efficient than the "wait for customer to yell at us" approach.