"Frenemies", or How to Make Teenagers Happy at a Birthday Party - cluster-analysis

I have a combinatorial optimization problem that I am struggling with. The technical details of the problem are cumbersome, so I have translated things in terms of a fictitious sweet-16 birthday party. Obviously, teenagers are NP difficult, but that's separate from the actual problem I'm trying to solve.
Let's say I have a son who is about to turn 16. He invites all of his friends for his birthday party, but not all of his friends like each other. In fact, every friend of my son's has at least one person they don't like, and some have more. These "frenemies" refuse to sit at the same table if one or more sworn "frenemy" is sitting at the same table. My son has provided me a list of all his invited friends, and also who doesn't like who. This information is symmetrical (If friend A doesn't like friend B, friend B doesn't like friend A), but it is NOT transitive (If friend A doesn't like friend B, but likes friend C, friend C is still free to like or dislike friend B). My question is: How do I determine the minimum number of tables that satisfies the condition that no two "frenemies" are seated at the same table?

This is a combinatorial optimization problem, not a machine learning problem.
Actually, it is a coloring problem: Create a graph G, where each vertex corresponds to a person. An edge (u, v) exists iff the two persons u and v do not like each other. You are now asking for the smallest k such that G is k-colorable. A coloring c(v) tells you which table person v is seated at.
Now you just have to pick an algorithm.

This sounds more like a constrained optimisation problem than a machine learning problem to me. I would model it as follows.
One variable per friend, the value would be the table,
additional constraints (as per list) of the form friendX !- friendY to say that these two can't sit at the same table.
That's the basic model that you can solve using a constraint solver of your choice (I recomment Minion). You can either minimise the highest table number (which would require some additional constraints), or simply try to find a solution with a given number of tables (i.e. values in the domains of the variables) until you get down to one where no solution exists.
Depending on the size of the problem (i.e. number of friends and tables) this may or may not work. Something you may have to consider is the symmetry in the problem (i.e. the people on table A can move to table B and vice versa and it's still a solution) which could be broken with additional constraints.

Related

Product Classification

This is my first post on Stackoverflow but I've consulted this website very often. I was hoping someone could help me out.
I'm trying to do a mapping of a product catalogue (not sure if it's the right term) in order to simplify it in the end. What I would like to achieve is to get a overview of the current product structure.
Variables:
Product_Name: definition of the product, total of 27 unique products
Product_Type: either data, voice, add-on or non-services
Customer_segment: 6 distinct segments
Carrier_type: how is the product delivered? (fiber/coax etc.)
Optional variables:
Account_ID: customer number
Revenue
What am I trying to achieve?
I would like to see how the columns relate to each other. So if I have for example product A, then I would like to see which options are related to this particular product.
Product A
Segment A Segment B Segment C
Carrier A Carrier B Carrier A Carrier C Carrier B Carrier D
I'm looking for ways on how to make such decision trees in R since I do not want to make these on my own. Even more awesome would be to include the account ID and revenue somehow. I've already looked into Market Basket Analysis but this only gives me the most common combinations of products per account (basket). Now I want to see which products belong together and to see if there is any overlap between them in order to simplify it in the end.
I'm not asking for a complete set of code but more like a push into the right direction. Does anyone know which method of analysis would be suited to do this?
Thanks!

Storing parameters for rules

I am using RdeHat Decision Maker 7.1 (Drools) to create a rule for assigning a case to a department. The rule itself is quite simple, however it requires quite a lot of parameters (~12) like the agent type, working area, case type, customer seniority and more. The result "action" is the department to which the case is assigned.
I tried to place the parameters in a decision table , but the table quickly bloated to over 15,000 rows and will probably get even larger then that. I did, however, notices that in many cases the different between two rows is 1 or two parameters (e.g. same row with the only different is agent type "Local" vs. "Regional") resulting in different assignment.
I am thinking of replacing the table with something else, like a tree structure, so I can group similar rows under the same node and then navigate over the tree to make the decision. To do this I plan to prioritize the parameters and give parameters with higher priority a higher place in the tree.
Does anyone has experience with such a problem ? I looked at decision trees but they focus more on ML and probabilities, so I'm not sure this is what I need.
Is there any other method to deal with bloated tables that become unmanageable ? I cannot go to our customer and ask them to maintain a 15,000 rows excel. They'll shoot me there and then.
Thanks
Alon.

Determining canonical classes with text data

I have a unique problem and I'm not aware of any algorithm that can help me. Maybe someone on here does.
I have a dataset compiled from many different sources (teams). One field in particular is called "type". Here are some example values for type:
aple, apples, appls, ornge, fruits, orange, orange z, pear,
cauliflower, colifower, brocli, brocoli, leeks, veg, vegetables.
What I would like to be able to do is to group them together into e.g. fruits, vegetables, etc.
Put another way I have multiple spellings of various permutations of a parent level variable (fruits or vegetables in this example) and I need to be able to group them as best I can.
The only other potentially relevant feature of the data is the team that entered it, assuming some consistency in the way each team enters their data.
So, I have several million records of multiple spellings and short spellings (e.g. apple, appls) and I want to group them together in some way. In this example by fruits and vegetables.
Clustering would be challenging since each entry is most often 1 or two words, making it tricky to calculate a distance between terms.
Short of creating a massive lookup table created by a human (not likely with millions of rows), is there any approach I can take with this problem?
You will need to first solve the spelling problem, unless you have Google scale data that could allow you to learn fixing spelling with Google scale statistics.
Then you will still have the problem that "Apple" could be a fruit or a computer. Apple and "Granny Smith" will be completely different. You best guess at this second stage is something like word2vec trained on massive data. Then you get high dimensional word vectors, and can finally try to solve the clustering challenge, if you ever get that far with decent results. Good luck.

SQL Database Design - Flag or New Table?

Some of the Users in my database will also be Practitioners.
This could be represented by either:
an is_practitioner flag in the User table
a separate Practitioner table with a user_id column
It isn't clear to me which approach is better.
Advantages of flag:
fewer tables
only one id per user (hence no possibility of confusion, and also no confusion in which id to use in other tables)
flexibility (I don't have to decide whether fields are Practitioner-only or not)
possible speed advantage for finding User-level information for a practitioner (e.g. e-mail address)
Advantages of new table:
no nulls in the User table
clearer as to what information pertains to practitioners only
speed advantage for finding practitioners
In my case specifically, at the moment, practitioner-related information is generally one-to-many (such as the locations they can work in, or the shifts they can work, etc). I would not be at all surprised if it turns I need to store simple attributes for practitioners (i.e., one-to-one).
Questions
Are there any other considerations?
Is either approach superior?
You might want to consider the fact that, someone who is a practitioner today, is something else tomorrow. (And, by that I don't mean, not being a practitioner). Say, a consultant, an author or whatever are the variants in your subject domain, and you might want to keep track of his latest status in the Users table. So it might make sense to have a ProfType field, (Type of Professional practice) or equivalent. This way, you have all the advantages of having a flag, you could keep it as a string field and leave it as a blank string, or fill it with other Prof.Type codes as your requirements grow.
You mention, having a new table, has the advantage for finding practitioners. No, you are better off with a WHERE clause on the users table for that.
Your last paragraph(one-to-many), however, may tilt the whole choice in favour of a separate table. You might also want to consider, likely number of records, likely growth, criticality of complicated queries etc.
I tried to draw two scenarios, with some notes inside the image. It's really only a draft just to help you to "see" the various entities. May be you already done something like it: in this case do not consider my answer please. As Whirl stated in his last paragraph, you should consider other things too.
Personally I would go for a separate table - as long as you can already identify some extra data that make sense only for a Practitioner (e.g.: full professional title, University, Hospital or any other Entity the Practitioner is associated with).
So in case in the future you discover more data that make sense only for the Practitioner and/or identify another distinct "subtype" of User (e.g. Intern) you can just add fields to the Practitioner subtable, or a new Table for the Intern.
It might be advantageous to use a User Type field as suggested by #Whirl Mind above.
I think that this is just one example of having to identify different type of Objects in your DB, and for that I refer to one of my previous answers here: Designing SQL database to represent OO class hierarchy

Core data max amount of entity

I am working on an app where I use core data.
I already have tried to do it with one entity but that didn't work.
But I now have like twenty entities and my question is: is there a limit to the number of entities or a number recommended?
Is there a better way to store that amount of data?
UPDATE:
What I am storing are grads from school but not the A,b,c,d,e,f but a number from 1 to 10. And each grad has is own weighing(amount of times a number count) like some grad count 2 time because the are more imported.
So i first thought to have an array with a string for the name of the subject and then to array one store's the grad the other the corresponding weighing.
Like this:
var subjects: [String,[Int],[Int]]
but this isn't possible and I don't even know how I should put this in core data and get it back properly.
Because I couldn't figure it out, I thought of just making a entity for each subject but there are a lot of them so there for this question.
There's no limit to the number of entities, but it's possible to go overboard and create more than you actually need. The recommended number is "as many as you need and no more", which obviously will vary a great deal depending on the nature of the data and how the app uses it. Whether there's a better way than your current approach is totally dependent on the fine details of exactly what you're doing, and so is impossible to answer without a far more detailed question.
You could setup a Subject entity that has one-to-many relationships to ordered sets of Grade and Weight, like so:
However, each grade apparently has a corresponding weight, so it would be more accurate to store each grade's weight in the Grade entity:
This still may not represent your real-world model.
If your subject is something general, like math or english, you could have more than one subject per grade, (e.g., algebra, geometry, trigonometry), or more than one level per subject (e.g., algebra 1, algebra 2) which may or may not have a different grade.
If your subject is very specific, your data may end up spread across unique one-to-one relationships, instead of one-to-many relationships.
You would also need to consider whether you can use ordered or unordered relationships, or whether an attribute exists that you can use to sort an entity.
You should consider these different facets of what you're trying to model (as well as the specific fetches you'd want to perform), before you try to design or implement the model, to allow you to efficiently represent this particular object graph.