How can we combine multiple data types in Myrrix for recommendation? - recommendation-engine

In our case, we have users' click stream, items' attributes (like category, tags and so on), favorites about item, and collections for items. How can we combine these data as Myrrix's input data?

Basically you are trying to model interactions between users and items. The way you would model the different interactions is by assigning a strength indicator to each. For instance, you could argue that a click has a strength of 2, a favorite a strength of 5 and perhaps a purchase a strength of 15 (I'm just saying numbers out of the top of my head).
Example of input data:
user1,item1,2 => he view the item
user1,item1,5 => he made the item a favorite
user1,item1,15 => he purchased the item
Now, internally, Myrrix will add all of these values, to indicate quite a strong preference for the item, hence you would keep all the interactions (and not just the strongest one)
The meta-data you might have on the users or the item can also be introduced to Myrrix as "tags" to better inform the model. So you could say that an user is "female" or an item is "jeans". You can have multiple tags per user or item and each tag can be assigned a weight as well.

Related

How can I bind switches (input checkbox) to a mutually exclusive (input radio) in leaflet.js?

There are three groups of markers (grocery stores, clothing stores, pharmacies). Each of them is from a different company. I need to make a display on the map by stores and their companies. When switching to stores, so that only the selected companies are displayed. That is, when selecting a store, other stores should not be displayed. Input radio -> (grocery stores, clothing stores, pharmacies), input checkbox -> (company1, company2,...). How can I make it so that when selecting, for example, "clothing stores", only the clothing store companies with an optional selection by company are shown?
Here's an example: https://codesandbox.io/s/billowing-cache-805g6?file=/index.html
This example uses buttons but you could easily switch them to radio buttons if you would rather use those.
The code is quite simple. One page load, it creates the map. I defined a static array of objects that are companies and each company contains objects by store types (yours might be different).
The doLoadCompany function takes a the id of the company and if it matches, it will loop through the grocery store object for that company and load the markers onto the map. Before it loads the new grocery stores though, it will clear the previously loaded company, if one existed.
This is one simple example. It can get much more complicated depending on your object or it could be simpler again, depending on your object. The basics is to determine a way to identify the company you are wanting to load data for and then load that data. If you're doing this from an API, I would return only the data that you need for the specific company then loop through the data set and plot the markers based on their coordinates.

When and how to load data for an infinite list when page/index of data is hard to know?

I'm writing a Flutter web/mobile calendar application / todo list, the main feature of which is a long list of items (appointments, tasks, and the like).
Much of the time, the list won't be longer than a few hundred items. But it should be able to scale to thousands of items, as many as the user wants. Not too hard to do should the user make many repeating items. I'm using ReorderableListView.builder to virtualize the list. The part I'm struggling with is when to load the list data and how to store it.
Important: When the user picks a day, the list can jump to somewhere in the middle... and can scroll back to the top of the list.
The easiest thing to do would be to just load all data up to the users position. But this means storing a potentially very large list in memory at best, and in the web application it means requesting way more data than is really needed.
A good summary of the problem might be: Jumping to particular day is more challenging than jumping to a known index on the list. It's not easy to know what an item's index would be in a fully constructed version of the list without fully constructing the list up to that item. Yes, you can get items at particular date, but what if you wanted to get fifty items before a particular date, and fifty items after a particular date (useful for keeping scrolling smooth)? There could be a huge gap, or there could be a whole ton of items all clustered on one day.
A complication is that not all items in the list are items in the database, for example day headers. The day headers need to behave like regular items and not be attached to other items when it comes to the reordering drag animation, yet storing them as records in the database feels wrong. Here's an image of the web application to give you some idea:
THIS ANSWER IS MY OWN WORK IN PROGRESS. OPEN TO BETTER ANSWERS, CORRECTIONS.
I like the answer here (Flutter: Display content from paginated API with dynamic ListView) and would like to do something like it.
I would do this both for web app, where there's http bottlenecks. I would also do this for the mobile app, even when all data is in the database. Why keep the entire list in memory when you don't have to? (I have very little mobile development experience, so please correct me if I'm wrong)
Problem:
Jumping to particular day is more challenging than jumping to known index on the list. It's not easy to know what an item's index would be in a fully constructed version of the list without fully constructing the list up to that item.
Solution I'm leaning toward:
ReferencesList + keyValueStorage solution
Store the item ids in order as well as key value pairs of all items by id as a json list in NoSql. This list will include references to items, with day headings included and represented by something like 'dh-2021-05-21' or its epoch time. This list will be very lightweight, just a string per item, so you don't need to worry about holding it all in memory. Item data can be pulled out of storage whenever it's built. (Should this be a problem in Sembast or hive? Hive, and here's why: https://bendelonlee.medium.com/hive-or-sembast-for-nosql-storage-in-flutter-web-fe3711146a0 )
When a date is jumped to, run a binary search on this list to find its exact position and scroll to that position. You can easily preload, say 30 items before and 50 items after that position.
other solutions:
SplayTreeMap + QuerySQLByDate Solution:
When jumping, since you don't know the index, insert it into a new SplayTreeMap at an arbitrarily high index, say 100 * number_of_list_items_in_database just to be safe. Run two queries, on ascending from scrolled to date, and one descending from it. Cache these queries and return them in paged form. Should the user reach the beginning of the list, simply prevent them from scrolling further up the list manually with a ScrollController or something like it.

CQRS event store aggregate vs projection

In a CQRS event store, does an "aggregate" contain a summarized view of the events or simply a reference to the boundary of those events? (group id)
A projection is a view or representation of events so in the case of an aggregate representing a boundary that would make sense to me whereas if the aggregate contained the current summarized state I'd be confused about duplication between the two.
In a CQRS event store, does an "aggregate" contain a summarized view of the events or simply a reference to the boundary of those events? (group id)
Aggregates don't exist in the event store
Events live in the event store
Aggregates live in the write model (the C of CQRS)
Aggregate, in this case, still has the same basic meaning that it had in the "Blue Book"; it's the term for a boundary around one or more entities that are immediately consistent with each other. The responsibility of the aggregate is to ensure that writes (commands) to the book of record respect the business invariant.
It's typically convenient, in an event store, to organize the events into "streams"; if you imagine a RDBMS schema, the stream id will just be some identifier that says "these events are all part of the same history."
It will usually be the case that one aggregate -> one stream, but usually isn't always; there are some exceptional cases you may need to handle when you change your model. Greg Young is covering some of these in his new eBook on event versioning.
So it's possible that the same data structure might exist in the aggregate and query side store (duplicated view used for different purposes).
Yes, and no. It's absolutely the case that the data structures used when validating a write match those used to support a query. But the storage doesn't usually match. Put another way, aggregates don't get stored (the state of the aggregate does); whereas it is fairly common that the query view gets cached (again, not the data structure itself, but a representation that can be used to repopulate the data structure without necessarily needing to replay all of the events).
Any chance you have an example of aggregate state data structure (rdbms)? Every example I've found is trimmed down to a few columns with something like include id, source_id, version making it difficult to visualize what the scope of an aggregate is
A common example would be that of a trading book (an aggregate responsible for matching "buy" and "sell" orders).
In a traditional RDBMS store, that would probably look like a row in a books table, with a unique id for the book, information about what item that book is tracking, date information about when that book is active, and so on. In addition, there's likely to be some sort of orders table, with uniq ids, a trading book id, order type, transaction numbers, prices and volumes (in other words, all of the information the aggregate needs to know to satisfy its invariant).
In a document store, you'd see all of that information in a single document -- perhaps a json document with the information about the root object, and two lists of order objects (one for buys, one for sells).
In an event store, you'd see the individual OrderPlaced, TradeOccurred, OrderCancelled.
it seems that the aggregate is computed using the entire set of events unless it gets large enough to warrant a snapshot.
Yes, that's exactly right. If you are familiar with a "fold function", then event sourcing is just a fold from some common initial state. When a snapshot is available, we'll fold from that state (with a corresponding reduction in the number of events that get folded in)
In an event sourced environment with "snapshots", you might see a combination of the event store and the document store (where the document would include additional meta information indicating where in the event stream it had been assembled).

How would I organize a Postgres db for a list of restaurants and their menu items?

I'm playing around with Postgres and trying to get the hang of more complex issues. Imagine I have a large set of restaurants in each of the 50 U.S. states. Each restaurant contains a menu, and each menu contains a set of items which contain things like price, description, etc. What would be a good way to organize the data?
My initial thought, which I'm sure is way wrong, would be to have a db per state. Within that would be a list of restaurants (and any basic details like address, phone number, rating, etc). Then, I'd have one db per restaurant which represents the menu. This db would contain columns that define each menu entry.
Is this totally off the mark? Is there a more ideal way to accomplish this? My current experience playing around with Postgres has just been limited to a single database.
I'm just looking for a good description, not a bunch of code. This is more a general architectural question. Thanks in advance!
I always recommend you first write down all the individual things you want to store in the database, at the most atomic form that makes sense for your application.
In this example, I am assuming that its a franchise restaurant, and you want to track its various stores and their offerings.
A sample schema:
A table for ingredients, possible columns could be:
Name
Supplier
A table for menu items, possible columns could be:
Name
Descirption
Is Vegan
Has Nuts
Is Kosher
A table that links a menu item with its ingredients:
MenuItemPK
IngredientItemPK
A table for each restaurant:
Name
Owner
Contact Information
A table for each restaurant location:
RestaurantPK
Branch Name
City
State
ZIP
Opening Hours
A table that links a restaurant with its menus:
RestaurantLocationPK
Menu Name (for example, 'weekend dinner')
Menu Descirption
A table that links a menu with its items:
MenuItemPK
MenuPK
Your question is really about relational database design, not Postgresql per se, so that's something for you to track down and learn about.
Your description of your first idea is heading down the right path, except instead of separate databases, these different entities should be stored in separate tables. (You'd break things out into separate databases only when there's no real likelihood you'd ever need to compare the items with each other, or search across all of them, etc. And in your description, these things would all best be in one database.)

iPhone - Ordering Core data relationship

I currently have an app which lists a number of events (horse riding, party, etc), i want to be able to add these events to a list of 'favourites' which will be stored in core data, but i also want these events to be sorted in the order they were added to favourites. I realise i could just add an index property to the event and sort using a descriptor when i retrieve the events but i would like to be able to add events to multiple lists of favourites, so i don't think that would work.
I've also looked into ordered relationships, which is exactly what i am looking for but requires iOS5, as a last resort i could probably cope with that although i would prefer to be able to find another way to do this if possible. Any ideas?
Thanks.
EDIT: The user can also add and remove lists of favourites so adding a date property and sorting by that would not be possible.
The correct solution is to have a 3rd entity that represents a membership of an event to a favourite list. Let's call it EventInFavourites.
EventInFavourites has two many-to-one relations:
Favourites <-------->> EventInFavourites
This one says that a Favourites can have several Event in it
Event <---------->> EventInFavourites
This one says that an Event can be part of several Favourites lists.
Finally, the position of that event in that favourite list is represented with an attribute of
EventInFavourites, let's say position.
So when you want to add an event to a favorite lists, you create an EventInFavourites instance and you link it to that event and to that favorite list. A bit like this:
EventInFavourites *newFavouriteMembership = [EventInFavourites insertInManagedObjectContext:self.moc];
newFavouriteMembership.event = anEvent;
newFavouriteMembership.favourites = aFavouritesList;
newFavouriteMembership.position = 3; // or whatever
[self.moc save:NULL];
I left out a few details, but that should give you the big picture.
And of course, you can wait for iOS 5.
Go with ordered relationships with iOS 5. iOS devices are updated fairly quickly and I imagine that you would not be forsaking a large percentage of potential customers. Not to mention the time you will save from having to roll your own implementation.
Store the time & date when the item was added to the favorites. Later, when querying the db, order by this timestamp.
With different lists of favorites you might want to store multiple timestamps, one for each list.