Recommending items with availability date in Amazon Personalize - recommendation-engine

Amazon Personalize builds a recommendation model taking into account users, items and events. However, items are assumed as available, and this might not be the case for certain scenarios.
If items need to reflect a time window like availability in time (from date, to date), then you should be able to offer only valid items according to that restriction.
For instance, this would be the case for live shows: you should only recommend live shows that will happen in the future, either based on similarity or community behaviour. Live shows that already happened are part of the training, but are not valid products to recommend.
How can you model this availability restriction in Amazon Personalize?

There are a variety of business requirements that you may need to handle, that are not built in to the core of Amazon Personalize, and this is one of those. For these business requirements you need to build the logic into your wrapper around Amazon Personalize.
Edit: Personalize now allows you to filter recommendations on item metadata which looks like it would be sufficient for this use case. See the writeup here: https://aws.amazon.com/blogs/machine-learning/enhancing-recommendation-filters-by-filtering-on-item-metadata-with-amazon-personalize/

Related

aws-presonalize: can I get recommendations on items not seen in training based on item features?

I consider using aws personalize, or any similar managed recommendation service.
My question is whether it is possible to get recommendations/rankings on items that were not seen in the training data, based on item features. I see that aws personalize does have item feature dataset, but when I read the documentation about ranking recipe it specifically says that items not in the training are added at the end of any ranking. of course - new items have no interaction data, so any recipe/algorithm that solely relies on interaction data is not relevant for my case.
My question is, whether and how can I utilize aws personalize to my use case, if at all possible, or whether you know of any recommender service that can handle it.
Yes. There are specific Amazon Personalize recipes designed to support cold starting items where a cold item is one without behavioral data in the interactions dataset but with item metadata in the items dataset.
The User-Personalization recipe supports cold starting items through a feature called exploration. You control how much exploration (i.e., recommending cold items) is done with the explorationWeight inference hyperparameter when creating a Personalize campaign or batch inference job. See this blog post for details.
Exploration also applies to domain recommenders for the Top picks for you VOD recommender and Recommended for you e-commerce recommender. You specify the explorationWeight when creating a recommender.
The Similar-Items recipe supports the related items use case and looks to balance recommending similar items based on behavioral data and thematic similarity between items. You currently cannot control the weighting with this recipe, though. See this blog post for details. The More like X VOD recommender provides similar functionality.

Creating a database of many products

I am creating an inventory app currently for iPhone using Parse for companies to keep track of all of their tools, supplies, inventory. Now I'd like to allow for the user/company when adding a new item to their database for them to have the option to search from a pre-made database of items such as for a construction company when adding a simple Dewalt Drill Battery to their inventory would search the pre-made database for "Dewalt #DC9096 18V XRP 2.4A Battery" or an office would search for pencils by brand/serial number/name. I am looking for a simple way to make a database or even a table containing multiple brands products including their prices, product specifications, website for ordering more, company website, warranty phone number, etc... I have considered parsing all of the retail websites for information but don't know the legalities behind it and if the websites change then I'd need to update code. If there is ANY (easier/better) way to do this then assistance or direction would be great!
Thanks always
I would not go down the route of trying to parse websites, that will be a huge pain in the neck and impossible to maintain unless you have extensive resources (and as you mention it probably violates most site's terms of service anyway). Your best bet would be to hook into existing product databases via an API, such as Google's Search API for shopping, or maybe Amazon's API. Here's where you can start if you wanted to use Google:
https://developers.google.com/shopping-search/
Hopefully that gets you going in the right direction.
Edit: Here's a list of a lot more shopping APIs that could be good options:
http://www.programmableweb.com/apis/directory/1?apicat=Shopping
If you did find yourself needing to parse many different vendor websites (we'd call this "screen scraping") and you have the legal right to do so, you should use a tool like SelectorGadget to get your XPaths, it's much faster, easier and less error-prone than doing it by hand.
If you're doing more than a couple websites, though, you'll probably find that you'll have to update the scraping rules pretty often, it definitely won't be a set-and-forget operation.

NoSQL Database for ECommerce

I will be constructing an ecommerce site, and would like to use a no-sql database, which will fit well with the plans for the app. But when it comes to which database would fit the job, im not sure. After comparing various DB's, the ones that seem best might be either mongo, couch, or even orientdb. I have seen arguments for all of them to be used or not used compared to something like MySQL. But between themselves (nosql databases), which one would fit well with an ecommerce solution?
Note, for the use case, i wont be having thousands of transactions a second. Or similarly high write rates. they will be moderate sure, but at a level that any established database could handle.
CouchDB: Has master to master replication, which I could really use. If not, I will still have to implement the same functionality in code anyways. I need to be able to have a users database, sync with the mothership. (users will have their own, potentially localhost database, that could sync with the main domains server). Couch is also fast, once your queries have been stored in the db.As i will probably have a higher need for read performance. Though not by a lot.
MongoDB: queries are very easy and user friendly. Also, with the fact that end users may need to query for certain things at a given time that I may not be able to account for ahead of time, this seems like it may be a better fit. I dont have to pre-store my queries in the db. Does support atomic transactions, though only when writing to a single document at a time.
OrientDB: A graph database. much different that most people are used to, but with the needs, it could fit very well too. Orient has the benefits of being schemaless, as well as having support for ACID transactions. There is a lot of customer, and product relationships that a graph database could be great with. Orient also support master to master replication, similar to couchdb.
Dont get me wrong, I can see how to build this traditionally with something like MySQL, but the ease and simplicity of a nosql solution, is very attractive. Although, in my case, needing a schemaless solution, would be much easier in nosql rather than mysql. a given product may have more or less items, than another. and avoiding recreating a table whenever a new field is added, is preferrable.
So between these 3 (or even others you think may be better), what features in each could potentially work for, or against me in regards to an ecommerce based site, when dealing with customer transactions?
Edit: The reason I am not using an existing solution, is because with the integrated features I need, there are no solutions available out there. We are also aiming to use this as a full product for our company. There will be a handful of other integrations than just sales. It is also going to be working with a store's POS system.
Since e-commerce can encompass everything from shopping carts through to membership and recurring subscriptions, it is hard to guess exactly what requirements and complexity you are envisioning.
When constructing an e-commerce site, one of the early considerations should be investigating whether there is already an established e-commerce product or toolkit that could meet your requirements. There are many subtleties to processes like ordering, invoicing, payments, products, and customer relationships even when your use case appears to be straightforward. It may also be possible to separate your application into the catalog management aspects (possibly more custom) versus the billing (potentially third party, perhaps even via a hosted billing/payment API).
Another consideration should be who you are developing the e-commerce site for: is this to scratch your own itch, or for a client? Time, budget, and features for a custom build can be difficult to estimate and schedule .. and a niche choice of technology may make it difficult to find/hire additional development expertise.
A third consideration is what your language(s) of choice are for developing your application. Some languages will have more complete/mature/documented drivers and/or framework abstractions for the different databases.
That said, writing an e-commerce system appears to be a rite of passage for many developers ;-).
Edit: a lot has changed since this answer was originally posted in 2012 and you should definitely refer to current product information. For example, MongoDB has had support for Decimal128 values since MongoDB 3.4 (2016) and multi-document transactions since MongoDB 3.6 (2017).
Check the comparison of different available NoSql databases here. Suit your requirement as per that.
MongoDB 4 now multi-document ACID transactions! That makes it suitable for e-Commerce!
Check out: https://www.mongodb.com/transactions

Event-based analytics package that won't break the bank with high volume

I'm using an application that is very interactive and is now at the point of requiring a real analytics solution. We generate roughly 2.5-3 million events per month (and growing), and would like to build reports to analyze cohorts of users, funneling, etc. The reports are standard enough that it would seem feasible to use an existing service.
However, given the volume of data I am worried that the costs of using a hosted analytics solution like MixPanel will become very expensive very quickly. I've also looked into building a traditional star-schema data warehouse with offline background processes (I know very little about data warehousing).
This is a Ruby application with a PostgreSQL backend.
What are my options, both build and buy, to answer such questions?
Why not building your own?
Check this open source project as an exemple:
http://www.warefeed.com
It is very basic and you will have to built datamart feature you will need in your case

How to manage multiple clients with slightly different business rules? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
We have written a software package for a particular niche industry. This package has been pretty successful, to the extent that we have signed up several different clients in the industry, who use us as a hosted solution provider, and many others are knocking on our doors. If we achieve the kind of success that we're aiming for, we will have literally hundreds of clients, each with their own web site hosted on our servers.
Trouble is, each client comes in with their own little customizations and tweaks that they need for their own local circumstances and conditions, often (but not always) based on local state or even county legislation or bureaucracy. So while probably 90-95% of the system is the same across all clients, we're going to have to build and support these little customizations.
Moreover, the system is still very much a work in progress. There are enhancements and bug fixes happening continually on the core system that need to be applied across all clients.
We are writing code in .NET (ASP, C#), MS-SQL 2005 is our DB server, and we're using SourceGear Vault as our source control system. I have worked with branching in Vault before, and it's great if you only need to keep 2 or 3 branches synchronized - but we're looking at maintaining hundreds of branches, which is just unthinkable.
My question is: How do you recommend we manage all this?
I expect answers will be addressing things like object architecture, web server architecture, source control management, developer teams etc. I have a few ideas of my own, but I have no real experience in managing something like this, and I'd really appreciate hearing from people who have done this sort of thing before.
Thanks!
I would recommend against maintaining separate code branches per customer. This is a nightmare to maintain working code against your Core.
I do recommend you do implement the Strategy Pattern and cover your "customer customizations" with automated tests (e.g. Unit & Functional) whenever you are changing your Core.
UPDATE:
I recommend that before you get too many customers, you need to establish a system of creating and updating each of their websites. How involved you get is going to be balanced by your current revenue stream of course, but you should have an end in mind.
For example, when you just signed up Customer X (hopefully all via the web), their website will be created in XX minutes and send the customer an email stating it's ready.
You definitely want to setup a Continuous Integration (CI) environment. TeamCity is a great tool, and free.
With this in place, you'll be able to check your updates in a staging environment and can then apply those patches across your production instances.
Bottom Line: Once you get over a handful of customers, you need to start thinking about automating your operations and your deployment as yet another application to itself.
UPDATE: This post highlights the negative effects of branching per customer.
Our software has very similar requirements and I've picked up a few things over the years.
First of all, such customizations will cost you both in the short and long-term. If you have control over it, place some checks and balances such that sales & marketing do not over-zealously sell customizations.
I agree with the other posters that say NOT to use source control to manage this. It should be built into the project architecture wherever possible. When I first began working for my current employer, source control was being used for this and it quickly became a nightmare.
We use a separate database for each client, mainly because for many of our clients, the law or the client themselves require it due to privacy concerns, etc...
I would say that the business logic differences have probably been the least difficult part of the experience for us (your mileage may vary depending on the nature of the customizations required). For us, most variations in business logic can be broken down into a set of configuration values which we store in an xml file that is modified upon deployment (if machine specific) or stored in a client-specific folder and kept in source control (explained below). The business logic obtains these values at runtime and adjusts its execution appropriately. You can use this in concert with various strategy and factory patterns as well -- config fields can contain names of strategies etc... . Also, unit testing can be used to verify that you haven't broken things for other clients when you make changes. Currently, adding most new clients to the system involves simply mixing/matching the appropriate config values (as far as business logic is concerned).
More of a problem for us is managing the content of the site itself including the pages/style sheets/text strings/images, all of which our clients often want customized. The current approach that I've taken for this is to create a folder tree for each client that mirrors the main site - this tree is rooted at a folder named "custom" that is located in the main site folder and deployed with the site. Content placed in the client-specific set of folders either overrides or merges with the default content (depending on file type). At runtime the correct file is chosen based on the current context (user, language, etc...). The site can be made to serve multiple clients this way. Efficiency may also be a concern - you can use caching, etc... to make it faster (I use a custom VirtualPathProvider). The largest problem we run into is the burden of visually testing all of these pages when we need to make changes. Basically, to be 100% sure you haven't broken something in a client's custom setup when you have changed a shared stylesheet, image, etc... you would have to visually inspect every single page after any significant design change. I've developed some "feel" over time as to what changes can be comfortably made without breaking things, but it's still not a foolproof system by any means.
In my case I also have no control other than offering my opinion over which visual/code customizations are sold so MANY more of them than I would like have been sold and implemented.
This is not something that you want to solve with source control management, but within the architecture of your application.
I would come up with some sort of plugin like architecture. Which plugins to use for which website would then become a configuration issue and not a source control issue.
This allows you to use branches, etc. for the stuff that they are intended for: parallel development of code between (or maybe even over) releases. Each plugin becomes a seperate project (or subproject) within your source code system. This also allows you to combine all plugins and your main application into one visual studio solution to help with dependency analisys etc.
Loosely coupling the various components in your application is the best way to go.
As mention before, source control does not sound like a good solution for your problem. To me it sounds that is better yo have a single code base using a multi-tenant architecture. This way you get a lot of benefits in terms of managing your application, load on the service, scalability, etc.
Our product using this approach and what we have is some (a lot) of core functionality that is the same for all clients, custom modules that are used by one or more clients and at the core a the "customization" is a simple workflow engine that uses different workflows for different clients, so each clients gets the core functionality, its own workflow(s) and some extended set of modules that are either client specific or generalized for more that one client.
Here's something to get you started on multi-tenancy architecture:
Multi-Tenant Data Architecture
SaaS database tenancy patterns
Without more info, such as types of client specific customization, one can only guess how deep or superficial the changes are. Some simple/standard approaches to consider:
If you can keep a central config specifying the uniqueness from client to client
If you can centralize the business rules to one class or group of classes
If you can store the business rules in the database and pull out based on client
If the business rules can all be DB/SQL based (each client having their own DB
Overall hard coding differences based on client name/id is very problematic, keeping different code bases per client is costly (think of the complete testing/retesting time required for the 90% that doesn't change)...I think more info is required to properly answer (give some specifics)
Layer the application. One of those layers contains customizations and should be able to be pulled out at any time without affect on the rest of the system. Application- and DB-level "triggers" (quoted because they may or many not employ actual DB triggers) that call customer-specific code or are parametrized with customer keys) are very helpful.
Core should never be customized, but you must layer it in somewhere, even if it is simplistic web filtering.
What we have is a a core datbase that has the functionality that all clients get. Then each client has a separate database that contains the customizations for that client. This is expensive in terms of maintenance. The other problem is that when two clients ask for a simliar functionality, it is often done differnetly by the two separate teams. There is currently little done to share custiomizations between clients and make common ones become part of the core application. Each client has their own application portal, so we don't have the worry about a change to one client affecting some other client.
Right now we are looking at changing to a process using a rules engine, but there is some concern that the perfomance won't be there for the number of records we need to be able to process. However, in your circumstances, this might be a viable alternative.
I've used some applications that offered the following customizations:
Web pages were configurable - we could drag fields out of view, position them where we wanted with our own name for the field label.
Add our own views or stored procedures and use them in: data grids (along with an update proc) and reports. Each client would need their own database.
Custom mapping of Excel files to import data into system.
Add our own calculated fields.
Ability to run custom scripts on forms during various events.
Identify our own custom fields.
If you clients are larger companies, you're almost going to need your own SDK, API's, etc.