algolia best practices : new index or tags? - algolia

I'v successfully setup an algolia search engine on my web page. My backend syncs public data to algolia, and the searchbar works just fine.
Now I want to setup the same for my admin application. Unlike the public application, this app should be able to recover secret data from algolia.
So far, I can think about two ways of doing this:
For each document, store both a "public" version (with a "public" tag) and an admin version (tagged "admin", and with additional fields). Custom api keys can then ensure that each app has access to the proper data.
OR
Create a new index, perhaps my_admin_collection_index, duplicate the settings, and use it just like the my_collection_index from the admin app.
So in first version I search the same index, but with different tags; in the second version I search two different indices.
Is there some insights about how to choose between the two approaches ?
I'd say it would be easier for me to duplicate documents and put some tags on them, but I can't really tell about the performances impact of such an approach.
Thanks !

The first approach consisting in pushing all objects to a single index and tagging them with the permissions is the good way to go. Combining that approach with the Secured API keys allows you to easily scale while keeping a secure front-end implementation (embedding the key in the javascript code for instance).
Even if the Algolia engine supports an unlimited number of indices per application (I saw users with +700,000 indices), having too many indices may result in some indexing overheads & slowdown (especially on the mutualized plans where you're sharing the indexing CPUs with other customers).

Related

How to store one-time data in a MongoDB database?

I am building a personal work/career portfolio web app project, and plan on using MongoDB for my database. (I plan to build the project using MERN stack.) Most of my data is not one-time data (such as education, and work experiences), however I have a few pieces of data (such as my personal summary (the content for my "About Me" section), and skills summary) that are one-time only data (I think "single instance" might be a better fitting term). I would like to store all of the data in a database, and set up an admin-end to manage and edit the data. However, I am not sure how to go about storing the one-time data in my MongoDB database.
One idea I had was to create a collection solely for the one-time data, and only allow the user (me) to update and read the documents in the collection. Another idea I had was placing all of my portfolio data into a single collection called "entries", and giving each "entry" a type (such as "Education", or "Personal Summary"). Then when I retrieve the data from the collection I would gather all the documents with the same value in their type field together. I was thinking of storing each of the types as a constant on my server. However, my biggest concern with both ideas is if they would be considered bad practice of not.
I would be very appreciative if anyone has any advice on how to solve this problem.
I had implemented this a while back on one of my small projects, and again after discussing it over with some professionals I'm in contact with, they said that the best approach would be to create a collection with a single document that contains all the information, like the links, about, etc...
One more thing I, was suggested is that we could use Redis solely for the purpose of storing this type of information as well.
Something that I implemented a long time back similar to the one collection, single doc approach: https://github.com/codelancedevs/Sundar-Clinic/tree/local-backend/src/api/app
Working on a similar approach here: https://github.com/kunalkeshan/Cam-O-Genics-Backend
Hope this is of some help, I'm still learning as to what might be the best approach. Open to any suggestions out there!

What are some patters for designing REST API for user-based platform in AWS?

I am trying to shift towards serverless architecture when it comes to building REST API. I came from Ruby on Rails background.
I have successfully understood and adapted services such as Api Gateway, Cognito, RDS and Lambda functions, however I am struggling with putting it all together in optimal way.
My case is the following. I have a simple user based platform when there are multiple resources related to application members say blog application.
I have used Cognito for the sake of authentication and Aurora as the database service for keeping thing like articles and likes..
Since the database and Cognito user pool are decoupled, it is hard for me to do things like:
Fetching users that liked particular article
Fetching users comments
It seems problematic for me because I need to pass some unique Cognito user identifier (retrieved during authorization phase in API gateway) to lambda function which will then save the database record with an external reference to this user. On the other hand, If I were to fetch particular users, firstly I must fetch their identifiers from my relation database and then request users details from Cognito user pool..I lack some standard ways of accessing current user in my lambda functions as well as mechanisms for easily associating databse record with that user..
I have not found some convincing recommended patterns for designing such applications even though it seems like a very common problem and I am having hard time struggling if my approach is correct..
I would appreciate some comments on what are some patterns to consider when designing simple user based platform and what are the pitfalls of my solution. Any articles and examples will also be very helpfull.
Thanks in advance.
These sound like standard problems associated with distributed, indpependent, databases. You can no longer delegate all relationships to the database and get a result aggregating them in some way. You have to do the work yourself by calling one database, then the other.
For a case like this:
Fetching users that liked particular article
You would look up the "likes" database to determine user IDs of those who liked it, then look up the "users" database to determine user details such as name and avatar.
Most patterns follow standard database advice, e.g. in the above example, you could follow the performance-oriented pattern of de-normalising - store user data such as name and avatar against each "like", as long as you feel the extra storage and burden of keeping it consistent is justified by the reduction in queries (probably too many Likes to justify this).
Another important practice is using bulk queries to avoid N+1 queries. This is what Rails does with the includes syntax, but you may have to do it yourself here. In my example, it should only take two queries because the second query should get all required user data in one go, by querying for users matching the list of user IDs.
Finally, I'd suggest you try to abstract things. This kind of code gets messy fast, so be sure to build a well-encapsulated data layer that isolates application code from dealing with the mess of multiple databases.

firebase queries and swift

I have a string for eg: "My name is John" stored in Firebase.
How would I query firebase so I can find all the posts in Firebase that have "John" ?
I can search for the first term in a string now using:
DataService.dataService.BASE_REF.child("Posts").child(selectedComment.commentKey).queryOrderedByChild("userComment").queryStartingAtValue(comment).queryEndingAtValue(comment+"\u{F8FF}").observeSingleEventOfType(.Value, withBlock: { (snapshot) in
where comment = "My"
I read about using Elastic search with Firebase but wanted to check if there was an easier way in Firebase before I looked at ElasticSearch/Flashlight for Firebase,
Unfortunately, Firebase doesn't support searching thru content like that (in any language SDK). From a Google Groups Post in July '16:
As a company that understands search, we're also a company that
understands using the best tool for the job. For fuzzy matching and
contains, a NoSQL, realtime data store isn't the correct tool--these
queries would be slow and scale poorly. BigQuery or ElasticSearch are
the right tool for providing useful results in a scalable and robust
manner.
Right now, this involves deploying a small node script to sync your
search results with the realtime data, as explained in the article
with the sample Flashlight lib. In the future, it will become more
"effortless" as we add integrations between Firebase and Cloud
products, particularly Cloud Functions and BigQuery interoperability.
BigQuery is, as I understand it, not specifically designed for user-facing search.
Elasticsearch (specifically, the Firebase plugin Flashlight) is a potential solution, but as you alluded to, it's an incredible amount of overhead (deploying/managing or renting an ES cluster, configuring the plugin, etc.). If content search is an important enough part of your app to justify that time/$, you may want to consider solutions beyond Firebase for your database needs, as it's by far one of the service's weakest areas.
In my opinion, you have a few options beyond Flashlight:
Algolia, a Search-as-a-service provider, does offer integration with Firebase, but I've never used it & so can't offer much more than to say that it exists.
Another alternative might be maintaining a collection of documents you want to search on another service, like AWS Cloud Search
Depending on the stage of your project & your needs, consider other Backends-as-a-Service that support more in terms of querying. E.g., GraphQL-as-a-service backends, like Scaphold.io, Graph.cool, and Reindex are all built on SQL databases, and (I believe) all support multiple types of querying.

How to store large user-specific data

So I'm in the middle of planning a little web app that will require quite large amounts of data stored on a user level, in one case, the system would take a large object from a system level and make a "user specific" version, a user can have multiple ones of these. Simplest would be to compare it to a form stored in a google spreadsheet, where the user is expected to use the template spreadsheet, then change not only the answers but also the question.
Security wise I am quite OK
In the second case there is requirement to store multiple objects, size about 250k to maybe 3mb, once again on a user specific level, with a potential to move it to a system level so additional users can access it. As an example, say the user can upload pictures, but may not want to share all of them. However, a user may choose to "publish" a small number of them because they are happy with those specific pictures.
What design patterns should I consider using specifically around web apps where the user have decent amounts of data? For example, would it make most sense to use a single large database and have a table that keeps track of resources or create separate tables per user?
I have considered putting it all in a mongo database.
Your approach may be wrong.
If you want to store user based binary data and make it accessible for the user itself or the community, you would need a hierarchic structure like so:
userid1
pic1,pic2,pic3
userid2
pic4,pic5,pic6
community
pic7,pic8
You could then grant read permissions to "community" for all users, and permission for each user to its own directory.
Usually there is nothing wrong using a database to store binary files if you consider partitioning, role permissions and an applicable interface to access the data.
My suggestion is to use a binary repository like Artifactory.
It provides hierarchic structures, simple search queries using HTTP requests and has caching abilities for frequently queried objects.
I also think that http requests are a lot easier to use and also there is an abstraction layer to the data which is more secure.
Artifactory is free.

Display MongoId publicly or not?

I'm bulding a small web application using mongodb and just thought if it's a good practice to show mongoIds publicly, in urls for example.
Now I'm using the following url structure for user profiles: http://example.com/user/MONGOID
Does this have any security flaws or is it discouraged in some other way?
The answer depends on many of things...
Using an ID in a URL is generally a bad idea. According to OWASP, it ranks #4 in the top 10 web security vulnerabitiy list. But using it will not ruin your project.
To prevent the security vulnerability, you must either :
Use it only on data that is public (like StackOverflow profiles)
Have some code intercept the request and validate that the user has the rights to see the resource (a profile, a page, a document, etc.)
Using _id It also ties your public URL to the back-end. You will need some conversion if you change database technology. Or maybe you will need to run some changes that will result in the object being destroyed and created again, but with a different _id, like merging databases or something. You don't want your URL to change because of that.
Another thing is that _id does not have a good spatial distribution. It does not make a good sharding key. Being derived from a time stamp, all _id are close together, linear if you will. They will tend to go in the same shard (Mongo will spread them later, but you want a key that has high cardinality).
So I prefer to pay now, and use a id field that is private to the application from the start. You can store it in the _id field if you want, but consider adding another key to your document, index it, and use that in your URLs.
No it does not have security implications.
All the person would be able to do is to guess the Id of some user or to try to go through all Ids to get all users of the system.
Take stackoverflow as an example. They have the same pattern as you: http://stackoverflow.com/users/352959 this is 352959 is you and there is nothing bad with it. The only thing that whenever you will enter this in your browser you will be redirected to http://stackoverflow.com/users/352959/king-julien.
I can try to iterate through these numbers and the next guy is http://stackoverflow.com/users/352960 but all I can found is that this is some john. And surely http://stackoverflow.com/users/1 is the creator of the resource.