Conceptual Help needed with Database Queries - mongodb

So, I don't have a specific issue here other than a general lack of knowledge, and I am hoping that the big brains on here can give me a nudge in the right direction or maybe refer me to an online resource that could help...here is the general problem I am trying to solve.
I have a mongo database that holds a handful of collections where I store data retrieved from some software that we use for our day to day operations. We are grabbing this data from the API and storing it in Mongo to build up a historical source of data (the API is limited in the timeframe of data that can be retrieved.)
The window for the historical data from the API is 7 days.
The data in question has a unique id for each item we pull from the API, so this allows us to grab a record, store it, and modify it as required if it changes over time. This has been working just fine for our needs, until we started to notice a few discrepencies between the data we stored in Mongo, and what we would get out of our software when we ran "reports." After digging into it, turns out there are a few edge cases where a record would be deleted from the software, but if we already grabbed that record through the API, then it would remain in our Mongo Database.
I'm looking for any advice on how to handle this situation. Ideally I suppose we would like to remove these deleted records from our mongoDB inorder to match what is in the software...but I'm having trouble dreaming up the process to make this happen. Apparently this is one of many gaping holes in my entirely self-taught knowledge of this stuff.
Thanks for any assistance.

Related

Dynatrace API - pull pure path information

We are using Dynatrace in our organization for a long time. It is really a great tool for pointing out pain areas in case of performance and knowing what's happening in the system. However we found that reporting is not great. In our setup, data gets wiped-off in 20 days for non-PRD environment. It also looses all the details. To keep a track of underlying calls currently we need to take a screen shot of put the data manually in excel file. This helps us in comparing old results with latest development/ improvement.
We were wondering if there is any Dynatrace API available which can push the pure path information in JSON format. We checked Dynatrace API page. But there is none. Creating excel files manually is waste of time. There is no value addition. Has anyone else found any work around for this?
e.g. for the image we want to have JSON having list of underlying DB calls shown under controller, their start-end time, time consumed details, etc..
Please help

Advice on structuring data with MongoDB

Hope it's okay to ask a question like this in here - I'm a little new to backend and am still finding my way.
I am building a kind of language-learning-themed social media program, but I'm a little lost as to how best to structure all the various bits of data. At the moment I am storing two types of objects in their own MongoDB collections: posts, with a nested array of all the replies, and users. At the moment that's working okay, however there's lots of things I'm intending to add down the line, and would quite like to get my data structured in a way where that isn't problematic later.
For instance, one feature I am intending to add is a phrasebook, where users can save messages or words to a personal list, that they can access later or export. Currently I'm planning to create a separate collection for these phrasebooks, with a string that links it to the user's ID, and then an array of phrase. But is that bad practice? Should I be trying to keep my number of collections to a minimum, or is it fine to have lots of different ones?
If anyone has suggestions of where I can read up more on best practice around this too, I'd appreciate that a lot! I've tried to google but haven't had much luck.

A Collection Per User or A Collection of Users

I am trying to figure out which is the best option for storing individual user logging information and general meta profiling data for each user on our system.
The original idea was to have a "profiler" collection and each document would represent a user. The problem with this design is that a power user could rack up so much meta data and history over the course of a year or less that it exceeds the document size limit. It also would force the documents to have deeper and more complex structures, which could result in slower queries.
The alternative design idea is to create a collection for each user and each document would hold specific types of profiling, history data. There are several benefits to this, namely speed. Yet also presents querying challenges when needing to run comparisons against other users (Solvable through other tracking DBs). I can't find a definitive answer to the question of how many collections a single mongo database contain.
If it can handle millions upon millions of collection per database then fantastic, otherwise I need to find better options for modeling this data. Am I going about this the right way?
The goal is to maintain a history of a user's interactions, reputation tracking, their interests over time, features they use regularly etc. which can allow for a more rich experience.
Create 2 collections: Users & User interactions.
There are certain things that make complete sense to store inside a User's document:
Reputation tracking
Interests -- common tags (similar to stack overflow) that a user frequents
Features -- this should be a finite list items. You could Key and $increment them as they are used
User interactions on the other hand is more of a log type structure that you may want to store with a back reference and process later.
Also check out Apache Kafka -- It's a distributed queuing technology that LinkedIn uses to do something similar to what you are describing.

Designing a MongoDB schema for a chat server

I wish to design a schema for a chat server. The schema needs to support delivery and reading of messages. Each message needs to have a option of being a private or group message.
I was trying to think about where the data regarding if it has been read and delivered be sent.
In a relational database this could be set in another table. In MongoDB I could set this either in the user or the actual message json document.
If the message isn't for a specific user but a broadcast message then i presume it would be better to store the IDs of the users that have seen it as part of the json document of the message.
Does anyone know of some good example schemas that are available. I don't fully understand the best way of attacking this issue.
(Too long for a comment. And it kinda answers the question)
Yeah, it's a challenging design. Also it's something we can't do for you, I'm afraid, because we don't know all your requirements, you do. However you design it, you should respect the usual mongodb guidelines. Unfortunately, they conflict with each other:
Don't put too much stuff into one document.
In the classic blog schema exercise, one might be tempted to embed comments into the post document, each comment embedding its user too. This can easily lead to overflowing mongodb's max document size. Also it leads to write contention. Doesn't matter much for MMAPv1 engine, but matters for WiredTiger engine (which has document-level locking).
Do not build overly normalized schemas.
Normalized schemas are encouraged in relational databases. In mongodb they're useless (because of the lack of joins). What you need to do is careful duplication of some data. For example, in blog/comments example, one might embed author's id/email in a comment, but not the rest of author's data (sign up date, membership status, etc.)
When I decide a place or shape of the data, I generally ask myself these two questions:
How am I going to query this?
Isn't this too much duplication?

Is there any GIS extension for Apache Cassandra?

I want to use Cassandra for my web application, because it will manage a lot information. The problem is that it will also handle a lot of geographical data, so I need a GIS (http://en.wikipedia.org/wiki/Geographic_information_system) cassandra extension to capture, store, manipulate, analyze, manage, and present all types of geographical data.
Something like PostGIS for PostgreSQL. Does it already exists? Something simillar? Any suggestions?
Thanks for your help in advance :)
Well, one of our clients at PlayOrm(a client on top of cassandra with it's own command line client) is heavy into GIS so we are going to be adding features to store GIS data though I think they already exist. I meet with someone next week regarding this so in the meantime, you may want to checkout PlayOrm.
Data will be read from cassandra and displayed on one fo the largest monitors I have seen with some big huge machine(s) backing it with tons of graphic's cards....pretty cool setup.
Currently, PlayOrm does joins and Scalable-SQL but it is very likely we will be adding spatial queries to the mix if we continue to do GIS data.