Is mongodb a no-go for this application? [closed] - mongodb

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Good sirs.
I've just started planning a new project, and it seems that I should stick with a relational database, (even though I want to play with mongo). Tell me if I'm mistaken!
There will be box models, each of which can contain hundreds to thousands of items.
At any time, the user can move an item to another box.
for example, using some Railsy pseudocode...
item = Item(5676)
item.box // returns 24
item.update(box:25)
item.box // returns 25
This sounds like a simple SQL join table to me, but an expensive array manipulation operation for mongodb.
Or is removing an object out of one (huge) array and inserting it in another (huge) array not a big problem for mongo?
Thanks for any wisdom. I've only just started with mongo.

If you want to use big arrays, stay away from MongoDB. I tell from personal experience. There are two big problems with arrays. If they start to grow, document grows and it needs to be moved on disk. That is very, very slow operation. Plus if you need to scan array to get to 10000 element, that will be very slow as it needs to check 9999 before that.

Related

When is it better to pull all data and filter, or pull the data filtered [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am working with Spark (pyspark) and MongoDB as a relational database.
We are running into some performance issues and the answers I found here were not directly related to Big Data.
We pull our entire mongoDB and then filter in Spark and when we apply some filters, some of the columns we don't filter are still present in the spark DataFrame(let me explain better this last case later).
My questions, besides a general understanding of the question's tittle:
Pull and filter, or filter and pull. If it's not a clear answer what are the parameters to start taking into account?
Let's say I have a Spark DataFrame with columns A,B,C and I filter only on C, it would be better (assuming I pulled everything) to drop then A and B?
Any links or readings regarding this are welcome.
1 - pull filtered data , it is more efficient to pull only the data you want. most database are optimize to do filter operation. the perfect case is when you can partition your data on your filtering columns (in your case columns C i guess)
2 - I am not sure but i think it's better to drop the colums you dont use, mainly to reduce the shuffle size if shuffle you have. and it also make your DataFrame more clear

Change a 1500 column data-set for easier front-end manipulation [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have a data-set that consist of 1500 columns and 6500 rows and I am trying to figure out what the best way is to shape the data for web based user interactive visualizations.
What I am trying to do is make the data more interactive and create an admin console that allows anyone to filter the data visually.
Front-end could potentially be based on Crossfilter, D3 and DC.js and give the user basically end-less filtering possibilities(date, value, country. In addition there will be some pre defined views like top and bottom 10 values.
I have seen and tested some great examples like this one, but after testing it did not really fit for the large amount of columns I had and it was based on a full JSON dump from the MongoDB. This amounted in very long loading times and loss of full interactivity with the data.
So in the end my question is what is the best approach (starting with normalization) in getting the data shaped in the right way so it can be manipulated from a front-end. Changing the amount of columns is a priority.
A quick look at the piece of data that you shared suggests that the dataset is highly denormalized. To allow for querying and visualization from a database backend I would suggest normalizing. This is no small bit software work but in the end you will have relational data which is much easier to deal with.
It's hard to guess where you would start but from the bit of data you showed there would be a country table, an event table of some sort and probably some tables of enumerated values.
In any case you will have a hard time finding a db engine that a lows that many columns. The row count is not a problem. I think in the end you will want a db with dozens of tables.

How to properly design classes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm trying to design an application in which a user can make multiple operations like add, delete.. Work stations or Applications from database. How do I design it?
To deal this, I've found two solutions that I couldn't be able to choose the best:
1st solution, 2nd solution.
Is this right?
Any brilliant suggestion, please?
Thanks a lot!
To think this out a bit, a user can have one or more operations i.e. a one to many relationship with the operation.
An operation can delete one or more Workstations or Applications, again a one to many relationships.
Therefore, I think your 1st solution captures it nicely.
I think solution 1 is better. Solution 2 requires that you insert 2 records in the 2 many-to-many associative tables, for each single operation. This is more complex, and probably unnecessarily so.
In solution 1 the Operation table becomes the only associative table. 1 operation, 1 insert. You may have to have to make some of the referencing keys nullable, depending on your requirements, but this is manageable. Simpler and sufficient for the needs you expressed.

Entity Framework or SqlDataReader [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I appreciate there are one or two similar questions on SO but we are a few years on and I know EF's speed and general performance has been enhanced so those may be out of date.
I am writing a new webservice to replace an old one. Replicating the existing functionality it needs to do just a handful of database operations. These are:
Call existing stored procedures to get data (2)
Send SQL to the database to be executed (should be stored procedures I know) (5)
Update records (2)
Insert records (1)
So 10 operations in total. The database is HUGE but I am only dealing with 3 tables directly (stored procedures do some complex JOINs).
When getting the data I build an array of objects (e.g. Employees) which then get returned by the web service.
From my experience with Entity Framework, and because I'm not doing anything clever with the data, I believe EF is not the right tool for my purpose and SqlDataReader is better (I imagine it is going to be lighter and faster).
Entity Framework focuses mostly on developer productivity - easy to use, easy to get things done.
EF does add some abstraction layers on top of "raw" ADO.NET. It's not designed for large-scale, bulk operations, and it will be slower than "raw" ADO.NET.
Using a SqlDataReader will be faster - but it's also a lot more (developer's) work, too.
Pick whichever is more important to you - getting things done quickly and easily (as a developer), or getting top speed by doing it "the hard way".
There's really no good "one single answer" to this "question" ... pick the right tool / the right approach for the job at hand and use it.

NoSQL or SQL Server [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm starting out to design a site that has some requirements that I've never really dealt with. Specifically, the data objects will have similar, but not exact, attributes. Yes, I could probably figure out most of the possible attributes and then just not populate the ones that don't make sense, and therefore keep a traditional "Relational" table and column design, but I'm thinking this might be a really good time to learn NoSQL.
In addition, the user will have 1, and only 1, textbox to search, and I will need to search all data objects and their attributes to find that string.
Ideally, I'd like to have the search return in order of "importance", meaning that if a match for the user's entered string is found in a "name" attribute, it would be returned as a higher confidence match than if the string was matched on a sub-attribute.
Anyone have any experience in this sort of situation? What have you tried that worked or didn't work? Am I wrong in thinking that this project is very well suited to a NoSQL type of database?
Stick with a traditional relational database such as MySQL or Postgresql. I would suggest sorting by relevance in your application code after obtaining the matching results. The size of your result set should impact your design choices, but if you will have less than 1-2k results then just keep it simple and don't worry too much about optimization.
NoSQL is just a dumb key value store, a persistent dictionary that can be shared across multiple application instances. It can solve scalability issues, but introduces new ones since you now just have a dumb data store. Relational databases have had years of performance tuning and do a great job.
I find NoSQL to be much more suited to storing state data, like a users preferences or cache. If you are analyzing the relationship between data then you need a relational database.