how to store array of data without knowing the fields? - facebook

I am currently developing a Facebook application that calls upon the API and retrieve the user's info and display them on the front end.
However, since there are so many fields in the data. Could i actually store them into an array without knowing the fields using MongoDB?

Data in MongoDB has a flexible schema. Unlike SQL databases, where you
must determine and declare a table’s schema before inserting data,
MongoDB’s collections do not enforce document structure.
Read this articel: Data Modeling Introduction
Instead of inserting data to array you can insert it as a new key-value pair to document.

Related

Can I store data that won't affect query performance in MongoDB?

We have an application which requires saving of data that should be in documents, for querying and sorting purposes. The data should be schema less, as some of the fields would be known only via usage. For this, MongoDB is a great solution and it works great for us.
Part of the data in each document, is for displaying purposes. Meaning the data can be objects (let's say json) that the client side uses in order to plot diagrams.
I tried to save this data using gridfs, but the use cases makes it not responsive enough. Also, the documents won't exceed the 16 MB limits even with the diagram data inside them. And in fact, while trying to save this data directly within the documents, we got better results.
This data is used only for client side responses, meaning we should never query it. So my question is, can I insert this data to MongoDB, and set it as a 'not for query' data? Meaning, can I insert this data without affecting Mongo's performance? The data is strict and once a document is inserted, there might be only updating of existing fields, not adding new ones.
I've noticed there is a Binary Data type in Mongo, and I am wondering if I should use this type for objects that are not binary. Can this give me what I'm looking for?
Also, I would love to know what is the advantage in using this type inside my documents. Can it save me disk space?
As at MongoDB 3.4, read and write operations are atomic on the level of a single document from the storage/memory point of view. If the MongoDB server needs to fetch a document from memory or disk (even when projecting a subset of fields to return) the full document generally has to be loaded into memory on a mongod. The only exception is if you can take advantage of covered queries where all of the fields returned are also included in the index used.
This data is used only for client side responses, meaning we should never query it.
Data fields which aren't queried directly do not need to be in any indexes. However, there is currently no concept like "not for query" fields in MongoDB. You can query or project any field (with or without an index).
Meaning, can I insert this data without affecting Mongo's performance?
Data with very different access or growth patterns (such as your infrequently requested client data) is a recommended candidate for storing separately from a parent document with frequently accessed data. This will improve the efficiency of memory usage for mongod by avoiding unnecessary retrieval of data when working with documents in the parent collection.
I've noticed there is a Binary Data type in Mongo, and I am wondering if I should use this type for objects that are not binary. Can this give me what I'm looking for? Also, I would love to know what is the advantage in using this type inside my documents. Can it save me disk space?
You should use a type that is most appropriate for the data that you are storing. Storing text data as binary will not gain you any obvious efficiencies in server storage. However, storing a complex object as a single value (for example, a JSON document serialized as a string) could save some serialization overhead if that object will only be interpreted via your client-side code. Binary data stored in MongoDB will be an opaque blob as far as indexing or querying, which sounds fine for your purposes.

is there any efficient way to insert a copy field in titan?

Is there any efficient way to store a copy of a field with different data type in Titan?
I am using titan 1.0.0 with Solr as a backend data store ,according to my queries and my backend strategies ,I want to store a field with two different data type in my Solr ( text and string). I have already knew that I can store them separately, but I want to know if it is possible to insert data once while storing it in both fields.

Differences between NoSQL databases

NoSQL term has 4 categories.
Key\value stores
Document oriented
Graph
Column oriented.
From my point of view all these data modeling has same definition, What are differences?
Key\value database maintains data in structure like object in OOP. having access to data is base on unique key.
Column oriented is an approach like key\value! But in key\value, you cant access to value by query. I mean, queries are key-based.
Compare 1st & 2nd picture from 2 different categories.
Document oriented stores data in collections, something like rows. Having access to data is base on unique key. The collections store data like key\value. However, you can access data by value.
As you can see, In these 3 categories, we define a unique key for specify a unique object & some pairs of key\value for more information
Graph db is a little different.
So, what are differences in definition & in real-world?
Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.
Graph stores are used to store information about networks of data, such as social connections. Graph stores include Neo4J and Giraph.
Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or 'key'), together with its value. Examples of key-value stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as 'integer', which adds functionality.
Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
for more, follow this link on MongoDB

Dynamic Data inside MongoDB

I'm making dynamic data storage, which it was in relational data base as:
form -> field
form -> form_data
field -> field_data
form_data -> filed_data
and field data contains field_id, form_data_id, and value
but for scalability and performance I'm planing to move form_data and field_data to MongoDB and my problem now is how to design MongoDB collections, using one collection for all form_data and move field_data to map inside it and key is field_id and value is the value of this field_data, or make collection for each form record and store data direct in form_data without map as the all data in this case will be consistent.
In document-oriented databases like MongoDB you should always favor aggregation over references, because they don't support JOIN-operations on the database to link multiple documents.
That means that a "A has many B" relationship between two entities should not be modeled as two tables A and B, but rather as one collection of A where each A has an embedded array of B objects.
In the context of MongoDB, there is however an additional limitation: MongoDB doesn't like objects which grow. When an object accumulates more and more data over its lifetime, the object will grow. That means that the hard drive space MongoDB allocated for it will run out again and again, which will require space reallocation. This takes performance and fragments the database. Also, MongoDB has an artificial size limit for documents, mainly to discourage developers from designing growing objects.
Therefor:
When the data exists at creation, embed it directly.
When more and more data is added after creation, put it into a different collection.
When a form has X fields, the number of fields will likely not change much over its lifetime. So you should embed the fields and their descriptions directly into the form object.
But the number of answers entered into these forms will grow over time, which means that these should be treated as separate objects in a separate collection.
So I would recommend you to have two collections, forms and form_data.
Each document in forms embeds a sub-object of fields with the static field properties.
Each document in form_data has a field with the _id of the corresponding form and embeds a sub-object of field_data which uses the same keys as the fields sub-object of forms and stores the entries the user made in that form.
When your use-case requires frequent access to the aggregated data (like when you want to publish the up-to-date statistics on a public website), you could also store this information in the fields of forms to avoid an expensive aggregation query over many form_data documents. MongoDB in general recommends to orient your database schema rather on your performance requirements than on the semantics of your data.
Regarding your remark "all data in this case will be consistent": keep in mind that MongoDB does not enforce referential integrity. When an application deletes or changes a document, it's the responsibility of the application to fix any outdated references to it in other documents.

Zend: index generation and the pros and cons of Zend_Search_Lucene

I've never came across an app/class like Zend Search Lucene before, as I've always queried my database.
Zend_Search_Lucene operates with
documents as atomic objects for
indexing. A document is divided into
named fields, and fields have content
that can be searched.
A document is represented by the
Zend_Search_Lucene_Document class, and
this objects of this class contain
instances of Zend_Search_Lucene_Field
that represent the fields on the
document.
It is important to note that any
information can be added to the index.
Application-specific information or
metadata can be stored in the document
fields, and later retrieved with the
document during search.
So this is basically saying that I can apply this to anything including databases, the key thing here is making indexes for searching.
What I'm trying to grasp is where exactly should I store the indexes in my application, let's take for example we have phones stored in a database, a manufacturers, models - how should I categorize the indexes?
If I'm making indexes of users with say, addresses I obviously wouldn't want them to be publically viewable, I'm just confused on how it all works out together, if there are known disadvantages, any gotchas I should know while using it.
A Lucene index is stored outside the database. I'd store it in a "data" directory as a sister to your controllers, models, and views. But you can store it anywhere; you just need to specify the path when you open the index for querying.
It's basically a redundant copy of the documents stored in your database, and you have to keep them in sync yourself. That's one of the disadvantages: you have to write code to populate the Lucene index based on results of a query against your database. As you add data to the database, you have to update your Lucene index as well.
An advantage of using an external full-text index solution is that you can reduce the workload on your RDBMS. To find a document, you execute a search using the Lucene API. The result should include a field containing the primary key value (as part of the document but no need to make it analyzed for FT search). You get this field back when you do a Lucene search, so you can look up the respective row in the database.
Does that help answer your question?
I gave a presentation recently for MySQL University comparing full-text search solutions:
http://forge.mysql.com/wiki/Practical_Full-Text_Search_in_MySQL
I also publish my slides at http://www.SlideShare.net/billkarwin.