Access to the creation of Glossary, Terms, Category, and Classification by different types of users. Apache Atlas - tags

Please tell me if сan all types of users (e.g., admin, data scientist...) create the Glossary, Term, Category, and Classification in Apache Atlas?
I would also like to know whether it's possible to make any restrictions for some users for these purposes?
How can the Bookmarks and Popularity score be created in Atlas?

There are two ways you can control access in Apache Atlas.
Apache Ranger
Simple Authorization
Both are explained very well in following documents.
Using Apache Ranger
Using Simple Authorization

Related

Is there any tool to generate sample/mock graph data for Apache AGE?

I have used "Mockaroo" to generate mock table data (in SQL format) for my postgreSQL projects. But now I am experimenting with Apache AGE (a graph database). I wanted to know if there is any website like mockaroo which let you generate mock graph data (in OpenCypher query format).
I didn't know whether this actually answers your question or not but you can try using Neo4j or Neo4j sandbox. Neo4j Sandbox is a web-based platform that allows you to create and experiment with Neo4j graph databases. It comes with several pre-built datasets that you can use to explore different graph use cases. Additionally, you can create your custom datasets using the built-in data generator, which supports generating data in OpenCypher query format.
Here is the GitHub repository. you can also download a desktop application using this link.
There are some tools for generating mock table data for Cypher query language, but they are not like Mockaroo in the sense of just accessing the website and freely creating different kinds of data easily. With these options, you might need to use python or make a new account on a website.
Neo4j Sandbox: Neo4j is a popular graph database management system that supports the Cypher query language. Neo4j provides a free sandbox environment where you can experiment with graph databases and use mock data to test your Cypher queries. https://neo4j.com/sandbox/
Graphgen: Graphgen is an open-source tool that can be used to generate random graph data. It allows you to define a schema and generate data that conforms to that schema using the Cypher query language. https://github.com/idea-iitd/graphgen

Building a Social Network on a database: Graph vs. Relational.... Or Both?

I am currently building the REST API backend for a social networking app that I am creating. The backend will be written in node js. I am trying to decide whether I should use a Graphing DB (Neo4J) or MongoDB. In Neo4J i will be able to query relationships a lot faster and will be able to provide recommendations and such much easier. However, MongoDB's document structure means I will have a lot more flexibility in storing data such as permissions, user's posts, etc. Would it be wise to build a MongoDB database with data, and then store references to the documents in a Neo4J database, allowing me to pull recommendations, but still providing the document flexibility???
Since Neo4j is a schemaless DB (like MongoDB), it seems Neo4j satisfies all of your requirements.

Hortonworks: How to Manage Users

I am new to Hadoop management and Hortonworks Hadoop. My question is what is the common practice of managing users in Hortonworks. Ambari allows me to create users, but how do companies map users in Ambari to their users. I see that in Hortonworks, I can enable Kerberos; is this the way to allow company users, for example in LDAP, to use the same username/password to login to Hortonworks? I'm not looking for details here, but just some guidance as to what the common practice is.
An identity source is needed. AD is quite common to be used for that purpose. You'd use something like sssd to integrate AD with your cluster nodes. Once that is done, you can integrate your cluster with AD's kerberos. Finally, you'd use AD's LDAP as a source of authentication for Ambari.
Of course, neither of those things is required. You could as well maintain various identity sources and sync periodically between them (e.g. OS users in /etc/shadow, kerberos users in MIT KDCs database, Ambari users in relational database, etc). Just take into account extra time/effort that will be needed to manage cluster users.
#facha gives a nice explanation.
Since I work with LDAP and Hortonworks I can only comment on this combination. To start figuring some things out you can for example use LDAP (called demo LDAP) that comes with the standard installation of Hortonworks. You can use the pre-supplied LDAP mappings in Ambari to add more users.
Afterwards you can import these users in Ranger for example to set new policies for the different Hadoop services. This is done with "ranger user sync", which is different from the access to Ambari with ldap users (ambari-server sync-ldap). I was not aware of this difference in the beginning, so it is good to notice.
If you have done all this you can also add Kerberos security, but this is something a lot more difficult to understand (keytabs and principals etc.).
Here is some good information and a nice tutorial on working with LDAP.
If you want to easily manage LDAP users and groups, I would recommend ApacheDirectoryStudio.

Modularize user management server, social feed server

I plan to design a system with Dreamfactory as the user management server while a separate REST server for social feed. Dreamfactory will have its own MySQL database for storing user info while the social feed will use MongoDB.
Is this a good system design? I'm new to this as I'm using both open source platform for two different purposes; social feed and user management.
It's difficult to answer your question without knowing requirements to the system. I was going to ask you why storing users in MySQL, but all the same I can ask why using MongoDB or product XXX ;)
There is no silver bullet in programming. Tool is chosen from requirements, not vice versa.
If you do not need to relate data, do not need transactions and does not care about data consistency at all, why go why relational databases? Solutions like AeroSpike or just Redis (yes, it can be persistent too) can give you much higher read/write rate.
Well, I suggest you go write a document, containing your system description, think of load this system is going to have. May be you will decide, that storing data in CSV files is ok for you (joking ;) )

Why does MongoLab not recommend using their REST API?

From the MongoLab's documentation, they recommend:
MongoLab databases can be accessed by your application code in two ways.
The first method - the one we strongly recommend - is to connect using one of the MongoDB drivers (as described above). You do not need
to use our API if you use the driver.
The second method, which you should use only if you cannot connect via one of the MongoDB drivers, is via MongoLab’s RESTful data API.
Why do they recommend using the driver rather than their REST API? One reason I can think of is portability across different MongoDB providers. Are there any other reasons? Wouldn't it be more beneficial for MongoLab to "vendor lock-in" customers with their API?
The points that #WiredPrairie and #Stennie brought up around security are correct. *When you use our REST API, you expose your API key to the client. Currently, anyone with the API key can modify your database. As a result, we only recommend using the REST API with public data, e.g. all the locations for taco trucks in the country.
By writing your own app tier, you can keep credentials to your database from being exposed to the client.
If you have any more questions, email us at support#mongolab.com. Happy to help!
-Chris#MongoLab
p.s. thanks #WiredPrairie and #Stennie