Unable to find other entity-type in Apache Atlas . Only Showing hdfs_path - apache-atlas

Hi I am new to Apache Atlas . And I am facing a problem.
I want to create a hive_table entity type manually but in Entity type drop downs its showing only "hdfs_path"
Can anyone let me know how I can use a custom entity type in apache Atlas.
And can anyone provide me a good documentation part or tutorial apart form Apche Atlas site.
Here is the photo where I want to add a new entity type

TL;DR: you need to apply the following setting to atlas-application.properties, and restart:
atlas.ui.editable.entity.types=<your entity types>
Note that <your entity types> can be a comma-separated list, like hdfs_path,kafka_topic or, to just allow all Types to be created and edited via the UI, use a star *.
I guess the reason for this restrictive default is because metadata in Atlas is normally synchronised from other systems using hooks and bridges. So in order to keep the metadata "consistent" (i.e. prevent the risk of people creating metadata entries in Atlas which do not correspond to actual data assets existing in the referenced systems), by default editing Entity Types via the UI is locked down.
Reference: https://issues.apache.org/jira/browse/ATLAS-3237

hive_table entity should be synced using import scripts.
"hdfs_path" is not synced automatically unless they belong to lineage, hence the option to create them manually.
However, if you want to create them manually, please check the following link, which has the steps:-
https://community.cloudera.com/t5/Support-Questions/How-to-create-hive-table-entity-in-Apache-atlas-using-REST/td-p/173644

Related

EF6 Model First: Unable to generate database from model after adding new entities

I created a new project and added an empty data model. I added a few entities and properties and then generated the database according to this tutorial. So far, so good. I then went back and added additional entitities.
Now, I am no longer able to Generate Database From Model... because I receive an Error 11007: Entity Type 'xxxx' is not mapped for all of the new entities I added. According to msdn, I can follow the instructions here to resolve my mapping issue between conceptual and storage models. However, it appears these instructions assume the entities are already present in my storage model (which they are not). When I try to manually map them, the only two tables I have to choose from are the original two tables I created.
I appreciate any help you can offer.
This was a phantom error of sorts. I had another error with a default value I was trying to set for a datetime that appeared at the bottom of my error list. I resolved this error and everything is now working as it should.

Telosys : How can i get database table records in template?

I am using Telosys tools for code generation. It's very nice tool and help me a lot.
But there is one problem, that is, it provides database schema information and i can access in templates (templates are formerly velocity templates), which is good, but how can i get selected entity's data from database? There is no way i can find, by which i can get that selected table data.
Please provide solution if any, or provide alternate way to do it.
Thanking You!
Telosys Tools is designed to retrieve the model from the database,
not the data stored in the tables.
But it allows to create your own specific tooling classes usable
in the templates, so it's possible to create a specific Java Class to retrieve the data from the database.
There's an example of this kind of specific class in the "database-doc" bundle
https://github.com/telosys-tools/database-doc-bundle-TT210 ( in classes folder )
To simplify the loading the simplest way is to create the class in the "default package" (no java package)
NB:
The problem is that the jar containing the JDBC driver
is not accessible by the generator class-loader, so you will have to use a specific class-loader and to connect directly with the JDBC driver.
Here is an example : https://gist.github.com/l-gu/ed0c8726807e5e8dd83a
Don't use it as is (the connection is never closed) but it can be easily adapted.

Saving a doctrine2 entity to cache to speed up the page load

Let's say I have an entity called Product and this entity is loaded every time user hits the product information page. Usually I'd save the object in Zend_Cache (memcache) for an hour to avoid hitting the db for each request but as far as I understand that's not possible with Doctrine2 entities because of the Proxy objects.
So my question is, how can I avoid loading the same entity from the database for each request?
[EDIT]
I tried using Doctrine Cache like this
$categoryService = App_Service_Container::getService('\App\Service\Category');
$cache = $categoryService->getEm()->getConfiguration()->getResultCacheImpl();
$apple = $cache->fetch('apple');
But I get the following error
Warning: require(App/Entity/Proxy/_CG_/App/Entity/Category.php)
[function.require]: failed to open stream: No such file or directory
in /opt/vhosts/app/price/library/Doctrine/Common/ClassLoader.php on
line 163
This is same for Zend Cache as well as you can't serialize the entity because of the Proxy class
You've got several options:
Use Doctrine's built-in result caching
Try just sticking entity in memcache via Zend_Cache. When you pull it out, you may need to merge() the Product back into the EM so proxies can be dereferenced. If you fetch-join any associations you need to display the product info, and you're only doing reads, this shoudl work fine.
Don't cache the entity at all. Cache whatever output you generate instead.
EDIT: If you don't care about the hydration overhead, you're using mysql, and your Products and associated tables don't change very often, you might prefer to just rely on the mySQL query cache. It's a fairly blunt object, but useful enough to mention.
You might want to try implementing __sleep or __wakeup methods for your entity class, as Doctrine 2 has special requirements and limitations concerning serialization/deserialization of entities (which is what happens when storing them in Zend_Cache).
There is this guidance.
General information about limitations including serialization.
I find this extremely strange since i just messed around with this myself and didn't have any issues with the proxy object being stored in the database. So im guessing your configuration is not setup 100% ?
If you find the issue with your configuration then be very aware of what timdev said you MUST merge the object back into the EntityManager else you will have weird bugs down the line.
A fourth solution available for you is also to retrieve the data as an array instead of an object, but then of course you lose all the functionality connected to your module which might not be exactly want you wanted.
It seems to me more like a configuration error. Either Proxies have not been generated or there is something wrong with the proxy directory and namespace.
Depending on your configuration, proxies can be either generated automatically or manually. Does your proxies have been indeed generated under App/Entity/Proxy ? Is this indeed the right directory?
FYI proxies can be manually generated by executing doctrine orm:generate-proxies <dest-dir>
Seconding what timdev says: Doctrine has built-in caching, you want to use it.
I also wonder from your question if you are experiencing any performance issues or if you are a victim of overly eager optimisation.

Managing changes in class structure to be consistent with mongodb collection

We are using mongodb with c#. We are trying to figure out a way to keep our collection consistent seamlessly. Right now, if a developer make any changes to the class structure(add a field or change data type or changing the property within a nested class) he/she has to change the mongo collection manually.
Its a pain as our project is growing and the developers working on the project keeps increasing. Was wondering whether someone already have figured out a way to manage this issue.
Research
I found a similar question. however, couldn't find the solution.
Found a way to find all properties Finding the properties; however, datatype and nested documents becomes an issue.
If you want to migrate gradually as records are accessed you need to follow a few simple rules:
1) If you add a field it had better be nullable or have a default value specified.
2) Never rename fields, never change field types
- Instead always add new fields, add migration code, remove the old fields only when all documents have been migrated over.
For prototyping with MongoDB and C# I build a dynamic wrapper ... that lets you specify your objects using only interfaces (no classes needed), and it lets you dynamically add new interfaces to an existing object. Not ready for production use but for prototyping it saves a lot of effort and makes migration really easy.

TG2.1: Proper location to store a database session instance?

I am using a custom database (MongoDB) with TG 2.1 and i am wondering where the proper place to store the PyMongo connection/database instances would be?
Eg, at the moment they are getting created inside of my inherited instance of AppConfig. Is there a standard location to store this? Would shoving the variables into the project.model.__init__ be the best location, given that under SQLAlchemy, the database seems to commonly be retrieved via:
from project.model import DBSession, metadata
Anyway, just curious what the best practice is.
As of TurboGears 2.1.3, MongoDB support is integrated via the Ming ORM. I would look at a quickstarted project using the --ming option to get best practices if you want to do some customization: http://turbogears.org/2.1/docs/main/Ming.html