Migrate from OrientDB to AWS Neptune - orientdb

I need to migrate a database from OrientDB to Neptune. I have an exported JSON file from Orient that contains the schema (classes) and the records - I now need to import this into Neptune. However, it seems that to import data into Neptune there must be a csv file containing all the vertex's and another file containing all the edges.
Are there any existing tools to help with this migration and converting to the required files/format?

If you are able to export the data as GraphML then you can use the GraphML2CSV tool. It will create a CSV file for the nodes and another for the edges with the appropriate header rows.
Keep in mind that GraphML is a lossy format (it cannot describe complex Java types the way GraphSON can) but you would not be able to import those into Neptune either.

Related

Foundry convert uploaded csv to dataframe

how can I convert an uploaded CSV to dataframe in foundry using a code workbook? should I use the #transform decorator with spark.read.... (not sure of the exact syntax)?
Thx!!
CSV is a "special format" where Foundry can infer the schema of the CSV and automatically convert it to a dataset. In this example, I uploaded a CSV with hail data from NOAA:
If you hit Edit Schema on the main page, you can use a front end tool to set certain configurations about the CSV (delimiter, column name row, column type etc).
Once it's a dataset, you can import the dataset in Code Workbooks and it should work out of the box.
If you wanted to parse the raw file, I would recommend you use Code Authoring, not Code Workbooks, as it's better suited for production level code, has less overhead and is more efficient for this type of workflow (parsing raw files). If you really wanted to use code workbooks, change the type of your input using the input helper bar, or in the inputs tab.
Once you finish iterating please move this to a Code Authoring repository and repartition your data. File reads at code workbook can substantially slow down your whole code workbook pipeline. Code Authoring offers preview of raw files now, so it's just as fast to develop as using Code Workbooks.
Note: Only imported datasets and persisted datasets can be read in as Python transform inputs. Transforms that are not saved as a dataset cannot be read in as Python transform inputs. Datasets with no schema should be read in as a transform input automatically.

How to import datasets as csv file to power bi using rest api?

I want to automate the import process in power bi, But I can't find how to publish a csv file as a dataset.
I'm using a C# solution for this.
Is there a way to do that?
You can't directly import CSV files into a published dataset in Power BI Service. AddRowsAPIEnabled property of datasets published from Power BI Desktop is false, i.e. this API is disabled. Currently the only way to enable this API is to create a push dataset programatically using the API (or create a streaming dataset from the site). In this case you will be able to push rows to it (read the CSV file and push batches of rows, either using C# or some other language, even PowerShell). In this case you will be able to create reports with using this dataset. However there are lots of limitations and you should take care of cleaning up the dataset (to avoid reaching the limit of 5 million rows, but you can't delete "some" of the rows, only to truncate the whole dataset) or to make it basicFIFO and lower the limit to 200k rows.
However a better solution will be to automate the import of these CSV files to some database and make the report read the data from there. For example import these files into Azure SQL Database or Data Bricks, and use this as a data source for your report. You can then schedule the refresh of this dataset (in case you use imported) or use Direct Query.
After a Power BI updates, it is now possible to import the dataset without importing the whole report.
So what I do is that I import the new dataset and I update parameters that I set up for the csv file source (stored in Data lake).

Is there a quicker way to export a NatTable than the ExportCommand?

Exporting the NatTable using the Export Command provided works fine, however when the table is populated with large amounts of data, the exporting will take incredibly long, sometimes not even happening if the amount of data is too large. Is there another way to export the data to just a simple csv file or something like that that won't get bogged down?
What exporter do you use? The default in core creates the xml format. The one in the POI extension uses Apache POI. And there was a csv exporter contributed to core which is not released yet, but available in the SNAPSHOT builds.
If they are all not sufficient you can even implement your own exporter and register it via the ConfigRegistry.

How to do ultra fast batchimport in orientdb from csv files?

We are evaluating graph databases to store our networked communication data and zeroed upon neo4j and orientdb.
Is there a batch importer tool or script for orient similar to what neo4j has? I was able to import a csv files with150M relationships and 18M nodes in under 25 mins for neo4j. Reading the documentation on orientdb site, looks like I need to use the ETL feature by modifying an json file to be able to do the import. Is there no other simpler and faster way to do the import from csv files?
Using OrientDB ETL is pretty easy. Look at: http://orientdb.com/docs/last/Import-from-CSV-to-a-Graph.html. Just create your json with the ETL steps and it's done.

Connect Neo4J on an existing Postgresql database

I'm a Neo4j new user and I played around with the webadmin interface of Neo4j to create small databases and simple queries in Cypher. Now I want to use Neo4J to create a graph with my existing database. It's a postgresql database with millions of entries with the same structure (Neo4J is very adapted to represent these data). My question is how to import these data ? What is the easiest way to do that ? I already saw that Cypher recognizes csv files but do I have to create a csv file with my data or is there another way to import them ? Thank you for your help. Sam
One option is to export your postgres data to csv and apply LOAD CSV to import them into the graph.
Another way is writing a script in a language of choice (I'd vote for groovy here) that connects to Postgres using JDBC and connects to Neo4j and then applies the business logic to transform between the two.
A third option is using a ETL tool like Talend. It basically does the same as your custom script but provides a point & click interface to define the transformation, see http://neo4j.com/blog/fun-with-music-neo4j-and-talend/ for more details.