Is it possible to load Phoenix tables from HDFS? - apache-phoenix

I am new to Phoenix. Is there any way to load tables to Phoenix from hadoop filesystem ?

Yes...
Phoenix is a wrapper on hBase...
So, you can create a phoenix table pointing to HDFS Data and can use it.
Please let me know if you are facing any specific issue related to that...

You can't do it straight away. You need to import data into HBase first. There are pre built importers (CSV format):
https://phoenix.apache.org/bulk_dataload.html

Related

Migrate data from NoSQL to an RDBMS

We have data existing in HBase and we want to move to AWS Aurora (MySQL) and we need to use the existing data so have to somehow load the NoSQL data into Aurora.
It's not a very big data base. Just a few tables.
Are there any best practices/tools to migrate data from NoSQL to a relational DB? I saw a lot of questions on the internet that ask to the reverse (DB -> NoSQL) but my requirement is a bit different and I don't find any helpful information.
Can someone please help? Where do I even start?
One simple way to do this without writing too much custom code would be to use Spark-HBase Connector from Hortonworks (SHC) to read data from an HBase table into a Spark dataframe and to write that dataframe into a MySQL table. The key challenge would be to get SHC to work, because in my experience it's extremely version sensitive. So the trick is to correctly coordinate your version of Spark, HBase, and SHC (and finding that right combination is trickier than you may think).
However, if you manage to get all the dependencies right, then doing the above is a matter of a few lines of code in Jupyter Notebook or Pyspark. You could run this on Yarn to parallelize the workload, in case it's large. Should work. Give it a try.

How to load CSV data in snappydata table with rowstore mode using JDBC connection

Hi i am starting to learning the snappydata rowstore link there i tried all the example its working , but i need to store the csv , json data in snappydata table, in the example they are using manually connecting the snappy-shell and creating the tables and inserting the records,and another option for JDBC client link , i am tried this way but i dont know how to load csv and store in snappy table,then i tried another method direct query based access in snappydata store also link ,if anyone knows how to store csv data in snappytable using jdbc please share me, Thank you..
You should go through the SnappyData docs - http://snappydatainc.github.io/snappydata/. The "How-to" section has loading from CSV examples ... http://snappydatainc.github.io/snappydata/howto/#how-to-load-data-in-snappydata-tables
val someCSV_DF = snSession.read.csv("Myfile.csv")
someCSV_DF.write.insertInto("MyTable")

import data from Postgres to Cassandra

I need to import data from Postgres to Cassandra using open source technologies only.
Can anyone please outline the steps I need to take.
As per instructions, I have to refrain from using DataStax software as they come with license.
Steps I have already tried:
Export one table from Postgres in csv format and imported to HDFS (using sqoop) {If I take this approach do I need to use Map_Reduce after this?}.
Tried to import the csv file to Cassandra using cql, however, got this error
Cassandra: Unable to import null value from csv
I am trying several methods, but unable to find a solid approach of attach.
Can anyone of you please provide me the steps required for the whole process. I believe there would be many people who have already done that.

Connect Neo4J on an existing Postgresql database

I'm a Neo4j new user and I played around with the webadmin interface of Neo4j to create small databases and simple queries in Cypher. Now I want to use Neo4J to create a graph with my existing database. It's a postgresql database with millions of entries with the same structure (Neo4J is very adapted to represent these data). My question is how to import these data ? What is the easiest way to do that ? I already saw that Cypher recognizes csv files but do I have to create a csv file with my data or is there another way to import them ? Thank you for your help. Sam
One option is to export your postgres data to csv and apply LOAD CSV to import them into the graph.
Another way is writing a script in a language of choice (I'd vote for groovy here) that connects to Postgres using JDBC and connects to Neo4j and then applies the business logic to transform between the two.
A third option is using a ETL tool like Talend. It basically does the same as your custom script but provides a point & click interface to define the transformation, see http://neo4j.com/blog/fun-with-music-neo4j-and-talend/ for more details.

Incrementally updating/adding data on HDFS

In my application there are 4 tables and each table is having more than 1 million data.
currently my java based reporting engine joins all the tables and get the data to show in reports.
Now i want to introduce Hadoop using sqoop. I have installed hadoop 2.2 and sqoop 1.9.
I have done a small POC to import the data in hdfs. problem is that, every time it creates new data file.
My need is :
there would be a scheduler which will run once in day, and It will:
Pick the data from all four tables and load in hdfs using sqoop.
PIG will do some transformation and joining in data and will prepare the concrete de normalized data.
Sqoop will again export this data in a separate eporting table.
I have few questions around this:
Do i need to import whole data from DB to HDFS on every sqoop import call ?
in the master table some data is updated and some data in new so how can i handle that if i merge the data while loading in HDFS.
at the time of export do i need to export whole data again to reporting table. If Yes, how would i do that.
Please help me out in this case...
Please suggest me the better solution if you have..
Sqoop support incremental and delta imports. Check the Sqoop documentation here for more details.