I have Hive tables populating data from a hadoop server. Now I want to connect to existing HIVE tables from my scala spark code which is running on local intellij.
I tried copying hive-site.xml into my local system and adding the file into my class path and trying to access the hive tables. But it always comes back with error
org.apache.spark.sql.AnalysisException: Table not found .
Is there any code snippet or configuration set up that I can douse to access an existing HIVE table from my local scala spark code?
Related
I'm getting a provision error when using PostgreSQL JDBC on GraphDB. Actually, I created a connection between postgresql and Graphdb by a virtual repository, and I made a ODBC file which includes the RDF mapping information.
Expected: Normaly I can browse PostgreSQL data's hierarchy and so on in GraphDB.
Error: The connection was fine and the repository was created successfully, but when I tried to browse the data, I got this error:
Actions that I did: I did go /opt/graphdb-free/app/lib/plugins/dependencies-plugin/dependencies-plugin.jar to modify the dependency parameter, but it didn't change anything. I did checked the syntax of ODBC file and I don't see anything wrong there.
Anyone have been though this? Was I in the right place to modify the dependency? Or it's something else?
I have just installed and created a new typeorm project using the Readme on the typeorm git page. Trying to play with access to my existing postgres db used by a Java application. Having issues when trying to retrieve data using ActiveRecord. Any attempt to fetch data from the table appears to try to drop indexes and create the table. It already exists. Why does it want to create it? Is that a side-effect of using ActiveRecord?
Ok. Found it. It's a function of the synchronize property in the ormconfig file.
I installed spark with sbt in project dependecies. Then I want to change variables of the spark env without doing it within my code with a .setMaster(). The problem is that i cannot find any config file on my computer.
This is because I have an error : org.apache.spark.SparkException: Invalid Spark URL: spark://HeartbeatReceiver#my-mbp.domain_not_set.invalid:50487even after trying to change my hostname. Thus, I would like to go deep into spark library and try some things.
I tried pretty much everything that is on this so post : Invalid Spark URL in local spark session.
Many thanks
What worked for the issue:
export SPARK_LOCAL_HOSTNAME=localhost in shell profil (e.g. ~/.bash_profil)
SBT was not able to find the host even using the command just before running sbt. I had to put it in the profil to have a right context.
I would like to use this new functionality: overwrite specific partition without delete all data in s3
I used the new flag (spark.sql.sources.partitionOverwriteMode="dynamic") and test it locally from my IDE and it worked (I was able to overwrite specific partition in s3) but when I deployed it to hdp 2.6.5 with spark 2.3.0 same code didn't create the s3 folders as expected , folder didn't create at all , only temp folder has been created
My code :
df.write
.mode(SaveMode.Overwtite)
.partitionBy("day","hour")
.option("compression", "gzip")
.parquet(s3Path)
Have you tried spark version 2.4? I have worked with this version and both EMR and Glue it has worked well, to use the "dynamic" in version 2.4 just use the code:
dataset.write.mode("overwrite")
.option("partitionOverwriteMode", "dynamic")
.partitionBy("dt")
.parquet("s3://bucket/output")
AWS documentation specifies Spark version 2.3.2 to use spark.sql.sources.partitionOverwriteMode="dynamic".
Reference click here.
I'm working with neo4j-community-3.4.7 and Eclipse Oxygen. I've installed the neo4j driver and added the neo4j community edition lib files to my Eclipse library. My problem is that I'm trying to import a csv file located in the import folder of the neo4j-community directory by passing in the Cypher query through a transaction. Passing other Cypher queries this way has been successful in creating nodes and relationships in the database. However, when I try to use "USING PERIODIC COMMIT" to load a large csv I get an error message saying "cannot use periodic commit on a non-updating query."
I've attached my code with the other lines of the transaction commented out, which should work once the csv file is successfully loaded.