I'm trying to read data from Hbase and write to MongoDB, my code as following in scala:
mongoConfig.set("mongo.output.uri", "mongodb://node1:57017/sampledb.sample")
mongoConfig.set("mongo.output.format","com.mongodb.hadoop.MongoOutputFormat")
val documents:RDD[Map[Object, BasicBSONObject]] = newRDD1.map(f => convert2BSON(f))
documents.saveAsNewAPIHadoopFile("file:///test",classOf[Object],classOf[BSONObject],
classOf[BSONFileOutputFormat[Object, BSONObject]],sparkConf)
But it got the following compiling error:
error: value saveAsNewAPIHadoopFile is not a member of
org.apache.spark.rdd.RDD[Map[Object,org.bson.BasicBSONObject]]
I'm using Spark1.5.2, mongo-hadoop-core 1.5.2 and mongo-java-driver 3.2.0
My mongoDB version is 3.2.5
Related
Using the latest MongoDB connector for Spark (v10) and trying to join two dataframes yields the following unhelpful error.
Py4JJavaError: An error occurred while calling o64.showString.
: java.lang.UnsupportedOperationException: Unspecialised MongoConfig. Use `mongoConfig.toReadConfig()` or `mongoConfig.toWriteConfig()` to specialize
at com.mongodb.spark.sql.connector.config.MongoConfig.getDatabaseName(MongoConfig.java:201)
at com.mongodb.spark.sql.connector.config.MongoConfig.getNamespace(MongoConfig.java:196)
at com.mongodb.spark.sql.connector.MongoTable.name(MongoTable.java:99)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation.name(DataSourceV2Relation.scala:66)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.$anonfun$applyOrElse$2(V2ScanRelationPushDown.scala:65)
Pyspark Code is simply pulling in two tables and running a join:
dfa = spark.read.format("mongodb").option("uri", mongodb://127.0.0.1/people.contacts").load()
dfb = spark.read.format("mongodb").option("uri", mongodb://127.0.0.1/people.accounts").load()
dfa.join(dfb, 'PKey').count()
SQL gives the same error:
dfa.createOrReplaceTempView("usr")
dfb.createOrReplaceTempView("ast")
spark.sql("SELECT count(*) FROM ast JOIN usr on usr._id = ast._id").show()
Document structures are flat.
Have you try using the latest version (10.0.2) of mongo-spark-connector? Can find it at here
I had a similar problem, solved it by replace 10.0.1 by 10.0.2
I'm trying to import mongodb data into hive.
The jar versions that i have used are
ADD JAR /root/HDL/mongo-java-driver-3.4.2.jar;
ADD JAR /root/HDL/mongo-hadoop-hive-2.0.2.jar;
ADD JAR /root/HDL/mongo-hadoop-core-2.0.2.jar;
And my cluster versions are
Ambari - Version 2.6.0.0, HDFS 2.7.3, Hive 1.2.1000, HBase 1.1.2, Tez 0.7.0
MongoDB Server version:- 3.6.5
Hive Script:-
CREATE TABLE sampletable
( ID STRING,
EmpID STRING,
BeginDate DATE,
EndDate DATE,
Time TIMESTAMP,
Type STRING,
Location STRING,
Terminal STRING)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"ID":"_id","EmpID":"emp_id","BeginDate":"begin_date","EndDate":"end_date","Time":"time","Type":"time_event_type","Location":"location","Terminal":"terminal"}')
TBLPROPERTIES('mongo.uri'='mongodb://username:password#10.10.170.43:27017/testdb.testtable');
Output:-
hive> select * from sampletable;
OK
Failed with exception java.io.IOException:java.io.IOException: Failed to aggregate sample documents. Note that this Splitter implementation is incompatible with MongoDB versions prior to 3.2.
Please suggest me how can i solve this.
Thanks,
Mohan V
set mongo.input.split_size=50;
I am having trouble to import a sql table to H2O.ai using Postgresql JDBC Driver in Ubuntu. I'm getting the follow error:
ERROR MESSAGE:
SQLException: ERROR: relation "XXX" does not exist
Position: 22
Failed to connect and read from SQL database with connection_url: jdbc:postgresql://localhost:5432/...**
I am executing H2O with the follow command:
java -cp h2o.jar:/usr/share/java/postgresql-9.4.1212.jar water.H2OApp
The JDBC driver is installed and already try to construct the Connection URL in several ways.
I'm using this one right now:
jdbc:postgresql://localhost:5432/XXX?&useSSL=false
I recently upgraded my Amazon PostgreSQL RDS to version 10.3 but while fetching the projections I am getting error:
ERROR: transform_geom: couldn't parse proj4 output string: '3857': projection not named
CONTEXT: SQL function "st_transform" statement 1
Same records i am able to fetch prior to version 9.5.xx
My PostGIS version is 2.4.2 which is compatible to RDS intance.
I perhaps faced the same problem after upgrading from postgis 2.2 to 2.3, some of my queries did no work anymore.
Old-query:
SELECT ST_X(ST_TRANSFORM(ST_SETSRID(ST_MAKEPOINT($1,$2),$3),$4));
query-params $1...$4:
602628,6663367,3857,3857
error message:
"transform_geom: couldn't parse proj4 output string: '3857': projection not named"
Reason:
ST_TRANSFORM comes in multiple flavours, two of them:
public.st_transform(geometry, integer)
public.st_transform(geometry, text)
Latter one, I assume new in postgis 2.3, caused my problem, because $4 (3857) was regarded as (proj4-) string and not as (SRID-) integer.
Workaround in my case: type-hint for param $4
SELECT ST_X(ST_TRANSFORM(ST_SETSRID(ST_MAKEPOINT($1,$2),$3),$4::int));
I am trying to access my existing hadoop setup in my spark+scala project
Spark Version 1.4.1
Hadoop 2.6
Hive 1.2.1
from Hive Console I able to create table and access it without any issue, I can also see the same table from Hadoop URL as well.
the problem is when I try to create a table from project, system shows error
ERROR Driver: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:file:/user/hive/warehouse/src is not a directory
or unable to create one)
following is the code I write:
import
import org.apache.spark._
import org.apache.spark.sql.hive._
Code
val sparkContext = new SparkContext("local[2]", "HiveTable")
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
hiveContext.setConf("hive.metastore.warehouse.dir", "hdfs://localhost:54310/user/hive/warehouse")
hiveContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
Edit:
instead of create table if I had to execute insert statement like:
hiveContext.sql("INSERT INTO TABLE default.src SELECT 'username','password' FROM foo;")
any help to resolve his issue would be highly appreciable.