Compile error when using Spark to save data to MongoDB

Compile error when using Spark to save data to MongoDB - mongodb

I'm trying to read data from Hbase and write to MongoDB, my code as following in scala:
mongoConfig.set("mongo.output.uri", "mongodb://node1:57017/sampledb.sample")
mongoConfig.set("mongo.output.format","com.mongodb.hadoop.MongoOutputFormat")
val documents:RDD[Map[Object, BasicBSONObject]] = newRDD1.map(f => convert2BSON(f))
documents.saveAsNewAPIHadoopFile("file:///test",classOf[Object],classOf[BSONObject],
classOf[BSONFileOutputFormat[Object, BSONObject]],sparkConf)
But it got the following compiling error:
error: value saveAsNewAPIHadoopFile is not a member of
org.apache.spark.rdd.RDD[Map[Object,org.bson.BasicBSONObject]]
I'm using Spark1.5.2, mongo-hadoop-core 1.5.2 and mongo-java-driver 3.2.0
My mongoDB version is 3.2.5

Related

Spark MongoDB Connector unable to df.join - Unspecialised MongoConfig

Using the latest MongoDB connector for Spark (v10) and trying to join two dataframes yields the following unhelpful error.
Py4JJavaError: An error occurred while calling o64.showString.
: java.lang.UnsupportedOperationException: Unspecialised MongoConfig. Use `mongoConfig.toReadConfig()` or `mongoConfig.toWriteConfig()` to specialize
at com.mongodb.spark.sql.connector.config.MongoConfig.getDatabaseName(MongoConfig.java:201)
at com.mongodb.spark.sql.connector.config.MongoConfig.getNamespace(MongoConfig.java:196)
at com.mongodb.spark.sql.connector.MongoTable.name(MongoTable.java:99)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation.name(DataSourceV2Relation.scala:66)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.$anonfun$applyOrElse$2(V2ScanRelationPushDown.scala:65)
Pyspark Code is simply pulling in two tables and running a join:
dfa = spark.read.format("mongodb").option("uri", mongodb://127.0.0.1/people.contacts").load()
dfb = spark.read.format("mongodb").option("uri", mongodb://127.0.0.1/people.accounts").load()
dfa.join(dfb, 'PKey').count()
SQL gives the same error:
dfa.createOrReplaceTempView("usr")
dfb.createOrReplaceTempView("ast")
spark.sql("SELECT count(*) FROM ast JOIN usr on usr._id = ast._id").show()
Document structures are flat.

Have you try using the latest version (10.0.2) of mongo-spark-connector? Can find it at here
I had a similar problem, solved it by replace 10.0.1 by 10.0.2

Import MongoDB data into Hive Error: Splitter implementation is incompatible

I'm trying to import mongodb data into hive.
The jar versions that i have used are
ADD JAR /root/HDL/mongo-java-driver-3.4.2.jar;
ADD JAR /root/HDL/mongo-hadoop-hive-2.0.2.jar;
ADD JAR /root/HDL/mongo-hadoop-core-2.0.2.jar;
And my cluster versions are
Ambari - Version 2.6.0.0, HDFS 2.7.3, Hive 1.2.1000, HBase 1.1.2, Tez 0.7.0
MongoDB Server version:- 3.6.5
Hive Script:-
CREATE TABLE sampletable
( ID STRING,
EmpID STRING,
BeginDate DATE,
EndDate DATE,
Time TIMESTAMP,
Type STRING,
Location STRING,
Terminal STRING)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"ID":"_id","EmpID":"emp_id","BeginDate":"begin_date","EndDate":"end_date","Time":"time","Type":"time_event_type","Location":"location","Terminal":"terminal"}')
TBLPROPERTIES('mongo.uri'='mongodb://username:password#10.10.170.43:27017/testdb.testtable');
Output:-
hive> select * from sampletable;
OK
Failed with exception java.io.IOException:java.io.IOException: Failed to aggregate sample documents. Note that this Splitter implementation is incompatible with MongoDB versions prior to 3.2.
Please suggest me how can i solve this.
Thanks,
Mohan V

set mongo.input.split_size=50;

Cant import table to H2O, using Postgresql JDBC in Ubuntu

I am having trouble to import a sql table to H2O.ai using Postgresql JDBC Driver in Ubuntu. I'm getting the follow error:
ERROR MESSAGE:
SQLException: ERROR: relation "XXX" does not exist
Position: 22
Failed to connect and read from SQL database with connection_url: jdbc:postgresql://localhost:5432/...**
I am executing H2O with the follow command:
java -cp h2o.jar:/usr/share/java/postgresql-9.4.1212.jar water.H2OApp
The JDBC driver is installed and already try to construct the Connection URL in several ways.
I'm using this one right now:
jdbc:postgresql://localhost:5432/XXX?&useSSL=false

transform_geom: couldn't parse proj4 output string: projection not named

I recently upgraded my Amazon PostgreSQL RDS to version 10.3 but while fetching the projections I am getting error:
ERROR: transform_geom: couldn't parse proj4 output string: '3857': projection not named
CONTEXT: SQL function "st_transform" statement 1
Same records i am able to fetch prior to version 9.5.xx
My PostGIS version is 2.4.2 which is compatible to RDS intance.

I perhaps faced the same problem after upgrading from postgis 2.2 to 2.3, some of my queries did no work anymore.
Old-query:
SELECT ST_X(ST_TRANSFORM(ST_SETSRID(ST_MAKEPOINT($1,$2),$3),$4));
query-params $1...$4:
602628,6663367,3857,3857
error message:
"transform_geom: couldn't parse proj4 output string: '3857': projection not named"
Reason:
ST_TRANSFORM comes in multiple flavours, two of them:
public.st_transform(geometry, integer)
public.st_transform(geometry, text)
Latter one, I assume new in postgis 2.3, caused my problem, because $4 (3857) was regarded as (proj4-) string and not as (SRID-) integer.
Workaround in my case: type-hint for param $4
SELECT ST_X(ST_TRANSFORM(ST_SETSRID(ST_MAKEPOINT($1,$2),$3),$4::int));

HiveContext setting in scala+spark project to access existing HDFS

I am trying to access my existing hadoop setup in my spark+scala project
Spark Version 1.4.1
Hadoop 2.6
Hive 1.2.1
from Hive Console I able to create table and access it without any issue, I can also see the same table from Hadoop URL as well.
the problem is when I try to create a table from project, system shows error
ERROR Driver: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
MetaException(message:file:/user/hive/warehouse/src is not a directory
or unable to create one)
following is the code I write:
import
import org.apache.spark._
import org.apache.spark.sql.hive._
Code
val sparkContext = new SparkContext("local[2]", "HiveTable")
val hiveContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
hiveContext.setConf("hive.metastore.warehouse.dir", "hdfs://localhost:54310/user/hive/warehouse")
hiveContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
Edit:
instead of create table if I had to execute insert statement like:
hiveContext.sql("INSERT INTO TABLE default.src SELECT 'username','password' FROM foo;")
any help to resolve his issue would be highly appreciable.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Compile error when using Spark to save data to MongoDB - mongodb

Related

Spark MongoDB Connector unable to df.join - Unspecialised MongoConfig

Import MongoDB data into Hive Error: Splitter implementation is incompatible

Cant import table to H2O, using Postgresql JDBC in Ubuntu

transform_geom: couldn't parse proj4 output string: projection not named

HiveContext setting in scala+spark project to access existing HDFS

Categories

Resources