How to create repository instance in JackRabbit Oak using MongoMK - mongodb

I am trying to Create a Oak's JCR repository to store content with " Apache Oak over MongoDB".
(which i have absolutely no idea)
Here's what iv'e been doing.
MongoClient connection = new MongoClient("127.0.0.1", 27017);
DB db = connection.getDB("test");
MongoMK.Builder m = new MongoMK.Builder();
MongoMK kernel = m.setMongoDB(db).open();
Repository repo = new Jcr().createRepository();
session = repo.login(); // Error javax.jcr.NoSuchWorkspaceException
Was trying to link "Repository " to "MongoMK" - which seems like a nightmare.
I have tried doing
Repository repo = new Jcr(kernel).createRepository(); //Error
I found something similiar # [How to create repository instance in JackRabbit Oak using MicroKernel , that didn't help either.
My question being, is there anyway to link-up MongMK - Repository ??
P.S - Tried using "NodeStore".

Yes, this was not well documented. The following should work:
import javax.jcr.Node;
import javax.jcr.Repository;
import javax.jcr.Session;
import org.apache.jackrabbit.oak.Oak;
import org.apache.jackrabbit.oak.plugins.document.DocumentMK;
import org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore;
import org.apache.jackrabbit.oak.spi.security.OpenSecurityProvider;
import com.mongodb.DB;
import com.mongodb.MongoClient;
public class Test {
public static void main(String... args) throws Exception {
DB db = new MongoClient("127.0.0.1", 27017).getDB("test2");
DocumentNodeStore ns = new DocumentMK.Builder().
setMongoDB(db).getNodeStore();
Repository repo = new Jcr(new Oak(ns))
.with(new OpenSecurityProvider())
.createRepository();
Session session = repo.login();
Node root = session.getRootNode();
if (root.hasNode("hello")) {
Node hello = root.getNode("hello");
long count = hello.getProperty("count").getLong();
hello.setProperty("count", count + 1);
System.out.println("found the hello node, count = " + count);
} else {
System.out.println("creating the hello node");
root.addNode("hello").setProperty("count", 1);
}
session.save();
session.logout();
ns.dispose();
}
}
This is now also documented.

Related

Vert.x ConfigRetriever.listen does not get triggered when the config file changes

I have a simple vertical that I am using to test the ConfigRetriever.listen for changes.
Using Vert.x version 4.3.4
import io.vertx.config.ConfigRetriever;
import io.vertx.config.ConfigRetrieverOptions;
import io.vertx.config.ConfigStoreOptions;
import io.vertx.core.AbstractVerticle;
import io.vertx.core.json.JsonObject;
public class MyVerticle extends AbstractVerticle {
#Override
public void start() {
ConfigStoreOptions fileStore = new ConfigStoreOptions().setType("file")
.setFormat("yaml")
.setConfig(new JsonObject().put("path", "config.yaml"));
ConfigRetrieverOptions options = new ConfigRetrieverOptions().setScanPeriod(1000)
.addStore(fileStore);
ConfigRetriever retriever = ConfigRetriever.create(vertx, options);
retriever.listen(change -> {
JsonObject previous = change.getPreviousConfiguration();
System.out.println(previous);
JsonObject changedConf = change.getNewConfiguration();
System.out.println(changedConf);
});
}
}
[Edit] The config file is under src/main/resource
When I run this, I get an output of the before as empty and after as config in my yaml file.
{}
{"bridgeservice":{"eb_address":"xyz","path":"/api/v1/aaa/","port":80}}
The problem is when I change the value in the yaml config file nothing happens. I expect the changes to get printed. When I am running this in the debugger I see
Thread [vert.x-internal-blocking-0] (Running)
..
..
..
Thread [vert.x-internal-blocking-19] (Running)
When I put the following just before the retriever.listen() , I get the succeeded... line printed and nothing from the listen method even after changing the config file values.
retriever.getConfig(ar -> {
if (ar.succeeded()) {
System.out.println("succeeded :" + ar.result());
}
else {
ar.cause()
.printStackTrace();
}
});
May be related to SO having-trouble-listen-vert-x-config-change
[Edit] The config file is under src/main/resource
When I moved my config file from resources to a folder cfg at the same level as src the Verticle behaved as it should and picked up config changes. I don't know why, maybe it's an eclipse environment thing.

Unable to override lagom kafka parameters

I created a normal java project and put all dependencies of lagom kafka client on classpath , then in source folder i put the application.conf
Content of application.conf
lagom.broker.kafka {
service-name = ""
brokers = "127.0.0.1:9092"
}
while running the application service-name = "" should be used (so that my broker path could be used, rather than discovering), but it was not working
while debugging i found that in KafkaConfig class service-name comes out to be "kafka_native".
I found that while creating KafkaConfig , conf object which is coming dosen't have my application.conf in its origin
After this i tried overriding them using vm parameters like this:
-Dlagom.broker.kafka.service-name=""
-Dlagom.broker.kafka.brokers="127.0.0.1:9092"
-Dakka.kafka.consumer.kafka-clients.auto.offset.reset="earliest"
and it worked.
Can somebody explain why overriding in application conf not working
This is how i am subscribing to topic
import java.net.URI;
import java.util.concurrent.CompletableFuture;
import com.ameyo.ticketing.ticket.api.TicketingService;
import com.ameyo.ticketing.ticket.api.events.TicketEvent;
import com.lightbend.lagom.javadsl.api.broker.Topic;
import com.lightbend.lagom.javadsl.client.integration.LagomClientFactory;
import com.typesafe.config.ConfigFactory;
import akka.Done;
import akka.stream.javadsl.Flow;
/**
*
*/
public class Main {
public static void main(String[] args) {
String brokers = ConfigFactory.load().getString("lagom.broker.kafka.brokers");
System.out.println("Initial Value for Brokers " + brokers);
LagomClientFactory clientFactory = LagomClientFactory.create("legacy-system", Main.class.getClassLoader());
TicketingService ticketTingService = clientFactory.createClient(TicketingService.class,
URI.create("http://localhost:11000"));
Topic<TicketEvent> ticketEvents = ticketTingService.ticketEvents();
ticketEvents.subscribe().withGroupId("nya13").atLeastOnce(Flow.<TicketEvent> create().mapAsync(1, e -> {
System.out.println("kuch to aaya");
return CompletableFuture.completedFuture(Done.getInstance());
}));
try {
Thread.sleep(1000000000);
} catch (InterruptedException e1) {
}
}
}
Change configuration to
akka{
lagom.broker.kafka {
service-name = ""
brokers = "127.0.0.1:9092"
}
}
and it worked

How to run Hadoop jobs in Amazon EMR using Eclipse?

I have followed the tutorial given by Amazon here but it seems that my code failed to run.
The error that I got:
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
The method withJobFlowRole(String) is undefined for the type AddJobFlowStepsRequest
at main.main(main.java:38)
My full code:
import java.io.IOException;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.PropertiesCredentials;
import com.amazonaws.services.elasticmapreduce.*;
import com.amazonaws.services.elasticmapreduce.model.AddJobFlowStepsRequest;
import com.amazonaws.services.elasticmapreduce.model.AddJobFlowStepsResult;
import com.amazonaws.services.elasticmapreduce.model.HadoopJarStepConfig;
import com.amazonaws.services.elasticmapreduce.model.StepConfig;
import com.amazonaws.services.elasticmapreduce.util.StepFactory;
public class main {
public static void main(String[] args) {
AWSCredentials credentials = null;
try {
credentials = new PropertiesCredentials(
main.class.getResourceAsStream("AwsCredentials.properties"));
} catch (IOException e1) {
System.out.println("Credentials were not properly entered into AwsCredentials.properties.");
System.out.println(e1.getMessage());
System.exit(-1);
}
AmazonElasticMapReduce client = new AmazonElasticMapReduceClient(credentials);
// predefined steps. See StepFactory for list of predefined steps
StepConfig hive = new StepConfig("Hive", new StepFactory().newInstallHiveStep());
// A custom step
HadoopJarStepConfig hadoopConfig1 = new HadoopJarStepConfig()
.withJar("s3://mywordcountbuckett/binary/WordCount.jar")
.withMainClass("com.my.Main1") // optional main class, this can be omitted if jar above has a manifest
.withArgs("--verbose"); // optional list of arguments
StepConfig customStep = new StepConfig("Step1", hadoopConfig1);
AddJobFlowStepsResult result = client.addJobFlowSteps(new AddJobFlowStepsRequest()
.withJobFlowRole("jobflow_role")
.withServiceRole("service_role")
.withSteps(hive, customStep));
System.out.println(result.getStepIds());
}
}
What could be the reason that the code is not running ?
Are there any tutorials based on the latest version ?

What is the most efficient way of moving data out of Hive and into MongoDB?

Is there an elegant, easy and fast way to move data out of Hive into MongoDB?
You can do the export with the Hadoop-MongoDB connector. Just run the Hive query in your job's main method. This output will then be used by the Mapper in order to insert the data into MongoDB.
Example:
Here I'm inserting a semicolon separated text file (id;firstname;lastname) to a MongoDB
collection using a simple Hive query :
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.mongodb.hadoop.MongoOutputFormat;
import com.mongodb.hadoop.io.BSONWritable;
import com.mongodb.hadoop.util.MongoConfigUtil;
public class HiveToMongo extends Configured implements Tool {
private static class HiveToMongoMapper extends
Mapper<LongWritable, Text, IntWritable, BSONWritable> {
//See: https://issues.apache.org/jira/browse/HIVE-634
private static final String HIVE_EXPORT_DELIMETER = '\001' + "";
private IntWritable k = new IntWritable();
private BSONWritable v = null;
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String [] split = value.toString().split(HIVE_EXPORT_DELIMETER);
k.set(Integer.parseInt(split[0]));
v = new BSONWritable();
v.put("firstname", split[1]);
v.put("lastname", split[2]);
context.write(k, v);
}
}
public static void main(String[] args) throws Exception {
try {
Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
}
catch (ClassNotFoundException e) {
System.out.println("Unable to load Hive Driver");
System.exit(1);
}
try {
Connection con = DriverManager.getConnection(
"jdbc:hive://localhost:10000/default");
Statement stmt = con.createStatement();
String sql = "INSERT OVERWRITE DIRECTORY " +
"'hdfs://localhost:8020/user/hive/tmp' select * from users";
stmt.executeQuery(sql);
}
catch (SQLException e) {
System.exit(1);
}
int res = ToolRunner.run(new Configuration(), new HiveToMongo(), args);
System.exit(res);
}
#Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
Path inputPath = new Path("/user/hive/tmp");
String mongoDbPath = "mongodb://127.0.0.1:6900/mongo_users.mycoll";
MongoConfigUtil.setOutputURI(conf, mongoDbPath);
/*
Add dependencies to distributed cache via
DistributedCache.addFileToClassPath(...) :
- mongo-hadoop-core-x.x.x.jar
- mongo-java-driver-x.x.x.jar
- hive-jdbc-x.x.x.jar
HadoopUtils is an own utility class
*/
HadoopUtils.addDependenciesToDistributedCache("/libs/mongodb", conf);
HadoopUtils.addDependenciesToDistributedCache("/libs/hive", conf);
Job job = new Job(conf, "HiveToMongo");
FileInputFormat.setInputPaths(job, inputPath);
job.setJarByClass(HiveToMongo.class);
job.setMapperClass(HiveToMongoMapper.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(MongoOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setNumReduceTasks(0);
job.submit();
System.out.println("Job submitted.");
return 0;
}
}
One drawback is that a 'staging area' (/user/hive/tmp) is needed to store the intermediate Hive output. Furthermore as far as I know the Mongo-Hadoop connector doesn't support upserts.
I'm not quite sure but you can also try to fetch the data from Hive without running
hiveserver which exposes a Thrift service so that you can probably save some overhead.
Look at the source code of Hive's org.apache.hadoop.hive.cli.CliDriver#processLine(String line, boolean allowInterupting) method which actually executes the query. Then you can hack together something like this:
...
LogUtils.initHiveLog4j();
CliSessionState ss = new CliSessionState(new HiveConf(SessionState.class));
ss.in = System.in;
ss.out = new PrintStream(System.out, true, "UTF-8");
ss.err = new PrintStream(System.err, true, "UTF-8");
SessionState.start(ss);
Driver qp = new Driver();
processLocalCmd("SELECT * from users", qp, ss); //taken from CliDriver
...
Side notes:
There's also a hive-mongo connector implementation you might also check.
It's also worth having a look at the implementation of the Hive-HBase connector to get some idea if you want to implement a similar one for MongoDB.
Have you looked into Sqoop? It's supposed to make it very simple to move data between Hadoop and SQL/NoSQL databases. This article also gives an example of using it with Hive.
Take a look at the hadoop-MongoDB connector project :
http://api.mongodb.org/hadoop/MongoDB%2BHadoop+Connector.html
"This connectivity takes the form of allowing both reading MongoDB data into Hadoop (for use in MapReduce jobs as well as other components of the Hadoop ecosystem), as well as writing the results of Hadoop jobs out to MongoDB."
not sure if it will work for your use case but it's worth looking at.

Check if mongodb database exists?

Is there a possibility to check if a mongo database allready exists?
Yes, you can get the list of existing databases. From the Java driver you could do something like this to get the database names on a mongod server running on localhost
Mongo mongo = new Mongo( "127.0.0.1", 27017 );
List<String> databaseNames = mongo.getDatabaseNames();
This is equivalent to the mongo shell "show dbs" command. I am sure similar methods exist in all of the drivers.
From the shell, if you want to explicitely check that a DB exists:
db.getMongo().getDBNames().indexOf("mydb");
Will return '-1' if "mydb" does not exist.
To use this from the shell:
if [ $(mongo localhost:27017 --eval 'db.getMongo().getDBNames().indexOf("mydb")' --quiet) -lt 0 ]; then
echo "mydb does not exist"
else
echo "mydb exists"
fi
For anyone who comes here because the method getDatabaseNames(); is depreciated / not available, here is the new way to get the list of existing databases:
MongoClient mongoClient = new MongoClient();
MongoCursor<String> dbsCursor = mongoClient.listDatabaseNames().iterator();
while(dbsCursor.hasNext()) {
System.out.println(dbsCursor.next());
}
Here is a method that validates if the database is found:
public Boolean databaseFound(String databaseName){
MongoClient mongoClient = new MongoClient(); //Maybe replace it with an already existing client
MongoCursor<String> dbsCursor = mongoClient.listDatabaseNames().iterator();
while(dbsCursor.hasNext()) {
if(dbsCursor.next().equals(databaseName))
return true;
}
return false;
}
in python using Pymongo
from pymongo import MongoClient
db_name = "foo"
conn = MongoClient('mongodb://localhost,localhost:27017')
db = self.conn[str(db_name)]
if bool(db_name in conn.database_names()):
collection.drop()
using MongoDb c# Driver 2.4
private bool DatabaseExists(string database)
{
// _client is IMongoClient
var dbList = _client.ListDatabases().ToList().Select(db => db.GetValue("name").AsString);
return dbList.Contains(database);
}
usage:
if (!DatabaseExists("FooDb")
{
// create and seed db
}
I'd like to add a C# version. I'm using the MongoDB.Driver 2.2.2.
static bool DatabaseExists(string connectionString)
{
var mongoUri = new MongoUrl(connectionString);
var client = new MongoClient(mongoUri);
var dbList = Enumerate(client.ListDatabases()).Select(db => db.GetValue("name").AsString);
return dbList.Contains(mongoUri.DatabaseName);
}
static IEnumerable<BsonDocument> Enumerate(IAsyncCursor<BsonDocument> docs)
{
while (docs.MoveNext())
{
foreach (var item in docs.Current)
{
yield return item;
}
}
}
Try this, it worked for me (on Mac OSx)
MongoClient mongoClient = new MongoClient("localhost");
/** **/
boolean dbExist =
mongoClient.listDatabaseNames().
into(new ArrayList<String>()).contains("TEST");
System.out.print(dbExist);
The PyMongo example above didn't work for me, so I rewrote it using the more standard list_databases() method to the MongoClient library:
from pymongo import MongoClient
db_name = "foo"
conn = MongoClient('mongodb://localhost,localhost:27017')
if bool(db_name in conn.list_databases()):
print true # or return true here
else:
print false # or return false here
In my case, I could not use listDatabaseNames, because my user did not have the rights to call this function. Instead, I just assume that it exists and call a method on this database, which will fail if it does not exist or if rights are missing.
Demo C# code:
/// <summary>
/// Tests the connection to the MongoDB server, and if the database already exists.
/// If not or an error is detected, an exception is thrown.
/// </summary>
public static void TestConnection(string mongoUrl, string mongoDatabase) {
var client = new MongoClient(mongoUrl);
var database = client.GetDatabase(mongoDatabase);
try {
// Try to perform an action on this database; will fail if it does not exist
database.ListCollections();
}
catch {
throw new Exception("Connection established, " +
"but database does not exist (or missing rights): " + mongoDatabase);
}
}
I searched for how to list database names in golang and accidentially found this page. Looks like no one has provided ways to check if specific database name exists.
Assume that you are using MongoDB Go Driver, here is an approach
// client type is *mongo.Client
dbNames, err := client.ListDatabaseNames(context.Background(), bson.D{{Key: "name", Value: "YOUR-DB-NAME"}})
"dbNames" contains list of all databases in mongo server.
Second parameter is a filter, in this case, it only allows database with name == "YOUR-DB-NAME". Thus, dbNames will be empty if "YOUR-DB-NAME" is not exist.