How can i store spark streaming data in the mongodb.
in java this is done like :
data.foreachRDD(
new Function<JavaRDD<String>, Void>() {
Mongo mongo = new Mongo("localhost", 27017);
DB db = mongo.getDB("mongodb");
DBCollection collection = db.getCollection("fb");
public Void call(JavaRDD<String> data) throws Exception {
if(data!=null){
List<String>result=data.collect();
for (String temp :result) {
System.out.println(temp);
DBObject dbObject = (DBObject)JSON.parse(temp.toString());
collection.insert(dbObject);
}
System.out.println("Inserted Data Done");
} else {
System.out.println("Got no data in this window");
}
return null;
}
}
);
where i want to store data in mongodb but in scala. the above code is in java.
//remove if not needed
import scala.collection.JavaConversions._
data.foreachRDD(new Function[JavaRDD[String], Void]() {
var mongo: Mongo = new Mongo("localhost", 27017)
var db: DB = mongo.getDB("mongodb")
var collection: DBCollection = db.getCollection("fb")
def call(data: JavaRDD[String]): Void = {
if (data != null) {
val result = data.collect()
for (temp <- result) {
println(temp)
val dbObject = JSON.parse(temp.toString).asInstanceOf[DBObject]
collection.insert(dbObject)
}
println("Inserted Data Done")
} else {
println("Got no data in this window")
}
null
}
})
Related
I might be asking for trouble with this setup but I am trying to get MongoDB with MongoJack to work with Kotlin classes.
I've managed to overcome a few problems already for which I will outline my solutions if others have the same problem but I am still struggling with one issue.
Kotlin class:
data class TestDocument (
#ObjectId #JsonProperty("_id") var id:String? = null,
#JsonProperty("name") var name:String,
var date: Date = Date(),
var decimal: BigDecimal = BigDecimal.valueOf(123.456)
)
If I use the default setup (using the ObjectMapper mongoJack configured) like this:
object MongoClientTestDefault {
val log: Log = Logging.getLog(MongoClientTest::class.java)
val mongoClient : MongoClient
val database: MongoDatabase
val testCol: MongoCollection<TestDocument>
init {
val mongoUrl = "mongodb://localhost:27017/"
log.info("Initialising MongoDB Client with $mongoUrl")
try {
mongoClient = MongoClients.create(mongoUrl)
database = mongoClient.getDatabase("unitTest")
testCol = JacksonMongoCollection.builder()
.build(
mongoClient,
"unitTest",
"testCol",
TestDocument::class.java,
UuidRepresentation.STANDARD
)
log.info("Successfully initialised MongoDB client")
}catch(e: Exception){
log.error(Strings.EXCEPTION,e)
log.error("Mongo URL: ", mongoUrl)
throw e //we need to throw this as otherwise it complains that the val fields haven't been initialised
}
}
}
Then the serialization (or persisting to Mongo) works ok but the deserialization from Mongo to the Kotlin class fails because without the Kotlin jackson module Jackson and MongoJack don't know how to handle the Kotlin constructor it seems.
To fix that you have to use the jacksonObjectMapper() and pass that to the builder:
testCol = JacksonMongoCollection.builder()
.withObjectMapper(jacksonObjectMapper)
.build(
mongoClient,
"unitTest",
"testCol",
TestDocument::class.java,
UuidRepresentation.STANDARD
)
That will solve the deserialization but now the Dates are causing problems.
So in order to fix that I do the following
object MongoClientTest {
val log: Log = Logging.getLog(MongoClientTest::class.java)
val mongoClient : MongoClient
val database: MongoDatabase
val testCol: MongoCollection<TestDocument>
var jacksonObjectMapper = jacksonObjectMapper()
init {
val mongoUrl = "mongodb://localhost:27017/"
MongoJackModule.configure(jacksonObjectMapper)
log.info("Initialising MongoDB Client with $mongoUrl")
try {
mongoClient = MongoClients.create(mongoUrl)
database = mongoClient.getDatabase("unitTest")
testCol = JacksonMongoCollection.builder()
.withObjectMapper(jacksonObjectMapper)
.build(
mongoClient,
"unitTest",
"testCol",
TestDocument::class.java,
UuidRepresentation.STANDARD
)
log.info("Successfully initialised MongoDB client")
}catch(e: Exception){
log.error(Strings.EXCEPTION,e)
log.error("Mongo URL: ", mongoUrl)
throw e //we need to throw this as otherwise it complains that the val fields haven't been initialised
}
}
}
I take the jacksonObjectMapper and call MongoJackModule.configure(jacksonObjectMapper) and then pass it to the builder.
Now serialization (persiting to Mongo) and deserialization (loading from Mongo and converting to Pojo) works but there is one Exception thrown (which seems to be handled).
var testDoc = TestDocument(name="TestName")
MongoClientTest.testCol.drop()
var insertOneResult = MongoClientTest.testCol.insertOne(testDoc)
This call works and it creates the document in Mongo but this exception is printed:
java.lang.UnsupportedOperationException: Cannot call setValue() on constructor parameter of com.acme.MongoJackTest$TestDocument
at com.fasterxml.jackson.databind.introspect.AnnotatedParameter.setValue(AnnotatedParameter.java:118)
at org.mongojack.internal.stream.JacksonCodec.lambda$getIdWriter$4(JacksonCodec.java:120)
at org.mongojack.internal.stream.JacksonCodec.generateIdIfAbsentFromDocument(JacksonCodec.java:77)
at com.mongodb.internal.operation.Operations.bulkWrite(Operations.java:450)
at com.mongodb.internal.operation.Operations.insertOne(Operations.java:375)
at com.mongodb.internal.operation.SyncOperations.insertOne(SyncOperations.java:177)
at com.mongodb.client.internal.MongoCollectionImpl.executeInsertOne(MongoCollectionImpl.java:476)
at com.mongodb.client.internal.MongoCollectionImpl.insertOne(MongoCollectionImpl.java:459)
at com.mongodb.client.internal.MongoCollectionImpl.insertOne(MongoCollectionImpl.java:453)
at org.mongojack.MongoCollectionDecorator.insertOne(MongoCollectionDecorator.java:528)
It is trying to create and set the _id on the TestDocument but struggles with the primary constructor of the Kotlin class.
I've tried to add a 'default aka empty' constructor:
data class TestDocument (
#ObjectId #JsonProperty("_id") var id:String? = null,
#JsonProperty("name") var name:String,
var date: Date = Date(),
var decimal: BigDecimal = BigDecimal.valueOf(123.456)
){
#JsonCreator constructor(): this(name = "")
}
But that doesn't fix it.
Any ideas how I could fix this?
This is the code in MongoJack that is catching the exception:
private Consumer<BsonObjectId> getIdWriter(final T t) {
final Optional<BeanPropertyDefinition> maybeBpd = getIdElementSerializationDescription(t.getClass());
return maybeBpd.<Consumer<BsonObjectId>>map(beanPropertyDefinition -> (bsonObjectId) -> {
try {
if (bsonObjectId != null) {
beanPropertyDefinition.getMutator().setValue(
t,
extractIdValue(bsonObjectId, beanPropertyDefinition.getRawPrimaryType())
);
}
} catch (Exception e) {
e.printStackTrace();
}
}).orElseGet(() -> (bsonObjectId) -> {
});
}
I guess it could just be ignored but that could flood the logs.
I have a async web API that has services and a connection to the db using an EF context.
In one of the services, where I inject the context using DI I want to make a simple transactional service, like:
public async Task<int> ServiceMethod()
{
using (var transaction = context.Database.BeginTransaction())
{
await context.Table1.AddAsync(new Table1());
await context.SaveChangeAsync();
CallOtherservice(context);
await context.SaveChangeAsync();
transaction.Commit();
}
}
CallOtherService(Context context)
{
await context.Table2.AddAsync();
}
I have now mocked CallOtherService() to throw an exception and I expect nothing to be commited, but it looks like changes to Table1 persist, amid the transaction trhat should stop them. I have tried calling Rollback() in a try catch statement and using a TransactionScope() instead, but both were useless. Also, I have noticed that after calling .BeginTransaction(), context.Database.CurrentTransaction is null, which I consider a bit weird.
EDIT:
the method that I test:
public async Task<int> AddMatchAsync(Models.Database.Match match)
{
//If match is null, nothing can be done
if (match == null)
return 0;
else
{
using (var scope = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))
{
try
{
//If the match already exists, return its id
var dbId = await ExistsMatchAsync(match.HomeTeam.Name, match.AwayTeam.Name, match.Time);
if (dbId > 0)
{
log.Info($"Found match with id {dbId}");
return dbId;
}
//add computable data
match = await AttachMatchName(match);
match.Season = CommonServices.toSeason(match);
// Prepare the relational entities accordingly
match = PrepareTeamsForAddMatch(match).Result;
match = PrepareLeagueForAddMatch(match).Result;
//add the match and save it to the DB
await context.Matches.AddAsync(match);
await context.SaveChangesAsync();
log.Info($"Successfully added a new match! id: {match.Id}");
// Create new TeamLeagues entities if needed
leaguesService.CreateTeamLeaguesForNewMatchAsync(context, match);
await context.SaveChangesAsync();
scope.Complete();
return match.Id;
}
catch (Exception ex)
{
log.Error($"Error adding match - Rolling back: {ex}");
throw;
}
}
}
}
and the test:
[Test]
public void AddMatchSaveChangesTransaction()
{
//Arrange
var exceptiontext = "This exception should prevent the match from being saved";
var leagueServiceMock = new Mock<ILeaguesService>();
leagueServiceMock.Setup(p => p.CreateTeamLeaguesForNewMatchAsync(It.IsAny<InFormCoreContext>(),It.IsAny<Match>()))
.Throws(new Exception(exceptiontext));
leagueServiceMock.Setup(p => p.GetLeagueAsync(It.IsAny<string>()))
.ReturnsAsync((string name) => new League{Id = 1, Name = name });
var match = new Match
{
HomeTeam = new Team { Name = "Liverpool" },
AwayTeam = new Team { Name = "Everton" },
League = new League { Name = "PL", IsSummer = true },
Time = DateTime.UtcNow,
HomeGoals = 3,
AwayGoals = 0
};
var mcount = context.Matches.Count();
//Act
var matchService = new MatchesService(context, leagueServiceMock.Object, new LogNLog(), TeamsService);
//Assert
var ex = Assert.ThrowsAsync<Exception>(async () => await matchService.AddMatchAsync(match));
Assert.AreEqual(ex.Message, exceptiontext);
Assert.AreEqual(mcount, context.Matches.Count(), "The match has been added - the transaction does not work");
}`
```
I have the following method to update a document in MongoDB:
public async Task UpdateAsync(T entity)
{
await _collection.ReplaceOneAsync(filter => filter.Id == entity.Id, entity);
}
Which works fine - I was just wondering if anybody has an example of how the UpdateManyAsync function works:
public async Task UpdateManyAsync(IEnumerable<T> entities)
{
await _collection.UpdateManyAsync(); // What are the parameters here
}
Any advice is appreciated!
UpdateManyAsync works the same way as update with multi: true in Mongo shell. So you can specify filtering condition and update operation and it will affect multiple documents. For instance to increment all a fields where a is greater than 10 you can use this method:
var builder = Builders<SampleClass>.Update;
await myCollection.UpdateManyAsync(x => x.a > 10, builder.Inc(x => x.a, 1));
I guess you'd like to replace multiple documents. That can be achieved using bulkWrite method. If you need generic method in C# then you can introduce some kind of marker interface to build filter part of replace operation:
public interface IMongoIdentity
{
ObjectId Id { get; set; }
}
Then you can add generic constaint to your class and use BuikWrite in .NET like below:
class YourRepository<T> where T : IMongoIdentity
{
IMongoCollection<T> collection;
public async Task UpdateManyAsync(IEnumerable<T> entities)
{
var updates = new List<WriteModel<T>>();
var filterBuilder = Builders<T>.Filter;
foreach (var doc in entities)
{
var filter = filterBuilder.Where(x => x.Id == doc.Id);
updates.Add(new ReplaceOneModel<T>(filter, doc));
}
await collection.BulkWriteAsync(updates);
}
}
As #mickl answer, you can not use x=> x.Id because it is a Generic
Use as below:
public async Task<string> UpdateManyAsync(IEnumerable<T> entities)
{
var updates = new List<WriteModel<T>>();
var filterBuilder = Builders<T>.Filter;
foreach (var doc in entities)
{
foreach (PropertyInfo prop in typeof(T).GetProperties())
{
if (prop.Name == "Id")
{
var filter = filterBuilder.Eq(prop.Name, prop.GetValue(doc));
updates.Add(new ReplaceOneModel<T>(filter, doc));
break;
}
}
}
BulkWriteResult result = await _collection.BulkWriteAsync(updates);
return result.ModifiedCount.ToString();
}
Or you go by Bson attribute:
public async Task UpdateManyAsync(IEnumerable<TEntity> objs, CancellationToken cancellationToken = default)
{
var updates = new List<WriteModel<TEntity>>();
var filterBuilder = Builders<TEntity>.Filter;
foreach (var obj in objs)
{
foreach (var prop in typeof(TEntity).GetProperties())
{
object[] attrs = prop.GetCustomAttributes(true);
foreach (object attr in attrs)
{
var bsonId = attr as BsonIdAttribute;
if (bsonId != null)
{
var filter = filterBuilder.Eq(prop.Name, prop.GetValue(obj));
updates.Add(new ReplaceOneModel<TEntity>(filter, obj));
break;
}
}
}
}
await _dbCollection.BulkWriteAsync(updates, null, cancellationToken);
}
I want to save the document in a collection my method is as below but it is not saving at all.
internal static void InitializeDb()
{
var db = GetConnection();
var collection = db.GetCollection<BsonDocument>("locations");
var locations = new List<BsonDocument>();
var json = JObject.Parse(File.ReadAllText(#"..\..\test_files\TestData.json"));
foreach (var d in json["locations"])
{
using (var jsonReader = new JsonReader(d.ToString()))
{
var context = BsonDeserializationContext.CreateRoot(jsonReader);
var document = collection.DocumentSerializer.Deserialize(context);
locations.Add(document);
}
}
collection.InsertManyAsync(locations);
}
If I made async and await then it runs lately, I need to run this first and then only test the data.
For future reference, wait() at end of async method work like synchronously
internal static void InitializeDb()
{
var db = GetConnection();
var collection = db.GetCollection<BsonDocument>("locations");
var locations = new List<BsonDocument>();
var json = JObject.Parse(File.ReadAllText(#"..\..\test_files\TestData.json"));
foreach (var d in json["locations"])
{
using (var jsonReader = new JsonReader(d.ToString()))
{
var context = BsonDeserializationContext.CreateRoot(jsonReader);
var document = collection.DocumentSerializer.Deserialize(context);
locations.Add(document);
}
}
collection.InsertManyAsync(locations).wait();
}
I am wondering how it is possible to add an OR condition to the Envers criteria api:
public IEnumerable<Guid> GetHistory(object id, params string[] props)
{
var auditQuery = AuditReaderFactory.Get(Session).CreateQuery()
.ForRevisionsOfEntity(typeof(T), false, true);
foreach (var prop in props)
{
auditQuery.Add(AuditEntity.RelatedId(prop).Eq(id)); // <-- adds AND, while OR is required!
}
return auditQuery
.GetResultList<object[]>()
.Select(i => ((T)i[0]).ID)
.Distinct();
}
Use AuditEntity.Disjunction().
In your example, something like...
[..]
var disjunction = AuditEntity.Disjunction();
foreach (var prop in props)
{
disjunction.Add(AuditEntity.RelatedId(prop).Eq(id));
}
auditQuery.Add(disjunction);
[..]
I did like this in Java as #Roger mentioned above. (Just in case if anybody needs)
public List<Employee> getAuditHistory(Session session, int id, String property) {
AuditReader auditReader = AuditReaderFactory.get(session);
List<Employee> employeeHistory = new ArrayList<>();
if (auditReader != null) {
AuditQuery auditQuery = auditReader.createQuery().forRevisionsOfEntity(Employee.class, true, false)
.add(AuditEntity.property(ResultsConstants.Employee_ID).eq(id));
AuditDisjunction auditDisjunction = null;
if (property.equalsIgnoreCase("FULL_NAME")) {
auditDisjunction = AuditEntity.disjunction().add(AuditEntity.property("FIRST_NAME".toUpperCase()).hasChanged())
.add(AuditEntity.property("LAST_NAME".toUpperCase()).hasChanged());
} else {
auditQuery = auditQuery.add(AuditEntity.property(property.toUpperCase()).hasChanged());
}
auditQuery = auditQuery.addOrder(AuditEntity.property("MODIFIED_DATE").desc());
if(null != auditDisjunction){
auditQuery = auditQuery.add(auditDisjunction);
}
if (auditQuery != null) {
if (auditQuery.getResultList().isEmpty()) {
// Log here or throw it back to caller
}
employeeHistory.addAll(auditQuery.getResultList());
}
}
return employeeHistory;
}