In my Apache Beam pipeline I have a PCollection of Row objects (org.apache.beam.sdk.values.Row). I want write to Avro files. Here is a simplified version of my code:
Pipeline p = Pipeline.create();
Schema inputSchema = Schema.of(
Schema.Field.of("MyField1", Schema.FieldType.INT32)
);
Row row = Row.withSchema(inputSchema).addValues(1).build();
PCollection<Row> myRow = p.apply(Create.of(row)).setCoder(RowCoder.of(inputSchema));
myRow.apply(
"WriteToAvro",
AvroIO.write(Row.class)
.to("src/tmp/my_files")
.withWindowedWrites()
.withNumShards(10));
p.run();
The file gets created, but it looks like this (in JSON form):
"schema" : {
"fieldIndices" : {
"MyField1" : 0
},
"encodingPositions" : {
"MyField1" : 0
},
"fields" : [
{
}
],
"hashCode" : 545134948,
"uuid" : {
},
"options" : {
"options" : {
}
}
}
So only the schema is there with bunch of useless metadata. What's the right way of writing to Avro from Row objects so that I have the data and not just the schema. And can I get rid of the metadata?
AvroUtils has some utilities to convert Rows and Beam schemas to Avro-types. You can do something like this:
Pipeline p = Pipeline.create();
Schema inputSchema = Schema.of(
Schema.Field.of("MyField1", Schema.FieldType.INT32)
);
avro.Schema avroSchema = AvroUtils.toAvroSchema(inputSchema)
class ConvertToGenericRecords extends DoFn<Row, GenericRecord> {
#ProcessElement
public void process(ProcessContext<Row> c) {
c.output(AvroUtils.toGenericRecord(c.element(), avroSchema));
}
}
Row row = Row.withSchema(inputSchema).addValues(1).build();
PCollection<Row> myRow = p.apply(Create.of(row)).setCoder(RowCoder.of(inputSchema));
myRow.apply(ParDo.of(new ConvertToGenericRecords()))
.withCoder(AvroCoder.of(avroSchema)
.apply(
"WriteToAvro",
AvroIO.writeGenericRecords(avroSchema)
.to("src/tmp/my_files")
.withWindowedWrites()
.withNumShards(10));
p.run();
Related
I am first about the Purescript.
I am going to add the new field to Object and send it as a function param.
But I can not find a good solution for this.
For example.
oldFiled = {
title : "title",
description : "d"
}
newField = {
time : "time"
}
//result after added new field
oldFiled = {
title : "title",
description : "d",
time : "time"
}
How can I do it?
If it's just about adding a single field you can use https://pursuit.purescript.org/packages/purescript-record/2.0.1/docs/Record#v:insert like so:
import Data.Record as Record
import Data.Symbol (SProxy(..))
oldFiled = {
title : "title",
description : "d"
}
newFiled = Record.insert (SProxy :: _ "time") "time" oldFiled
If you're merging records look at the merge union and disjointUnion functions in the Data.Record module
I have two collections for example CollectionA and CollectionB both have common filed which is hostname
Collection A :
{
"hostname": "vm01",
"id": "1",
"status": "online",
}
Collection B
{
"hostname": "vm01",
"id": "string",
"installedversion": "string",
}
{
"hostname": "vm02",
"id": "string",
"installedversion": "string",
}
what i want to achieve is when i receive a post message for collection B
I want to check if the record exists in Collection B based on hostname and update all the values. if not insert the new record ( i have read it can be achieved by using upsert -- still looking how to make it work)
I want to check if the hostname is present in Collection A , if not move the record from collection B to another collection which is collection C ( as archive records).ie in the above hostname=vm02 record from collection B should be moved to collectionC
how can i achieve this using springboot mongodb anyhelp is appreciated.The code which i have to save the Collection B is as follows which i want to update to achieve the above desired result
public RscInstalltionStatusDTO save(RscInstalltionStatusDTO rscInstalltionStatusDTO) {
log.debug("Request to save RscInstalltionStatus : {}", rscInstalltionStatusDTO);
RscInstalltionStatus rscInstalltionStatus = rscInstalltionStatusMapper.toEntity(rscInstalltionStatusDTO);
rscInstalltionStatus = rscInstalltionStatusRepository.save(rscInstalltionStatus);
return rscInstalltionStatusMapper.toDto(rscInstalltionStatus);
}
Update 1 : The below works as i expected but I think there should be a better way to do this.
public RscInstalltionStatusDTO save(RscInstalltionStatusDTO rscInstalltionStatusDTO) {
log.debug("Request to save RscInstalltionStatus : {}", rscInstalltionStatusDTO);
RscInstalltionStatus rscInstalltionStatus = rscInstalltionStatusMapper.toEntity(rscInstalltionStatusDTO);
System.out.print(rscInstalltionStatus.getHostname());
Query query = new Query(Criteria.where("hostname").is(rscInstalltionStatus.getHostname()));
Update update = new Update();
update.set("configdownload",rscInstalltionStatus.getConfigdownload());
update.set("rscpkgdownload",rscInstalltionStatus.getRscpkgdownload());
update.set("configextraction",rscInstalltionStatus.getConfigextraction());
update.set("rscpkgextraction",rscInstalltionStatus.getRscpkgextraction());
update.set("rscstartup",rscInstalltionStatus.getRscstartup());
update.set("installedversion",rscInstalltionStatus.getInstalledversion());
mongoTemplate.upsert(query, update,RscInstalltionStatus.class);
rscInstalltionStatus = rscInstalltionStatusRepository.findByHostname(rscInstalltionStatus.getHostname());
return rscInstalltionStatusMapper.toDto(rscInstalltionStatus);
}
Update2 : with the below code i am able to get move the records to another collection
String query = "{$lookup:{ from: \"vmdetails\",let: {rschostname: \"$hostname\"},pipeline:[{$match:{$expr:{$ne :[\"$hostname\",\"$$rschostname\"]}}}],as: \"rscInstall\"}},{$unwind:\"$rscInstall\"},{$project:{\"_id\":0,\"rscInstall\":0}}";
AggregationOperation rscInstalltionStatusTypedAggregation = new CustomProjectAggregationOperation(query);
LookupOperation lookupOperation = LookupOperation.newLookup().from("vmdetails").localField("hostname").foreignField("hostname").as("rscInstall");
UnwindOperation unwindOperation = Aggregation.unwind("$rscInstall");
ProjectionOperation projectionOperation = Aggregation.project("_id","rscInstall").andExclude("_id","rscInstall");
OutOperation outOperation = Aggregation.out("RscInstallArchive");
Aggregation aggregation = Aggregation.newAggregation(rscInstalltionStatusTypedAggregation,unwindOperation,projectionOperation,outOperation);
List<BasicDBObject> results = mongoTemplate.aggregate(aggregation,"rsc_installtion_status",BasicDBObject.class).getMappedResults();
this issue which i have here is it returns multiple records
Found the solution , there may be other best solutions but for me this one worked
create a class customeAggregationGeneration (found in SO answers and extended to match my needs)
public class CustomProjectAggregationOperation implements AggregationOperation {
private String jsonOperation;
public CustomProjectAggregationOperation(String jsonOperation) {
this.jsonOperation = jsonOperation;
}
#Override
public Document toDocument(AggregationOperationContext aggregationOperationContext) {
return aggregationOperationContext.getMappedObject(Document.parse(jsonOperation));
}
}
String lookupquery = "{$lookup :{from:\"vmdetails\",localField:\"hostname\",foreignField:\"hostname\"as:\"rscinstall\"}}";
String matchquery = "{ $match: { \"rscinstall\": { $eq: [] } }}";
String projectquery = "{$project:{\"rscinstall\":0}}";
AggregationOperation lookupOpertaion = new CustomProjectAggregationOperation(lookupquery);
AggregationOperation matchOperation = new CustomProjectAggregationOperation(matchquery);
AggregationOperation projectOperation = new CustomProjectAggregationOperation(projectquery);
Aggregation aggregation = Aggregation.newAggregation(lookupOpertaion, matchOperation, projectOperation);
ArrayList<Document> results1 = (ArrayList<Document>) mongoTemplate.aggregate(aggregation, "rsc_installtion_status", Document.class).getRawResults().get("result");
// System.out.println(results1);
for (Document doc : results1) {
// System.out.print(doc.get("_id").toString());
mongoTemplate.insert(doc, "RscInstallArchive");
delete(doc.get("_id").toString());
I am having trouble connecting relationships in sequelize.
SELECT * from actors
JOIN "actorStatuses"
on "actorStatuses".actor_id = actors.id
JOIN movies
on movies.id = "actorStatuses".actor_id
WHERE movies.date = '7/8/2017';
Here you go :
model.Actor.findAll({ // change model name as per yours
include : [
{
model : model.ActorStatuses // change model name as per yours
required : true ,
},{
model : model.Movies // change model name as per yours
required : true ,
where : { date : 'your_date' }
}
]
});
This will create exact same query / result as you required.
I have a question concerning using orientDB for JSON data. In my scenario,
I receive entities which are serialized into JSON. This JSON data should
be stored in orientDB. The relevant code part to create documents
in orientDB looks as follows:
//JSON string created by using Jackson
String jsonStr = "...";
//here some dummy name "test"
ODocument doc = new ODocument("test");
doc.fromJSON(jsonStr);
doc.save();
In the following, I give an example for classes I'm working with
(I left out constructors, getters and setters and other fields which
are not relevant for the example):
class AbstractEntity {
private String oidString;
}
class A extends AbstractEntity {
private List<B> bs;
}
class B extends AbstractEntity {
private List<C> cs;
}
class C extends AbstractEntity {
private List<D> ds;
}
class D extends AbstractEntity {
private int type;
}
As the classes use type List, Jackson needs to store
additional type information in the JSON representation, to be able
to deserialize the data properly.
{
"oidString" : "AAA_oid1",
"bs" : [ "java.util.ArrayList", [ {
"oidString" : "b_oid1",
"cs" : null
}, {
"oidString" : "b_oid2",
"cs" : [ "java.util.ArrayList", [ {
"oidString" : "c_oid1",
"ds" : [ "java.util.ArrayList", [ ] ]
}, {
"oidString" : "c_oid2",
"ds" : [ "java.util.ArrayList", [ {
"oidString" : "d_oid1",
"type" : 2
} ] ]
} ] ]
} ] ]
}
However, I have problems querying such a document if I try to e.g. find all D instances that contain a certain type. I tried to simplify my query by first listing all D instances that can be found for a specific A:
OQuery<?> query = new OSQLSynchQuery<ODocument>(
"select bs.cs.ds from test where oidString = 'AAA_oid1'"
);
This returns: {"#type":"d","#rid":"#-2:0","#version":0,"bs":[null,null]}
The additional type information ("java.util.ArrayList") seems to cause problems for orientDB. If I rewrite my example and only use ArrayList directly, hence, the additional type information is omitted, the query above shows something as a result.
Is there a general solution to this problem? I have to work with JSON data and that JSON data will contain additional type information (it has to).
Can't deal orientDB with this situation?
I have a MongoDB collection containing history data with id and timestamp.
I want to delete data from the collection older than a specific
timestamp. But for every id at least one
document (the newest) must stay in the collection.
Suppose I have the following documents in my collection ...
{"id" : "11", "timestamp" : ISODate("2011-09-09T10:27:34.785Z")} //1
{"id" : "11", "timestamp" : ISODate("2011-09-08T10:27:34.785Z")} //2
{"id" : "22", "timestamp" : ISODate("2011-09-05T10:27:34.785Z")} //3
{"id" : "22", "timestamp" : ISODate("2011-09-01T10:27:34.785Z")} //4
... and I want to delete documents having a timestamp older than
2011-09-07 then
1 and 2 should not be deleted because they are newer.
4 should be deleted because it is older, but 3 should not be deleted
(although it is older) because
at least one document per id should stay in the collection.
Does anyone know how I can do this with casbah and/or on the mongo
console?
Regards,
Christian
I can think of a couple of ways. First, try this:
var cutoff = new ISODate("2011-09-07T00:00:00.000Z");
db.testdata.find().forEach(function(data) {
if (data.timestamp.valueOf() < cutoff.valueOf()) {
// A candidate for deletion
if (db.testdata.find({"id": data.id, "timestamp": { $gt: data.timestamp }}).count() > 0) {
db.testdata.remove({"_id" : data._id});
}
}
});
This does the job you want. Or you can use a MapReduce job to do it as well. Load this into a text file:
var map = function() {
emit(this.id, {
ref: this._id,
timestamp: this.timestamp
});
};
var reduce = function(key, values) {
var cutoff = new ISODate("2011-09-07T00:00:00.000Z");
var newest = null;
var ref = null;
var i;
for (i = 0; i < values.length; ++i) {
if (values[i].timestamp.valueOf() < cutoff.valueOf()) {
// falls into the delete range
if (ref == null) {
ref = values[i].ref;
newest = values[i].timestamp;
} else if (values[i].timestamp.valueOf() > newest.valueOf()) {
// This one is newer than the one we are currently saving.
// delete ref
db.testdata.remove({_id : ref});
ref = values[i].ref;
newest = values[i].timestamp;
} else {
// This one is older
// delete values[i].ref
db.testdata.remove({_id : values[i].ref});
}
} else if (ref == null) {
ref = values[i].ref;
newest = values[i].timestamp;
}
}
return { ref: ref, timestamp: newest };
};
Load the above file into the shell: load("file.js");
Then run it: db.testdata.mapReduce(map, reduce, {out: "results"});
Then remove the mapReduce output: db.results.drop();