How to iterate gremlinepipeline of titan db to get the vertex & it's properties in Java API - titan

Here is the sample code which I am using but I don't know how to use this Pipeline object to get the vertices & it's properties.
GremlinPipeline pipeline = new GremlinPipeline(vert)
.out("LIVES_IN_CITY").in("LIVES_IN_CITY")
.filter(new PipeFunction<Vertex,Boolean>() {
public Boolean compute(Vertex v){
return v.getProperty("name").equals(city);
}}).back(2).out("LIVES_IN_CITY");

A Gremlin Pipeline is just an Iterator - so treat it as such. At a low level, use a while loop to iterate checking hasNext() to see if there are more items in the pipeline to extract and use next to pop off the very next item in the Iterator.
Pipeline also has toList() and fill() methods to work at a higher level of abstraction. You can see the API here.

Related

Execute HTTP Post in a loop in Groovy (Jenkins)

I'm developing a shared library in Jenkins to help with interacting with an internal API. I can make single call which starts a long running process to create an object. I have to continue to query the API to check for the process' completion.
I'm trying to get this done in using a simple loop, but I keep getting stuck. Here's my function to query the API until it's completed:
def url = new URL("http://myapi/endpoint")
HttpURLConnection = http = (HttpURLConnection) url.openConnection()
http.setDoOutput(true)
http.setRequestMethod('POST')
http.setRequestProperty("Content-Type", "application/x-www-form-urlencoded")
def body = ["string", "anotherstring"].join('=')
OutputStreamWriter osw = new OutputStreamWriter(http.outStream)
osw.write(body)
osw.flush()
osw.close()
for(int i = 0; i < 30; i++) {
Integer counter = 0
http.connect()
response = http.content.text
def status = new JsonSlurperClassic().parseText(response)
// Code to check values here
}
When I run this through a pipeline, the first iteration through the loop works fine. The next iteration bombs with this error:
Caused: java.io.NotSerializableException: sun.net.www.protocol.http.HttpURLConnection
I just started in Groovy, so I feel like I'm trying to do this wrong. I've looked all over trying to find answers and tried several things without any luck.
Thanks in advance.
When running a pipeline job, Jenkins will constantly save the state of the execution so it can be paused and later on resumed, this means that Jenkins must be able to serialize the state of the script and therefore all of the objects that you create in your pipeline must be serializable.
If an object is not serializable you will get the NotSerializableException whenever Jenkins will attempt to serialize your un-serializable object when it save the state.
To overcome this issue you can use the #NonCPS annotation, which will cause Jenkins to execute the function without trying to serialize it. Read more info on this issue at pipeline-best-practice.
While normal Pipeline is restricted to serializable local variables (see appendix at bottom), #NonCPS functions can use more complex, nonserializable types internally (for example regex matchers, etc). Parameters and return types should still be Serializable, however.
There are however some limitations so read the documentation carefully, for example the return value types from #NonCPS methods must be serializable and you can't use any pipeline steps or CPS transformed methods inside a #NonCPS annotated function. Additional info can be found here.
One last thing, to overcome all these issues you can also use the Jenkins HTTP Plugin which includes all the HTTP abilities you will probably need wrapped in an easy to use build in interface.

Dynamically create flatmap function (keyed states) with values in a stream

I am writing a streaming Flink program to do feature extractions for our offline trained model and was wondering about the design of the program. I want each feature extraction logic to maintain its own state within its class so that adding a new feature extraction would be equivalent to adding a new class.
The rough high-level design is as followed:
#data is the stream of relative paths to the feature extraction logic in our code e.g. com.xxx.FeatureExtraction1
val data:DataStream[String] = ...
#based on the relative path, use reflection to initiate the class
featureExtraction1 = method.getReflect("com.xxx.FeatureExtraction1")
data.keyBy(_).flatmap(featureExtraction1)
where each feature extraction logic has its own internal state tracking
class FeatureExtraction1 extends RichFlatMapFunction[String, Double)] {
private var mystate: MapState = _
override def flatMap(input: String, out: Collector[Double]) = {
// access the state value
}
override def open(parameters: Configuration): Unit = {
mystate = xxx
}
}
I could make it work like this, as soon as I add a new feature extraction class, e.g. com.xxx.FeatureExtraction2, I append that to the data stream like
data.keyBy(_).flatmap(featureExtraction1).flatmap(featureExtraction2)...flatmap(featureExtractionN)
However, I don't know Flink well enough to be sure that if the featureExtraction1 though featureExtractionN will be executed concurrently (they should in my head) if they are chained like this. Secondly, I want to write the code that it automatically creates a new feature extraction logics without me appending it to the stream. In my head, it might look like this:
data.keyBy(_).foreachValueIntheStream.flatmap(new FeatureExtractionX based on the Value)
if I can do this, adding a new feature would be adding a new feature extraction class with its own state tracking
Please advise my naive thinking. I am grateful for any guidance.
Flink can't dynamically add functions. But you could do something close, I think.
I'd use a broadcast stream for the feature paths, and a regular stream for the actual data to be processed. Connect them to create a connected stream, then run that into a CoFlatMapFunction. Inside this function you'd maintain a list of (dynamically generated) feature extraction functions that you apply to the incoming data. For state, use a Map<feature extraction function id, value>, so that each feature extraction function records its state in the same map.
You do have the typical issue of wanting to empty the broadcast stream before processing the first of the data elements - see the mailing list for discussions on how to do that.

why is the map function inherently parallel?

I was reading the following presentation:
http://www.idt.mdh.se/kurser/DVA201/slides/parallel-4up.pdf
and the author claims that the map function is built very well for parallelism (specifically he supports his claim on page 3 or slides 9 and 10).
If one were given the problem of increasing each value of a list by +1, I can see how looping through the list imperatively would require a index value to change and hence cause potential race condition problems. But I'm curious how the map function better allows a programmer to successfully code in parallel.
Is it due to the way map is recursively defined? So each function call can be thrown to a different thread?
I hoping someone can provide some specifics, thanks!
The map function applies the same pure function to n elements in a collection and aggregates the results. It doesn't matter the order in which you apply the function to the members of the collection because by definition the return value of the function is purely dependent upon the input.
The others already explained that the standard map implementation isn't parallel.
But in Scala, since you tagged it, you can get the parallel version as simply as
val list = ... // some list
list.par.map(x => ...) // instead of list.map(x => ...)
See also Parallel Collections Overview and documentation for ParIterable and other types in the scala.collection.parallel package.
You can find the implementation of the parallel map in https://github.com/scala/scala/blob/v2.12.1/src/library/scala/collection/parallel/ParIterableLike.scala, if you want (look for def map and class Map). It requires very non-trivial infrastructure and certainly isn't just taking the recursive definition of sequential map and parallelizing it.
If one had defined map via a loop how would that break down?
The slides give F# parallel arrays as the example at the end and at https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/array.fs#L266 you can see the non-parallel implementation there is a loop:
let inline map (mapping: 'T -> 'U) (array:'T[]) =
checkNonNull "array" array
let res : 'U[] = Microsoft.FSharp.Primitives.Basics.Array.zeroCreateUnchecked array.Length
for i = 0 to res.Length-1 do
res.[i] <- mapping array.[i]
res

How to delete elements from a transformed collection using a predicate?

If I have an ArrayList<Double> dblList and a Predicate<Double> IS_EVEN I am able to remove all even elements from dblList using:
Collections2.filter(dblList, IS_EVEN).clear()
if dblList however is a result of a transformation like
dblList = Lists.transform(intList, TO_DOUBLE)
this does not work any more as the transformed list is immutable :-)
Any solution?
Lists.transform() accepts a List and helpfully returns a result that is RandomAccess list. Iterables.transform() only accepts an Iterable, and the result is not RandomAccess. Finally, Iterables.removeIf (and as far as I see, this is the only one in Iterables) has an optimization in case that the given argument is RandomAccess, the point of which is to make the algorithm linear instead of quadratic, e.g. think what would happen if you had a big ArrayList (and not an ArrayDeque - that should be more popular) and kept removing elements from its start till its empty.
But the optimization depends not on iterator remove(), but on List.set(), which is cannot be possibly supported in a transformed list. If this were to be fixed, we would need another marker interface, to denote that "the optional set() actually works".
So the options you have are:
Call Iterables.removeIf() version, and run a quadratic algorithm (it won't matter if your list is small or you remove few elements)
Copy the List into another List that supports all optional operations, then call Iterables.removeIf().
The following approach should work, though I haven't tried it yet.
Collection<Double> dblCollection =
Collections.checkedCollection(dblList, Double.class);
Collections2.filter(dblCollection, IS_EVEN).clear();
The checkCollection() method generates a view of the list that doesn't implement List. [It would be cleaner, but more verbose, to create a ForwardingCollection instead.] Then Collections2.filter() won't call the unsupported set() method.
The library code could be made more robust. Iterables.removeIf() could generate a composed Predicate, as Michael D suggested, when passed a transformed list. However, we previously decided not to complicate the code by adding special-case logic of that sort.
Maybe:
Collection<Double> odds = Collections2.filter(dblList, Predicates.not(IS_EVEN));
or
dblList = Lists.newArrayList(Lists.transform(intList, TO_DOUBLE));
Collections2.filter(dblList, IS_EVEN).clear();
As long as you have no need for the intermediate collection, then you can just use Predicates.compose() to create a predicate that first transforms the item, then evaluates a predicate on the transformed item.
For example, suppose I have a List<Double> from which I want to remove all items where the Integer part is even. I already have a Function<Double,Integer> that gives me the Integer part, and a Predicate<Integer> that tells me if it is even.
I can use these to get a new predicate, INTEGER_PART_IS_EVEN
Predicate<Double> INTEGER_PART_IS_EVEN = Predicates.compose(IS_EVEN, DOUBLE_TO_INTEGER);
Collections2.filter(dblList, INTEGER_PART_IS_EVEN).clear();
After some tries, I think I've found it :)
final ArrayList<Integer> ints = Lists.newArrayList(1, 2, 3, 4, 5);
Iterables.removeIf(Iterables.transform(ints, intoDouble()), even());
System.out.println(ints);
[1,3,5]
I don't have a solution, instead I found some kind of a problem with Iterables.removeIf() in combination with Lists.TransformingRandomAccessList.
The transformed list implements RandomAccess, thus Iterables.removeIf() delegates to Iterables.removeIfFromRandomAccessList() which depends on an unsupported List.set() operation.
Calling Iterators.removeIf() however would be successful, as the remove() operation IS supported by Lists.TransformingRandomAccessList.
see: Iterables: 147
Conclusion: instanceof RandomAccess does not guarantee List.set().
Addition:
In special situations calling removeIfFromRandomAccessList() even works:
if and only if the elements to erase form a compact group at the tail of the List or all elements are covered by the Predicate.

MapMaker and ReferenceMap - Google Collections

I understand ReferenceMap from the alpha version of Google Collections has been replaced by MapMaker.
I used this ReferenceMap constructor with the backing map:
public ReferenceMap(ReferenceType keyReferenceType, ReferenceType
valueReferenceType, ConcurrentMap<Object, Object> backingMap) {
this(keyReferenceType, valueReferenceType, backingMap, true);
}
My backing map is a concurrentmap with the ability to collect statistics (hit/miss etc).
What can I use in place of the above ReferenceMap constructor?
Thanks, Grace
We were not able to continue to offer the ability to pass your own backing map. MapMaker works using a customized map implementation of its own.
But, to gather hit/miss statistics, you can wrap the returned ConcurrentMap in a ForwardingConcurrentMap to count get invocations (using an AtomicLong), and have your Function count misses in a similar way. (Hits being, of course, nearly equal to request minus misses.)