ContentResolver openFile and ParcelFileDescriptor - android-contentresolver

The openFile method of a ContentResolver allows read/write (rw) mode settings and returns a ParcelFileDescriptor (pfd). But the pfd only has subclasses for AutoCloseInputStream and AutoCloseOutputStream. In turn, these streams have a method for getChannel() which will return a FileChannel that can be used for random access to the underlying content.
What I'd like is a way to derive a FileChannel for a rw file obtained through the pfd of the openFile method of the ContentResolver. It almost seems this is supported (almost). For instance, a FileChannel can be obtained from a FileInputStream, FileOutputStream, or a RandomAccessFile. Unfortunately, the pfd only provides subclasses for FileInputStream and FileOutputStream. It would seem to make sense that a ParcelFileDescriptor should have a subclass of AutoCloseRandomAccessFile. With this, I could get a FileChannel that could be used for rw operations. As it is, I need to dup the pfd and create both an input stream and an output stream and use the respective streams for reading or writing.
Is there a different way to accomplish what I'm trying to do? Has someone already extended a ParcelFileDescriptor with a RandomAccessFile?

Related

How to convert `fs2.Stream[IO, T]` to `Iterator[T]` in Scala

Need to fill in the methods next and hasNext and preserve laziness
new Iterator[T] {
val stream: fs2.Stream[IO, T] = ...
def next(): T = ???
def hasNext(): Boolean = ???
}
But cannot figure out how an earth to do this from a fs2.Stream? All the methods on a Stream (or on the "compiled" thing) are fairly useless.
If this is simply impossible to do in a reasonable amount of code, then that itself is a satisfactory answer and we will just rip out fs2.Stream from the codebase - just want to check first!
fs2.Stream, while similar in concept to Iterator, cannot be converted to one while preserving laziness. I'll try to elaborate on why...
Both represent a pull-based series of items, but the way in which they represent that series and implement the laziness differs too much.
As you already know, Iterator represents its pull in terms of the next() and hasNext methods, both of which are synchronous and blocking. To consume the iterator and return a value, you can directly call those methods e.g. in a loop, or use one of its many convenience methods.
fs2.Stream supports two capabilities that make it incompatible with that interface:
cats.effect.Resource can be included in the construction of a Stream. For example, you could construct a fs2.Stream[IO, Byte] representing the contents of a file. When consuming that stream, even if you abort early or do some strange flatMap, the underlying Resource is honored and your file handle is guaranteed to be closed. If you were trying to do the same thing with iterator, the "abort early" case would pose problems, forcing you to do something like Iterator[Byte] with Closeable and the caller would have to make sure to .close() it, or some other pattern.
Evaluation of "effects". In this context, effects are types like IO or Future, where the process of obtaining the value may perform some possibly-asynchronous action, and may perform side-effects. Asynchrony poses a problem when trying to force the process into a synchronous interface, since it forces you to block your current thread to wait for the asynchronous answer, which can cause deadlocks if you aren't careful. Libraries like cats-effect strongly discourage you from calling methods like unsafeRunSync.
fs2.Stream does allow for some special cases that prevent the inclusion of Resource and Effects, via its Pure type alias which you can use in place of IO. That gets you access to Stream.PureOps, but that only gets you methods that consume the whole stream by building a collection; the laziness you want to preserve would be lost.
Side note: you can convert an Iterator to a Stream.
The only way to "convert" a Stream to an Iterator is to consume it to some collection type via e.g. .compile.toList, which would get you an IO[List[T]], then .map(_.iterator) that to get an IO[Iterator[T]]. But ultimately that doesn't fit what you're asking for since it forces you to consume the stream to a buffer, breaking laziness.
#Dima mentioned the "XY Problem", which was poorly-received since they didn't really elaborate (initially) on the incompatibility, but they're right. It would be helpful to know why you're trying to make a Stream-to-Iterator conversion, in case there's some other approach that would serve your overall goal instead.

Losing path dependent type when extracting value from Try in scala

I'm working with scalax to generate a graph of my Spark operationS. So, I have a custom library that generates my graph. So, let me show a sample:
val DAGWithoutGet = createGraphFromOps(ops)
val DAGWithGet = createGraphFromOps(ops).get
The return type of DAGWithoutGet is
scala.util.Try[scalax.collection.Graph[typeA, scalax.collection.GraphEdge.DiEdge]],
and, for DAGWithGet is
scalax.collection.Graph[typeA, scalax.collection.GraphEdge.DiEdge].
Here, typeA is a project related class representing a single Spark operation, not relevant for the context of this question. (for context only: What my custom library does is, essentially, generate a map of dependencies between those operations, creating a big Map object, and calling Graph(myBigMap: _*) to generate the graph).
As far as I know, calling the .get command on this point of my code or later should not make any difference, but that is not what I'm seeing.
Calling DAGWithoutGet.get.nodes has a return type of scalax.collection.Graph[typeA,DiEdge]#NodeSetT,
while calling DAGWithGet.nodes returns DAGWithGet.NodeSetT.
When I extract one of those nodes (using the .find method), I receive scalax.collection.Graph[typeA,DiEdge]#NodeT and DAGWithGet.NodeT types, respectively. Much to my dismay, even the methods available in each case are different - I cannot use pathTo (which happens to be what I want) or withSubgraph on the former, only on the latter.
My doubt is, then, after this relatively complex example: what is going on here? Why extracting the value from the Try construct on different moments leads to different types, one path dependent, and the other not - or, if that isn't correct, what may I be missing here?

How to get disk usage or sizes of files in a directory in a HDFS filesystem using Scala

I am trying to get the size of the files in a HDFS directory in Scala. I can do the following in REPL:
Seq("/usr/bin/hdfs", "dfs", "-du", "-s", "/tmp/test").!
but I cannot store the result into a value. How can I get the size of the files in a directory in Scala?
The ! method you're using comes from ProcessBuilder.
(Seq[String] is being implicitly converted to a ProcessBuilder, thus granting you access to !).
/** Starts the process represented by this builder,
* blocks until it exits, and returns the exit code.
*/
abstract def !: Int
If you want the output, use a different method, like !!
/** Starts the process represented by this builder,
* blocks until it exits, and returns the output as a String.
*/
abstract def !!: String
I recommend checking out the other methods defined on ProcessBuilder. I'm sure at least one of them will suit your needs.
I'd recommend the use of https://github.com/pathikrit/better-files
import better.files._
import java.io.{File => JFile}
val size = File("/usr/bin/hdfs").size
println(size)

How to create a stream with Scalaz-Stream?

It must be damn simple. But for some reason I cannot make it work.
If I do io.linesR(...), I have a stream of lines of the file, it's ok.
If I do Processor.emitAll(), I have a stream of pre-defined values. It also works.
But what I actually need is to produce values for scalaz-stream asynchronously (well, from Akka actor).
I have tried:
async.unboundedQueue[String]
async.signal[String]
Then called queue.enqueueOne(...).run or signal.set(...).run and listened to queue.dequeue or signal.discrete. Just with .map and .to. With an example proved to work with another kind of stream -- either with Processor or lines from the file.
What is the secret? What is the preferred way to create a channel to be streamed later? How to feed it with values from another context?
Thanks!
If the values are produced asynchronously but in a way that can be driven from the stream, I've found it easiest to use the "primitive" await method and construct the process "by hand". You need an indirectly recursive function:
def processStep(v: Int): Process[Future, Int] =
Process.emit(v) ++ Process.await(myActor ? NextValuePlease())(w => processStep(w))
But if you need a truly async process, driven from elsewhere, I've never done that.

Why do we need to set the output key/value class explicitly in the Hadoop program?

In the "Hadoop : The Definitive Guide" book, there is a sample program with the below code.
JobConf conf = new JobConf(MaxTemperature.class);
conf.setJobName("Max temperature");
FileInputFormat.addInputPath(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
conf.setMapperClass(MaxTemperatureMapper.class);
conf.setReducerClass(MaxTemperatureReducer.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
The MR framework should be able to figure out the output key and value class from the Mapper and the Reduce functions which are being set on the JobConf class. Why do we need to explicitly set the output key and value class on the JobConf class? Also, there is no similar API for the input key/value pair.
The reason is type erasure[1]. You set the output K/V classes as generics. During job setup (which is run time, not compile time), these generics are erased.
The input k/v classes can be read from the input file, in the case of SequenceFiles the classes are in the header- you can read them when opening a sequence file in the editor.
This header must be written, since every map output is a SequenceFile, so you need to provide the classes.
[1] http://download.oracle.com/javase/tutorial/java/generics/erasure.html