To read from a stream in Java I would do the usual:
byte buff[] = new byte[10]
int len = 0;
while ((len = inputStream.read(buff)) != -1){
...do something with buff..
}
I know scala offers things like Source.fromInputStream but I see it a bit heavy to be honest. I know the above won't work in Scala because the assignment doesn't return the value. Is there a simple way without using the library?
It is possible to close over mutable state and use Iterator.continually like so:
val buff = Array.ofDim[Byte](10)
Iterator.continually(inputStream.read(buff))
.takeWhile(_ != -1)
.foreach { len =>
// do something wit buff and len
}
being a more or less direct translation of Java code. I'd reach for the libraries based on task at hand, however.
Related
I'd like to have a generator that terminates, like python, but I can't tell from ranges::views::generate's interface if this is supported.
You can roll it by hand easily enough:
https://godbolt.org/z/xcGz6657r although it's probably better to use a coroutine generator if you have one available.
You can return an optional in the generator, and stop taking elements when a std::nullopt is generated with views::take_while
auto out = ranges::views::generate(
[i = 0]() mutable -> std::optional<int>
{
if (i > 3)
return std::nullopt;
return { i++ };
})
| ranges::views::take_while([](auto opt){ return opt.has_value();})
;
I have an instance of ByteString. To read data from it I should use it's iterator() method.
I read some data and then I decide than I need to create a view (separate iterator of some chunk of data).
I can't use slice() of original iterator, because that would make it unusable, because docs says that:
After calling this method, one should discard the iterator it was called on, and use only the iterator that was returned. Using the old
iterator is undefined, subject to change, and may result in changes to
the new iterator as well.
So, it seems that I need to call slice() on ByteString. But slice() has from and until parameters and I don't know from. I need something like this:
ByteString originalByteString = ...; // <-- This is my input data
ByteIterator originalIterator = originalByteString .iterator();
...
read some data from originalIterator
...
int length = 100; // < -- Size of the view
int from = originalIterator.currentPosition(); // <-- I need this
int until = from + length;
ByteString viewOfOriginalByteString = originalByteString.slice(from, until);
ByteIterator iteratorForView = viewOfOriginalByteString.iterator(); // <-- This is my goal
Update:
Tried to do this with duplicate():
ByteIterator iteratorForView = originalIterator.duplicate()._2.take(length);
ByteIterator's from field is private, and none of the methods seems to simply return it. All I can suggest is to use originalIterator.duplicate to get a safe copy, or else to "cheat" by using reflection to read the from field, assuming reflection is available in your deployment environment.
Hello I am looking for fastest bat rather hi-level way to work with large data collection.
My task consist of two task read alot of large files in memory and then make some statistical calculations (the easiest way to work with data in this task is random access array ).
My first approach was to use java.io.ByteArrayOutputStream, becuase it can resize it's internal storage .
def packTo(buf:java.io.ByteArrayOutputStream,f:File) = {
try {
val fs = new java.io.FileInputStream(f)
IOUtils.copy(fs,buf)
} catch {
case e:java.io.FileNotFoundException =>
}
}
val buf = new java.io.ByteArrayOutputStream()
files foreach { f:File => packTo(buf,f) }
println(buf.size())
for(i <- 0 to buf.size()) {
for(j <- 0 to buf.size()) {
for(k <- 0 to buf.size()) {
// println("i " + i + " " + buf[i] );
// Calculate something amathing using buf[i] buf[j] buf[k]
}
}
}
println("amazing = " + ???)
but ByteArrayOutputStream can't get me as byte[] only copy of it. But I can not allow to have 2 copies of data .
Have you tried scala-io? Should be as simple as Resource.fromFile(f).byteArray with it.
Scala's built in library already provides a nice API to do this
io.Source.fromFile("/file/path").mkString.getBytes
However, it's not often a good idea to load whole file as byte array into memory. Do make sure the largest possible file can still fit into your JVM memory properly.
From what I understood here, "V8 has a generational garbage collector. Moves objects aound randomly. Node can’t get a pointer to raw string data to write to socket." so I shouldn't store data that comes from a TCP stream in a string, specially if that string becomes bigger than Math.pow(2,16) bytes. (hope I'm right till now..)
What is then the best way to handle all the data that's comming from a TCP socket ? So far I've been trying to use _:_:_ as a delimiter because I think it's somehow unique and won't mess around other things.
A sample of the data that would come would be something_:_:_maybe a large text_:_:_ maybe tons of lines_:_:_more and more data
This is what I tried to do:
net = require('net');
var server = net.createServer(function (socket) {
socket.on('connect',function() {
console.log('someone connected');
buf = new Buffer(Math.pow(2,16)); //new buffer with size 2^16
socket.on('data',function(data) {
if (data.toString().search('_:_:_') === -1) { // If there's no separator in the data that just arrived...
buf.write(data.toString()); // ... write it on the buffer. it's part of another message that will come.
} else { // if there is a separator in the data that arrived
parts = data.toString().split('_:_:_'); // the first part is the end of a previous message, the last part is the start of a message to be completed in the future. Parts between separators are independent messages
if (parts.length == 2) {
msg = buf.toString('utf-8',0,4) + parts[0];
console.log('MSG: '+ msg);
buf = (new Buffer(Math.pow(2,16))).write(parts[1]);
} else {
msg = buf.toString() + parts[0];
for (var i = 1; i <= parts.length -1; i++) {
if (i !== parts.length-1) {
msg = parts[i];
console.log('MSG: '+msg);
} else {
buf.write(parts[i]);
}
}
}
}
});
});
});
server.listen(9999);
Whenever I try to console.log('MSG' + msg), it will print out the whole buffer, so it's useless to see if something worked.
How can I handle this data the proper way ? Would the lazy module work, even if this data is not line oriented ? Is there some other module to handle streams that are not line oriented ?
It has indeed been said that there's extra work going on because Node has to take that buffer and then push it into v8/cast it to a string. However, doing a toString() on the buffer isn't any better. There's no good solution to this right now, as far as I know, especially if your end goal is to get a string and fool around with it. Its one of the things Ryan mentioned # nodeconf as an area where work needs to be done.
As for delimiter, you can choose whatever you want. A lot of binary protocols choose to include a fixed header, such that you can put things in a normal structure, which a lot of times includes a length. In this way, you slice apart a known header and get information about the rest of the data without having to iterate over the entire buffer. With a scheme like that, one can use a tool like:
node-buffer - https://github.com/substack/node-binary
node-ctype - https://github.com/rmustacc/node-ctype
As an aside, buffers can be accessed via array syntax, and they can also be sliced apart with .slice().
Lastly, check here: https://github.com/joyent/node/wiki/modules -- find a module that parses a simple tcp protocol and seems to do it well, and read some code.
You should use the new stream2 api. http://nodejs.org/api/stream.html
Here are some very useful examples: https://github.com/substack/stream-handbook
https://github.com/lvgithub/stick
I'll give some C-style "bracket" pseudo-code to show what I'd like to express in another way:
for (int i = 0; i < n; i++) {
if (i == 3 || i == 5 || i == 982) {
assertTrue( isCromulent(i) );
} else {
assertFalse( isCromulent(i) );
}
}
The for loop is not very important, that is not the point of my question: I'd like to know how I could rewrite what is inside the loop using Scala.
My goal is not to have the shortest code possible: it's because I'd like to understand what kind of manipulation can be done on method names (?) in Scala.
Can you do something like the following in Scala (following is still some kind of pseudo-code, not Scala code):
assert((i==3 || i==5 || i==982)?True:False)(isCromulent(i))
Or even something like this:
assertTrue( ((i==3 || i==5 || i==982) ? : ! ) isCromulent(i) )
Basically I'd like to know if the result of the test (i==3 || i==5 || i==982) can be used to dispatch between two methods or to add a "not" before an expression.
I don't know if it makes sense so please be kind (look my profile) :)
While pelotom's solution is much better for this case, you can also do this (which is a bit closer to what you asked originally):
(if (i==3||i==5||i==982) assertTrue else assertFalse)(isCromulent(i))
Constructing names dynamically can be done via reflection, but this certainly won't be concise.
assertTrue(isCromulent(i) == (i==3||i==5||i==982))
Within the Scala type system, it isn't possible to dynamically create a method name based on a condition.
But it isn't at all necessary in this case.
val condition = i == 3 || i == 5 || i == 982
assertEquals(condition, isCromulent(i))
I hope nobody minds this response, which is an aside rather than a direct answer.
I found the question and the answers so far very interesting and spent a while looking for a pattern matching based alternative.
The following is an attempt to generalise on this (very specific) category of testing:
class MatchSet(s: Set[Int]) {def unapply(i: Int) = s.contains(i)}
object MatchSet {def apply(s: Int*) = new MatchSet(Set(s:_*))}
val cromulentSet = MatchSet(3, 5, 982)
0 until n foreach {
case i # cromulentSet() => assertTrue(isCromulent(i))
case i => assertFalse(isCromulent(i))
}
The idea is to create ranges of values contained in MatchSet instances rather than use explicit matches.