I'm trying to read a csv and gather two types of informations.
I need to read it once to get the number of lines that my csv contains.
I need to go back to the top of the csv and re read it again to retrieve the values.
I'm looking for something similar to seek(0) in java for bufferedSource
I've tried to close and re open the buffer but I think I'm overthiking it.
I also read through the source library but nothing came close to what I'm looking for.
PS:I know this is not optimized however my first read will basically enable me to create an array to store the values I'd retrive from the second read(and not a list).
I would take any clue you might have to offer !
Thanks a lot
Related
First of all, apologies if this is a stupid question. I'm new to unit tests so I'm struggling a bit here.
I'm working on an app that queries an API, receives a JSON response and then processes that response to produce a series of complex data structures. Many of these data structures are daily time series, which means each of my functions produces a list (List<Datapoint>) containing hundreds of datapoint objects.
What I'm trying to test is that, for a given API response, each function produces the output it should.
For the input of each test I have already grabbed a sample, real JSON response from the API, and I've stored it inside a test_data folder within my root test folder.
However, for the expect part... how can I obtain a sample output from my function and store it somewhere in my test_data folder?
It would be straightforward if the output of my function were a string, but in this case we're talking about a list with hundreds of custom objects containing different values inside them. The only way to create those objects is through the function itself.
I tried running the debugger to check the value of the output at runtime, which I can do... but that doesn't help me copy it or store it anywhere as code.
Should I try to print the full contents of the output to a string at runtime and store that string? I don't think this would work, as all I see in the console are a bunch of Instance Of when I do functionOutput.toString()... I would probably need to recursively print each of the variables inside those objects.
Please tell me I'm being stupid and there's a simpler way to do this :)
I am working on data factory and was wondering if there are any activities to just "move files" without actually reading them rather than "copy data" (which seems like does a read operation)?
I am trying to move files if any exist from one folder to another and if there are many files, since copy data reads each file, it makes the process slow.
Any suggestion. This is how my current data source looks like and all I want to do is, if there is any csv file exists at the location move it without reading it per say.
So here is a MSFT link I followed to move files.
https://learn.microsoft.com/en-us/azure/data-factory/solution-template-move-files
This tutorial was not very detailed when it comes to explaining everything. Like it assumes that the user needs parameters. I did as it said but my datasets were pointing to exactly where files need to be picked up and land, so I left the parameters empty. Debugging or running the trigger didn't move a file.. solution didn't work.
I had to remove the parameters created in the template to make this work. In case its helpful to some. File move started happening after that.
So lesson learned, empty parameters wont work. If you don't need them remove them.
Also, I watched this tutorial in case its helpful to some one.
https://www.youtube.com/watch?v=u_X_f4z8zoQ
I have a val dataset = Dataset[FeedData], where FeedData is something like case class FeedData(feed: String, data: XYZ).
I want to avoid post-processing the files, so I decided to call dataset.repartition($"feed").json("s3a://...") so that each feed ends up in a different file. The problem is that the files are still named along the lines of part-XXXX so I can't easily pick out the relevant file for a given feed, without a) opening them all to check the values for feed inside, or b) post-processing the files to be more friendly.
I want the files to look like part-XXXX-{feed} instead of part-XXXX
Is it possible to dynamically name the partition files based on the value of the column feed used to partition the dataset?
Background:
I found this answer which mentions a saveAsNewAPIHadoopFile() method, where I can extend some relevant classes for my own file naming implementation.
Can anybody help me understand this method, how to access it from a Dataset, and tell me whether it's possible to project the required information (feed) into my implementation to dynamically name the partitions?
I was trying to do it the wrong way:
dataset.repartition($"colName").write.format("json").save(path)
The correct way to do this is:
dataset.write.partitionBy("colName").format("json").save(path)
The difference is that you should call .partitionBy after .write. The resulting directories look like: colName=value/part-XXXX.
See here for more info.
I am trying to save a variable in .mat format and update and APPEND the NEW contents of THIS variable each time a loop is finished to avoid memory blow up. I have searched a bit and think the best way is structures. But still think there should be an straight way to do that. Anybody can give me an example to how to do that?
You could try using matfile to write without loading the file.
I am a newbie learning sml and the question I am given involves IO functions that I do not understand.
Here are the 2 questions that I really need help with to get me started, please provide me with code and some explanation, I will be able to use trial and error with the code given for the other questions.
Q2) readlist(filename) which reads a list of filenames (each of which were produced by listdir in (Q1) and combines them into one large list.
(reads from the text file in Q1 and then assigns the contents into 1 big list containing all the information)
Thing is, I only learned from the lecturer in school on the introduction section, there isn't even a system input or output example shown, not even the "use file" function is taught. if anyone that knows sml sees this, please help. Thanks to anyone who took the effort to help me.
Thanks for the reply, current I am using SMLNJ to try and do this. Basically, Q1 requires me to list the directory's files of the "directoryname" provided into a text file in "filename". The Q2 requires me to read from the "filename" text file and then place the contents into one large list.
BTW, if you people only kept seeing this post, please try and ask questions also. Currently i am stuck trying to read from the txt file and appending it to a list, I am able to do it for a single line but am now trying to do it for the whole file:
fun readlist(infile : string) =
let val ins = TextIO.openIn infile
fun listing() =
TextIO.inputLine ins;
in listing()
end;
TextIO.closeIn;
It is very hard for me to make out what questions you are trying to ask.
The functions you ask about are not part of the Standard Basis Library for ML. If you are supposed to write them, you are going to have a hard time without some kind of Posix module. You can tell your instructor I didn't care for this assignment.
Moscow ML contains a listDir function which is admirably simple:
- load "Mosml";
> val it = () : unit
- Mosml.listDir ".";
> val it = ["natural-semantics.djvu", "natural-semantics.pdf"] : string list
-
To get more help, please be a little clearer what you are asking.
EDIT: Since it's a homework question I shouldn't just give you the answer, but some useful functions includeopenDir, readDir, and closeDir from the OS.Filesys structure. These will tell you what's in the directory. Then to read and write files you'll want TextIO.
You'll find the Standard Basis Library documentation indispensible.
You sure i didn't teach u?
u owe me one chicken pie.