Scala file reading adding spaces - scala

I'm reading a file in scala using
def fileToString(that:String):String= {
var x:String=""
for(line <- Source.fromFile(that).getLines){
x += line + "\n"
}
x
}
This works fine for a scala file. But on a txt file it adds spaces between every character. For example. I read in a .txt file and get this:
C a l l E v e n t L o g ( E r r o r $ , E r r N u m , E r r O b j )
' E n d E r r o r h a n d l i n g b l o c k .
E n d S u b
and I read in the scala file for the program and it comes out normally
EDIT: It seems to be something to do with Encoding. When I change it to UTF-16, it reads the .txt file, but not the scala file. Is there a way to make it universally work?

No it can't work for all files. To read/interpret a file/data you need to know the format/encoding unless you're treating it as a binary blob.
Either save all files in the usual unicode format (UTF-8) or specify the encoding when reading the file.
FromFile takes an implicit codec, you can pass it explicitly.
io.Source.fromFile("123.txt")(io.Codec("UTF-16"))

In general, if you read from a file you need to know its encoding in order to correctly read the characters. I am not sure what the default encoding is that Scala assumes, probably UTF8, but you can either pass a Codec to fromFile, or specify the encoding as a string:
io.Source.fromFile("file.txt", "utf-8")

It's hard to be sure, but it sounds like the two files were written with different encodings. On any Unix system (including Mac) you can use the command od to look at the actual bytes in the file.
UTF-8 is the standard for ordinary text files on most systems, but if you have a mix of UTF-8 and UTF-16, you'll have to know which encoding to use for which files and correctly specify the encoding.
Or be more careful when you create the files to insure that they are all in the same format.

Related

PowerShell: Read id3v2 from MP3, specifically ISRC

I'm ultimately aiming to get a hashtable of the path and ISRC of all the MP3 files in my music library for use in organising my library. Right now, I am having trouble getting the ISRC information out of the files. I have checked it is there using other software, but I particularly need to read it using powershell.
I've tried using a few Get-FileMetaData functions, but I think I was looking in the wrong place with that attempt.
In place of reading it the 'proper' way, I attempted to just read the file as plain text with Get-Content and manipulate the string to isolate the ISRC, which I can find when viewing the file in Notepad. The difficulty I ran into is managing the way the text is encoded (if that is the right word). There are whitespace characters inbetween the characters when viewed in notepad, which don't show up in PowerShell but still seem to count toward string length.
I would try to provide some code, but all I've had are dead ends, and I think the issue is in my understanding of what I'm working with. If I've skipped over any important information, please let me know. Tagged with unicode on a vague hunch that the string manipulation involves unicode.
So, how can I properly read the id3v2 tags using powershell (By properly I mean without bodgy string manipulation), or how can I interpret the raw file contents using powershell, i.e. deal with the special characters and whitespaces.
Thanks very much.
Raw content example: (Where the piece of interest is the text following 'TSRC')
ID3 >1TCON ) ÿþS i n g e r & S o n g w r i t r TRCK 1 TPOS 1 TIT2 ÿþv a l e n t i n e TPE1
ÿþD a f n a TXXX ÿþA R T I S T S ÿþD a f n a TALB ÿþv a l e n t i n e TPE2
ÿþD a f n a TLEN 151000TPUB # ÿþM a r g a l i t R e c o r d s TSRC ÿþQ Z 8 L D 1 9 8 6 2 3 3 TXXX - ÿþB A R C O D E ÿþ1 9 3 6 6 4 6 1 1 6 0 3 TYER 2019TDAT 0702APIC ‰ image/jpeg cover ÿØÿà JFIF H H ÿÛ C
Maybe this
Access Music File Metadata in Powershell
answer - using taglib.dll can help you too.
Get-Content has a parameter for -encoding.
If you can work out the encoding of those files, just put it in that parameter.
It's also worth checking your powershell version. I believe this behaviour changed between 5 and 6.

Trouble Importing Microsoft CSV(Historical-Search) into powershell

When I try to load the result of an Historical Search (through EAC) into my Powershell Script, I get whitespaces in the result between every letter. So for Example what looked in the original csv like
Header1, Header2, Header3,
Content1, Content2, Content3
Now Looks like
" H e a d e r 1 ", " H e a d e r 2 ", " H e a d e r 3 ",
" C o n t en t 1", ...
I already tried re-downloading the files and creating a datatable etc but nothing works because the data is just wrong.
When I open the csv in Editor, the whitespaces aren't there.
Select Statements also don't work.
BTW the same Thing works for the traditional message trace csv
$trace = Import-Csv W:\Path.csv
If anybody knows what might cause this, i would love to know the fix since it's driving me crazy
Update: I checked the csv on Notepad++ and These are not whtitespaces, but \0 values.
Any Ideas how they got there and why they are there?

How do I export a matrix in MATLAB?

I'm trying to export a matrix f that is double. My data in f are real numbers in three columns. I want a txt file as an output with the columns separated by tabs. However, when I try the dlmwrite function, just the first column appears as output.
for k = 1:10
f = [idx', firsttime', sectime'];
filename = strcat(('/User/Detection_rerun/AF_TIMIT/1_state/mergedlabels_train/'),(files_train{k,1}),'.lab');
dlmwrite(filename,f,'\t') ;
end
When I use dlmwrite(filename,f,'\t','newline','pc') ; I keep getting an error Invalid attribute tag: \t . I even tried 'tab' instead of '\t' but a similar error appears. Please let me know if you have any suggestions. thank you
This is because you are not calling dlmwrite properly. To specify the delimiter, you must use the delimiter flag, followed by the specific delimiter you want. In your case, you use \t. In other words, you need to do this:
for k = 1:10
f = [idx', firsttime', sectime'];
filename = strcat(('/User/Detection_rerun/AF_TIMIT/1_state/mergedlabels_train/'),(files_train{k,1}),'.lab');
dlmwrite(filename,f,'delimiter','\t') ;
end
BTW, you are using the newline flag with pc, meaning that you are specifying carriage returns that are recognized by a PC. I suggest you leave this out and allow MATLAB to automatically infer this. Only force the newline characters if you know what you're doing.
FWIW, the MATLAB documentation is pretty clear about delimiters and other quirks about the function: http://www.mathworks.com/help/matlab/ref/dlmwrite.html

What's the most efficient way to decode a UTF16 binary?

As Rebol 3 supports unicode, and UTF16 is used internally when needed (if it has only ASCII characters, it's in ASCII), it should be as simple as copying the memory content from the binary and setting up the REBVAL structure. However, the only way I find seems to be iterating over the binary and converting each character individually.
Same question applies to encoding a string in UTF16.
OK, there doesn't seem to be an easy way to do it. So I just added two codecs UTF-16LE/BE for this purpose. See this commit: https://github.com/zsx/r3/commit/630945070eaa4ae4310f53d9dbf34c30db712a21
With this change, you can do:
>> b: encode 'utf-16le "hello"
== #{680065006C006C006F00}
>> s: decode 'utf-16le b
== "hello"
>> b: encode 'utf-16be "hello"
== #{00680065006C006C006F}
>> s: decode 'utf-16be b
== "hello"

Reading in a file in io programming language

I'm looking to read in a simple text file using the IO language and print it to the screen,
so far I have:
f := File with("test.txt")
f openForReading
but just have no idea how to print it or clone the contents to an object. If anyone knows anything or could point me in a good direction it would be much appreciated.
Turns out it's very simple, just f contents. For any future reference to check for already existing methods for an object in io you can use protos, e.g. f protos
From the io> interactive shell, have you tried?
f print
or
doString(f)
See this blog
Use readLine to read one line to a string, and println to print.
f := File with(fileName)
f openForReading
l := f readLine
l println
Create a File object with your specified path:
fileName := "yourFileName.txt"
file := File with(fileName)
Open and read the file into a variable
file open
fileText := file readToEnd
Then close the file.
file close
You should then have the 'fileText' variable available for use.