How to avoid unicode string in python3? - unicode

I have read a csv file in which there are many values which have been read as unicode. While doing my data processing, I want to skip these values.
Existing Code:
if aString.encode('ascii', 'ignore') == b'':
continue
The same code is working when i am trying this in python environment. But it is not working in script(.py file).

Related

Keeping whitespace in csv headers (Matlab)

So I'm reading in a .csv file, and it all works as I want bar one thing. The headers of the data have spaces, which I want later for displaying data to the user. However, these spaces get stripped when the csv file is read in via readtable (as they get used as the variable names). Again, no problem with this per se, but I still need the unmodified strings as well.
Two additional notes:
I'm happy for the strings to be stored separately from the main table if that makes things easier.
The actual .csv file I'm reading in is reasonably large (about 2 million data points) so from a computational cost side of things, the less reading of the file the better
Example read in code:
File = 'example.csv';
Import_Options = detectImportOptions( File, 'NumHeaderLines', 0 );
Data = readtable( File )
Example csv file (example.csv):
"this","is","an","example test"
"1","1","2","3"
"3","1","4","1"
"hot","hot","cold","hot"
You can simply read the first line with fgetl, thus grabbing the headers, before reading the entire file with readtable.

Extracting data from complex output text file using perl and placing into new text file

The complete output text file is hundreds of lines long, with relevant nuclear cross sections and a plethora of other data that I do not need for this particular problem. I am trying to extract the columns of data under "BURNUP" and the first "K-INF" from the file I attached. I am trying to extract this data and place it into a separate file. I am a newbie, and have a similar perl script from a professor. I have tried to adapt it to the information I am looking for but the only result I am receiving are the 2 print statements. Any suggestions?

Xlsread returning zero values....?

I am getting zero values while using xlsread command in MATLAB.I am using a real world dataset taken from UCI repository which has got both integer and float values.
[Train,textData,rawData] = `xlsread('C:\Users\pooja\Documents\project\breastcancer.csv');`
I have tried with xls format too..
[Train,textData,rawData] = xlsread('C:\Users\pooja\Documents\project\breastcancer.xls');
Thanx in Advance..!
In the wide world of computers, there are a lot of data formats. You need to remember that data formats are different from each other. Generally software like Matlab allows you to open different types of data formats. Each one of course with its own function.
You can guess that the function xmlread is to read XML files. If you want to read csv files or any other type of file in the world, please (I think this is obvious) do not use xmlread!
Specifically to open csv files matlab has csvread. Please, do not use csv read to open files that are not CSV.....

Saving a string to HDFS creates line feeds for each character

I have a plain text file that I am reading from my local system that I am uploading to HDFS. I have Spark/Scala code that reads the file in, converts the file to a string, and then i use saveAsTextFile function to specify my HDFS path where I want the file to be saved. Note I am using the coalesce function because I want one file saved, rather than the file getting split.
import scala.io.Source
val fields = Source.fromFile("MyFile.txt").getLines
val lines = fields.mkString
sc.makeRDD(lines).coalesce(1, true).saveAsTextFile("hdfs://myhdfs:8520/user/alan/mysavedfile")
The code I have saves my text successfully to HDFS, unfortunately though, for some reason each character in my string has a line feed character after it.
Is there a way around this?
I wasn't able to get this working exactly as I wanted, but I did come up with a work around. I saved the device locally and then called a shell command through Scala to upload the completed file to HDFS. Pretty straightforward.
Would still appreciate if anyone could tell me how to copy a string directly to a file in HDFS though.

MATLAB: How to import multiple CSV files with mixed data types

I have just started learning MATLAB and have difficulties to import csv files to a 2-D array..
Here is a sample csv for my needs:(all the csv files are in the same format with fixed columns)
Date, Code, Number....
2012/1/1, 00020.x1, 10
2012/1/2, 00203.x1, 0300
...
As csvread() only works with integer numbers, should I import numeric data and text data separately or is there any quick way to import multiple csv files with mixed data types?
Thanks a lot!!
What you're looking for is maybe the function xlsread.
It opens any file recognized by Excel, and automatically separates text data from numerical data.
The problem is that the default delimiter for at least on my computer is ;, and not , (at least for my locale here in Brazil). So xlsread will try to separate the fields on the file with a ;, and not a comma as you'd like.
To change that you have to change your system locales to add the comma as the list separator. So if you feel like it, to do it in windows vista, click Start, Control Panel, Regional and Language Options, Customize this format, and change the List Separator from ';' to ','. On other windows the process should be almost the same.
After doing that, typing:
[num, txt, all] = xlsread('your_file.csv');
will return something like:
num =
10
300
txt =
'01/01/2012' ' 00020.x1'
'02/01/2012' ' 00203.x1'
all =
'01/01/2012' ' 00020.x1' [ 10]
'02/01/2012' ' 00203.x1' [300]
Notice that if your locale has already the list separator set to ',', you won't have to change anything on your system to make that work.
If you don't want to change your system just to use the xlsread function, then you could use the textscan function described here: http://www.mathworks.com/help/techdoc/ref/textscan.html
The problem is that it is not as simple as calling it, as you will have to open the file, iterate on the lines, and tell matlab explicitly the format of your file.
Best regards
I recently wrote a function that solves exactly this problem. See delimread.
It's worth noting that xlsread on csv files only works in windows. On Linux or Mac, xlsread works in 'basic' mode which cannot read csv files. It might not be a great idea in the longrun to use xlsread in case you need to migrate across platforms or automate code runs on Linux servers.
xlsread is also much slower than other text parsing functions since it opens an Excel session to read the file.