sed script for converting ascii data file?

sed script for converting ascii data file? - sed

I used to have a way to do this, but it's been lost to time... I have one program that outputs an undeliniated ascii data file, and another program that needs the data formatted it's own way. The output contains X,Y data points in the format:
X120207Y041009
X120107Y040071
etc. ...
where each of these ordinates represents a 2.4 data point. The input file needs it as such:
X 12.0207 Y 04.1009
X 12.0107 Y 04.0071
etc. ...
Not all of the lines in the file are data points, but the ones that are start with "X", have the exact same format, and contain nothing else on that line.
All my searching for a similar conversion points toward using sed as a quick, elegant way to do this, but I never learned sed. I think before I actually wrote a c program to do this conversion but that seems like the really hard way to recreate. If anyone could bail me out, I'll owe you a bagel!

You can make use of gawk's FIELDWIDTHS to handle fixed format text:
awk -v FIELDWIDTHS="1 6 1 6" '/^X/{for(x=2;x<=5;x+=2)sub(/../,"&.",$x)}7' file
Let's see an example:
kent$ cat f
X120207Y041009
X120107Y040071
we want to leave it, no matter what it is
kent$ awk -v FIELDWIDTHS="1 6 1 6" '/^X/{for(x=2;x<=5;x+=2)sub(/../,"&.",$x)}7' f
X 12.0207 Y 04.1009
X 12.0107 Y 04.0071
we want to leave it, no matter what it is

Related

Transform a matrix of integers (0 to 30) to a matrix of emojis

I am working on transforming an image into a set of emojis, depending on how many colors are there. The Maths part is done. I have the matrix of numbers from 0 to 30, but I specifically need to convert the numbers into symbols and I was thinking about emojis since they are so used nowadays.
My question is how am I supposed to read a matrix of integers from a file, transform the matrix of integers into a matrix with different emojis (eventually, from a list of my choice) and put the output in another text file, this time containing the emojis? Is that possible? I guess it should be, but how do I do that? Does anyone have any suggestions?
The problem that I face is actually with the emojis unicode, I don't seem to have success when it comes to receiving messages on the console in their case. I just get "? ?" instead of a smiley face. But that thing happens only for them, the ASCII characters seem to work a bit better. The problem with ASCII characters is that I need, again, expressive images instead of numbers or random pipe shapes.
There is the code:
%make sure you have the "1234567.jpg" in the same location as the .m file
imdata = imread('1234567.jpg');
[X_no_dither,map] = rgb2ind(imdata,30,'nodither');
imshow(X_no_dither,map)
% and there I try to put the output in a text file
dlmwrite('result.txt',X_no_dither,'delimiter',"\t");
Ok, and the output in the text file is:
0 0 0 0 26 26 ... etc.
And I wonder how am I supposed to write the code in such a way that I will get emojis instead of numbers.
🤔 🤔 🤔 🤔 💖 💖 ... etc.
That's how I'd want the output to be like. But, from what I tried yesterday, I cannot print them without getting warnings/errors.

What you need to do is create a table with your 30 emojis (this documentation page might be helpful), then index into that table. I'm using the compose function as indicated in the page above, it should also be possible to copy-paste emojis into your M-file. If you don't see the emojis in MATLAB's console, change the font you're using.
For example:
table = [compose("\xD83D\xDE0E"),
"B", % find the utf16 encoding for your emojis, or copy-paste them in
"C",
"D",
...
];
output = table(X_no_dither + 1);
f = fopen('result.txt', 'wt');
for ii = 1:size(output, 1)
fprintf(f, '%s', output(ii, :));
fprintf(f, '\n');
end
fclose(f);
This will write the file out in UTF16 format, which is what MATLAB uses. If you're on Windows this might work well for you. On other platforms you might want to save as UTF8 instead, which can be accomplished by opening the file in UTF8 mode:
f = fopen('result.txt', 'wt', 'native', 'UTF-8');
Note that, even if you don't manage to get the emojis shown in the MATLAB command window, opening the text file in an editor will show the emojis correctly.

Reading data from .txt file into Matlab

I have been trying in vain for days to do one seemingly simple thing--I want to read data from a .txt file that looks like this:
0.221351321
0.151351321
0.235165165
8.2254546 E-7
into Matlab. I've been able to load the data in the .txt file as a column vector using the fscanf command, like so:
U=fscanf(FileID, '%e')
provided that I go through the file first and remove the space before the 'E' wherever scientific notation occurs in the data set.
Since I have to generate a large number of such sets, it would be impractical to have to do a search-and-replace for every .txt file.
Is there a way for matlab to read the data as it appears, as in the above example (with the space preceding 'E'), and put it into a column vector?
For anyone who knows PARI-GP, an alternate fix would be to have the output devoid of spaces in the first place--but so far I haven't found a way to erase the space before 'E' in scientific notation, and I can't predict if a number in scientific notation will appear or not in the data set.
Thank you!

Thank you all for your help, I have found a solution. There is a way to eliminate the space from PARI-GP, so that the output .txt file has no spaces to begin with. I had the output set to "prettymatrix". One needs to enter the following:
? \o{0}
to change the output to "Raw," which eliminates the space before the "E" in scientific notation.
Thanks again for your help.

A simple way, may not be the best, is to read line by line, remove the space and convert back to floating point number.
For example,
x = []
tline = fgetl(FileID);
while ischar(tline)
x = [x str2num(tline(find(~isspace(tline))))]
tline = fgetl(FileID);
end

One liner:
data = str2double(strsplit(strrep(fileread('filename.txt'),' ',''), '\n'));
strrep removes all the spaces, strsplit takes each line as a separate string, and str2double coverts the strings to numbers.

Octave / Matlab - Reading fixed width file

I have a fixed width file format (original was input for a Fortran routine). Several lines of the file look like the below:
1078.0711005.481 932.978 861.159 788.103 716.076
How this actually should read:
1078.071 1005.481 932.978 861.159 788.103 716.076
I have tried various methods, textscan, fgetl, fscanf etc, however the problem I have is, as seen above, sometimes because of the fixed width of the original files there is no whitespace between some of the numbers. I cant seem to find a way to read them directly and I cant change the original format.
The best I have come up with so far is to use fgetl which reads the whole line in, then I reshape the result into an 8,6 array
A=fgetl
A=reshape(A,8,6)
which generates the following result
11
009877
703681
852186
......
049110
787507
118936
So now I have the above and thought I might be able to concatenate the rows of that array together to form each number, although that is seeming difficult as well having tried strcat, vertcat etc.
All of that seems a long way round so was hoping for some better suggestions.
Thanks.

If you can rely on three decimal numbers you can use a simple regular expression to generate the missing blanks:
s = '1078.0711005.481 932.978 861.159 788.103 716.076';
s = regexprep(s, '(\.\d\d\d)', '$1 ');
c = textscan(s, '%f');
Now c{1} contains your numbers. This will also work if s is in fact the whole file instead of one line.

You haven't mentioned which class of output you needed, but I guess you need to read doubles from the file to do some calculations. I assume you are able to read your file since you have results of reshape() function already. However, using reshape() function will not be efficient for your case since your variables are not fixed sized (i.e 1078.071 and 932.978).
If I did't misunderstand your problem:
Your data is squashed in some parts (i.e 1078.0711005.481 instead
of 1078.071 1005.481).
Fractional part of variables have 3 digits.
First of all we need to get rid of spaces from the string array:
A = A(~ismember(A,' '));
Then using the information that fractional parts are 3 digits:
iter = length(strfind(A, '.'));
for k=1:iter
[stat,ind] = ismember('.', A);
B(k)=str2double(A(1:ind+3));
A = A(ind+4:end);
end
B will be an array of doubles as a result.

Gnuplot from 3D datafile with strange format

Hi I am looking for a quick and dirty answer,
I try to make a surface or contourplot with gnuplot from a data file. The problem is the fileformat:
Its all in one row
line 1 till line 32 contain the values for the x-coordinates 1-32 and y=1,
line 33 is the value for x=1 and y=2 and so on...
I tried to do that with the "everyline" command but since they are not separated by a blank line this is not working.
Since this data file is an output file and from my program I get many of them, it is not practicable to modify each of them. It would be best if I find a way to do that directly with gnuplot.
I also tried it with "sed" but I am not yet further than extracing values from a specific line to a specific line.
If someone could help me with a quick applicable solution for this problem, it would be great.

You could use awk to add the needed columns, e.g.,
splot "<awk '{print int(NR/32),NR%32,$1}' datafile" u 1:2:3 with points
I may have switched the x and y columns around

How do I read the HITRAN2012 database into MATLAB?

The HITRAN database is a listing of molecular rotational-vibrational transitions. It is given in a text file where each line is 160 characters, with fixed width fields defining molecule, isotope, etc. The format is well documented, and there is even a program on the MathWorks File Exchange that will read in the database and simulate a portion of the spectrum. However, I need to read in a specific portion of the spectrum and then use it to do some fitting to a measured spectrum, so I need something much more custom.
As given in the comment section of that function, as well as elsewhere, the following line should read each line in properly:
database = which('HITRAN2012.par');
fid = fopen(database);
hitran = textscan(fid,'%2u%1u%12f%10f%10f%5f%5f%10f%4f%8f%15c%15c%15c%15c%6c%12c%1c%7f%7f','delimiter','','whitespace','');
fclose(fid);
The first two fields denote the molecule code, which runs from 1-47, and the isotope code which runs from 1-9.
Unfortunately, molecules 1-9 do not have a leading zero, and no matter what I do, it seems to silently confuse MATLAB. If I load in the entire database and then type
unique(hitran{1})
I do not get the numbers 1-47, but I get 10-92 with a few numbers missing. As far as I can figure, when MATLAB encounters a leading space, it shifts the line over and then pads the end, so that ' 12' becomes '12', but I'm not exactly sure. I have also tried
hitran = textscan(fid,'%160c','delimiter','\n','whitespace','');
and then tried to parse the resulting strings, but that also sometimes gets confused by the first space.
For instance, the first water line looks like
exampleHitranLine = ' 14 0.007002 1.165E-32 2.071E-14.05870.305 818.00670.590.000000 0 0 0 0 0 0 7 5 2 7 5 3 005540 02227 5 2 0 90.0 90.0';
The first bit of code comes across this line and returns '14' instead of ' 1' and '4'. If I just read in a subset that only contains molecule 1 (as in this example), then the second method of reading works fine. If I try to read in the entire database, however, the lines with molecule 1-9 are shifted the the left, which messes up all the other fields.
I should note that I've tried reading the numerical fields both as floats and as integers, and neither gives satisfactory results. The entire database in text form is nearly 700 MB, and so I need something that works as efficiently as possible.
What am I doing wrong?

I have a new file on the FileExchange that will read in HITRAN 2004+ format data. Please try it out and let me know if there are any issues with it.

I don't have an answer as to why this is happening, but I do have a solution. If anyone has an answer as to why, I'd be happy to accept it.
It is the leading space that is screwing things up. MATLAB is being a little too clever, and when textscan encounters a leading space, it decides that it's extra and discards it and moves on to the next two characters. To get it to properly read in the file, I had to go line by line and test whether the first character is a space and then replace it with a leading zero, like this:
database = which('HITRAN2012_First100Lines.par');
fileParams = dir(database);
K = fileParams.bytes/162;
hitran = cell(K,19);
fid = fopen(database);
for k = 1:K
hitranTemp = fgetl(fid);
if abs(hitranTemp(1)) == 32;
hitranTemp(1) = '0';
end
hitran(k,:) = deal(textscan(hitranTemp,'%2u%1u%12f%10f%10f%5f%5f%10f%4f%8f%15c%15c%15c%15c%6c%12c%1c%7f%7f','delimiter','','whitespace',''));
end
fclose(fid);
I'm working in MATLAB 2013a. Should I consider this to be a bug and report it? Is there some reason that the leading space should be gobbled up like this?
Update:
My workaround above was slow, but worked. Then I had to process the HITEMP database, which is several times larger, so I finally did submit a support ticket to MathWorks. The workaround suggested by MathWorks technical support is to read everything in as text and then convert. This saves a lot of disk reads and works.
fileParams = dir(database);
fid = fopen(database);
hitran = textscan(fid,'%2c%1c%12c%10c%10c%5c%5c%10c%4c%8c%15c%15c%15c%15c%6c%12c%1c%7c%7c','delimiter','','whitespace','');
fclose(fid);
moleculeNumber = uint8(str2num(hitran{1}));
isotopologueNumber = uint8(str2num(hitran{2});
vacuumWavenumber = str2num(hitran{3});
...
etc.
Depending on the application, for larger databases one would probably want to do this in chunks rather than all at once.
He also said he would forward the behavior to the development team for consideration in a future update.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse