Matlab - read textfile and extract data from specific row - matlab

I have a text file which is an output from the command line of a diff program.
I now read the file using
fileID = fopen('runInfo.out','r');
file_dump = textscan(fileID, '%s', 'Delimiter', '\n');
% Find postion of Min Cp to mark field on interest
Row = find(~cellfun('isempty',strfind(file_dump{1}, 'Minimum Viscous Cp')));
fclose(fileID);
I now have the row from which I need to extract data.
The format of the file from this location looks something like
.OPERv c>
u/Uinf = 1.0050 v/Uinf = -0.0029
q/Uinf = 1.0050 Cp = -0.0100
.OPERv c>
u/Uinf = 1.0088 v/Uinf = -0.0075
q/Uinf = 1.0088 Cp = -0.0177
.OPERv c>
u/Uinf = 1.0156 v/Uinf = -0.0281
q/Uinf = 1.0160 Cp = -0.0323
Since I already have this data in my cellArray from textscan,
What I could think of was (pretty non robust)
u_line = vertcat(file_dump{1,1}...
{find(~cellfun('isempty',strfind(file_dump{1}, 'u/Uinf'))),1})
v = str2num(u_line(:,end-5:end));
and then somehow extract the numbers from these returned cells?
In the end, I need the four values of u/Uinf, v/Uinf, q/Uinf and Cp.
Is there a simpler option that I am missing?

If the text file from which to extract the data is the one you posted in the sedond box, you can use a AWK script.
The followng AWK script reads the text file and generates as output a MatLab .m file containing the values extracted form the text file as arrays.
AWK script
BEGIN {
u_cnt=1;
v_cnt=1;
c_cnt=1;
q_cnt=1;
}
{
if($1 == "u/Uinf")
{
u_Uinf[u_cnt]=$3
u_cnt++
}
if($4 == "v/Uinf")
{
v_Uinf[v_cnt]=$6
v_cnt++
}
if($4 == "Cp")
{
cp[c_cnt]=$6
c_cnt++
}
if($1 == "q/Uinf")
{
q_Uinf[q_cnt]=$3
q_cnt++
}
}
END {
print "u_Uinf_data=[" > "txt_file_data.m"
for(i=1;i<u_cnt;i++)
print u_Uinf[i] > "txt_file_data.m"
print "];" > "txt_file_data.m"
print "v_Uinf_data=[" > "txt_file_data.m"
for(i=1;i<v_cnt;i++)
print v_Uinf[i] > "txt_file_data.m"
print "];" > "txt_file_data.m"
print "q_Uinf_data=[" > "txt_file_data.m"
for(i=1;i<q_cnt;i++)
print q_Uinf[i] > "txt_file_data.m"
print "];" > "txt_file_data.m"
print "cp_data=[" > "txt_file_data.m"
for(i=1;i<c_cnt;i++)
print cp[i] > "txt_file_data.m"
print "];" > "txt_file_data.m"
}
Output .m file
u_Uinf_data=[
1.0050
1.0088
1.0156
];
v_Uinf_data=[
-0.0029
-0.0075
-0.0281
];
q_Uinf_data=[
1.0050
1.0088
1.0160
];
cp_data=[
-0.0100
-0.0177
-0.0323
];

Related

Format of the date field gets changed through Spreadsheet::ParseExcel

I have an Excel Sheet (A.xls) which has following content:
Date,Value
10/1/2020,36.91
10/2/2020,36.060001
I got following output using same script with Perl v5.6.1 on solaris 5.8
>>./a4_test.pl
INFO>Excel File=A.xls,#WorkSheet=1,AuthorID=Sahoo, Ashish
DEBUG>row 2 - col 0:10-2-20
DEBUG>row 2 - col 1:36.060001
And I got different output for date field using same script with perl v5.26.3 on solaris 5.11
>>./a4_test.pl
INFO>Excel File=A.xls,#WorkSheet=1,AuthorID=Sahoo, Ashish
DEBUG>row 2 - col 0:2020-10-02
DEBUG>row 2 - col 1:36.060001
I used 0.2602 version of Spreadsheet::ParseExcel on Solaris 8 machine and 0.65 version on Solaris 11 machine.
Why am I getting different output while reading date field from
excel sheet through Spreadsheet::ParseExcel module?
#!/usr/perl/5.12/bin/perl -w
use Spreadsheet::ParseExcel;
my $srce_file = "a.xls";
my $oExcel = new Spreadsheet::ParseExcel;
my $oBook = $oExcel->Parse($srce_file);
my %hah_sheet = ();
my $header_row = 1;
my($iR, $iC, $oWkS, $oWkC);
my $book = $oBook->{File};
my $nsheet= $oBook->{SheetCount};
my $author= $oBook->{Author};
unless($nsheet){
print "ERR>No worksheet found for source file:$srce_file\n";
return 0;
}
else{
print "INFO>Excel
File=$srce_file,#WorkSheet=$nsheet,AuthorID=$author\n";
}
for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++) {
next if($iSheet >0);
$oWkS = $oBook->{Worksheet}[$iSheet];
my $rows = 0;
for(my $iR = $oWkS->{MinRow}; defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) {
$rows++;
my $str_len = 0;
for(my $iC = $oWkS->{MinCol}; defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol}; $iC++) {
$oWkC = $oWkS->{Cells}[$iR][$iC];
next if ($iR <$header_row);
if (defined($oWkC)){
my $cell_value = $oWkC->Value;
$cell_value =~s/\n+//g; #removed newline inside the value
#
##if the first column at header row is null then skip. Column might be shifted
if($iR==$header_row && $iC == 0){
last unless($cell_value);
}
if($iR == $header_row){
$hah_sheet{$iR}{$iC} = uc($cell_value);
}else {
$hah_sheet{$iR}{$iC} = $cell_value;
$str_len += length($cell_value);
##View cell value by row/column
print "DEBUG>row ${iR} - col ${iC}:$cell_value\n";
}
}else{
$hah_sheet{$iR}{$iC} = ""; #keep position for NULL value
}
} # END of Column loop
} # END of Row loop
} # END of Worksheet
If you search for "date" in Changes, you see this:
0.33 2008.09.07
- Default format for formatted dates changed from 'm-d-yy' to 'yyyy-mm-dd'
This explains why you see different date formats between versions 0.2602 and 0.65 of Spreadsheet::ParseExcel.
If you always want your code to print the same format regardless of which version you are using, you could transform the date in your code. For example, if you always want to see yyyy-mm-dd:
$cell_value =~ s/^(\d+)-(\d+)-(\d+)$/sprintf '%04d-%02d-%02d', 2000+$3, $1, $2/e;
Or, vice versa:
$cell_value =~ s/^(\d+)-(\d+)-(\d+)$/sprintf '%0d-%0d-%02d', $2, $3, $1-2000/e;

How to make an input a file name?

I would like to take an input from the user and make this input a file name which I will be writing on my code after taking the input.
I would be appreciated if you could help me on this.
Thanks in advance!
MATLAB
filename = input('File name: ','s');
fileID = fopen(filename)
fprintf(fileID, 'That wasn''t so hard')
fclose(fileID)
input - Request user input
fopen - Open file, or obtain information about open files
fprintf - Write data to text file
Python
filename = raw_input('File name: ')
with open(filename, 'w') as f:
f.write('That wasn\'t so hard')
C
#include <stdio.h>
int main() {
char filename[81];
FILE* f;
while (1) {
printf("File name: ");
if (scanf("%80s") == 1) break;
}
f = fopen(filename, "w");
if (!f) {
perror("opening file");
return 1;
}
fprintf(f, "That wasn't so hard\n");
fclose(f);
return 0;
}
sh script
#!/bin/sh
echo -n "File name: "
read filename
echo "That wasn't so hard" > $filename

Compare two CSV files and show only the difference

I have two CSV files:
File1.csv
Time, Object_Name, Carrier_Name, Frequency, Longname
2013-08-05 00:00, Alpha, Aircel, 917.86, Aircel_Bhopal
2013-08-05 00:00, Alpha, Aircel, 915.13, Aircel_Indore
File2.csv
Time, Object_Name, Carrier_Name, Frequency, Longname
2013-08-05 00:00, Alpha, Aircel, 917.86, Aircel_Bhopal
2013-08-05 00:00, Alpha, Aircel, 815.13, Aircel_Indore
These are sample input files in actual so many headers and values will be there, so I can not make them hard coded.
In my expected output I want to keep the first two columns and the last column as it is as there won't be any change in the same and then the comparison should happen for the rest of the columns and values.
Expected output:
Time, Object_Name, Frequency, Longname
2013-08-05 00:00, 815.13, Aircel_Indore
How can I do this?
Please look at the links below, there are some examples scripts:
http://bytes.com/topic/perl/answers/647889-compare-two-csv-files-using-perl
Perl: Compare Two CSV Files and Print out differences
http://www.perlmonks.org/?node_id=705049
If you are not bound to Perl, here a solution using AWK:
#!/bin/bash
awk -v FS="," '
function filter_columns()
{
return sprintf("%s, %s, %s, %s", $1, $2, $(NF-1), $NF);
}
NF !=0 && NR == FNR {
if (NR == 1) {
print filter_columns();
} else {
memory[line++] = filter_columns();
}
} NF != 0 && NR != FNR {
if (FNR == 1) {
line = 0;
} else {
new_line = filter_columns();
if (new_line != memory[line++]) {
print new_line;
}
}
}' File1.csv File2.csv
This outputs:
Time, Object_Name, Frequany, Longname
2013-08-05 00:00, Alpha, 815.13, Aircel_Indore
Here the explanation:
#!/bin/bash
# FS = "," makes awk split each line in fields using
# the comma as separator
awk -v FS="," '
# this function selects the columns you want. NF is the
# the number of field. Therefore $NF is the content of
# the last column and $(NF-1) of the but last.
function filter_columns()
{
return sprintf("%s, %s, %s, %s", $1, $2, $(NF-1), $NF);
}
# This block processes just the first file, this is the aim
# of the condition NR == FNR. The condition NF != 0 skips the
# empty lines you have in your file. The block prints the header
# and then save all the other lines in the array memory.
NF !=0 && NR == FNR {
if (NR == 1) {
print filter_columns();
} else {
memory[line++] = filter_columns();
}
}
# This block processes just the second file (NR != FNR).
# Since the header has been already printed, it skips the first
# line of the second file (FNR == 1). The block compares each line
# against that one saved in the array memory (the corresponding
# line in the first file). The block prints just the lines
# that do not match.
NF != 0 && NR != FNR {
if (FNR == 1) {
line = 0;
} else {
new_line = filter_columns();
if (new_line != memory[line++]) {
print new_line;
}
}
}' File1.csv File2.csv
Answering #IlmariKaronen's questions would clarify the problem much better, but meanwhile I made some assumptions and took a crack at the problem - mainly because I needed an excuse to learn a bit of Text::CSV.
Here's the code:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
use Array::Compare;
use feature 'say';
open my $in_file, '<', 'infile.csv';
open my $exp_file, '<', 'expectedfile.csv';
open my $out_diff_file, '>', 'differences.csv';
my $text_csv = Text::CSV->new({ allow_whitespace => 1, auto_diag => 1 });
my $line = readline($in_file);
my $exp_line = readline($exp_file);
die 'Different column headers' unless $line eq $exp_line;
$text_csv->parse($line);
my #headers = $text_csv->fields();
my %all_differing_indices;
#array-of-array containings lists of "expected" rows for differing lines
# only columns that differ from the input have values, others are empty
my #all_differing_rows;
my $array_comparer = Array::Compare->new(DefFull => 1);
while (defined($line = readline($in_file))) {
$exp_line = readline($exp_file);
if ($line ne $exp_line) {
$text_csv->parse($line);
my #in_fields = $text_csv->fields();
$text_csv->parse($exp_line);
my #exp_fields = $text_csv->fields();
my #differing_indices = $array_comparer->compare([#in_fields], [#exp_fields]);
#all_differing_indices{#differing_indices} = (1) x scalar(#differing_indices);
my #output_row = ('') x scalar(#exp_fields);
#output_row[0, 1, #differing_indices, $#exp_fields] = #exp_fields[0, 1, #differing_indices, $#exp_fields];
$all_differing_rows[$#all_differing_rows + 1] = [#output_row];
}
}
my #columns_needed = (0, 1, keys(%all_differing_indices), $#headers);
$text_csv->combine(#headers[#columns_needed]);
say $out_diff_file $text_csv->string();
for my $row_aref (#all_differing_rows) {
$text_csv->combine(#{$row_aref}[#columns_needed]);
say $out_diff_file $text_csv->string();
}
It works for the File1 and File2 given in the question and produces the Expected output (except that the Object_Name 'Alpha' is present in the data line - I'm assuming that's a typo in the question).
Time,Object_Name,Frequany,Longname
"2013-08-05 00:00",Alpha,815.13,Aircel_Indore
I've created a script for it with very powerful linux tools. Link here...
Linux / Unix - Compare Two CSV Files
This project is about comparison of two csv files.
Let's assume that csvFile1.csv has XX columns and csvFile2.csv has YY columns.
Script I've wrote should compare one (key) column form csvFile1.csv with another (key) column from csvFile2.csv. Each variable from csvFile1.csv (row from key column) will be compared to each variable from csvFile2.csv.
If csvFile1.csv has 1,500 rows and csvFile2.csv has 15,000 total number of combinations (comparisons) will be 22,500,000. So this is very helpful way how to create Availability Report script which for example could compare internal product database with external (supplier's) product database.
Packages used:
csvcut (cut columns)
csvdiff (compare two csv files)
ssconvert (convert xlsx to csv)
iconv
curlftpfs
zip
unzip
ntpd
proFTPD
More you can find on my official blog (+example script):
http://damian1baran.blogspot.sk/2014/01/linux-unix-compare-two-csv-files.html

Altering multiple text files using grep awk sed perl or something else

I have multiple text files named split01.txt, split02.txt etc... with the data in the format below: (This is what I have)
/tmp/audio_files/n000001.wav;
/tmp/audio_files/n000002.wav;
/tmp/audio_files/n000003.wav;
/tmp/audio_files/p000004.wav;
/tmp/audio_files/p000005.wav;
I would like to create another file with the data taken from the split01.txt, split02.txt etc... file in the format below: (this is the format I would like to see)
[playlist]
NumberOfEntries=5
File000001=n000001.wav
Title000001=n000001.wav
File000002=n000002.wav
Title000002=n000002.wav
File000003=n000003.wav
Title000003=n000003.wav
File000004=p000004.wav
Title000004=p000004.wav
File000005=p000005.wav
Title000005=p000005.wav
Version=2
Can this be done in one instance? The reason I ask is that I'm going to be running/calling the command (awk,grep,sed,etc...) from inside of octave/matlab after the initial process has completed creating the audio files.
example: of what I mean in one instance below: (matlab/octave code)
system(strcat({'split --lines=3600 -d '},dirpathwaveformstmp,fileallplaylistStr,{' '},dirpathwaveformstmp,'allsplit'))
This splits a single file into multiple files with the names allsplit01 allsplit02 etc.. and each file only has a max of 3600 lines.
For those who asked this is creating playlist files for audio files I create with octave/matlab.
Any suggestions?
Here's one way you could do it with awk:
parse.awk
BEGIN {
print "[playlist]"
print "NumberOfEntries=" len "\n"
i = 1
}
{
gsub(".*/|;", "")
printf "File%06d=%s\n" , i, $0
printf "Title%06d=%s\n\n", i, $0
i++
}
END {
print "Version 2"
}
Run it like this:
awk -v len=$(wc -l < infile) -f parse.awk infile
Output:
[playlist]
NumberOfEntries=5
File000001=n000001.wav
Title000001=n000001.wav
File000002=n000002.wav
Title000002=n000002.wav
File000003=n000003.wav
Title000003=n000003.wav
File000004=p000004.wav
Title000004=p000004.wav
File000005=p000005.wav
Title000005=p000005.wav
Version 2
If you're writing your program in Octave, why don't you do it in Octave as well? The language is not limited to numerical analysis. What you're trying to do can be done quite easily with Octave functions.
filepath = "path for input file"
playlistpath = "path for output file"
## read file and prepare cell array for printing
files = strsplit (fileread (filepath)', "\n");
if (isempty (files{end}))
files(end) = [];
endif
[~, names, exts] = cellfun (#fileparts, files, "UniformOutput", false);
files = strcat (names, exts);
files(2,:) = files(1,:);
files(4,:) = files(1,:);
files(1,:) = num2cell (1:columns(files))(:);
files(3,:) = num2cell (1:columns(files))(:);
## write playlist
[fid, msg] = fopen (playlistpath, "w");
if (fid < 0)
error ("Unable to fopen %s for writing: %s", playlistpath, msg);
endif
fprintf (fid, "[playlist]\n");
fprintf (fid, "NumberOfEntries=%i\n", columns (files));
fprintf (fid, "\n");
fprintf (fid, "File%06d=%s\nTitle%06d=%s\n\n", files{:});
fprintf (fid, "Version 2");
if (fclose (fid))
error ("Unable to fclose file %s with FID %i", playlistpath, fid);
endif

Renaming names in a file using another file without using loops

I have two files:
(one.txt) looks Like this:
>ENST001
(((....)))
(((...)))
>ENST002
(((((((.......))))))
((((...)))
I have like 10000 more ENST
(two.txt) looks like this:
>ENST001 110
>ENST002 59
and so on for the rest of all ENSTs
I basically would like to replace the ENSTs in the (one.txt) by the combination of the two fields in the (two.txt) so the results will look like this:
>ENST001_110
(((....)))
(((...)))
>ENST002_59
(((((((.......))))))
((((...)))
I wrote a matlab script to do so but since it loops for all lines in (two.txt) it take like 6 hours to finish, so I think using awk, sed, grep, or even perl we can get the result in few minutes. This is what I did in matlab:
frf = fopen('one.txt', 'r');
frp = fopen('two.txt', 'r');
fw = fopen('result.txt', 'w');
while feof(frf) == 0
line = fgetl(frf);
first_char = line(1);
if strcmp(first_char, '>') == 1 % if the line in one.txt start by > it is the ID
id_fold = strrep(line, '>', ''); % Reomve the > symbol
frewind(frp) % Rewind two.txt file after each loop
while feof(frp) == 0
raw = fgetl(frp);
scan = textscan(raw, '%s%s');
id_pos = scan{1}{1};
pos = scan{2}{1};
if strcmp(id_fold, id_pos) == 1 % if both ids are the same
id_new = ['>', id_fold, '_', pos];
fprintf(fw, '%s\n', id_new);
end
end
else
fprintf(fw, '%s\n', line); % if the line doesn't start by > print it to results
end
end
One way using awk. FNR == NR process first file in arguments and saves each number. Second condition process second file, and when first field matches with a key in the array modifies that line appending the number.
awk '
FNR == NR {
data[ $1 ] = $2;
next
}
FNR < NR && data[ $1 ] {
$0 = $1 "_" data[ $1 ]
}
{ print }
' two.txt one.txt
Output:
>ENST001_110
(((....)))
(((...)))
>ENST002_59
(((((((.......))))))
((((...)))
With sed you can at first run only on two.txt you can make a sed commands to replace as you want and run it at one.txt:
First way
sed "$(sed -n '/>ENST/{s=.*\(ENST[0-9]\+\)\s\+\([0-9]\+\).*=s/\1/\1_\2/;=;p}' two.txt)" one.txt
Second way
If files are huge you'll get too many arguments error with previous way. Therefore there is another way to fix this error. You need execute all three commands one by one:
sed -n '1i#!/bin/sed -f
/>ENST/{s=.*\(ENST[0-9]\+\)\s\+\([0-9]\+\).*=s/\1/\1_\2/;=;p}' two.txt > script.sed
chmod +x script.sed
./script.sed one.txt
The first command will form the sed script that will be able to modify one.txt as you want. chmod will make this new script executable. And the last command will execute command. So each file is read only once. There is no any loops.
Note that first command consist from two lines, but still is one command. If you'll delete newline character it will break the script. It is because of i command in sed. You can look for details in ``sed man page.
This Perl solution sends the modified one.txt file to STDOUT.
use strict;
use warnings;
open my $f2, '<', 'two.txt' or die $!;
my %ids;
while (<$f2>) {
$ids{$1} = "$1_$2" if /^>(\S+)\s+(\d+)/;
}
open my $f1, '<', 'one.txt' or die $!;
while (<$f1>) {
s/^>(\S+)\s*$/>$ids{$1}/;
print;
}
Turn the problem on its head. In perl I would do something like this:
#!/usr/bin/perl
open(FH1, "one.txt");
open(FH2, "two.txt");
open(RESULT, ">result.txt");
my %data;
while (my $line = <FH2>)
{
chomp(line);
# Delete leading angle bracket
$line =~ s/>//d;
# split enst and pos
my ($enst, $post) = split(/\s+/, line);
# Store POS with ENST as key
$data{$enst} = $pos;
}
close(FH2);
while (my $line = <FH1>)
{
# Check line for ENST
if ($line =~ m/^>(ENST\d+)/)
{
my $enst = $1;
# Get pos for ENST
my $pos = $data{$enst};
# make new line
$line = '>' . $enst . '_' . $pos . '\n';
}
print RESULT $line;
}
close(FH1);
close(RESULT);
This might work for you (GNU sed):
sed -n '/^$/!s|^\(\S*\)\s*\(\S*\).*|s/^\1.*/\1_\2/|p' two.txt | sed -f - one.txt
Try this MATLAB solution (no loops):
%# read files as cell array of lines
fid = fopen('one.txt','rt');
C = textscan(fid, '%s', 'Delimiter','\n');
C1 = C{1};
fclose(fid);
fid = fopen('two.txt','rt');
C = textscan(fid, '%s', 'Delimiter','\n');
C2 = C{1};
fclose(fid);
%# use regexp to extract ENST numbers from both files
num = regexp(C1, '>ENST(\d+)', 'tokens', 'once');
idx1 = find(~cellfun(#isempty, num)); %# location of >ENST line
val1 = str2double([num{:}]); %# ENST numbers
num = regexp(C2, '>ENST(\d+)', 'tokens', 'once');
idx2 = find(~cellfun(#isempty, num));
val2 = str2double([num{:}]);
%# construct new header lines from file2
C2(idx2) = regexprep(C2(idx2), ' +','_');
%# replace headers lines in file1 with the new headers
[tf,loc] = ismember(val2,val1);
C1( idx1(loc(tf)) ) = C2( idx2(tf) );
%# write result
fid = fopen('three.txt','wt');
fprintf(fid, '%s\n',C1{:});
fclose(fid);