delete rows with character in cell array - matlab

I need some basic help. I have a cell array:
TITLE 13122423
NAME Bob
PROVIDER James
and many more rows with text...
234 456 234 345
324 346 234 345
344 454 462 435
and many MANY (>4000) more with only numbers
text
text
and more text and mixed entries
Now what I want is to delete all the rows where the first column contain a character, and end up with only those rows containing numbers. Row 44 - 46 in this example.
I tried to use
rawdataTruncated(strncmp(rawdataTruncated(:, 1), 'A', 1), :) = [];
but then i need to go throught the whole alphabet, right?

Given data of the form:
C = {'FIRSTX' '350.0000' '' '' ; ...
'350.0000' '0.226885' '254.409' '0.755055'; ...
'349.9500' '0.214335' '254.41' '0.755073'; ...
'250.0000' 'LASTX' '' '' };
You can remove any row that has character strings containing letters using isstrprop, cellfun, and any like so:
index = ~any(cellfun(#any, isstrprop(C, 'alpha')), 2);
C = C(index, :)
C =
2×4 cell array
'350.0000' '0.226885' '254.409' '0.755055'
'349.9500' '0.214335' '254.41' '0.755073'

Related

How to find all the rows that share the same value on two columns?

Dataset example:
sex favourite_meal favourite_color age weight(kg)
Tom M pizza red 18 90
Jess F lasagna blue 20 43
Mark M pizza red 30 68
David M hamburger purple 25 70
Lucy F sushi green 18 47
How can I compare each row with the others and find which one share for example the same (sex,favourite_meal) couple. The idea is to check on a large dataset which rows share the same values on two attributes (columns). In this example would be Tom and Mark which share (M, pizza); how to do the same on a large dataset where you can't check by eye?
One awk option is process the source data twice. First get the count of uniq values in columns 2 and 3 into an array. Then use those counts to filter the data:
awk 'NR==FNR {p[$2" "$3]++} FNR<NR {for(n in p) if (p[n]>1 && $2" "$3==n) { print}}' m.dat m.dat
Tom M pizza red 18 90
Mark M pizza red 30 68
you can use pandas to do this
import pandas as pd
# Initialize data to Dicts of series.
d = {'Name': pd.Series(['Tom', 'Jess', 'Mark', 'David', 'Lucy']),
'sex': pd.Series(['M', 'F', 'M', 'M', 'F']),
'favorite_meal': pd.Series(['pizza', 'lasanga', 'pizza', 'hamburger', 'sushi']),
'favorite_color': pd.Series(['red', 'blue', 'red', 'purple', 'green']),
'age': pd.Series([18, 20, 30, 20, 18]),
' weights(kg)': pd.Series([90, 43, 68, 70, 47])
}
df = pd.DataFrame(d)
for y, x in df.groupby(['favorite_meal', 'sex']):
print("....................")
print(x.to_string(index=0, header=0))
In each iteration, the for loop is operating on a set of similar rows.

Regular expression for MS SQL

I want arrange these numbers first two numbers into following categories i confuse because it repeats it self.
Thank you fo your help.
210690, 391910, 392490, 880390, 847321, 940290, 300420, 300410, 901890, 901890, 030269,080530, 630399
1-5
6-14
5
16-24
25-27
28-38
39-40
41-41
44-46
47-49
50-63
64-67
68-70
71
72-83
84-85
86-89
90-92
93
94-96
97
98-99

How to convert a text file to a dictionary in python?

I have the following text file.
BREST:
Rennes 244
RENNES:
Breast 244
Caen 176
Nantes 107
Paris 348
CAEN:
Calais 120
Paris 241
Rennes 176
CALAIS:
Caen 120
Nancy 534
Paris 297
I am trying to convert this to a dictionary with the capitalized words as the keys. It should look like this:
roads = {'BREST': ['Rennes'],
'RENNES': ['Brest', 'Caen', 'Nantes', 'Paris'],
'CAEN': ['Calais', 'Paris', 'Rennes'],
'CALAIS': ['Caen', 'Nancy', 'Paris']
}
Assuming that you are reading from a file called input.txt, this produces the desired result.
from collections import defaultdict
d = defaultdict(list)
with open('input.txt', 'r') as f:
for line in f.read().splitlines():
if not line: continue
if line.endswith(':'):
name = line.strip(':')
else:
d[name].append(line.split()[0])
If you want to keep the numbers, you can create a dictionary for each each entry in the file and store the contact with the associated number.
from collections import defaultdict
d = defaultdict(dict)
with open('input.txt', 'r') as f:
for line in f.read().splitlines():
if not line: continue
if line.endswith(':'):
name = line.strip(':')
else:
contact, number = line.split(' ')
d[name][contact] = int(number)
Which produces the following dictionary.
{'BREST': {'Rennes': 244},
'CAEN': {'Calais': 120, 'Paris': 241, 'Rennes': 176},
'CALAIS': {'Caen': 120, 'Nancy': 534, 'Paris': 297},
'RENNES': {'Breast': 244, 'Caen': 176, 'Nantes': 107, 'Paris': 348}}

How to sum values in a column grouped by values in the other

I have a large file consisting data in 2 columns
100 5
100 10
100 10
101 2
101 4
102 10
102 2
I want to sum the values in 2nd column with matching values in column 1. For this example, the output I'm expecting is
100 25
101 6
102 12
I'm trying to work on this using bash script preferably. Can someone explain me how can I do this
Using awk:
awk '{a[$1]+=$2}END{for(i in a){print i, a[i]}}' inputfile
For your input, it'd produce:
100 25
101 6
102 12
In a perl oneliner
perl -lane "$s{$F[0]} += $F[1]; END { print qq{$_ $s{$_}} for keys %s}" file.txt
You can use an associative array. The first column is the index and the second becomes what you add to it.
#!/bin/bash
declare -A columns=()
while read -r -a line ; do
columns[${line[0]}]=$((${columns[${line[0]}]} + ${line[1]}))
done < "${1}"
for idx in ${!columns[#]} ; do
echo "${idx} ${columns[${idx}]}"
done
Using awk and maintain the order:
awk '!($1 in a){a[$1]=$2; b[++i]=$1;next} {a[$1]+=$2} END{for (k=1; k<=i; k++) print b[k], a[b[k]]}' file
100 25
101 6
102 12
Python is my choice:
d = {}
for line in f.readlines():
key,value = line.split()
if d[key] == None:
d[key] = 0
d[key] += value
print d
Why would you want a bash script?

Matlab Code for Reading Text file with inconsistent rows

I am new to Matlab and have been working my way through using Google. But now I have hit the wall it seems.
I have a text file which looks like following:
Information is for illustration reasons only
Aggregated Results
Date;$/Val1;Total $;Exp. Val1;Act. Val1
01-Oct-2008; -5.20; -1717; 330; 323
02-Oct-2008; -1.79; -595; 333; 324
03-Oct-2008; -2.29; -765; 334; 321
04-Oct-2008; -2.74; -917; 335; 317
Total Period; -0.80; -8612; 10748; 10276
Aggregated Results for location State PA
Date;$/Val1;Total $;Exp. Val1;Act. Val1
01-Oct-2008; -5.20; -1717; 330; 323
02-Oct-2008; -1.79; -595; 333; 324
03-Oct-2008; -2.29; -765; 334; 321
Total Period; -0.80; -8612; 10748; 10276
Results for account A1
Date;$/Val1;Total $;Exp. Val1;Act. Val1
01-Oct-2008; -7.59; -372; 49; 51
Total Period; -0.84; -1262; 1502; 1431
Results for account A2
Date;$/MWh;Total $;Exp. MWh;Act. MWh
01-Oct-2008; -8.00; -392; 49; 51
02-Oct-2008; 0.96; 47; 49; 51
03-Oct-2008; -0.75; -37; 50; 48
04-Oct-2008; 1.28; 53; 41; 40
Total Period; -0.36; -534; 1502; 1431
I want to extract following information in a cell/matrix format so that I can use it later to selectively do operations like average of accounts A1 and A2 or average of PA and A1, etc.
PA -0.8
A1 -0.84
A2 -0.036
I'd go this way:
fid = fopen(filename,'r');
A = textscan(fid,'%s','delimiter','\r');
A = A{:};
str_i = 'Total Period';
ix = find(strncmp(A,str_i,length(str_i)));
res = arrayfun(#(i) str2num(A{ix(i)}(length(str_i)+2:end)),1:numel(ix),'UniformOutput',false);
res = cat(2,res{:});
This way you'll get all the numerical values after a string 'Total Period' in a matrix, so that you may pick the values you need.
Similarly you may operate with strings PA, A1 and A2.
Matlab is not that nice when it comes to dealing with messy data. You may want to preprocess it a bit first.
However, here is an easy general way to import mixed numeric and non-numeric data in Matlab for a limited number of normal sized files.
Step 1: Copy the contents of the file into excel and save it as xls or xlsx
Step 2: Use xlsread
[NUM,TXT,RAW]=xlsread('test.xlsx')
From there the parsing should be maneagable.
Hopefully they will add non-numeric support to csvread or dlmread in the future.