I am trying to import data from text file named xMat.txt which has the data in the following format.
200 space separated elements in one line and some 767 lines.
This is how xMat.txt looks.
386.0 386.0 388.0 394.0 402.0 413.0 ... .0 800.0 799.0 796
801.0 799.0 799.0 802.0 802.0 80 ... 399.0 397.0 394.0 391
.
.
.
This is my file - for reference.
When I try to read the file using
file = fopen('xMat.txt','r')
c = textscan(file,'%f');
I get the output as:
> c = { [1,1] =
> 386
> 386
> 388
> 394
> 402
> 413
> 427
> 442
> 458
> 473
> 487
> 499
> 509
> 517
> 524 ... in column format
What I need is a matrix of size (767X200). How can I do this?
I wouldn't use textscan in this case because your text file is purely numeric. Your text file contains 767 rows of 200 numbers per row where each number is delimited by a space. You couldn't get it to be any better suited for use with dlmread (MATLAB doc, Octave doc). dlmread can do this for you in one go:
c = dlmread('xMat.txt');
c will contain a 767 x 200 array for you that contains the data stored in the text file xMat.txt. Hopefully you can dump textscan in this case because what you're really after is trying to read your data into Octave... and dlmread does the job for you quite nicely.
Related
I am trying to load a dataset for a classification task using pytorch, this is the code i use:
data_transforms = {
'train': transforms.Compose([
transforms.RandomRotation(2.8),
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.5), (0.5))
]),
'valid': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize((0.5), (0.5))
])
}
print(os.listdir())
# TODO: Load the datasets with ImageFolder
image_datasets = {x: datasets.ImageFolder(os.path.join("/content/drive/MyDrive/DatasetPersonale", x),
data_transforms[x])
for x in ['train', 'valid']}
# TODO: Using the image datasets and the trainforms, define the dataloaders
batch_size = 32
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size,
shuffle=True, num_workers=4)
for x in ['train', 'valid']}
class_names = image_datasets['train'].classes
print(class_names)
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'valid']}
the code worked fine but as my dataset was in grayscale, I needed to convert it to RGB so I used this code:
rootdir = '/content/drive/MyDrive/DatasetPersonale/trainRGB'
print("Train")
for subdir, dirs, files in os.walk(rootdir):
for file in files:
filePath = os.path.join(subdir, file)
name = os.path.basename(filePath)
img=Image.open(filePath, mode="r")
print(img.mode)
if img.mode != "RGB":
RGBimg=img.convert("RGB")
RGBimg.save(filePath,format=jpeg)
now my images are still jpeg, but now they are RGB and not L. the problem is that if I go to rerun the code to load the dataset I get this error
FileNotFoundError Traceback (most recent call last)
<ipython-input-15-3dace4b0f21b> in <module>()
19 image_datasets = {x: datasets.ImageFolder(os.path.join("/content/drive/MyDrive/DatasetPersonale", x),
20 data_transforms[x])
---> 21 for x in ['trainRGB', 'validRGB']}
22
23 # TODO: Using the image datasets and the trainforms, define the dataloaders
4 frames
<ipython-input-15-3dace4b0f21b> in <dictcomp>(.0)
19 image_datasets = {x: datasets.ImageFolder(os.path.join("/content/drive/MyDrive/DatasetPersonale", x),
20 data_transforms[x])
---> 21 for x in ['trainRGB', 'validRGB']}
22
23 # TODO: Using the image datasets and the trainforms, define the dataloaders
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/folder.py in __init__(self, root, transform, target_transform, loader, is_valid_file)
311 transform=transform,
312 target_transform=target_transform,
--> 313 is_valid_file=is_valid_file)
314 self.imgs = self.samples
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/folder.py in __init__(self, root, loader, extensions, transform, target_transform, is_valid_file)
144 target_transform=target_transform)
145 classes, class_to_idx = self.find_classes(self.root)
--> 146 samples = self.make_dataset(self.root, class_to_idx, extensions, is_valid_file)
147
148 self.loader = loader
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/folder.py in make_dataset(directory, class_to_idx, extensions, is_valid_file)
190 "The class_to_idx parameter cannot be None."
191 )
--> 192 return make_dataset(directory, class_to_idx, extensions=extensions, is_valid_file=is_valid_file)
193
194 def find_classes(self, directory: str) -> Tuple[List[str], Dict[str, int]]:
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/folder.py in make_dataset(directory, class_to_idx, extensions, is_valid_file)
100 if extensions is not None:
101 msg += f"Supported extensions are: {', '.join(extensions)}"
--> 102 raise FileNotFoundError(msg)
103
104 return instances
FileNotFoundError: Found no valid file for the classes .ipynb_checkpoints. Supported extensions are: .jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp
Does someone know why this error appears? I checked the extension of all the files and they are jpeg.
Thank you.
Problem: This is because of .ipynb_checkpoints folder inside the folder /content/drive/MyDrive/DatasetPersonale/trainRGB which contains files (invalid images) cannot be read as images that have valid extensions (.jpg, .jpeg, .png, .ppm, .bmp, .pgm, .tif, .tiff, .webp).
Solution: You can save all your images in a subfolder namely 'images' and then change your root folder to /content/drive/MyDrive/DatasetPersonale/trainRGB/images to avoid reading the .ipynb_checkpoints folder with your images.
In the recent VS Code release, they added this feature to view the active variables in the Jupyter Notebook and also, view the values in the variable with Data Viewer.
However, every time I am trying to view the values in Data Viewer, VS Code is throwing error below. It says that the reason is that the object of data type is Int64 and not string, but I am sure that should not be the reason to not show the variable. Anyone facing similar issues. I tried with a simple data frame and it's working fine.
Error: Failure during variable extraction:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-eae5f1f55b35> in <module>
97
98 # Transform this back into a string
---> 99 print(_VSCODE_json.dumps(_VSCODE_targetVariable))
100 del _VSCODE_targetVariable
101
~/anaconda3/lib/python3.7/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
229 cls is None and indent is None and separators is None and
230 default is None and not sort_keys and not kw):
--> 231 return _default_encoder.encode(obj)
232 if cls is None:
233 cls = JSONEncoder
~/anaconda3/lib/python3.7/json/encoder.py in encode(self, o)
197 # exceptions aren't as detailed. The list call should be roughly
198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
200 if not isinstance(chunks, (list, tuple)):
201 chunks = list(chunks)
~/anaconda3/lib/python3.7/json/encoder.py in iterencode(self, o, _one_shot)
255 self.key_separator, self.item_separator, self.sort_keys,
256 self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)
258
259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
~/anaconda3/lib/python3.7/json/encoder.py in default(self, o)
177
178 """
--> 179 raise TypeError(f'Object of type {o.__class__.__name__} '
180 f'is not JSON serializable')
181
TypeError: Object of type int64 is not JSON serializable
I would like to understand one thing :
When I write : Propreties.Device.Time = Data.Device(lobo(:,2),1) - Data.Device(lobo(:,1),1) I obtain a stockage of all the differencies (it's great)
But when I write : Propreties.Device.Prop = sum(Data.Device(lobo(:,1) : lobo(:,2)),2)*dt I don't obtain a stockage of all results (that I would like to obtain as previous exemple) but an overwrite of each of them so at the end I have only one value :-/
Could someone explain to me what differs between the two examples and what I could implement to obtain in the second example the same type of result as in the first (ie a list of results and not a single result) (without making a loop) ?
(Matlab verison : R2017a)
Some data for exemple :
Data.Device = [1.86000000000000 675 0;1.87000000000000 685 0;1.88000000000000 695 0;1.89000000000000 705 0;1.90000000000000 710 5;1.91000000000000 715 50;1.92000000000000 700 120;1.93000000000000 685 180;1.94000000000000 655 235;1.95000000000000 620 285;1.96000000000000 565 305;1.97000000000000 505 315;1.98000000000000 435 335;1.99000000000000 360 345;2 285 355];
lobo = [1 5; 6 15];
dt = 0.01
Propreties.Device.Time = Data.Device(lobo(:,2),1) - Data.Device(lobo(:,1),1);
Propreties.Device.Prop = sum(Data.Device(lobo(:,1) : lobo(:,2)),2)*dt
I have a text output from a program with a set format. I need to parse ~200 of them to extract an information. I tried in MATLAB with 'textscan' but did not work. Following is the input:
MOTIFS SUMMARY:
1) TTATAGCCGC (GCGGCTATAA) 1.986
2) AAACCGCCTC (GAGGCGGTTT) 1.865
DETAILED RESULTS:
1) TTATAGCCGC (GCGGCTATAA) 1.986
Matrix: MAT1 TTATAGCCGC
A 0.1249 0.177 0.7364 0.1189 0.7072 0.1149 0.09858 0.1096
C 0.0899 0.07379 0.1136 0.1298 0.08662 0.1293 0.7528 0.721
G 0.06828 0.1284 0.07195 0.1031 0.1352 0.6708 0.05556 0.0713
T 0.7169 0.6209 0.07802 0.6482 0.07096 0.08492 0.09305 0.09804
OCCURRENCES:
>GENE_1 1 TTATAGCCGC 1 561 +
>GENE_2 24 TAATAGCCGC 0.928699 762 -
>GENE_3 10 ATATAGCCGC 0.904905 185 -
>GENE_1 7 TTATAGCAGC 0.901785 726 +
**********
2) AAACCGCCTC (GAGGCGGTTT) 1.865
Matrix: MAT2 AAACCGCCTC
A 0.653 0.7401 0.7763 0.1323 0.09619 0.09134 0.07033 0.1383
C 0.1163 0.07075 0.09441 0.749 0.6347 0.1132 0.6559 0.6982
G 0.09136 0.09402 0.07385 0.04209 0.1799 0.7332 0.1241 0.07568
T 0.1393 0.09518 0.05541 0.07659 0.08921 0.06234 0.1497 0.08786
OCCURRENCES:
>GENE_1 21 AAACCGCCTC 1 963 +
>GENE_2 14 AAACGGCCTC 0.928198 212 +
>GENE_2 8 AAACCGTCTC 0.92009 170 +
>GENE_4 3 TAACCGCCTC 0.918883 370 +
**********
I am trying to count the unique() occurrence under each motif and add it to the MOTIF SUMMARY and a final average of them. My expected output is:
MOTIFS SUMMARY:
1) TTATAGCCGC (GCGGCTATAA) 1.986 3
2) AAACCGCCTC (GAGGCGGTTT) 1.865 3
AVERAGE OCCURRENCE: 3
For motif 1, unique occurrence is 3 (GENE_1, GENE_2, GENE_3). Similarly for motif 2, it is again 3 (GENE_1, GENE_2, GENE_4)
How can I use OCCURRENCES and ****** as blocks ? so that, I can regexp GENE_x to store it and count.
Kindly help.
Thanks,
AP
You better try to change the original text file so that it will be legal matlab m file code, then just use 'eval' function to run it .
Most of the job will be to find where to insert '=' and '[' ']' and '%' for ignore parts.
If all files are identical in format than it will be easy.
I am new to Matlab and have been working my way through using Google. But now I have hit the wall it seems.
I have a text file which looks like following:
Information is for illustration reasons only
Aggregated Results
Date;$/Val1;Total $;Exp. Val1;Act. Val1
01-Oct-2008; -5.20; -1717; 330; 323
02-Oct-2008; -1.79; -595; 333; 324
03-Oct-2008; -2.29; -765; 334; 321
04-Oct-2008; -2.74; -917; 335; 317
Total Period; -0.80; -8612; 10748; 10276
Aggregated Results for location State PA
Date;$/Val1;Total $;Exp. Val1;Act. Val1
01-Oct-2008; -5.20; -1717; 330; 323
02-Oct-2008; -1.79; -595; 333; 324
03-Oct-2008; -2.29; -765; 334; 321
Total Period; -0.80; -8612; 10748; 10276
Results for account A1
Date;$/Val1;Total $;Exp. Val1;Act. Val1
01-Oct-2008; -7.59; -372; 49; 51
Total Period; -0.84; -1262; 1502; 1431
Results for account A2
Date;$/MWh;Total $;Exp. MWh;Act. MWh
01-Oct-2008; -8.00; -392; 49; 51
02-Oct-2008; 0.96; 47; 49; 51
03-Oct-2008; -0.75; -37; 50; 48
04-Oct-2008; 1.28; 53; 41; 40
Total Period; -0.36; -534; 1502; 1431
I want to extract following information in a cell/matrix format so that I can use it later to selectively do operations like average of accounts A1 and A2 or average of PA and A1, etc.
PA -0.8
A1 -0.84
A2 -0.036
I'd go this way:
fid = fopen(filename,'r');
A = textscan(fid,'%s','delimiter','\r');
A = A{:};
str_i = 'Total Period';
ix = find(strncmp(A,str_i,length(str_i)));
res = arrayfun(#(i) str2num(A{ix(i)}(length(str_i)+2:end)),1:numel(ix),'UniformOutput',false);
res = cat(2,res{:});
This way you'll get all the numerical values after a string 'Total Period' in a matrix, so that you may pick the values you need.
Similarly you may operate with strings PA, A1 and A2.
Matlab is not that nice when it comes to dealing with messy data. You may want to preprocess it a bit first.
However, here is an easy general way to import mixed numeric and non-numeric data in Matlab for a limited number of normal sized files.
Step 1: Copy the contents of the file into excel and save it as xls or xlsx
Step 2: Use xlsread
[NUM,TXT,RAW]=xlsread('test.xlsx')
From there the parsing should be maneagable.
Hopefully they will add non-numeric support to csvread or dlmread in the future.