How to save urls of images from a text file? - matlab

I have a text file with many URL's and have imported them into matlab via a cell with 1400 rows of URL's.
How do I take these urls of images and save them with a pre-determined size into a sub-folder?
urlsToImgs = importdata('ImageURLS.txt');
for i = 1:1
outfilename = websave(['PosImage: ', num2str(i)],['' urlsToImgs{i} '']);
end
This makes an error: The file name contains characters that are not contained in the filesystem encoding.
Certain operations may not work as expected.
Im guessing since I pasted it from the web it contains an invisible character, how can I delete this?

delete colon in filename. i.e.) change 'PosImage: ' to 'PosImage_'
you may not allowed to create file with special characters.

Related

Matlab dataimport

My matlab code for dataimport is giving me different results for what appear to be similar text files as input. Input1 gives me a normal cell with all lines from the text file as entries in the cell which i can reference using {i}.
Input2 gives me a scalar data structure where all numeric entries in my text file are converted to the input.data structure. I want all files to be converted to regular cell entries and I do not understand why for some files they are converted to scalar data structures.
Code: input = importdata(strcat(direct,'\',filename));
Input1 example: Correctly working dataimport, with text file on the right
File link: https://drive.google.com/open?id=1aHK4xivqEqJEmaA8Dv8Y0uW5giG-Bbip
Input2 example: Incorrectly working data import, with text file on the right FIle link: https://drive.google.com/open?id=1nzUj_wR1bNXFcGaSLGva6uVsxrk-R5vA
UTSL!
I'm guessing you are using GNU Octave although you are writing "Matlab" as topic of your question.
In importdata.m around line 178, the code tries to automatically detect the delimiter for your data:
delim = regexpi (row, '[-+\d.e*ij ]+([^-+\de.ij])[-+\de*.ij ]','tokens', 'once');
If you run this against W40A0060; you get A as delimiter because there is basically a number before and after it.
If you run this against W39E0016; you get {} as delimiter(empty) because the E could be part of a number in scientific notation and is therefore excluded.
Solution:
you really should add the correct delimiter to the importdata call and not trust that it's magically detected.
And if you just want the lines in a cell, use
strsplit (fileread ("W39E0016_Input2.txt"), "\n")
Analysis
This looks indeed strange!
EDIT: The cause for this strange looking behaviour has been deciphered by #Andy (See his solution).
When you use all outputs of importdata() function you can see what happens when reading the data:
[dat1,del1,headerrows1]=importdata('Input1.txt')
[dat2,del2,headerrows2]=importdata('Input2.txt')
For your first file it recognizes 69 header riws and no delimiter:
del1 = []
headerrows1 = 69
while in your second file only two header rows and a comma , delimiter is recognized
del2 = ','
headerrows2 = 2
I can not find an obvious reason in your files causing this different interpretation of data.
Suggestion
Your data format is rather complex. It is not a simple table like produced from excel. It has multiple lines with a different number of fields per line and varying data types. importdata() is not designed for this type of data. I suggest to write a specific import function for this kind of file. Have a look at textread() for a first guess. You can use it to read the lines of the files as text and later interpret it with sscanf() or use strsplit() to split the line contents into fields.

How to extract numbers in a filename from matlab?

I have to read out numbers and possibly some letters from large set of file names in a directory. The file names have a format as "aXXXX_bXX_XX_S.ext" where 'X' could be any number and 's' could be any letter or a string. How do I extract those numbers and string as separate cell array?
Thanks!
First you can read all the files inside your directory. Assuming the location of your folder is stored in the string path, use:
a=dir(mypath);
Now you have a structure a. File names are stored in a.name. Now you can work with it. Here is a very rough code. You loop over all the files, check if the first letter is a (there could be some hidden files, you don't need them). Then you extract the data you need from the eligible files.
n=0;
for i=1:numel(a)
if a(i).name(1)=='a'
n=n+1;
numbers{n}=strcat(a(i).name(2:5),a(i).name(8:9),a(i).name(11:12));
letters{n}=a(i).name(13:find(a(i).name=='.')-1);
end
end

How to create mat file containing video in it

I'm new to matlab programming.I have an image processing code which helps to load a mat file in it. the code accepts .mat file as input with video file in it.
filename=('C:\Users\HP\Desktop\Folder\Image\NVR_ch2_main_cut_35-41.asf');
s=load(filename);
s=struct2cell(s);
M=double(s{1});
if (length(size(M))==4)
M=squeeze(M(:,:,1,:));
end`
Error using load
Unknown text on line number 1 of ASCII file C:\Users\HP\Desktop\Folder\Image\NVR_ch2_main_cut_35-41.asf
"Seh".
Just use v = VideoReader(filename) instead of the load function.
For further information: http://ch.mathworks.com/help/matlab/ref/videoreader.html
Well obviously Matlab won't read your file because it contains things load won't accept.
Does your file comply to this: (from the Matlab reference , next time you should read this)
ASCII files must contain a rectangular table of numbers, with an equal
number of elements in each row. The file delimiter (the character
between elements in each row) can be a blank, comma, semicolon, or tab
character. The file can contain MATLAB comments (lines that begin with
a percent sign, %).
http://de.mathworks.com/help/matlab/ref/load.html#responsive_offcanvas
Read your first sentence. You say you want to load a .mat file. But filename ends with .asf which is some video format if I remember correctly.
You can't feed a video file into load.

How can I strip out "file separator" characters from CSS/text files?

My CSS files have become contaminated with "file separator" characters (AKA "INFORMATION SEPARATOR FOUR" or ALT/028 characters). How can I get rid of them?
This is the character:
http://www.fileformat.info/info/unicode/char/1c/index.htm
Background
I manage a number of .CSS text files that are fairly similar. Unfortunately a number of these file have somehow got "file separator" characters pasted into them. Although they do still seem to work in browsers any file that has one of these characters anywhere within it can not be indexed by my desktop search utility (X1 Search). And this is making them extremely hard for me to manage because I need to compare CSS files contantly.
[Bizarrely X1 Search ignores the character if the filename extension is .TXT but files to index the entire file if the filename extension is .CSS]
Worse this "file separator" character is almost invisible within my text editor (TextPad 7.2). The only way I can detect it is to make spaces and carriage returns visible and then it appears as blank space. Worse still it appears to be impossible to search for using text search.
To make it clear what I mean an example that I have pasted into this page. The "file separator" character is on LineB below
LineA
LineB
LineC
LineD
Is there any way to remove this character from multiple text (in this case CSS) files at once?
NB I do NOT want to remove the whole line, just the one character(!)
Thanks
J
P.S. I am running on Windows7 (x64). I am using TextPad 7.3.
I have eventually managed to answer my own question.
Text Crawler and the use of a regular expression of "\x1c" appears to be the answer.
Fwiw, both Agent Ransack and FileLocator Pro filter out any characters in the ASCII range 0-31 (excluding 0x09 - tab) from the input field.

updating line in large text file using scala

i've a large text file around 43GB in .ttl contains triples in the form :
<http://www.wikidata.org/entity/Q1001> <http://www.w3.org/2002/07/owl#sameAs> <http://la.dbpedia.org/resource/Mahatma_Gandhi> .
<http://www.wikidata.org/entity/Q1001> <http://www.w3.org/2002/07/owl#sameAs> <http://lad.dbpedia.org/resource/Mohandas_Gandhi> .
and i want to find the fastest way to update a specific line inside the file without rewriting all next text. either by updating it or deleting it and appending it to the end of the file
to access the specific line i use this code :
val lines = io.Source.fromFile("text.txt").getLines
val seventhLine = lines drop(10000000) next
If you want to use text files, consider a fixed length/record size for each line/record.
This way you can use a RandomAccessFile to seek to the exact position of each line by number: You just seek to line * LineSize, and then update it.
It will not really help, if you have to insert a new line. Other limitations are: The file size will grow (because of the fixed record length), and there will always be one record which is too big.
As for the initial conversion:
Get the maximum line length of the current file, then add 10% for example.
Now you have to convert the file once: Read a line from the text file, and convert it into a fixed-size record.
You could use a special character like | to separate the fields. If possible, use somthing like ;, so you get a .csv file
I suggest padding the remaining space it with spaces, so it still looks like a text file which you can parse with shell utilities.
You could use a \n to terminate the record.
For example
http://x.com|http://x.com|http://x.com|...\n
or
http://x.com;http://x.com;http://x.com;...\n
where each . at the end represents a space character. So it's still somehow compatible with a "normal" text file.
On the other hand, looking at your data, consider using a key-value data store like Redis: You could use the line number or the 1st URL as the key.