Does eof() look at lines or data? C++ - eof

Let's say I am trying to read in data line by line from a file called input.txt. There's about 20 lines and each line consists of 3 different data types. If I use this code:
while(!file.eof){ ..... }
Does this function look at only one data type from each line per iteration, or does it look at the all the data types at once for each line per iteration--so the next iteration would look at the next line instead of the next data type?
Many thanks.

.eof() looks at the end of file flag. The flag is set after you run over the end of the file. This is not desirable.
A great blog post on how this works and best practice can be found here.
Basically, use
std::string line;
while(getline(file, line)) { ... }
or
while (file >> some_data) { ... }
as it will notice errors and the end of the file at the correct time and act accordingly.

Related

MATLAB fwrite\fread issue: two variables are being concatenated

I am reading in a binary EDF file and I have to split it into multiple smaller EDF files at specific points and then adjust some of the values inside. Overall it works quite well but when I read in the file it combines 2 character arrays with each other. Obviously everything afterwords gets corrupted as well. I am at a dead end and have no idea what I'm doing wrong.
The part of the code (writing) that has to contain the problem:
byt=fread(fid,8,'*char');
fwrite(tfid,byt,'*char');
fwrite(tfid,fread(fid,44));
%new number of records
s = records;
fwrite(tfid,s,'*char');
fseek(fid,8,0);
%test
fwrite(tfid,fread(fid,8,'*char'),'*char');
When I use the reader it combines the records (fwrite(tfid,s,'*char'))
with the value of the next variable. All variables before this are displayed correctly. The relevant code of the reader:
hdr.bytes = str2double(fread(fid,8,'*char')');
reserved = fread(fid,44);%#ok
hdr.records = str2double(fread(fid,8,'*char')');
if hdr.records == -1
beep
disp('There appears to be a problem with this file; it returns an out-of-spec value of -1 for ''numberOfRecords.''')
disp('Attempting to read the file with ''edfReadUntilDone'' instead....');
[hdr, record] = edfreadUntilDone(fname, varargin);
return
end
hdr.duration = str2double(fread(fid,8,'*char')');
The likely problem is that your character array s does not have 8 characters in it, but you expect there to be 8 when you read it from the file. Whatever the number of characters in the array is, that's how many values fwrite will write out to the file. Anything less than 8 characters and you'll end up reading part of the next piece of data when you read from the file.
One fix would be to pad s with blanks before writing it:
s = [blanks(8-numel(records)) records];
In addition, the syntax '*char' is only valid when using fread: the * indicates that the output class should be 'char' as well. It's unnecessary when using fwrite.

MATLAB simultaneous read and write the same file

I want to read and write the same file simultaneously. Here is a simplified code:
clc;
close all;
clearvars;
fd = fopen ('abcd.txt','r+'); %opening file abcd.txt given below
while ~feof(fd)
nline = fgetl(fd);
find1 = strfind(nline,'abcd'); %searching for matching string
chk1 = isempty(find1);
if(chk1==0)
write = '0000'; %in this case, matching pattern found
% then replace that line by 0000
fprintf(fd,'%s \n',write);
else
continue;
end
end
File abcd.txt
abcde
abcd23
abcd2
abcd355
abcd65
I want to find text abcd in string of each line and replace the entire line by 0000. However, there is no change in the text file abcd.txt. The program doesn't write anything in the text file.
Someone can say read each line and write a separate text file line by line. However, there is a problem in this approach. In the original problem, instead of finding matching text `abcd, there is array of string with thousands of elements. In that case, I want to read the file, parse the file for find matching string, replace string as per condition, go to next iteration to search next matching string and so on. So in this approach, line by line reading original file and simultaneously writing another file does not work.
Another approach can be reading the entire file in memory, replacing the string and iterate. But I am not very sure how will it work. Another issue is memory usage.
Any comments?
What you are attempting to do is not possible in a efficient way. Replacing abcde with 0000, which should be done for the first line, would require all remaining text forward because you remove one char.
Instead solve this reading one file and write to a second, then remove the original file and rename the new one.

How to read different lines from a text file simultaneously

I need to read a file line by line such that it reads the first line, does something with it, then takes the second line, does something with it and so on.
I know how to read a text file line by line:
for(line <- Source.fromFile("file.txt").getLines())
{
insert(line) **Use the first line of the file in this function
reverse(line) **Use the second line of the file in this function
}
in the insert function, first I want to use the first line of the file, and in the reverse function I want to use the second line, then in the second iteration of the loop, I want to use the 3rd line in the insert function and the 4th line in the reverse function and so on. How to do that?
EDIT: This is just an example. I want a general thing, like suppose if I want to use the first line, second line, third line and then iterate the for loop, how to do that?
Lots of clever solutions. Here's a simple one using zipWithIndex that handles even cases with an uneven number of lines.
for((line,index) <- Source.fromFile("file.txt").getLines().zipWithIndex)
{
if (index % 2 == 0) insert(line)
else reverse(line)
}
One more approach, using grouped, which takes into account a (possibly) uneven number of lines,
Source.fromFile("file.txt")
.getLines
.grouped(2)
.map { xs => (xs.head, xs.last.reverse) }
Note that getLines gives an iterator for fetching one line at a time, sequentially, then grouped gives yet another iterator with paired lines for simultaneous processing. This is in contrast with reading multiple lines of a file at the same time.
Using sliding to group your lines into pairs of two.
for(pairs <- Source.fromFile("file.txt").getLines().sliding(2, 2)) {
insert(pairs.head)
reverse(pairs.last)
}
Obviously you'll need to handle the condition where you don't have a list of even length.

Scheme read specific data from file

I have a txt file that looks like this:
1 17.3
2 18.2
3 18.6
I would like to make a variable (for example temp) which would store store first value (17.3). I would then compare this value with something else (< temp 20). Next step would be to store second value in temp (18.2), so I could again compare values.
Any help would be appreciated!
In Matlab it would look like this:
A=importdata(...)
i=0;
while i<length(temp) do
temp=A(i,2)
i=i+1;
if temp < 20
...
end
end
There are several ways to skin this cat in R6RS:
You can use read. read will read any Scheme datum so since these are all numbers read will read the next number.
You can make your own parser. You read one char at a time and when you hit a space or linefeed you take the list of chars you have though list->string to get string and then string->number This can also be done in two parts reading lines then parsing each line or do a slurp first then process the string.

Using a .fasta file to compute relative content of sequences

So me being the 'noob' that I am, being introduced to programming via Perl just recently, I'm still getting used to all of this. I have a .fasta file which I have to use, although I'm unsure if I'm able to open it, or if I have to work with it 'blindly', so to speak.
Anyway, the file that I have contains DNA sequences for three genes, written in this .fasta format.
Apparently it's something like this:
>label
sequence
>label
sequence
>label
sequence
My goal is to write a script to open and read the file, which I have gotten the hang of now, but I have to read each sequence, compute relative amounts of 'G' and 'C' within each sequence, and then I'm to write it to a TAB-delimited file the names of the genes, and their respective 'G' and 'C' content.
Would anyone be able to provide some guidance? I'm unsure what a TAB-delimited file is, and I'm still trying to figure out how to open a .fasta file to actually see the content. So far I've worked with .txt files which I can easily open, but not .fasta.
I apologise for sounding completely bewildered. I'd appreciate your patience. I'm not like you pros out there!!
I get that it's confusing, but you really should try to limit your question to one concrete problem, see https://stackoverflow.com/faq#questions
I have no idea what a ".fasta" file or 'G' and 'C' is.. but it probably doesn't matter.
Generally:
Open input file
Read and parse data. If it's in some strange format that you can't parse, go hunting on http://metacpan.org for a module to read it. If you're lucky someone has already done the hard part for you.
Compute whatever you're trying to compute
Print to screen (standard out) or another file.
A "TAB-delimite" file is a file with columns (think Excel) where each column is separated by the tab ("\t") character. As quick google or stackoverflow search would tell you..
Here is an approach using 'awk' utility which can be used from the command line. The following program is executed by specifying its path and using awk -f <path> <sequence file>
#NR>1 means only look at lines above 1 because you said the sequence starts on line 2
NR>1{
#this for-loop goes through all bases in the line and then performs operations below:
for (i=1;i<=length;i++)
#for each position encountered, the variable "total" is increased by 1 for total bases
total++
}
{
for (i=1;i<=length;i++)
#if the "substring" i.e. position in a line == c or g upper or lower (some bases are
#lowercase in some fasta files), it will carry out the following instructions:
if(substr($0,i,1)=="c" || substr($0,i,1)=="C")
#this increments the c count by one for every c or C encountered, the next if statement does
#the same thing for g and G:
c++; else
if(substr($0,i,1)=="g" || substr($0,i,1)=="G")
g++
}
END{
#this "END-block" prints the gene name and C, G content in percentage, separated by tabs
print "Gene name\tG content:\t"(100*g/total)"%\tC content:\t"(100*c/total)"%"
}