Extract at a certain command several lines - command-line

I found (of course) lots of threads about extracting lines from a file, but I couldn't transfer it to my problem. Given a file with a structure like this:
Time = 0.1
bla bla
Time = 0.2
bla bla
**StartExtractionPart1Here**
data i want
in file 1
svereal lines of code
**endExtractionPart1**
**StartExtractionPart2Here**
data i want
in file 2
severeal lines of code
**endExtractionPart2**
Time = 0.3
bla bla
so Pseudo-Code would be:
find a given time
find a given startflag
write all files into file 1 until endflag is reached
do that for all Parts
I can change the Syntax of the start- and endflags as I need them. The "Extraction Parts" are printed in every timestep but I just want those of a given time.

Related

PowerShell logic to search for new string occurrences in one log file

Good morning,
I am searching a remote "live" log file for the occurrence of a string, lets say the string is "Invalid Data". If this string is found, it means a severe system issue has/is taking place. Its checking every hour (but I will be changing this to more frequently) If the string is found, I am sending an email alert. That is fine. I'm using PowerShell to find the string no problem.
The issue is, that this log file is not time-sliced (and cannot be), so its one big file that grows throughout the day (it becomes date appended at the end of the day), so lets say I find the string at 08:00 - great, email is sent. The problem is, that once the issue is fixed, those string occurrences within the log file are still there and will remain there until the end of the day, so my logic for checking if the string is in the log file is flawed after/if it finds the 1st occurrence of the day, as it will keep finding the string.
I cant think of how to rectify this flaw. I was thinking of count of occurrences, but this check is checking hundreds of remote devices, so I'm struggling to see how I can utilize that.
The log file looks like (as an example):
17-05-2018 09:22:52:391 (07144) .................. bla bla bla
17-05-2018 09:22:52:391 (07144) .................. bla bla bla
17-05-2018 09:22:52:392 (07144) ..................
17-05-2018 09:22:52:393 (07144) .................. bla bla
17-05-2018 09:22:52:393 (07144) .................. LoadFileInfo,
17-05-2018 09:22:52:393 (07144) .................. Invalid Data <--- this being the error
So looking for any tips on how I can make my alert more relevant. The fact the date and time is on the left hand side of the log leans me to believe this might be able to be manipulated into some variable maybe and if string found greater than this variable, trigger? But I'm new to PowerShell, so any pointers greatly appreciated..
Thanks for your time and appreciate any feedback.
Cheers
Can you just run a while loop and check the last line of the log file, it will get stuck in this loop until a match is made, at which point it will continue.
Depending how quickly lines are written to the log, you will want to play with the Start-Sleep
while ((Get-Content -Path $LogFilePath -Tail 1 | Select-String -Pattern "Invalid Data" -SimpleMatch) -eq $null)
{
#Stuck in this loop until a match is made
Start-Sleep 1
}
#Send email and then loop back to the beginning

Select specific filenames from an array of filenames containing a date in the name

If I have a group of .wav files and Im trying to pick only month wise or do daily/only night psd(power spectral density) averages etc or choose files belonging to a month how to go about? The following are first 10 .wav files in a .txt file that are read into matlab code-
AMAR168.1.20150823T200235Z.wav
AMAR168.1.20150823T201040Z.wav
AMAR168.1.20150823T201845Z.wav
AMAR168.1.20150823T202650Z.wav
AMAR168.1.20150823T203455Z.wav
AMAR168.1.20150823T204300Z.wav
AMAR168.1.20150823T205105Z.wav
AMAR168.1.20150823T205910Z.wav
AMAR168.1.20150823T210715Z.wav
yyyymmddTHHMMSSZ.wav is part of the format to get sense of some parameters.
Many thanks
You need to be more specific.
Do all files always start with "AMAR168.1." for instance?
Anyway, here's a general approach to get you started:
AllFilenames = fileread ('filenames.dat');
FileNames = strsplit (AllFilenames, '\n');
for i = FileNames
if ~isempty (strfind (i{:}, '20150823')); disp(i{:}); end
end
Your filename examples aren't very useful because they all have the same date, but, anyway, you get the point.
Alternatively, if the filenames always have the same format and size, you could do, e.g.:
AllFilenames = fileread ('filenames.dat');
AllFilenames = strvcat (strsplit (AllFilenames, '\n'));
LogicalIndices = categorical (cellstr (AllFilenames(:,15:16))) == '08';
to obtain all rows where the month is '08' for instance. This assumes that the month is always at position 15 to 16 in the string

Does eof() look at lines or data? C++

Let's say I am trying to read in data line by line from a file called input.txt. There's about 20 lines and each line consists of 3 different data types. If I use this code:
while(!file.eof){ ..... }
Does this function look at only one data type from each line per iteration, or does it look at the all the data types at once for each line per iteration--so the next iteration would look at the next line instead of the next data type?
Many thanks.
.eof() looks at the end of file flag. The flag is set after you run over the end of the file. This is not desirable.
A great blog post on how this works and best practice can be found here.
Basically, use
std::string line;
while(getline(file, line)) { ... }
or
while (file >> some_data) { ... }
as it will notice errors and the end of the file at the correct time and act accordingly.

Loop through files in a folder in matlab

I have a set of days of log files that I need to parse and look at in matlab.
The log files look like this:
LOG_20120509_120002_002.csv
(year)(month)(day)_(hour)(minute)(second)_(log part number)
The logs increment hourly, but sometimes the seconds are one or two seconds off (per hour) which means i need to ignore what they say to do loadcsv.
I also have another file:
LOG_DATA_20120509_120002.csv
which contains data for the whole hour (different data).
The overall objective is to:
loop through each day
loop through each hour
read in LOG_DATA for whole hour
loop through each segment
read in LOG for each segment
compile a table of all the data
I guess the question is then, how do i ignore the minutes of the day if they are different? I suspect it will be by looping through all the files in the folder, in which case how do i do that?
Looping through all the files in the folder is relatively easy:
files = dir('*.csv');
for file = files'
csv = load(file.name);
% Do some stuff
end
At first, you must specify your path, the path that your *.csv files are in there
path = 'f:\project\dataset'
You can change it based on your system.
then,
use dir function :
files = dir (strcat(path,'\*.csv'))
L = length (files);
for i=1:L
image{i}=csvread(strcat(path,'\',file(i).name));
% process the image in here
end
pwd also can be used.

Using a .fasta file to compute relative content of sequences

So me being the 'noob' that I am, being introduced to programming via Perl just recently, I'm still getting used to all of this. I have a .fasta file which I have to use, although I'm unsure if I'm able to open it, or if I have to work with it 'blindly', so to speak.
Anyway, the file that I have contains DNA sequences for three genes, written in this .fasta format.
Apparently it's something like this:
>label
sequence
>label
sequence
>label
sequence
My goal is to write a script to open and read the file, which I have gotten the hang of now, but I have to read each sequence, compute relative amounts of 'G' and 'C' within each sequence, and then I'm to write it to a TAB-delimited file the names of the genes, and their respective 'G' and 'C' content.
Would anyone be able to provide some guidance? I'm unsure what a TAB-delimited file is, and I'm still trying to figure out how to open a .fasta file to actually see the content. So far I've worked with .txt files which I can easily open, but not .fasta.
I apologise for sounding completely bewildered. I'd appreciate your patience. I'm not like you pros out there!!
I get that it's confusing, but you really should try to limit your question to one concrete problem, see https://stackoverflow.com/faq#questions
I have no idea what a ".fasta" file or 'G' and 'C' is.. but it probably doesn't matter.
Generally:
Open input file
Read and parse data. If it's in some strange format that you can't parse, go hunting on http://metacpan.org for a module to read it. If you're lucky someone has already done the hard part for you.
Compute whatever you're trying to compute
Print to screen (standard out) or another file.
A "TAB-delimite" file is a file with columns (think Excel) where each column is separated by the tab ("\t") character. As quick google or stackoverflow search would tell you..
Here is an approach using 'awk' utility which can be used from the command line. The following program is executed by specifying its path and using awk -f <path> <sequence file>
#NR>1 means only look at lines above 1 because you said the sequence starts on line 2
NR>1{
#this for-loop goes through all bases in the line and then performs operations below:
for (i=1;i<=length;i++)
#for each position encountered, the variable "total" is increased by 1 for total bases
total++
}
{
for (i=1;i<=length;i++)
#if the "substring" i.e. position in a line == c or g upper or lower (some bases are
#lowercase in some fasta files), it will carry out the following instructions:
if(substr($0,i,1)=="c" || substr($0,i,1)=="C")
#this increments the c count by one for every c or C encountered, the next if statement does
#the same thing for g and G:
c++; else
if(substr($0,i,1)=="g" || substr($0,i,1)=="G")
g++
}
END{
#this "END-block" prints the gene name and C, G content in percentage, separated by tabs
print "Gene name\tG content:\t"(100*g/total)"%\tC content:\t"(100*c/total)"%"
}