Linux command to sort according to second word/character - perl

I have a linux file whose content is as below:
hey this
is just
sample file
I want to :
1. sort the three lines according to the second word so the output should be :
sample file
is just
hey this
2. sort the three lines according to the second character of second line, so the output would be :
hey this
sample file
is just
Is there anyway i can run a perl/unix command on command line(doesnt matter using pipes)?

I got the answer for both the questions:
For sorting by second word: sort -k2 myfile
For sorting by second character of second word sort -k2.3 myfile

Related

If Line Number Exists

Im trying to get a code that will do something if a certain line number is in a text file for example "Test.txt"
Ex.
if line "x" exists in test.txt msg $chan working
thanks #denny for being the one to help me out.
if ($read(test.txt, n, x)) {
msg $chan working
}
You have couple of options.
Searching to see if the number is lower or equal to the total lines. e.g: $lines(filename)
You can extract the line and use a condition if it's full. e.g: $read(filename, LINE-NUMBER)
Notes for each method:
Will only tell you if there is a line number, NOT if there is something inside this line.
Will only give you the line if exists, if the line is empty or there is no such line then it will appears like it's empty.

How to make number in first line of file as variable?

I have a file with a list of number like this:
10
15..135
140..433
444..598
600..783
800
The first and last lines are always single number without "..". So the problem is that, how to edit the first and last line to be like this:
1..10
15..135
140..433
444..598
600..783
800..900
For the first line.. I need to put "1.." in front of the number IF THE NUMBER IS NOT 0. If the number is already 0, no "1.." need to be there.
For last line.. I always want to edit (in this case I add "..900"). Can somebody give me some idea?
perl -pi -e '($.==1)?s/^/1../:((eof)?s/$/..900/:1);' your_file
PS: This does an in place replacement of the file

UNIX SA venturing into the world of Windows

I need to parse a CSV (well, more a tab delimited file) and create a new file that contains unique field - i.e.
RC100 1st line
RC100 2nd line
RC100 3rd line
RC200 1st line
and have in the new resultant file
RC100 1rd line
RC200 1st line
I was going to attempt this using VBScript then wondered if I can accomplish this using powershell (sort with -unique) but obviously didn't acheive success.
Please point me to the right direction.
Thanks.
Ray

perl sequence extraction loop

I have an existing perl one-liner (from the Edwards lab) that works wonderfully to read a text file (named ids.file) that contains one column of IDs and searches a second, specially formatted text file (named fasta.file in this example - in "fasta" format for those who know bioinformatics) and returns sequences that match the ID from the first file. I was hoping to expand this script to do two additional things:
The current perl one-liner only seems to work if the ids.file contains one column of data. I would like it to work on a file that contains two columns (separated by spaces), and act on the second column of data (well, really any column of data, but I assume that it will be obvious enough to adapt it if someone can give an example using a second column)
I would like to append the any results returned from the output of the search to a third column, instead of just to a new file.
If someone is kind enough to offer an example but only has time or inclination to work on one of these, I would prefer that you try to solve #2 - I have come close to solving #1 with a for loop that uses awk to only use the Perl code on the second column - I haven't gotten it yet, but am close, so #2 seems like the harder one to me.
The perl one liner is as follows:
perl -ne 'if(/^>(\S+)/){$c=$i{$1}}$c?print:chomp;$i{$_}=1 if #ARGV' ids.file fasta.file
I appreciate any help you can give!
Not quite sure but will this do?
perl -ne 'chomp; s/^>(\S+).*/$c=$i{$1}/e; print if $c;
$i{(/^\S*\s(\S*)$/)[0]}="$_ " if #ARGV'
ids.file fasta.file

Using a .fasta file to compute relative content of sequences

So me being the 'noob' that I am, being introduced to programming via Perl just recently, I'm still getting used to all of this. I have a .fasta file which I have to use, although I'm unsure if I'm able to open it, or if I have to work with it 'blindly', so to speak.
Anyway, the file that I have contains DNA sequences for three genes, written in this .fasta format.
Apparently it's something like this:
>label
sequence
>label
sequence
>label
sequence
My goal is to write a script to open and read the file, which I have gotten the hang of now, but I have to read each sequence, compute relative amounts of 'G' and 'C' within each sequence, and then I'm to write it to a TAB-delimited file the names of the genes, and their respective 'G' and 'C' content.
Would anyone be able to provide some guidance? I'm unsure what a TAB-delimited file is, and I'm still trying to figure out how to open a .fasta file to actually see the content. So far I've worked with .txt files which I can easily open, but not .fasta.
I apologise for sounding completely bewildered. I'd appreciate your patience. I'm not like you pros out there!!
I get that it's confusing, but you really should try to limit your question to one concrete problem, see https://stackoverflow.com/faq#questions
I have no idea what a ".fasta" file or 'G' and 'C' is.. but it probably doesn't matter.
Generally:
Open input file
Read and parse data. If it's in some strange format that you can't parse, go hunting on http://metacpan.org for a module to read it. If you're lucky someone has already done the hard part for you.
Compute whatever you're trying to compute
Print to screen (standard out) or another file.
A "TAB-delimite" file is a file with columns (think Excel) where each column is separated by the tab ("\t") character. As quick google or stackoverflow search would tell you..
Here is an approach using 'awk' utility which can be used from the command line. The following program is executed by specifying its path and using awk -f <path> <sequence file>
#NR>1 means only look at lines above 1 because you said the sequence starts on line 2
NR>1{
#this for-loop goes through all bases in the line and then performs operations below:
for (i=1;i<=length;i++)
#for each position encountered, the variable "total" is increased by 1 for total bases
total++
}
{
for (i=1;i<=length;i++)
#if the "substring" i.e. position in a line == c or g upper or lower (some bases are
#lowercase in some fasta files), it will carry out the following instructions:
if(substr($0,i,1)=="c" || substr($0,i,1)=="C")
#this increments the c count by one for every c or C encountered, the next if statement does
#the same thing for g and G:
c++; else
if(substr($0,i,1)=="g" || substr($0,i,1)=="G")
g++
}
END{
#this "END-block" prints the gene name and C, G content in percentage, separated by tabs
print "Gene name\tG content:\t"(100*g/total)"%\tC content:\t"(100*c/total)"%"
}