Print first 10 rows, followed by a string - sed

I have a text file, and I want to extract the first 10 lines from it and then a specific string, then output this.
That is:
Input text file -> print first 10 lines -> print string starting with 'N' -> output to text file

You can use awk for this:
awk 'NR<11 && /^N/' infile > outfile
This will then from within the first 10 lines print only lines starting with N.
Here is a sed version too:
sed -n '1,10{/^N/p}' infile > outfile

awk 'NR<11{print;next} /^N/{print;exit}' file

Related

copying every nth line to a new line

I have a txt file that I need to copy the 1st line of every four lines and print it onto the 3rd line of every four. And print this into a new txt file.
e.g
#CR5SM:00004:00029
TTTTCTCTTTCTTTCTT
+
>>>/>#99419BAAABB
#CR5SM:00005:00026
ATTATAGAGGGATAG
+
;969999999-4;BB
change it to this:
#CR5SM:00004:00029
TTTTCTCTTTCTTTCTT
+CR5SM:00004:00029
>>>/>#99419BAAABB
#CR5SM:00005:00026
ATTATAGAGGGATAG
+CR5SM:00005:00026
;969999999-4;BB
I have tried using Awk but cant seem to find the correct commands to do this.
Does anyone have any solutions? Thanks
Using awk:
$ awk '/^#/{a=substr($0,2)}/^\+/{$0=$0 a}1' file
#CR5SM:00004:00029
TTTTCTCTTTCTTTCTT
+CR5SM:00004:00029
>>>/>#99419BAAABB
#CR5SM:00005:00026
ATTATAGAGGGATAG
+CR5SM:00005:00026
;969999999-4;BB
You can redirect the output to another file by saying:
awk '/^#/{a=substr($0,2)}/^\+/{$0=$0 a}1' file > newfile
We use the substr function to capture the lines that start with # from second character onwards until the end of the line.
We look for lines that start with + (notice we escape it since it is a meta-character). Once we find that line, we append our captured line to the existing line.
1 at the end allows us to print the lines.
Try:
awk '
(NR-1) % 4 == 0 { l=substr($0,2); print; next } # save every 4th line (print & continue)
(NR-1) % 4 == 2 { print $0 l; next } # append saved line to every 3rd line (print & continue)
{ print }' \ # all other lines: print as is
infile > outfile # specify input file and redirect output to output file
This might work for you (GNU sed):
sed 'h;n;n;G;s/\n.//;n' file
Copy the first line, print the first and second lines and append the first to the third removing the first character of the first, print it and the fourth line and repeat.

How to print output with linebreakers from command line

When I want to print an output like this
./myScript (prints some lines)
or
cat myFile
I want the output to show with linebreakers , for example each line will include not more than 100 chars.
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaffffff
vbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbf
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
there is something I can add to the command line to get this result ?
Thanks.
You can use sed if you want the line terminator as ,.
$ cat myfile
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaffffffvbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbfaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
$ sed -r 's/.{50}/&,\n/g' myfile
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaffffffvbbb,
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbfaaaaaa,
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
fold is another utility but won't add a , at the end
$ fold -w50 myfile
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaffffffvbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbfaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa

sed/awk/cut/grep - Best way to extract string

I have a results.txt file that is structured in this format:
Uncharted 3: Javithaxx l Rampant l Graveyard l Team Deathmatch HD (D1VpWBaxR8c)
Matt Darey feat. Kate Louise Smith - See The Sun (Toby Hedges Remix) (EQHdC_gGnA0)
The Matrix State (SXP06Oax70o)
Above & Beyond - Group Therapy Radio 014 (guest Lange) (2013-02-08) (8aOdRACuXiU)
I want to create a new file extracting the youtube URL ID specified in the last characters in each line line "8aOdRACuXiU"
I'm trying to build a URL like this in a new file:
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Note, I appended the &hd=1 to the string that I am trying to be replaced. I have tried using Linux reverse and cut but reverse or rev munges my data. The hard part here is that each line in my text file will have entries with parentheses and I only care about getting the data between the last set of parentheses. Each line has a variable length so that isn't helpful either. What about using grep and .$ for the end of the line?
In summary, I want to extract the youtube ID from results.txt and export it to a new file in the following format: http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Using awk:
awk '{
v = substr( $NF, 2, length( $NF ) - 2 )
printf "%s%s%s\n", "http://www.youtube.com/watch?v=", v, "&hd=1"
}' infile
It yields:
http://www.youtube.com/watch?v=D1VpWBaxR8c&hd=1
http://www.youtube.com/watch?v=EQHdC_gGnA0&hd=1
http://www.youtube.com/watch?v=SXP06Oax70o&hd=1
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
$ sed 's!.*(\(.*\))!http://www.youtube.com/watch?v=\1\&hd=1!' results.txt
http://www.youtube.com/watch?v=D1VpWBaxR8c&hd=1
http://www.youtube.com/watch?v=EQHdC_gGnA0&hd=1
http://www.youtube.com/watch?v=SXP06Oax70o&hd=1
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Here, .*(\(.*\)) looks for the last occurrence of a pair of parentheses, and captures the characters inside those parentheses. The captured group is then inserted into the URL using \1.
Using a perl one-liner :
perl -lne 'printf "http://www.youtube.com/watch?v=%s&hd=1\n", $& if /[^\(]+(?=\)$)/' file.txt
Or multi-line version :
perl -lne '
printf(
"http://www.youtube.com/watch?v=%s&hd=1\n",
$&
) if /[^\(]+(?=\)$)/
' file.txt

Joining lines in order of different blocks in the same text file

I have a file split in blocks like the following:
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGG
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTT
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTA
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
TATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
GGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
I've truncated/wrapped the lines for clarity's sake, but imagine very long lines. The point of my question is that I want a final file that looks like this:
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGGAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTTAGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGTTATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
Where this new block has:
the same number of lines as the initial blocks,
each of the lines of the resulting block is a concatenation of the lines with the same line-number in the initial blocks.
this concatenation should be in-order (i.e. "1st line of 1st block" + "1st line of 2nd block", etc
Is it possible to achieve this final block using sed and/or awk, could you show me how it could be done?
In bash with paste:
$ paste <(head -4 file) <(tail -4 file) | tr -d '\t'
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGGAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTTAGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGTTATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
try this:
awk -vOFS="" '$0{a[NR]=$0}END{for(i=1;i<=NR/2;i++)print a[i],a[i+5]}' file
test with your example:
kent$ cat tmp.txt
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGG
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTT
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTA
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
TATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
GGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
kent$ awk -vOFS="" '$0{a[NR]=$0}END{for(i=1;i<=NR/2;i++)print a[i],a[i+5]}' tmp.txt
AGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTGGGGAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTT
AGGTAGTTATTATTTTTTTGGTTTTTAGTATTTAATTGAGTGTTTAGTTTTTTTTTATTTGTCGGGATATTTTAGTTGATTTTAGATTGC
ATGTAGGTGTTTATGTATTAGTTTTTTTTAGGTTTAGGGTGTTGTTATATTTTTAGTTTCGATTCGTCGTAAGTTTTATTTTTTTTTAAT
ATTTAGGTTTTGTGTTTTGTGTATTATTGAATTTAATTAAAGTTAGGATAGGTTTTGGTGTTTGAGGTTAATTTTGTTTTATTTTTTTTT
awk -F'\n' -v RS= '{for (i=1;i<=NF;i++) {str[i] = str[i] $i} END {for (i=1;i<=NF;i++) print str[i]}' file

Add column to middle of tab-delimited file (sed/awk/whatever)

I'm trying to add a column (with the content '0') to the middle of a pre-existing tab-delimited text file. I imagine sed or awk will do what I want. I've seen various solutions online that do approximately this but they're not explained simply enough for me to modify!
I currently have this content:
Affx-11749850 1 555296 CC
I need this content
Affx-11749850 1 0 555296 CC
Using the command awk '{$3=0}1' filename messes up my formatting AND replaces column 3 with a 0, rather than adding a third column with a 0.
Any help (with explanation!) so I can solve this problem, and future similar problems, much appreciated.
Using the implicit { print } rule and appending the 0 to the second column:
awk '$2 = $2 FS "0"' file
Or with sed, assuming single space delimiters:
sed 's/ / 0 /2' file
Or perl:
perl -lane '$, = " "; $F[1] .= " 0"; print #F'
awk '{$2=$2" "0; print }' your_file
tested below:
> echo "Affx-11749850 1 555296 CC"|awk '{$2=$2" "0;print}'
Affx-11749850 1 0 555296 CC