Modify a line in a particular section of a document with sed

Modify a line in a particular section of a document with sed - sed

I'd like to use sed to modify a Debian control file to change the dependencies of a particular package.
The file contains metadata for several packages, where entry looks like this:
Package: linux-image-generic
Architecture: i386 amd64 armhf arm64 powerpc ppc64el s390x
Section: kernel
Provides: ${dkms:zfs-modules} ${dkms:virtualbox-guest-modules} ${dkms:wireguard-linux-compat-modules}
Depends: ${misc:Depends}, linux-image-${kernel-abi-version}-generic, linux-modules-extra-${kernel-abi-version}-generic [i386 amd64 arm64 powerpc ppc64el s390x], linux-firmware, intel-microcode [amd64 i386], amd64-microcode [amd64 i386]
Recommends: thermald [i386 amd64]
Description: Generic Linux kernel image
This package will always depend on the latest generic kernel image
available.
Package: linux-tools-generic
Architecture: i386 amd64 armhf arm64 powerpc ppc64el s390x
Section: kernel
Provides: linux-tools
Depends: ${misc:Depends}, linux-tools-${kernel-abi-version}-generic
Description: Generic Linux kernel tools
This package will always depend on the latest generic kernel tools
available.
I would like to find the line that matches Package: linux-image-generic, then modify the following line that matches Depends:, for instance by performing s/linux-image-/linux-image-unsigned-/.

Here's my solution that modifies the Depends: line, but only within the linux-image-generic section.
This solution for GNU sed, but a slight modification makes it work for BSD sed, noted below.
sed '/^Package: linux-image-generic$/,/^$/{/^Depends:/ s/linux-image-/linux-image-unsigned-/}' debian/control
It starts with a range address that matches from the beginning of the package metadata up to the blank line after the package block.
/^Package: linux-image-generic$/,/^$/
Then it uses a {} to apply a command within this range:
/^Depends:/ s/linux-image-/linux-image-unsigned-/
The first part here, /^Depends:/, is a regular expression address that selects only the line(s) that begin with Depends:.
Lastly, the s command performs the substitution on the selected line.
BSD sed (on macOS, etc.) has an additional syntactic rule for function lists { ... }:
The terminating “}” must be preceded by a newline, and may also be preceded by white space.
We need to insert a newline before the }, for example by using the $'\n' ANSI-C string in Bash:
sed '/^Package: linux-image-generic$/,/^$/{/^Depends:/ s/linux-image-/linux-image-unsigned-/'$'\n''}' debian/control
As an aside, the path to finding this solution was to research sed commands that operate on other file formats with similar syntaxes, like INI files.

This might work for you (GNU sed):
sed '/Package: linux-image-generic/{:a;n;/Depends:/!ba;s/linux-image-/&unsigned-/}' file
Find line containing Package: linux-image-generic then continue reading lines until one containing Depends: and substitute linux-image- with linux-image-unsigned-.
N.B. This assumes the package stanza contains Depends:, if not, then:
sed -E '/Package:/{h;:a;n;/Package:/{h;ba};/Depends/!ba;G
s/(linux-image-)(.*)\n.*Package: linux-image-generic/\1unsigned-\2/}' file

The sed solution you found is perfect. For completeness, with GNU awk instead of sed:
awk -v RS= -v ORS='\n\n' '/^Package: linux-image-generic/ {
$0 = gensub(/(\nDepends:[^\n]*linux-image-)/,"&unsigned-",1)} 1' file
If the input record separator (RS) is the empty string records are separated by empty lines. So each section of your file is a record. gensub does the substitution.

Related

How do I run this Bismark Bisulfite Sequencing program?

I am very new to coding so I'm not really sure how to approach this. I wanted to look at some data that we got and sequence them using Bismark. I already used Trim Galore to pare the reads, now I wanted to get the data into Bismark. However, I'm not exactly sure how to approach this. In the documentation it said that it required Perl to run so I downloaded Perl along with the Bismark zip file from github. I also downloaded the bowtie2 zip file and extracted both the zip files into the same directory. I then opened up the Perl command prompt and set the directory to one with my extracted folders.
I put this line in:
> \bismark\bismark_genome_preparation --path_to_bowtie ^
C:\Users\sevro\Documents\Lab_Code\bowtie2-master --verbose ^
C:\Users\sevro\Documents\Lab_Code\genome
The system cannot find the path specified.
I also tried this after changing the directory to the Bismark folder:
> perl bismark
Failed to execute Bowtie 2 porperly (return code of 'bowtie2 --version' was 256).
Please install Bowtie 2 or HISAT2 first and make sure it is in the PATH,
or specify the path to the Bowtie 2 with --path_to_bowtie2 /path/to/bowtie2,
or --path_to_hisat2 /path/to/hisat2
I tried a few other things but all in all I am a bit confused on how exactly to approach this. Things I have downloaded right now:
Bismark zip file- https://github.com/FelixKrueger/Bismark
Bowtie2 zip file- https://github.com/BenLangmead/bowtie2
A genome assembly in .fa format
The data that I want to analyze in fasta format
Any insight would be helpful.

I think Bismark and bowtie2 only supports Linux and macOS natively. If you want to use bismark on Windows you can try install it via a *nix emulation systems like Cygwin, MSYS2, or simply use WSL. I tested this on Windows 11 with WSL with Ubuntu 20.04:
Downloaded bowtie2-2.4.4-linux-x86_64.zip and extracted to ~/bowtie2/bowtie2-2.4.4-linux-x86_64 folder.
Downloaded Bismark-0.23.1.zip and extracted to ~/bismark/Bismark-0.23.1/
Tested installation:
$ perl --version
This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-linux-gnu-thread-multi (with 50 registered patches, see perl -V for more detail)
$ perl bismark --path_to_bowtie2 ../../bowtie2/bowtie2-2.4.4-linux-x86_64/Bowtie 2 seems to be working fine (tested command '../../bowtie2/bowtie2-2.4.4-linux-x86_64/bowtie2 --version' [2.4.4])
Output format is BAM (default)
Did not find Samtools on the system. Alignments will be compressed with GZIP instead (.sam.gz)
Genome folder was not specified!
DESCRIPTION
The following is a brief description of command line options and arguments to control the Bismark
bisulfite mapper and methylation caller. Bismark takes in FastA or FastQ files and aligns the
reads to a specified bisulfite genome. Sequence reads are transformed into a bisulfite converted forward strand
version (C->T conversion) or into a bisulfite treated reverse strand (G->A conversion of the forward strand).
Each of these reads are then aligned to bisulfite treated forward strand index of a reference genome
(C->T converted) and a bisulfite treated reverse strand index of the genome (G->A conversion of the
forward strand, by doing this alignments will produce the same positions). These 4 instances of Bowtie 2 or HISAT2
are run in parallel. The sequence file(s) are then read in again sequence by sequence to pull out the original
sequence from the genome and determine if there were any protected C's present or not.
The final output of Bismark is in BAM/SAM format by default, described in more detail below.
USAGE: bismark [options] <genome_folder> {-1 <mates1> -2 <mates2> | <singles>}
[...]

Using SED on MAC (zsh) to get first jpg after marker string

Please Note: I found other gnu implementations of this, but they don't seem to work on a mac. This question is specifically for MacOS running zsh
I'm trying to pipe some output into SED and use it to find the first jpg after a marker string.
Here is my sample .sh file:
Phrase="where is \“frankenstien\" tonight.jpg with my hamburger tomorrow.jpg"
echo $Phrase | sed 's/.*\frankenstien" \(.*\)jpg/\1/'
The marker string is “frankenstien" (WITH quotes). I would like the output to be:
tonight.jpg
But instead its
tonight.jpg with my hamburger tomorrow.
So obviously the sequence passed to SED is wrong, how should I write it so that it stops after the first jpg AND includes the ".jpg" in it? I found many examples online of similar things but they did not work for MAC running zsh. Can the same code work on macs running bash? If you only get it to work on bash that might be good enough.
Thanks!

If the first jpg, is immediately following the frankenstien string (marker), then you can modify your regex to do below. The following should work on any POSIX compliant sed as it does not involve any constructs from the GNU version
sed 's/.*\"frankenstien\" \([^ ]*\).*/\1/'
The above regex will capture the string after the marker string and up to the subsequent space following the required string and ignore the rest.
P.S. Note that the shell versions don't play a role in how your regex string is interpreted by your sed installed. Remember sed is a binary on its own and comes shipped with your native distro (GNU on Linux and BSD on MacOS). There are few features supported in one and not in the other ( GNU vs *BSD ), but as such the native shell should not come into the picture here. E.g. In MacOS, with a default shell say zsh, you can have both BSD sed (shipped default) and GNU version (installable using homebrew).

how should I write it so that it stops after the first jpg AND includes the ".jpg" in it?
Match up until a space.
sed 's/.*frankenstien" \([^ ]*\) .*/\1/' <<<"$Phrase"
Handle tab also:
sed 's/.*frankenstien" \([^[:space:]]*\)[[:space:]].*/\1/' <<<"$Phrase"

rename command doesn't rename

This should work on my CentOS 6.6 but somehow the file name is not changed. What am I missing here?
rename -f 's/silly//' sillytest.zi
This should rename sillytest.zi to test.zi but the name is not changed. Of course I can use mv command but I want to apply to many files and patterns.

There are two different rename utilities commonly used on GNU/Linux systems.
util-linux version
On Red Hat-based systems (such as CentOS), rename is a compiled executable provided by the util-linux package. It’s a simple program with very simple usage (from the relevant man page):
rename from to file...
rename will rename the specified files by replacing the first occurrence of from in their name by to.
Newer versions also support a useful -v, --verbose option.
NB: If a file already exists whose name coincides with the new name of the file being renamed, then this rename command will silently (without warning) over-write the pre-existing file.
Example
Fix the extension of HTML files so that all .htm files have a four-letter .html suffix:
rename .htm .html *.htm
Example from question
To rename sillytest.zi to test.zi, replace silly with an empty string:
rename silly '' sillytest.zi
Perl version
On Debian-based systems ,rename is a Perl script which is much more capable
as you get the benefit of Perl’s rich set of regular expressions.
Its usage is (from its man page):
rename [ -v ] [ -n ] [ -f ] perlexpr [ files ]
rename renames the filenames supplied according to the rule specified as the first argument.
This rename command also includes a -v, --verbose option. Equally useful is its -n, --no-act which can be used as a dry-run to see which files would be renamed. Also, it won’t over-write pre-existing files unless the -f, --force option is used.
Example
Fix the extension of HTML files:
rename s/\.htm$/.html/ *.htm

how to make cygwin tar output proper unicode letters instead of shashed values?

I have a *.tar.gz file that have inside occasionally some names with non ascii letters.
for example when tar encounter a file containing word: naïve it outputs: na\303\257ve
Is there any swich, or tool to convert these slashed values to a proper letter ?

http://www.gnu.org/software/tar/manual/tar.html
By default GNU tar attempts to unquote each file or member name, replacing escape sequences according to the following table: ...
This default behavior is controlled by the following command line
option:
--unquote
Enable unquoting input file or member names (default).
--no-unquote
Disable unquoting input file or member names.
In other words, see if "--no-unquote" is an option for your version of Cygwin.
PS:
Which version of Cygwin tar are you using?

geninfo searches .da instead of .gcda

When I try to execute the following lcov command through Plink (I give Plink a text file as an argument containing the following command)
lcov --capture --directory . --output-file coverage.info
it results with
GNU gcov version 1.5 Capturing coverage data from .
Scanning . for .da files ... gcov [-b] [-v] [-n] [-l] [-f] [-o OBJDIR] file geninfo: Use of uninitialized value in pattern match
(m//) at /home/myUser/lcov/lcov/usr/bin/geninfo line 1874. gcov [-b]
[-v] [-n] [-l] [-f] [-o OBJDIR] file geninfo: Use of uninitialized
value in pattern match (m//) at /home/myUser/lcov/lcov/usr/bin/geninfo
line 3622. geninfo: Use of uninitialized value in pattern match (m//)
at /home/myUser/lcov/lcov/usr/bin/geninfo line 3622.
geninfo: ERROR: no .da files found in .!
It seems that the geninfo expects for .da files instead of .gcda files.
when I execute the same command without Plink (in the same CWD), the lcov runs fine and generates a valid .info file. It also runs fine when I execute it manually thorugh PuTTY.
what might be the reason for this?

The problem was more general. Plink uses different environment variables. The solution was to set manually the correct environment variables. In my case I run perl script so I added in the head of the file:
use Env;
$ENV{PATH} = "correct PATH variable";
a missing environment variable caused the code to get wrong gcov version and therefore .da files were serached instead of .gcda files that belong to newer lcov versions

Upgrading lcov version to latest solved the issue. Older version of lcov searches .da instead of .gcda. Updating to latest version 1.13 solves the issue

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse