Emacs. How remove duplicate lines that contain some text - emacs

Windows 10 (64 bit), Emacs 25.1
Suppose I has text:
111111111 aaaaaaaa bbbbbbbbbb
222222 3333333333 44444444
111111111 aaaaaaaa bbbbbbbbbb
111111111 aaaaaaaa bbbbbbbbbb
44444444 666666666 777777777777
111111111 aaaaaaaa bbbbbbbbbb
So I want to remove duplicate lines that contain aaaaaaaa.
Result must be this:
111111111 aaaaaaaa bbbbbbbbbb
222222 3333333333 44444444
44444444 666666666 777777777777
I want to use built-in capabilities of Emacs (without write custom elisp script).

If you wish to remove all duplicate lines from the buffer (regardless of whether or not they contain "aaaaaaaa"), then use:
C-xh
M-x delete-duplicate-lines RET
Note that your desired result includes the removal of all blank lines (rather than retaining one of them), so Emacs' result is different on that account.
If you want to remove all lines containing "aaaaaaaa" then use:
M-x flush-lines RET aaaaaaaa RET
If you issue this with point after the first instance, then it won't remove that first instance.
If you want the behaviour of delete-duplicate-lines but restricted to acting only on lines containing "aaaaaaaa", then I don't know of a standard command for that (although it would be a relatively simple enhancement to delete-duplicate-lines to introduce such a feature).

Related

Generate all combinations from list of characters

I am busy implementing a lab for pen testers to create MD5 hashes from 4 letter words. I need the words to have a combination of lower and uppercase letters as well as numeric and special characters, but I just do not seem to find out how to combine any given characters in all orders. So currently I have this:
my $str = 'aaaa';
print $str++, $/ while $str le 'dddd';
Which will do:
aaaa
aaab
aaac
aaad
...
...
dddd
There is no way however how I can make it do:
Aaaa
AAaa
aAaa
...
dddD
Not even to mention adding numbers and special characters. What I really wanted to do was to make the characters to create words based on a given list. So if I feel I want to use abeDod## it should create all combinations from those characters.
Edit to clarify.
Let's say I give the characters aBc# I need it to give it a a count to say it must have maximum of 4 letters per word and with combination of all the given characters, like:
aBc#
Bac#
caB#
#Bca
...
I hope that clarifies the question.
Use a list of integers that are ASCII codes for the characters you accept, to sample from it using your favorite (pseudo-)random number generator. Then convert each to its character using chr and concatenate them.
Like
perl -wE'$rw .= chr( 32+(int rand 126-32) ) for 1..4; say $rw'
Notes
I use a one-liner merely for easy copy-paste testing. Write this nicely in a script, please
I use the sketchy rand, good for shuffling things a bit. Replace with a better one if needed
Glueing four (pseudo-)random numbers does not build a good distribution; even as each letter on its own does, the whole thing does not. But the four should satisfy most needs.
If not, I think that you'd need to produce a far longer list (range of allowed chars repeated four times perhaps) and randomize it, then draw four-letter subsequences. A lot more work
I need to tap dance a little to produce (random-ish) integers from 32 to 126 using rand, since it takes only the end of range. Also, this takes all of them from that range, likely not what you want; so specify subranges, or specific lists that you want to draw from

How to copy last 13 characters of a string?

In Notepad++ I have a list of entries and at the end of each entry is a phone number (with dashes, 12 characters total). How do I go about either removing all the text before the number or copy/cut the number from the end of the entry for multiple entries? Thanks!
i.e.
1 $1,300 Deposit $1,300 Available 12/15/16 2050 Hurricane Shoals 678-790-0986
2 7 $1,400 Deposit $1,400 Available 12/22/16 1453 Alamein Dr  404-294-6441
3 $1,500 - $1,590 Not Income Based  /  Deposit $1,500 - $1,590 678-328-7952
Here is a way:
Ctrl+H
Find what: ^.*([\d-]{12})$
Replace with: $1
Replace all

sed: keep certain contents for matched lines

I have numerous sequences in one fasta file like the one below (downloaded from UniProtKB):
>sp|P00045|CYC7_YEAST Cytochrome c iso-2 OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GN=CYC7 PE=1 SV=1
MAKESTGFKPGSAKKGATLFKTRCQQCHTIEEGGPNKVGPNLHGIFGRHSGQVKGYSYTD
ANINKNVKWDEDSMSEYLTNPKKYIPGTKMAFAGLKKEKDRNDLITYMTKAAK
Since they are all amino acid sequences for cytochrome c, I care only about the organism (i.e. Saccharomyces cerevisiae for the above entry). So I wish to modify headers of these sequences as below:
>Saccharomyces cerevisiae
MAKESTGFKPGSAKKGATLFKTRCQQCHTIEEGGPNKVGPNLHGIFGRHSGQVKGYSYTD
ANINKNVKWDEDSMSEYLTNPKKYIPGTKMAFAGLKKEKDRNDLITYMTKAAK
Organism names always come after "OS=" and stop when either one of:
space(.*) # strain information
space..=
is met.
So could anybody give me some clues on how to make it? Thx!
You can use this:
sed 's/.*OS=\([^(]*\).*/>\1/' input

use perl to extract specific output lines

I'm endeavoring to create a system to generalize rules from input text. I'm using reVerb to create my initial set of rules. Using the following command[*], for instance:
$ echo "Bananas are an excellent source of potassium." | ./reverb -q | tr '\t' '\n' | cat -n
To generate output of the form:
1 stdin
2 1
3 Bananas
4 are an excellent source of
5 potassium
6 0
7 1
8 1
9 6
10 6
11 7
12 0.9999999997341693
13 Bananas are an excellent source of potassium .
14 NNS VBP DT JJ NN IN NN .
15 B-NP B-VP B-NP I-NP I-NP I-NP I-NP O
16 bananas
17 be source of
18 potassium
I'm currently piping the output to a file, which includes the preceding white space and numbers as depicted above.
What I'm really after is just the simple rule at the end, i.e. lines 16, 17 & 18. I've been trying to create a script to extract just that component and put it to a new file in the form of a Prolog clause, i.e. be source of(banans, potassium).
Is that feasible? Can Prolog rules contain white space like that?
I think I'm locked into getting all that output from reVerb so, what would be the best way to extract the desirable component? With a Perl script? Or maybe sed?
*Later I plan to replace this with a larger input file as opposed to just single sentences.
This seems wasteful. Why not leave the tabs as they are, and use:
$ echo "Bananas are an excellent source of potassium." \
| ./reverb -q | cut --fields=16,17,18
And yes, you can have rules like this in Prolog. See the answer by #mat. You need to know a bit of Prolog before you move on, I guess.
It is easier, however, to just make the string a a valid name for a predicate:
be_source_of with underscores instead of spaces
or 'be source of' with spaces, and enclosed in single quotes.
You can use probably awk to do what you want with the three fields. See for example the printf command in awk. Or, you can parse it again from Prolog directly. Both are beyond the scope of your current question, I feel.
sed -n 'N;N
:cycle
$!{N
D
b cycle
}
s/\(.*\)\n\(.*\)\n\(.*\)/\2 (\1,\3)/p' YourFile
if number are in output and not jsut for the reference, change last sed action by
s/\^ *[0-9]\{1,\} \{1,\}\(.*\)\n *[0-9]\{1,\} \{1,\}\(.*\)\n *[0-9]\{1,\} \{1,\}\(.*\)/\2 (\1,\3)/p
assuming the last 3 lines are the source of your "rules"
Regarding the Prolog part of the question:
Yes, Prolog facts can contain whitespace like this, with suitable operator declarations present.
For example:
:- op(700, fx, be).
:- op(650, fx, source).
:- op(600, fx, of).
Example query and its result, to let you see the shape of terms that are created with this syntax:
?- write_canonical(be source of(a, b)).
be(source(of(a,b))).
Therefore, with these operator declarations, a fact like:
be source of(a, b).
is exactly the same as stating:
be(source(of(a,b)).
Depending on use cases and other definitions, it may even be an advantage to create this kind of facts (i.e., facts of the form be/1 instead of source_of/2). If this is the only kind of facts you need, you can simply write:
source_of(a, b).
This creates no redundant wrappers and is easier to use.
Or, as Boris suggested, you can use single quotes as in 'be source of'/2.

Sum up several files in emacs?

I have roughly a gazillion files of the same architecture. Is there a way to create a buffer which will present a summary of those files? possibly with org-mode?
Each file is formated as:
q val counts
1 0.05 2500
4 0.01 2500
10 0.002 2500
.
.
.
.
The files are each in their own folder:
prog
|
+fold1
| |
| ----file1
+fold2
| |
| ----file1
-fold3
|
----file1
I'm not certain what should the summary conclude. I think that the first 3 lines + the averages of each file.
Expanding Juancho's comment, in org-mode, you can define a shell source block :
#+begin_src sh :results output raw replace
for i in /path/to/files/; do
head -n 3 $i
# I let you compute the average you want using awk and whatnot
done
#+end_src
and execute it with C-c C-c (cursor being in the block).
You may want to read the manual concerning header arguments if you want the results to be formatted in some particular fashion. (And you can do that in any language org-mode supports.)