Merge specific rows from two files if number in row file 1 is between two numbers in row in file 2 - sed

I'm searching for a couple of hours (actually already two days) but I can't find an answer to my problem yet. I've tried Sed and Awk but I can't get the parameters right.
Essentially, this is what I'm looking for
FOR every line in file_1
IF [value in colum2 in file_1]
IS EQUAL TO [value in column 4 in some row in file_2]
OR IS EQUAL TO [value in column 5 in some row in file_2]
OR IS BETWEEN [value column 4 and value column 5 in some row in file_2]
THAN
ADD column 3, 6 and 7 of some row of file_2 to column 3, 4 and 5 of file_1
NB: Values that needs to be compared are INTs, values in col 3, 6 and 7 (that only needs to be copied) are STRINGs
And this is the context, but probably not necessary to read:
I've two files with genome data which I want to merge in a specific way (the columns are tab separated)
The first file contains variants (only SNPs for the ones interested) of which, efficiently, only the second column is relevant. This column is a list of numbers (position of that variant on the chromosome)
I have a structural annotation files that contains the following data:
In column 4 is a begin position of the specific structure and in column 5 is the end position.
Column 3, 7 and 9 contains information that describes the specific structure (name of a gene etc.)
I would like to annotate the variants in the first file with the data in the annotation file. Therefore, if the number in column 2 of the variants file is equal to column 4 or 5 OR between those values in a specific row, columns 3, 7 and 9 of that specific row in the annotation needs to be added.
Sample File 1
SOME_NON_RELEVANT_STRING 142
SOME_NON_RELEVANT_STRING 182
SOME_NON_RELEVANT_STRING 320
SOME_NON_RELEVANT_STRING 321
SOME_NON_RELEVANT_STRING 322
SOME_NON_RELEVANT_STRING 471
SOME_NON_RELEVANT_STRING 488
SOME_NON_RELEVANT_STRING 497
SOME_NON_RELEVANT_STRING 541
SOME_NON_RELEVANT_STRING 545
SOME_NON_RELEVANT_STRING 548
SOME_NON_RELEVANT_STRING 4105
SOME_NON_RELEVANT_STRING 15879
SOME_NON_RELEVANT_STRING 26534
SOME_NON_RELEVANT_STRING 30000
SOME_NON_RELEVANT_STRING 30001
SOME_NON_RELEVANT_STRING 40001
SOME_NON_RELEVANT_STRING 44752
SOME_NON_RELEVANT_STRING 50587
SOME_NON_RELEVANT_STRING 87512
SOME_NON_RELEVANT_STRING 96541
SOME_NON_RELEVANT_STRING 99541
SOME_NON_RELEVANT_STRING 99871
Sample File 2
SOME_NON_RELEVANT_STRING SOME_NON_RELEVANT_STRING A1 0 38 B1 C1
SOME_NON_RELEVANT_STRING SOME_NON_RELEVANT_STRING A2 40 2100 B2 C2
SOME_NON_RELEVANT_STRING SOME_NON_RELEVANT_STRING A3 2101 9999 B3 C3
SOME_NON_RELEVANT_STRING SOME_NON_RELEVANT_STRING A4 10000 15000 B4 C4
SOME_NON_RELEVANT_STRING SOME_NON_RELEVANT_STRING A5 15001 30000 B5 C5
SOME_NON_RELEVANT_STRING SOME_NON_RELEVANT_STRING A6 30001 40000 B6 C6
SOME_NON_RELEVANT_STRING SOME_NON_RELEVANT_STRING A7 40001 50001 B7 C7
SOME_NON_RELEVANT_STRING SOME_NON_RELEVANT_STRING A8 50001 50587 B8 C8
SOME_NON_RELEVANT_STRING SOME_NON_RELEVANT_STRING A9 50588 83054 B9 C9
SOME_NON_RELEVANT_STRING SOME_NON_RELEVANT_STRING A10 83055 98421 B10 C10
SOME_NON_RELEVANT_STRING SOME_NON_RELEVANT_STRING A11 98422 99999 B11 C11
Sample output file
142 A2 B2 C2
182 A2 B2 C2
320 A2 B2 C2
321 A2 B2 C2
322 A2 B2 C2
471 A2 B2 C2
488 A2 B2 C2
497 A2 B2 C2
541 A2 B2 C2
545 A2 B2 C2
548 A2 B2 C2
4105 A3 B3 C3
15879 A5 B5 C5
26534 A5 B5 C5
30000 A5 B5 C5
30001 A6 B6 C6
40001 A7 B7 C7
44752 A7 B7 C7
50587 A8 B8 C8
87512 A10 B10 C10
96541 A10 B10 C10
99541 A11 B11 C11
99871 A11 B11 C1
1

As a start, here's how to write the algorithm you posted in awk, assuming when you said "ADD" you meant "append" and assuming all lines in file1 have unique values of the 2nd field (ran against the sample input provided):
awk '
BEGIN{ FS=OFS="\t"; startIdx=1 }
NR==FNR {
if ($2 in seen) {
printf "%s on line %d, first seen on line %d\n", $2, NR, seen[$2] | "cat>&2"
}
else {
f2s[++endIdx] = $2
seen[$2] = NR
}
next
}
{
inBounds = 1
for (idx=startIdx; (idx<=endIdx) && inBounds; idx++) {
f2 = f2s[idx]
if (f2 >= $4) {
if (f2 <= $5) {
print f2, $3, $6, $7
}
else {
inBounds = 0
}
}
else {
startIdx = idx
}
}
}
' file1 file2
142 A2 B2 C2
182 A2 B2 C2
320 A2 B2 C2
321 A2 B2 C2
322 A2 B2 C2
471 A2 B2 C2
488 A2 B2 C2
497 A2 B2 C2
541 A2 B2 C2
545 A2 B2 C2
548 A2 B2 C2
4105 A3 B3 C3
15879 A5 B5 C5
26534 A5 B5 C5
30000 A5 B5 C5
30001 A6 B6 C6
40001 A7 B7 C7
44752 A7 B7 C7
50587 A8 B8 C8
87512 A10 B10 C10
96541 A10 B10 C10
99541 A11 B11 C11
99871 A11 B11 C11

Related

AutoHotkey - limiting the number and type of characters taken from the string

I have an ahk script for an IRC client which after entering nick!ident#host into the text field and pressing F4 decrypts the ident which is the encrypted form of the IP address:
F4::
Clipboard =
Send ^a^x
ClipWait, 0
If ErrorLevel
MsgBox, 48, Error, An error occurred while waiting for the clipboard. Aborting.
Else Clipboard := decode(SubStr(Clipboard, -15, -8))
Return
decode(str) {
Static code := " " "
( LTrim Join`s
00 0x 02 03 04 0z 06 01 08 09 0B 0b 0c 0d 0e 0H x0 xx x2 x3 x4 xz x6 x1 x8 x9 xB xb xc xd xe xH 20 2x
22 23 24 2z 26 21 28 29 2B 2b 2c 2d 2e 2H 30 3x 32 33 34 3z 36 31 38 39 3B 3b 3c 3d 3e 3H 40 4x 42 43
44 4z 46 41 48 49 4B 4b 4c 4d 4e 4H z0 zx z2 z3 z4 zz z6 z1 z8 z9 zB zb zc zd ze zH 60 6x 62 63 64 6z
66 61 68 69 6B 6b 6c 6d 6e 6H 10 1x 12 13 14 1z 16 11 18 19 1B 1b 1c 1d 1e 1H 80 8x 82 83 84 8z 86 81
88 89 8B 8b 8c 8d 8e 8H 90 9x 92 93 94 9z 96 91 98 99 9B 9b 9c 9d 9e 9H B0 Bx B2 B3 B4 Bz B6 B1 B8 B9
BB Bb Bc Bd Be BH b0 bx b2 b3 b4 bz b6 b1 b8 b9 bB bb bc bd be bH c0 cx c2 c3 c4 cz c6 c1 c8 c9 cB cb
cc cd ce cH d0 dx d2 d3 d4 dz d6 d1 d8 d9 dB db dc dd de dH e0 ex e2 e3 e4 ez e6 e1 e8 e9 eB eb ec ed
ee eH H0 Hx H2 H3 H4 Hz H6 H1 H8 H9 HB Hb Hc Hd He HH
)"
Loop, % StrLen(str) / 2
new .= "." Round((Instr(code, " " SubStr(str, 2 * A_Index - 1, 2), True) - 1) / 3)
Return SubStr(new, 2)
}
Decryption is performed according to the following key:
https://pastebin.com/raw/P8cQtH2v
For example, for user data asdf!~z3040d4B#webchat will decrypt the ident from z3040d4B as 83.4.13.75 and copies this value to the clipboard.
But there are cases when the encoded form of the IP (ident) is longer or shorter than 8 characters or contains characters that aren't in the decryption key. Then it's impossible to decode the IP correctly. So I would like the script to copy the decryption result to the clipboard only if the retrieved string (between ! and #, omitting the ~ sign if present) is 8 characters long and contains the characters contained in the key I entered. Otherwise, the script should clear the clipboard. How to do it?
A regex match approach with e.g the regex !~?[A-z\d]{8}# is surely most convenient:
F4::
Clipboard := ""
SendInput, ^a^x
ClipWait, 0
if (ErrorLevel)
MsgBox, 48, Error, An error occurred while waiting for the clipboard. Aborting.
else if (Clipboard ~= "!~?[A-z\d]{8}#")
Clipboard := decode(SubStr(Clipboard, -15, -8))
else
Clipboard := ""
Return
decode(str)
{
static code := " " "
( LTrim Join`s
00 0x 02 03 04 0z 06 01 08 09 0B 0b 0c 0d 0e 0H x0 xx x2 x3 x4 xz x6 x1 x8 x9 xB xb xc xd xe xH 20 2x
22 23 24 2z 26 21 28 29 2B 2b 2c 2d 2e 2H 30 3x 32 33 34 3z 36 31 38 39 3B 3b 3c 3d 3e 3H 40 4x 42 43
44 4z 46 41 48 49 4B 4b 4c 4d 4e 4H z0 zx z2 z3 z4 zz z6 z1 z8 z9 zB zb zc zd ze zH 60 6x 62 63 64 6z
66 61 68 69 6B 6b 6c 6d 6e 6H 10 1x 12 13 14 1z 16 11 18 19 1B 1b 1c 1d 1e 1H 80 8x 82 83 84 8z 86 81
88 89 8B 8b 8c 8d 8e 8H 90 9x 92 93 94 9z 96 91 98 99 9B 9b 9c 9d 9e 9H B0 Bx B2 B3 B4 Bz B6 B1 B8 B9
BB Bb Bc Bd Be BH b0 bx b2 b3 b4 bz b6 b1 b8 b9 bB bb bc bd be bH c0 cx c2 c3 c4 cz c6 c1 c8 c9 cB cb
cc cd ce cH d0 dx d2 d3 d4 dz d6 d1 d8 d9 dB db dc dd de dH e0 ex e2 e3 e4 ez e6 e1 e8 e9 eB eb ec ed
ee eH H0 Hx H2 H3 H4 Hz H6 H1 H8 H9 HB Hb Hc Hd He HH
)"
Loop, % StrLen(str) / 2
{
if (!InStr(code, block := " " SubStr(str, 2 * A_Index - 1, 2), true))
return ""
new .= "." Round((InStr(code, block, true) - 1) / 3)
}
return SubStr(new, 2)
}
~=(docs) is the RegExMatch()(docs) shorthand.

How to read .gif image file in matlab ? I am reading it with imread command but it is not showing the same color image which is the original [duplicate]

This is my original image:
But when I load it up on MATLAB and use imshow() on it, this is how I see it:
This is the code I'm using:
I=imread('D:\Matty\pout.gif')
imshow(I)
Forget what I said earlier. It has to do with the colormap. The image seems to have a funky colormap. Generally you should be able to read the colormap with [X, map] = imread(...), but there is some clipping of the data that I don't fully understand.
I copied the colormap manually out of the raw data from a hexeditor and saved it as gif_colormap.txt
B1 B1 B1 AF AF AF AB AB AB A9 A9 A9 A7 A7 A7 A3 A3 A3 A1 A1 A1 9F 9F
9F 9D 9D 9D 9B 9B 9B 99 99 99 97 97 97 95 95 95 93 93 93 91 91 91 8F
8F 8F 8B 8B 8B 89 89 89 85 85 85 83 83 83 7F 7F 7F 7D 7D 7D 7B 7B 7B
79 79 79 77 77 77 75 75 75 71 71 71 6D 6D 6D 6B 6B 6B 69 69 69 67 67
67 65 65 65 63 63 63 61 61 61 5F 5F 5F 5D 5D 5D 5B 5B 5B 59 59 59 57
57 57 53 53 53 4D 4D 4D 4B 4B 4B E0 E0 E0 DC DC DC DA DA DA D6 D6 D6
D4 D4 D4 D2 D2 D2 D0 D0 D0 CE CE CE CC CC CC CA CA CA C8 C8 C8 C4 C4
C4 C2 C2 C2 C0 C0 C0 BE BE BE BA BA BA B8 B8 B8 B6 B6 B6 B4 B4 B4 B2
B2 B2 B0 B0 B0 AE AE AE AC AC AC AA AA AA A6 A6 A6 A4 A4 A4 A2 A2 A2
A0 A0 A0 9E 9E 9E 9C 9C 9C 9A 9A 9A 96 96 96 94 94 94 92 92 92 90 90
90 8E 8E 8E 8A 8A 8A 88 88 88 86 86 86 84 84 84 82 82 82 80 80 80 7E
7E 7E 7A 7A 7A 78 78 78 74 74 74 72 72 72 70 70 70 6E 6E 6E 6C 6C 6C
6A 6A 6A 66 66 66 62 62 62 5E 5E 5E 56 56 56 54 54 54 52 52 52 50 50
50 4E 4E 4E 4A 4A 4A DF DF DF DD DD DD DB DB DB D7 D7 D7 D5 D5 D5 D3
D3 D3 D1 D1 D1 CF CF CF CD CD CD C9 C9 C9 C7 C7 C7 C5 C5 C5 C3 C3 C3
C1 C1 C1 BD BD BD BB BB BB B9 B9 B9 B5 B5 B5 B3 B3 B3
Then I read in the new colormap and set it manually
fid = fopen('gif_colormap.txt', 'r')
A = fscanf(fid, '%x ');
fclose(fid);
my_map = reshape(A,3,121)'
im = imread('pout.gif');
%colormap has to be between 0 and 1
my_map = (my_map-min(my_map(:)))/max(my_map(:));
imshow(im,[])
%set colormap manually
colormap(my_map);
GIF is indexed format, and each image can have its own colormap. So you need to read the colormap together with the image:
[I, Imap] = imread('D:\Matty\pout.gif');
imshow(I,Imap)
I've tested it on your image and it works very well. i don't understand what was the problem #Lucas described in his answer.

Why doesn't my image load properly in MATLAB?

This is my original image:
But when I load it up on MATLAB and use imshow() on it, this is how I see it:
This is the code I'm using:
I=imread('D:\Matty\pout.gif')
imshow(I)
Forget what I said earlier. It has to do with the colormap. The image seems to have a funky colormap. Generally you should be able to read the colormap with [X, map] = imread(...), but there is some clipping of the data that I don't fully understand.
I copied the colormap manually out of the raw data from a hexeditor and saved it as gif_colormap.txt
B1 B1 B1 AF AF AF AB AB AB A9 A9 A9 A7 A7 A7 A3 A3 A3 A1 A1 A1 9F 9F
9F 9D 9D 9D 9B 9B 9B 99 99 99 97 97 97 95 95 95 93 93 93 91 91 91 8F
8F 8F 8B 8B 8B 89 89 89 85 85 85 83 83 83 7F 7F 7F 7D 7D 7D 7B 7B 7B
79 79 79 77 77 77 75 75 75 71 71 71 6D 6D 6D 6B 6B 6B 69 69 69 67 67
67 65 65 65 63 63 63 61 61 61 5F 5F 5F 5D 5D 5D 5B 5B 5B 59 59 59 57
57 57 53 53 53 4D 4D 4D 4B 4B 4B E0 E0 E0 DC DC DC DA DA DA D6 D6 D6
D4 D4 D4 D2 D2 D2 D0 D0 D0 CE CE CE CC CC CC CA CA CA C8 C8 C8 C4 C4
C4 C2 C2 C2 C0 C0 C0 BE BE BE BA BA BA B8 B8 B8 B6 B6 B6 B4 B4 B4 B2
B2 B2 B0 B0 B0 AE AE AE AC AC AC AA AA AA A6 A6 A6 A4 A4 A4 A2 A2 A2
A0 A0 A0 9E 9E 9E 9C 9C 9C 9A 9A 9A 96 96 96 94 94 94 92 92 92 90 90
90 8E 8E 8E 8A 8A 8A 88 88 88 86 86 86 84 84 84 82 82 82 80 80 80 7E
7E 7E 7A 7A 7A 78 78 78 74 74 74 72 72 72 70 70 70 6E 6E 6E 6C 6C 6C
6A 6A 6A 66 66 66 62 62 62 5E 5E 5E 56 56 56 54 54 54 52 52 52 50 50
50 4E 4E 4E 4A 4A 4A DF DF DF DD DD DD DB DB DB D7 D7 D7 D5 D5 D5 D3
D3 D3 D1 D1 D1 CF CF CF CD CD CD C9 C9 C9 C7 C7 C7 C5 C5 C5 C3 C3 C3
C1 C1 C1 BD BD BD BB BB BB B9 B9 B9 B5 B5 B5 B3 B3 B3
Then I read in the new colormap and set it manually
fid = fopen('gif_colormap.txt', 'r')
A = fscanf(fid, '%x ');
fclose(fid);
my_map = reshape(A,3,121)'
im = imread('pout.gif');
%colormap has to be between 0 and 1
my_map = (my_map-min(my_map(:)))/max(my_map(:));
imshow(im,[])
%set colormap manually
colormap(my_map);
GIF is indexed format, and each image can have its own colormap. So you need to read the colormap together with the image:
[I, Imap] = imread('D:\Matty\pout.gif');
imshow(I,Imap)
I've tested it on your image and it works very well. i don't understand what was the problem #Lucas described in his answer.

Scrolling Button On Scroll View with Middle Button Selected

Please guide me on how can I create a dynamic button a scroll view that will be scrollable, with the middle button always selected.
like this
b1 b2 b3 b4 b5(selected) b6 b7 b8 b9
if user select b4 it should seems like that
b1 b2 b3 b4(selected) b5 b6 b7 b8
b2 b3 b4 b5 b6(selected) b7 b8 b9
the way should like that. Please guide.

Translate unreadable Russian text

I'm trying to read documentation which was written in I believe is Russian, but I'm not sure if what I'm seeing is even encoded correctly. The text looks something like this:
Ãåíåðèðóåò ìàòðèöó ñëó÷àéíûõ ÷èñåë â äèàïàçîíå îò -1 äî 1
(appears as several special A's and o's)
when opened in Firefox. In other programs it looks like this:
���������� ������� ��������� ����� � ��������� �� -1 �� 1
(appears as several question marks)
Is there any hope to translate this?
Decode as CP1251.
>>> print u'Ãåíåðèðóåò ìàòðèöó ñëó÷àéíûõ ÷èñåë â äèàïàçîí'.encode('latin-1').decode('cp1251')
Генерирует матрицу случайных чисел в диапазон
You need to determine which of multiple possible Cyrillic codesets was used - the linked site lists more than a dozen possibilities, of which ISO 8859-5 and CP-1251 are perhaps the most likely.
You may be able to get one of the translation web sites (Babelfish or Google, and no doubt others) to help. However, you may have to translate from the original codeset to UTF-8 to get it to work -- simply copying the bytes above did not work.
When copying the original text to a Mac, it was encoded as UTF-8:
0x0000: C3 83 C3 A5 C3 AD C3 A5 C3 B0 C3 A8 C3 B0 C3 B3 ................
0x0010: C3 A5 C3 B2 20 C3 AC C3 A0 C3 B2 C3 B0 C3 A8 C3 .... ...........
0x0020: B6 C3 B3 20 C3 B1 C3 AB C3 B3 C3 B7 C3 A0 C3 A9 ... ............
0x0030: C3 AD C3 BB C3 B5 20 C3 B7 C3 A8 C3 B1 C3 A5 C3 ...... .........
0x0040: AB 20 C3 A2 20 C3 A4 C3 A8 C3 A0 C3 AF C3 A0 C3 . .. ...........
0x0050: A7 C3 AE C3 AD C3 A5 20 C3 AE C3 B2 20 2D 31 20 ....... .... -1
0x0060: C3 A4 C3 AE 20 31 0A .... 1.
0x0067:
So, to translate this with Perl, I used the Encode module first to convert the UTF-8 string back to Latin-1, and then I told Perl to treat the Latin-1 as if it was CP-1251 and convert that back to UTF-8:
#!/usr/bin/env perl
use Encode qw( from_to );
my $source = 'Ãåíåðèðóåò ìàòðèöó ñëó÷àéíûõ ÷èñåë â äèàïàçîíå îò -1 äî 1';
# from_to changes things 'in situ'
my $nbytes = from_to($source, "utf-8", "latin-1");
# print "$nbytes: $source\n";
$nbytes = from_to($source, "cp-1251", "utf-8");
print "$nbytes: $source\n";
The output is:
102: Генерирует матрицу случайных чисел в диапазоне от -1 до 1
Which Babelfish translates as:
102: It generates the matrix of random numbers in the range from -1 to 1
and Google translates as:
102: Generate a matrix of random numbers ranging from -1 to 1
The initial UTF-8 to Latin-1 translation was required because of the setup on my Mac (my terminal uses UTF-8 by default, etc): YMMV.