compare & merge files-unix

compare & merge files-unix - perl

cat file1.txt
Id leng sal mon
25671 34343 56565 5565
44888 56565 45554 6868
23343 23423 26226 6224
77765 88688 87464 6848
66776 23343 63463 4534
cat file2.txt
Id sp He Ho
25671 33 45 35
34353 64 75 33
77765 56 56 67
cat output.txt
Id leng sal sp He Ho
25671 34343 56565 33 45 35
77765 88688 87464 56 56 67
Compare both file1.txt & file2.txt, if the column1 is same in both files(file1.txt & file2.txt), report in separte output(output.txt) only matched one by merging (ignore 4th column in file1.txt, while merging output file).
I have tried cat file1.txt file2.txt|sort-u >output.txt. But it does not work. Any awk,trick using join is appreciated.

awk 'NR==FNR{ s[$1] = $2 " " $3 }
NR!=FNR{ if( $1 in s ) print $1, s[$1], $2,$3,$4}' file1.txt file2.txt

join -o 0 1.2 1.3 2.2 2.3 2.4 <(sort file1.txt) <(sort file2.txt) |sort -n | tr ' ' '\t'

This might work for you (GNU sed):
cat <<\! >file1.txt
> Id leng sal mon
> 25671 34343 56565 5565
> 44888 56565 45554 6868
> 23343 23423 26226 6224
> 77765 88688 87464 6848
> 66776 23343 63463 4534
> !
cat <<\! >file2.txt
> Id sp He Ho
> 25671 33 45 35
> 34353 64 75 33
> 77765 56 56 67
> !
sed 's|^\(\S*\)\s*\(.*\)|/^\1/s/\\(\\(\\S*\\s*\\)\\{3\\}\\).*/\\1\2/p|' file2.txt
/^Id/s/\(\(\S*\s*\)\{3\}\).*/\1sp He Ho/p
/^25671/s/\(\(\S*\s*\)\{3\}\).*/\133 45 35/p
/^34353/s/\(\(\S*\s*\)\{3\}\).*/\164 75 33/p
/^77765/s/\(\(\S*\s*\)\{3\}\).*/\156 56 67/p
sed 's|^\(\S*\)\s*\(.*\)|/^\1/s/\\(\\(\\S*\\s*\\)\\{3\\}\\).*/\\1\2/p|' file2.txt |
sed -nf - file1.txt
Id leng sal sp He Ho
25671 34343 56565 33 45 35
77765 88688 87464 56 56 67
Explanation:
Convert file2.txt into a sed script that transforms file1.txt into the required format.

Related

Extracting all rows containing a specific datetime value (MATLAB)

I have a table which looks like this:
Entry number
Timestamp
Value1
Value2
Value3
Value4
5758
28-06-2018 16:30
34
63
34.2
60.9
5759
28-06-2018 17:00
33.5
58
34.9
58.4
5758
28-06-2018 16:30
34
63
34.2
60.9
5759
28-06-2018 17:00
33.5
58
34.9
58.4
5760
28-06-2018 17:30
33
53
35.2
58.5
5761
28-06-2018 18:00
33
63
35
57.9
5762
28-06-2018 18:30
33
61
34.6
58.9
5763
28-06-2018 19:00
33
59
34.1
59.4
5764
28-06-2018 19:30
28
89
33.5
64.2
5765
28-06-2018 20:00
28
89
33
66.1
5766
28-06-2018 20:30
28
83
32.5
67
5767
28-06-2018 21:00
29
89
32.2
68.4
Where '28-06-2018 16:30' is under one column. So I have 6 columns:
Entry number, Timestamp, Value1, Value2, Value3, Value4
I want to extract all rows that belong to '28-06-2018', i.e all data pertaining to that day. Since my table is too large I couldn't fit more data, however, the entries under the timestamp range for a couple of months.

t=table([5758;5759],["28-06-2018 16:30";"29-06-2018 16:30"],[34;33.5],'VariableNames',{'Entry number','Timestamp','Value1'})
t =
2×3 table
Entry number Timestamp Value1
____________ __________________ ______
5758 "28-06-2018 16:30" 34
5759 "29-06-2018 16:30" 33.5
t(contains(t.('Timestamp'),"28-06"),:)
ans =
1×3 table
Entry number Timestamp Value1
____________ __________________ ______
5758 "28-06-2018 16:30" 34

Add null to the columns which are empty

I am trying to put null to the columns which are empty using perl or awk, to find the number of column , header's column count can be used. I tried to perform the solution using perl and some regex. However, the output looks very close to the desired output but if noticed carefully row number one is showing incorrect data.
Input data:
id name type foo-id zoo-id loo-id-1 moo-id-2
----- --------------- ----------- ------ ------ ------ ------
0 zoo123 soozoo 8 31 32
51 zoo213 soozoo 48 51
52 asz123 soozoo 47 52
53 asw122 soozoo 1003 53
54 fff123 soozoo 68 54
55 sss123 soozoo 75 55
56 ssd123 soozoo 76 56
Expected Output:
0 zoo123 soozoo 8 null 31 32
51 zoo213 soozoo 48 51 null null
52 asz123 soozoo 47 52 null null
53 asw122 soozoo 1003 53 null null
54 fff123 soozoo 68 54 null null
55 sss123 soozoo 75 55 null null
56 ssd123 soozoo 76 56 null null
Very close to solution but row-1 is showing incorrect data:
echo "$x"|grep -E '^[0-9]+' |perl -ne 'm/^([\d]+)(?:\s+([\w]+))?(?:\s+([-\w]+))?(?:\s+([\d]+))?(?:\s+([\d]+))?(?:\s+([\d]+))?(?:\s+([\d]+))?/;printf "%s %s %s %s %s %s %s\n", $1, $2//"null", $3//"null",$4//"null",$5//"null",$6//"null",$7//"null"' |column -t
0 zoo123 soozoo 8 31 32 null
51 zoo213 soozoo 48 51 null null
52 asz123 soozoo 47 52 null null
53 asw122 soozoo 1003 53 null null
54 fff123 soozoo 68 54 null null
55 sss123 soozoo 75 55 null null
56 ssd123 soozoo 76 56 null null

When you have a fixed-width string to parse, you'll find that unpack() is a better tool than regexes.
This should demonstrate how to do it. I'll leave it to you to convert it to a one-liner.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
while (<DATA>) {
next if /^\D/; # Skip lines that don't start with a digit
# I worked out the unpack() template by counting columns.
my #data = map { /\S/ ? $_ : 'null' } unpack('A7A14A16A8A8A8A8');
say join ' ', #data;
}
__DATA__
id name type foo-id zoo-id loo-id-1 moo-id-2
----- --------------- ----------- ------ ------ ------ ------
0 zoo123 soozoo 8 31 32
51 zoo213 soozoo 48 51
52 asz123 soozoo 47 52
53 asw122 soozoo 1003 53
54 fff123 soozoo 68 54
55 sss123 soozoo 75 55
56 ssd123 soozoo 76 56
Output:
$ perl unpack | column -t
0 zoo123 soozoo 8 null 31 32
51 zoo213 soozoo 48 51 null null
52 asz123 soozoo 47 52 null null
53 asw122 soozoo 1003 53 null null
54 fff123 soozoo 68 54 null null
55 sss123 soozoo 75 55 null null
56 ssd123 soozoo 76 56 null null

With GNU awk:
awk 'NR>2{ # ignore first and second row
NF=7 # fix number of columns
for(i=1; i<=NF; i++) # loop with all columns
if($i ~ /^ *$/){ # if empty or only spaces
$i="null"
}
print $0}' FIELDWIDTHS='7 14 16 8 8 10 8' OFS='|' file | column -s '|' -t
As one line:
awk 'NR>2{NF=7; for(i=1;i<=NF;i++) if($i ~ /^ *$/){$i="null"} print $0}' FIELDWIDTHS='7 14 16 8 8 10 8' OFS='|' file | column -s '|' -t
Output:
0 zoo123 soozoo 8 null 31 32
51 zoo213 soozoo 48 51 null null
52 asz123 soozoo 47 52 null null
53 asw122 soozoo 1003 53 null null
54 fff123 soozoo 68 54 null null
55 sss123 soozoo 75 55 null null
56 ssd123 soozoo 76 56 null null
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

Add unique rows for each group when similar group repeats after certain rows

Hi Can anyone help me please to get unique group number?
I need to give unique rows for each group even when same group repeats after some groups.
I have following data:
id version product startdate enddate
123 0 2443 2010/09/01 2011/01/02
123 1 131 2011/01/03 2011/03/09
123 2 131 2011/08/10 2012/09/10
123 3 3009 2012/09/11 2014/03/31
123 4 668 2014/04/01 2014/04/30
123 5 668 2014/05/01 2016/01/01
123 6 668 2016/01/02 2017/09/08
123 7 131 2017/09/09 2017/10/10
123 8 131 2018/10/11 2019/01/01
123 9 550 2019/01/02 2099/01/01
select *,
dense_rank()over(partition by id order by id,product)
from table
Expected results:
id version product startdate enddate count
123 0 2443 2010/09/01 2011/01/02 1
123 1 131 2011/01/03 2011/03/09 2
123 2 131 2011/08/10 2012/09/10 2
123 3 3009 2012/09/11 2014/03/31 3
123 4 668 2014/04/01 2014/04/30 4
123 5 668 2014/05/01 2016/01/01 4
123 6 668 2016/01/02 2017/09/08 4
123 7 131 2017/09/09 2017/10/10 5
123 8 131 2018/10/11 2019/01/01 5
123 9 550 2019/01/02 2099/01/01 6

Try the following
SELECT
id,version,product,startdate,enddate,
1+SUM(v)OVER(PARTITION BY id ORDER BY version) n
FROM
(
SELECT
*,
IIF(LAG(product)OVER(PARTITION BY id ORDER BY version)<>product,1,0) v
FROM TestTable
) q

Select only those records which are twice in postgres

select distinct(msg_id),sub_id from programs where sub_id IN
(
select sub_id from programs group by sub_id having count(sub_id) = 2 limit 5
)
sub_id means subscriberID
Inner query will return those subscriberID which are exactly 2 times in the program table and main query will gives those subscriberID which having distinct msg_id.
This result will generated
msg_id sub_id
------|--------|
112 | 313
111 | 222
113 | 313
115 | 112
116 | 112
117 | 101
118 | 115
119 | 115
110 | 222
I want it should be
msg_id sub_id
------|--------|
112 | 313
111 | 222
113 | 313
115 | 112
116 | 112
118 | 115
119 | 115
110 | 222
117 | 101 (this result should not be in output because its only once)
I want only those record which are twice.

I'm not sure, but are you just missing the second field in your in-list?
select distinct msg_id, sub_id, <presumably other fields>
from programs
where (sub_id, msg_id) IN
(
select sub_id, msg_id
from programs
group by sub_id, msg_id
having count(sub_id) = 2
)
If so, you can also do this with a windowing function:
with cte as (
select
msg_id, sub_id, <presumably other fields>,
count (*) over (partition by msg_id, sub_id) as cnt
from programs
)
select distinct
msg_id, sub_id, <presumably other fields>
from cte
where cnt = 2

try this
SELECT msg_id, MAX(sub_id)
FROM programs
GROUP BY msg_id
HAVING COUNT(sub_id) = 2 -- COUNT(sub_id) > 1 if you want all those that repeat more than once
ORDER BY msg_id

postgres: select max returns 9 instead of 10

This is my script:
SELECT MAX(distinct TRIM(value, 'SAMPLE_VALUES_'))
FROM sample
WHERE id = 79;
My data is somehow like this:
id | value
-------------------
79 | SAMPLE_VALUES_6
79 | SAMPLE_VALUES_7
79 | SAMPLE_VALUES_7
79 | SAMPLE_VALUES_8
79 | SAMPLE_VALUES_8
79 | SAMPLE_VALUES_8
79 | SAMPLE_VALUES_9
79 | SAMPLE_VALUES_9
79 | SAMPLE_VALUES_10
79 | SAMPLE_VALUES_10
79 | SAMPLE_VALUES_10
But it always returning 9.
Is there something wrong with my script? Thanks for your help.

Cast into integer.
SELECT MAX(distinct cast(TRIM(value, 'SAMPLE_VALUES_') as integer))
FROM sample
WHERE id = 79

Try casting the trimmed result to an integer
SELECT MAX(distinct TRIM(value, 'SAMPLE_VALUES_')::int)
FROM sample
WHERE id = 79;
Your current query is calling MAX(text) rather than MAX(integer). The rules for comparing text strings is different to comparing numbers - strings are compared character by character, and the first character of '9' is > the first character of '10'.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

compare & merge files-unix - perl

awk 'NR==FNR{ s[$1] = $2 " " $3 } NR!=FNR{ if( $1 in s ) print $1, s[$1], $2,$3,$4}' file1.txt file2.txt

join -o 0 1.2 1.3 2.2 2.3 2.4 <(sort file1.txt) <(sort file2.txt) |sort -n | tr ' ' '\t'

Related

Extracting all rows containing a specific datetime value (MATLAB)

Add null to the columns which are empty

Add unique rows for each group when similar group repeats after certain rows

Select only those records which are twice in postgres

postgres: select max returns 9 instead of 10

Categories

Resources