Why is sed -E find/replace showing unexpected output - sed

I'm trying to extract sha256:13b918c5a5eadfed53597146332889dc5e10d1a8edbcdc42f7a872531766aab8 from the following output. The output is in a file called d2.txt.
d2.txt:
The push refers to repository [...]
331ebf1e6bb7: Layer already exists
9bb0b3c0e55b: Layer already exists
9f59b9615f5e: Layer already exists
82621df65774: Layer already exists
3e123f0af898: Layer already exists
93defbb4091e: Layer already exists
bc21254008da: Layer already exists
53619ba80b4a: Layer already exists
18eb03bf3058: Layer already exists
daf4ddfb16e5: Layer already exists
b5639327d5be: Layer already exists
30ccd09e6f92: Layer already exists
167efff21776: Layer already exists
fee20f1b745d: Layer already exists
d0fe97fa8b8c: Layer already exists
v1.0: digest: sha256:13b918c5a5eadfed53597146332889dc5e10d1a8edbcdc42f7a872531766aab8 size: 3470
Using grep I can use the following to identify the line with the digest:
grep -E '^.*(sha256:[a-z0-9]{64}).*' d2.txt
which returns:
v1.0: digest: sha256:13b918c5a5eadfed53597146332889dc5e10d1a8edbcdc42f7a872531766aab8 size: 3470
Using the parenthesis from this regex to define capture group 1 (the sha256:hash), I attempt to run this in sed. But Instead getting just the line with the digest, I get all the lines in d2.txt. (With the correct capture group on the sha256 line!).
sed -E s/'^.*(sha256:[a-z0-9]{64}).*'/'\1'/g d2.txt
returns:
The push refers to repository [...]
331ebf1e6bb7: Layer already exists
9bb0b3c0e55b: Layer already exists
9f59b9615f5e: Layer already exists
82621df65774: Layer already exists
3e123f0af898: Layer already exists
93defbb4091e: Layer already exists
bc21254008da: Layer already exists
53619ba80b4a: Layer already exists
18eb03bf3058: Layer already exists
daf4ddfb16e5: Layer already exists
b5639327d5be: Layer already exists
30ccd09e6f92: Layer already exists
167efff21776: Layer already exists
fee20f1b745d: Layer already exists
d0fe97fa8b8c: Layer already exists
sha256:13b918c5a5eadfed53597146332889dc5e10d1a8edbcdc42f7a872531766aab8
So why does sed return the full text all lines where there is no regex match?

You can use grep with -o to print only the matched part.
grep -Eo 'sha256:[[:alnum:]]{64}' d2.txt
Or with sed you can prevent the default printing with -n and use p to only print the line with the substitution.
sed -En 's/^.*(sha256:[a-z0-9]{64}).*/\1/p' d2.txt
Both will output
sha256:13b918c5a5eadfed53597146332889dc5e10d1a8edbcdc42f7a872531766aab8

With your shown samples, in awk with awk's match function you can try following code:
awk 'match($0,/sha256:[a-z0-9]{64}/){print substr($0,RSTART,RLENGTH)}' d2.txt
In case you have ONLY ONE MATCH in whole file and you want to print it then use exit also with above code to make this faster and we need not to read whole file then,
awk 'match($0,/sha256:[a-z0-9]{64}/){print substr($0,RSTART,RLENGTH);exit}' d2.txt

Related

Filtering tshark output for .csv. Preventing errors from missing fields

I am trying to filter a pcap file in tshark wit a lua script and ultimately output it to a .csv. I am most of the way there but I am still running into a few issues.
This is what I have so far
tshark -nr -V -X lua_script:wireshark_dissector.lua -r myfile.pcap -T fields -e frame.time_epoch -e Something_UDP.field1 -e Something_UDP.field2 -e Something_UDP.field3 -e Something_UDP.field4 -e Something_UDP.field5 -e Something_UDP.field6 -e Something_UDP.field15 -e Something_UDP.field16 -e Something_UDP.field18 -e Something_UDP.field22 -E separator=,
Here is an example of what the frames look like, sort of.
frame 1
time: 1626806198.437893000
Something_UDP.field1: 0
Something_UDP.field2: 1
Something_UDP.field3:1
Something_UDP.field5:1
Something_UDP.field6:1
frame 2
time: 1626806198.439970000
Something_UDP.field8: 1
Something_UDP.field9: 0
Something_UDP.field13: 0
Something_UDP.field14: 0
frame 3
time: 1626806198.440052000
Something_UDP.field15: 1
Something_UDP.field16: 0
Something_UDP.field18: 1
Something_UDP.field19:1
Something_UDP.field20:1
Something_UDP.field22: 0
Something_UDP.field24: 0
The output I am looking for would be
1626806198.437893000,0,1,1,,1,1,1,,,,,
1626806198.440052000,,,,,,,,,1,0,,1,1,1,,0,0,,,,
That is if the frame contains one of the fields I am looking for it will output its value followed by a comma but if that field isn't there it will output a comma. One issue is that not every frame contains info that I am interested in and I don't want them to be outputted. Part of the issue with that is that one of the fields I need is epoch time and that will be in every frame but that is only important if the other fields are there. I could use awk or grep to do this but wondering if it can all be done inside tshark. The other issue is that the fields being requested will com from a text file and there may be fields in the text file that don't actually exist in the pcap file and if that happens I get a "tshark: Some fields aren't valid:" error.
In short I have 2 issues.
1: I need to print data only it the fields names match but not if the only match is epoch.
2: I need it to work even if one of the fields being requested doesn't exist.
I need to print data only it the fields names match but not if the only match is epoch.
Try using a display filter that mentions all the field names in which you're interested, with an "or" separating them, such s
-Y "Something_UDP.field1 or Something_UDP.field2 or Something_UDP.field3 or Something_UDP.field4 or Something_UDP.field5 or Something_UDP.field6 or Something_UDP.field15 or Something_UDP.field16 or Something_UDP.field18 or Something_UDP.field22"
so that only packets containing at least one of those fields will be processed.
I need it to work even if one of the fields being requested doesn't exist.
Then you will need to construct the command line on the fly, avoiding field names that aren't valid.
One way, in a script, to test whether a field is valid is to use the dftest command:
dftest Something_UDP.field1 >/dev/null 2>&1
will exit with a status of 0 if there's a field named "Something_UDP.field1" and will exit with a status of 2 if there isn't; if the scripting language you're using can check the exit status of a command to see if it succeeds, you can use that.

sh: can't return one result after comparing 2 files

as an example I will put different inputs to keep the privacy of my files and to avoid long text, these are of the following form :
INPUT1.cfg :
TC # aa # D317
TC # bb # D314
TC # cc # D315
TC # dd # D316
INPUT2.cfg
BL;nn;3
LY;ww;3
LO;xx;3
TC;vv;3
TC;dd;3
OD;pp;3
TC;aa;3
what I want to do is iterate the name (column 2) in the rows of input1 and compare with the name (column 2) in the rows of input2; if they match we will get the line of INPUT2 in an output file otherwise it will return that the table is not found, here is my try code:
#!/bin/bash
input1="input1.cfg";
input2="input2.cfg"
cat $input1|while read line
do
TableNameIN=`echo $line|cut -d"#" -f2`
cat $input2| while read line
do
TableNameOUT=`echo $line|cut -d";" -f2`
if echo "$TableNameOUT" | grep -q $TableNameIN;
then echo "$line" >> output.txt
else
echo "Table $TableNameIN non trouvé"
fi
done
done
this what i get as result :
Table bb not found
Table bb not found
Table bb not found
Table cc not found
Table cc not found
Table cc not found
I manage to write what is equal but the problem with my code is that it has in output "table not found" for each row whereas I just want to write only once at the end of the comparison of all the lines
here is the output i want to get :
Table bb not found
Table cc not found
Can any one help me with this , PS : I don't want to use awk because it's just a part of my code and i already use sh
Assumptions:
for file input2.cfg the 2nd column (table name) is unique
input2.cfg is not so large that we run the risk of using up all memory for storing intput2.cfg in an associative array (otherwise we could store the table names from input1.cfg's - assuming this is a smaller file - in the array and swap the processing order of the two files)
there are no explicit requirements for data to be sorted (otherwise we may need to add a sort or two)
a bash solution is sufficient (based on inclusion of the #!/bin/bash shebang in OPs current code)
There are many ways to slice-n-dice this one (awk being my preference but OP doesn't want to use awk). For this particular answer I'll pull the awk steps out into separate bash commands.
NOTE: While we could use a set of nested loops (as in the OPs code), I've opted to use an associative array to store input2.cfg thus eliminating the need to repeatedly scan input2.cfg.
#!/usr/bin/bash
input1=input1.cfg
input2=input2.cfg
> output.txt # clear out the target file
# load ${input2} into an associative array
unset lines
typeset -A lines # associative array for storing contents of ${input2}
while read -r line
do
x="${line%;*}" # use parameter expansion
tabname="${x#*;}" # to parse out table name
lines["${tabname}"]="${line}" # add to array
done < "${input2}"
# process ${input1}
while read -r c1 c2 tabname rest_of_line
do
[[ -v lines["${tabname}"] ]] && # if tabname has an entry in our array
echo "${lines[${tabname}]}" >> output.txt && # then dump the associated line (from ${input2}) to output.txt
continue # process next line from ${input1}
echo "Table ${tabname} not found" # otherwise print 'not found' message
done < "${input1}"
# display contents of output.txt
echo "++++++++++++++++ output.txt"
cat output.txt
echo "++++++++++++++++"
This generates the following:
Table bb not found
Table cc not found
++++++++++++++++ output.txt
TC;aa;3
TC;dd;3
++++++++++++++++

Greenplum : Getting filenames processed via an external table

we are processing multiple files using external table. Is there any way I can get the file name being processed in external tables and stored it in database table?
Only workaround I can find is appending the file name to every record in the flat file which isn't ideal when huge dataset and multiple files.
Can anyone help on this
Thanks
No, the file name is simply never passed from the gpfdist daemon back to Greenplum. So you have to append the file name to each line - you can use gpfdist transformation for doing so
I was struggling with this as well, here's my solution. Please note I'm not an expert in linux, so there may be a one liner solution.
So I wanted to add a filename column in front of my records.
That can be done in sed, I've created a transform.sh file, with the following content:
#/bin/sh
filename=$1
#echo $filename >> transform.txt
sed -e "s|^|$filename\v|" $filename
Please note that I was using vertical tab as a delimiter, \v. Also in the filename you could have / hence using | . In order to have the value of $filename we have to use double quites for sed.
Test it, it looks good.
./transform.sh countersamples-2016-03-02--11-51-10.csv
countersamples-2016-03-02--11-51-10.csv
timestamp
machine
category
instance
name
value
countersamples-2016-03-02--11-51-10.csv
2016-03-02 11:51:10.064
DESKTOP-4PLQKVL
Memory
% Committed Bytes In Use
74.8485488891602
This part is done, lets continue with gpfdist. We need a yaml file that can be passed to gpfdist, I named this transform.yaml
Content:
---
VERSION: 1.0.0.1
TRANSFORMATIONS:
add_filename:
TYPE: input
CONTENT: data
COMMAND: /bin/bash transform.sh %filename%
Please note that we have the %filename% value here. It seems that gpfdist prefilters the files that needs to be handled, and passes them 1 by 1 to our transform.
Lets fire up gpfdist:
gpfdist -c transform.yaml -v
Now go into greenplum and create an external table such as:
CREATE READABLE EXTERNAL TABLE "ext_transform"
(
"filename" text,
"timestamp" timestamp without time zone ,
"machine" text ,
"category" text ,
"instance" text ,
"name" text ,
"value" double precision
)
LOCATION ('gpfdist://localhost:8080/*/countersamples*.csv#transform=add_filename')
FORMAT 'TEXT'
( HEADER DELIMITER '\013' NULL AS '\\N' ESCAPE AS '\\' )
And when we select data from it:
select * from "ext_transform";
We see:
I've created 2 folders to see how it reacts if the files are not in the same folder as the transform. This way I can distinguish between the 2 files, even if their data is identical.

Replace matches of one regex expression with matches from another, across two files

I am currently helping a friend reorganise several hundred images on a database driven website. I have generated a list of the new, reorganised image paths offline and would like to replace each matching image reference in the sql export of the database with the new paths.
EDIT: Here is an example of what I am trying to achieve
The new_paths_list.txt is a file that I generated using a batch script after I had organised all of the existing images into folders. Prior to this all of the images were in just a few folders. A sample of this generated list might be:
image/data/product_photos/telephones/snom/snom_xyz.jpg
image/data/product_photos/telephones/gigaset/giga_xyz.jpg
A sample of my_exported_db.sql (the database exported from the website) might be:
...
,(110,32,'data/phones/snom_xyz.jpg',3),(213,50,'data/telephones/giga_xyz.jpg',0),
...
The result I want is my_exported_db.sql to be:
...
,(110,32,'data/product_photos/telephones/snom/snom_xyz.jpg',3),(213,50,'data/product_photos/telephones/gigaset/giga_xyz.jpg',0),
...
Some pseudo code to illustrate:
1/ Find the first image name in my_exported_db.sql, such as 'snom_xyz.jpg'.
2/ Find the same image name in new_paths_list.txt
3/ If it is present, copy the whole line (the path and filename)
4/ Replace the whole path in in my_exported_db.sql of this image with the copied line
5/ Repeat for all other image names in my_exported_db.sql
A regex expression that appears to match image names is:
([^)''"/])+\.(?:jpg|jpeg|gif|png)
and one to match image names, complete with path (for relative or absolute) is:
\bdata[^)''"\s]+\.(?:jpg|jpeg|gif|png)
I have looked around and have seen that Sed or Awk may be capable of doing this, but some pointers would be greatly appreciated. I understand that this will only work accurately if there are no duplicated filenames.
You can use sed to convert new_paths_list.txt into a set of sed replacement commands:
sed 's|\(.*\(/[^/]*$\)\)|s#data\2#\1#|' new_paths_list.txt > rules.sed
The file rules.sed will look like this:
s#data/snom_xyz.jpg#image/data/product_photos/telephones/snom/snom_xyz.jpg#
s#data/giga_xyz.jpg#image/data/product_photos/telephones/gigaset/giga_xyz.jpg#
Then use sed again to translate my_exported_db.sql:
sed -i -f rules.sed my_exported_db.sql
I think in some shells it's possible to combine these steps and do without rules.sed:
sed 's|\(.*\(/[^/]*$\)\)|s#data\2#\1#|' new_paths_list.txt | sed -i -f - my_exported_db.sql
but I'm not certain about that.
EDIT<:
If the images are in several directories under data/, make this change:
sed "s|image/\(.*\(/[^/]*$\)\)|s#[^']*\2#\1#|" new_paths_list.txt > rules.sed

How do you search by DN in LDAP?

I'm pulling information about a user from LDAP. This includes directReports, which is in the full CN=cnBlah, OU=ouBlah, DC=dcBlah form. I'm trying to do another lookup to find info about the reportee.
So far the only way I've been able to actually find said user is to break out the CN= and set the remainder of the string as the base.
Is this the proper way of doing it? Or is there a way to search for an entry given the full DN?
Use the DN as the base object in the search and set the scope of the search to base.
Calling ldapsearch with the -f option would do pretty much what you want.
Save your first search results to a file, with only the value of the cn attribute. For example, your file would look like this :
users.txt:
user1
user2
cnBlah
john
jim
user883
Then call ldapsearch with a base that is high enough to encompass all users. This could be -b dc=users,dc=example,dc=com.
So if you saved your user list to a file named users.txt, your ldapsearch command line would look like this :
#I removed the hostname, port and authentification for clarity
ldapsearch -b "dc=users,dc=example,dc=com" -s sub "cn=%s" -f users.txt -LLL
Long lines will wrap at ~76 characters. Nothing that a pipe through perl -p00e 's/\r?\n //g' can't fix. (Or just add option -o ldif-wrap=no to your ldapsearch commandline.)
Closing the loop on this question, courtesy of https://www.openldap.org/lists/openldap-software/200503/msg00520.html
When you know the DN of an entry, there is no need to "search" for it all, just retrieve the entry directly:
ldapsearch -x -LLL -b "uid=droy,ou=people,dc=eclipse,dc=org"
So that answers the "how do you use ldapsearch to lookup() an item rather than search for it"