Filtering tshark output for .csv. Preventing errors from missing fields - tshark

I am trying to filter a pcap file in tshark wit a lua script and ultimately output it to a .csv. I am most of the way there but I am still running into a few issues.
This is what I have so far
tshark -nr -V -X lua_script:wireshark_dissector.lua -r myfile.pcap -T fields -e frame.time_epoch -e Something_UDP.field1 -e Something_UDP.field2 -e Something_UDP.field3 -e Something_UDP.field4 -e Something_UDP.field5 -e Something_UDP.field6 -e Something_UDP.field15 -e Something_UDP.field16 -e Something_UDP.field18 -e Something_UDP.field22 -E separator=,
Here is an example of what the frames look like, sort of.
frame 1
time: 1626806198.437893000
Something_UDP.field1: 0
Something_UDP.field2: 1
Something_UDP.field3:1
Something_UDP.field5:1
Something_UDP.field6:1
frame 2
time: 1626806198.439970000
Something_UDP.field8: 1
Something_UDP.field9: 0
Something_UDP.field13: 0
Something_UDP.field14: 0
frame 3
time: 1626806198.440052000
Something_UDP.field15: 1
Something_UDP.field16: 0
Something_UDP.field18: 1
Something_UDP.field19:1
Something_UDP.field20:1
Something_UDP.field22: 0
Something_UDP.field24: 0
The output I am looking for would be
1626806198.437893000,0,1,1,,1,1,1,,,,,
1626806198.440052000,,,,,,,,,1,0,,1,1,1,,0,0,,,,
That is if the frame contains one of the fields I am looking for it will output its value followed by a comma but if that field isn't there it will output a comma. One issue is that not every frame contains info that I am interested in and I don't want them to be outputted. Part of the issue with that is that one of the fields I need is epoch time and that will be in every frame but that is only important if the other fields are there. I could use awk or grep to do this but wondering if it can all be done inside tshark. The other issue is that the fields being requested will com from a text file and there may be fields in the text file that don't actually exist in the pcap file and if that happens I get a "tshark: Some fields aren't valid:" error.
In short I have 2 issues.
1: I need to print data only it the fields names match but not if the only match is epoch.
2: I need it to work even if one of the fields being requested doesn't exist.

I need to print data only it the fields names match but not if the only match is epoch.
Try using a display filter that mentions all the field names in which you're interested, with an "or" separating them, such s
-Y "Something_UDP.field1 or Something_UDP.field2 or Something_UDP.field3 or Something_UDP.field4 or Something_UDP.field5 or Something_UDP.field6 or Something_UDP.field15 or Something_UDP.field16 or Something_UDP.field18 or Something_UDP.field22"
so that only packets containing at least one of those fields will be processed.
I need it to work even if one of the fields being requested doesn't exist.
Then you will need to construct the command line on the fly, avoiding field names that aren't valid.
One way, in a script, to test whether a field is valid is to use the dftest command:
dftest Something_UDP.field1 >/dev/null 2>&1
will exit with a status of 0 if there's a field named "Something_UDP.field1" and will exit with a status of 2 if there isn't; if the scripting language you're using can check the exit status of a command to see if it succeeds, you can use that.

Related

Iterating over output of jq with a for loop in bash [duplicate]

This question already has answers here:
How can I loop over the output of a shell command?
(4 answers)
Closed 2 years ago.
Hello fellow developers !
I'm quite having a hard time on this one, so your help would be good :)
I'm creating a script shell that will periodically (thanks to cron) do a synchronisation between an offline mongodb to firebase.
I use jq to treat the export of my mongodb, but this one is acting strangely on strings values containing whitespaces.
For example, I've got this exported document from a mongodb collection named users (using --jsonArray as mongoexport option), file named mycollectionexported.json:
[{"_id":{"$oid":"5fc253c493b9f7363c8d1011"},"emailVerified":true,"disabled":false,"isKickstarter":false,"isPremiun":false,"gender":"Male","nationality":"None","type":"admin","optMarketing":false,"optNewsletter":false,"cguValidated":true,"providers":[],"email":"test#reflect.com","displayName":"John Doe","phoneNumber":"0644080050","homeAddress":{"street":"rue Pasteur","number":24,"zipcode":94270,"city":"Le Kremlin-BicĂȘtre","country":"France"},"familyId":{"$oid":"5fc253c493b9f7363c8d1010"},"createdAt":{"$date":"2020-11-28T13:42:28.51Z"},"updatedAt":{"$date":"2020-11-28T13:42:28.51Z"},"__v":0}]
You can see that for keys like "displayName", "homeAddress.street" or "homeAddress.city", I've got whitespaces in the values.
When I try to store my newly exported collection into a classic variable using jq, and display it on the stdout, jq is acting quite strangely.
Script executed:
#!/bin/bash
documents=$(jq -c ".[]" ./mycollectionexported.json)
for document in $documents; do
echo ""
echo $document
done
Output:
{"_id":{"$oid":"5fc253c493b9f7363c8d1011"},"emailVerified":true,"disabled":false,"isKickstarter":false,"isPremiun":false,"gender":"Male","nationality":"None","type":"admin","optMarketing":false,"optNewsletter":false,"cguValidated":true,"providers":[],"email":"test#reflect.com","displayName":"John
Doe","phoneNumber":"0644080050","homeAddress":{"street":"rue
Pasteur","number":24,"zipcode":94270,"city":"Le
Kremlin-BicĂȘtre","country":"France"},"familyId":{"$oid":"5fc253c493b9f7363c8d1010"},"createdAt":{"$date":"2020-11-28T13:42:28.51Z"},"updatedAt":{"$date":"2020-11-28T13:42:28.51Z"},"__v":0}
Expected (no newlines):
{"_id":{"$oid":"5fc253c493b9f7363c8d1011"},"emailVerified":true,"disabled":false,"isKickstarter":false,"isPremiun":false,"gender":"Male","nationality":"None","type":"admin","optMarketing":false,"optNewsletter":false,"cguValidated":true,"providers":[],"email":"test#reflect.com","displayName":"John Doe","phoneNumber":"0644080050","homeAddress":{"street":"rue Pasteur","number":24,"zipcode":94270,"city":"Le Kremlin-BicĂȘtre","country":"France"},"familyId":{"$oid":"5fc253c493b9f7363c8d1010"},"createdAt":{"$date":"2020-11-28T13:42:28.51Z"},"updatedAt":{"$date":"2020-11-28T13:42:28.51Z"},"__v":0}
How is that possible ?
There is no such informations about that in the jq documentation.
When I do it with a json file containing no whitespaces in string values, it works as expected...
Thank you for your help and futures answers :)
If the output from jq is a single line, why not just:
echo "$documents"
Or, if the result of the call to jq is a stream that might have more than one item, and if you want each item in the stream to be available as a bash variable, then if your bash is sufficiently up-to-date, you could use mapfile (aka readarray); otherwise, you could consider using a bash while loop, e.g. along the lines of:
declare -a document
while read -r item ; do
document+=("$item")
done < <(jq -c ....)
The idea here is that invoking jq with the -c option ensures that the only raw newlines that jq emits will be the ones demarking the items in the stream.

Is there any way to encode Multiple columns in a csv using base64 in Shell?

I have a requirement to replace multiple columns of a csv file with its base64 encoding value which should be applied to some columns of the file but keep the first line unaffected as the first line contains the header of the file. I have tried out for 1 column as below but as I have given it to proceed after skipping the first line of the file it is not
gawk 'BEGIN { FS="|"; OFS="|" } NR >=2 { cmd="echo "$4" | base64 -w 0";cmd | getline x;close(cmd); print $1,$2,$3,x}' awktest
o/p:
12|A|B|Qw==
13|C|D|RQ==
36|Z|V|VQ==
Qs: It is not showing the header in the output. What should I do to make produce the header in the output? Also can I use any loop here to replace multiple columns?
input:
10|A|B|C|5|T|R
12|A|B|C|6|eee|ff
13|C|D|E|9|dr|xrdd
36|Z|V|U|7|xc|xd
Required output:
10|A|B|C|5|T|R
12|A|B|encodedvalue|6|encodedvalue|ff
13|C|D|encodedvalue|9|encodedvalue|xrdd
36|Z|V|encodedvalue|7|encodedvalue|xd
Is this possible? Have researched a lot but could not find a proper explanation. I am new to shell. Kindly help. Many thanks!!!!
It looks like you can just sequence conditionals. This may not be the best way of solving the header issue, but it's intuitive.
BEGIN { FS="|"; OFS="|" } NR ==1 {print} NR >=2 { cmd="echo "$4" | base64 -w 0";cmd | getline x;close(cmd); print $1,$2,$3,x}
As for using a loop to affect multiple columns... Loops in bash are hard. Awk is technically its own language, and may have a looping construct of it's own, IDK. But it's not clear you need a loop. If there's only a reasonable number of fields that need modifying, you can just parameterize the existing command (somehow) by the field index, and then pipe through however many instances of it. It won't be as performant as doing it all in a single pass of awk, but that's probably ok.

Any way to filter email dynamically by taking 'from' and matching it with database ? (Using procmail or Virtualmin or Webmin)

I basically want to check the incoming 'From' in the email received and then either
Keep it and make it deliver to the intended mailbox if the email matches a Specified MySQL/PostgreSQL
Database User (eg. select email from users where exists ('from email address') )
If the 'From' address is blank or it is not found in the database, the email should be discarded
Any way I can achieve this before the e-mail is delivered to the intended mailbox?
I am using Procmail + Virtualmin + Webmin + PostgreSQL
PS: I want to apply this filter not to the wole server but to some specified mailboxes/users (i'm assuming 1 user = 1 mailbox here)
Procmail can easily run an external command in a condition and react to its exit status. How exactly to make your particular SQL client set its exit code will depend on which one you are using; perhaps its man page will reveal an option to make it exit with an error when a query produces an empty result set, for example? Or else write a shell wrapper to look for empty output.
A complication is that Procmail (or rather, the companion utility formail) can easily extract a string from e.g. the From: header; but you want to reduce this to just the email terminus. This is a common enough task that it's easy to find a canned solution - generate a reply and then extract the To: address (sic!) from that.
FROM=`formail -rtzxTo:`
:0
* FROM ?? ^(one#example\.com|two#site\.example\.net|third#example\.org)$
{
:0
* ? yoursql --no-headers --fail-if-empty-result \
--batch --query databasename \
--eval "select yada yada where address = '$FROM'"
{ }
:0E
/dev/null
}
The first condition examines the variable and succeeds if it contains one of the addresses (my original answer simply had the regex . which matches if the string contains at least one character, any character; I'm not convinced this is actually necessary or useful; there should be no way for From: to be empty). If it is true, Procmail enters the braces; if not, they will be skipped.
The first recipe inside the braces runs an external command and examines its exit code. I'm imagining your SQL client is called yoursql and that it has options to turn off human-friendly formatting (table headers etc) and for running a query directly from the command line on a specific database. We use double quotes so that the shell will interpolate the variable FROM before running this command (maybe there is a safer way to pass string variables which might contain SQL injection attempts with something like --variable from="$FROM" and then use that variable in the query? See below.)
If there is no option to directly set the exit code, but you can make sure standard output is completely empty in the case of no result, piping the command to grep -q . will produce the correct exit code. In a more complex case, maybe write a simple Awk script to identify an empty result set and set its exit status accordingly.
Scraping together information from
https://www.postgresql.org/docs/current/app-psql.html,
How do you use script variables in psql?,
Making an empty output from psql,
and from your question, I end up with the following attempt to implement this in psql; but as I don't have a Postgres instance to test with, or any information about your database schema, this is still approximate at best.
* ? psql --no-align --tuples-only --quiet \
--dbname=databasename --username=something --no-password \
--variable=from="$FROM" \
--command="select email from users where email = :'from'" \
| grep -q .
(We still can't use single quotes around the SQL query, to completely protect it from the shell, because Postgres insists on single quotes around :'from', and the shell offers no facility for embedding literal single quotes inside single quotes.)
The surrounding Procmail code should be reasonably self-explanatory, but here goes anyway. In the first recipe inside the braces, if the condition is true, the empty braces in its action line are a no-op; the E flag on the next recipe is a condition which is true only if any of the conditions on the previous recipe failed. This is a common idiom to avoid having to use a lot of negations; perhaps look up "de Morgan's law". The net result is that we discard the message by delivering it to /dev/null if either condition in the first recipe failed; and otherwise, we simply pass it through, and Procmail will eventually deliver it to its default destination.
The recipe was refactored in response to updates to your question; perhaps now it would make more sense to just negate the exit code from psql with a ! in front:
FROM=`formail -rtzxTo:`
:0
* FROM ?? ^(one#example\.com|two#site\.example\.net|third#example\.org)$
* ! ? psql --no-align --tuples-only --quiet \
--dbname=databasename --username=something --no-password \
--variable=from="$FROM" \
--command="select email from users where email = :'from'" \
| grep -q .
/dev/null
Tangentially, perhaps notice how Procmail's syntax exploits the fact that a leading ? or a doubled ?? are not valid in regular expressions. So the parser can unambiguously tell that these conditions are not regular expressions; they compare a variable to the regex after ??, or examine the exit status of an external command, respectively. There are a few other special conditions like this in Procmail; arguably, all of them are rather obscure.
Newcomers to shell scripting should also notice that each shell command pipeline has two distinct results: whatever is being printed on standard output, and, completely separate from that, an exit code which reveals whether or not the command completed successfully. (Conventionally, a zero exit status signals success, and anything else is an error. For example, the exit status from grep is 0 if it finds at least one match, 1 if it doesn't, and usually some other nonzero exit code if you passed in an invalid regular expression, or you don't have permission to read the input file, etc.)
For further details, perhaps see also http://www.iki.fi/era/procmail/ which has an old "mini-FAQ" which covers several of the topics here, and a "quick reference" for looking up details of the syntax.
I'm not familiar with Virtualmin but https://docs.virtualmin.com/Webmin/PostgreSQL_Database_Server shows how to set up Postgres and as per https://docs.virtualmin.com/Webmin/Procmail_Mail_Filter I guess you will want to use the option to put this code in an include file.

Variable not being recognized after "read"

-- Edit : Resolved. See answer.
Background:
I'm writing a shell that will perform some extra actions required on our system when someone resizes a database.
The shell is written in ksh (requirement), the OS is Solaris 5.10 .
The problem is with one of the checks, which verifies there's enough free space on the underlying OS.
Problem:
The check reads the df -k line for root, which is what I check in this step, and prints it to a file. I then "read" the contents into variables which I use in calculations.
Unfortunately, when I try to run an arithmetic operation on one of the variables, I get an error indicating it is null. And a debug output line I've placed after that line verifies that it is null... It lost it's value...
I've tried every method of doing this I could find online, they work when I run it manually, but not inside the shell file.
(* The file does have #!/usr/bin/ksh)
Code:
df -k | grep "rpool/ROOT" > dftest.out
RPOOL_NAME=""; declare -i TOTAL_SIZE=0; USED_SPACE=0; AVAILABLE_SPACE=0; AVAILABLE_PERCENT=0; RSIGN=""
read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN < dftest.out
\rm dftest.out
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=$TOTAL_SIZE/1024))
This is the result:
DBResize.sh[11]: TOTAL_SIZE=/1024: syntax error
I'm pulling hairs at this point, any help would be appreciated.
The code you posted cannot produce the output you posted. Most obviously, the error is signalled at line 11 but you posted fewer than 11 lines of code. The previous lines may matter. Always post complete code when you ask for help.
More concretely, the declare command doesn't exist in ksh, it's a bash thing. You can achieve the same result with typeset (declare is a bash equivalent to typeset, but not all options are the same). Either you're executing this script with bash, or there's another error message about declare, or you've defined some additional commands including declare which may change the behavior of this code.
None of this should have an impact on the particular problem that you're posting about, however. The variables created by read remain assigned until the end of the subshell, i.e. until the code hits a ), the end of a pipe (left-hand side of the pipe only in ksh), etc.
About the use of declare or typeset, note that you're only declaring TOTAL_SIZE as an integer. For the other variables, you're just assigning a value which happens to consist exclusively of digits. It doesn't matter for the code you posted, but it's probably not what you meant.
One thing that may be happening is that grep matches nothing, and therefore read reads an empty line. You should check for errors. Use set -e in scripts to exit at the first error. (There are cases where set -e doesn't catch errors, but it's a good start.)
Another thing that may be happening is that df is splitting its output onto multiple lines because the first column containing the filesystem name is too large. To prevent this splitting, pass the option -P.
Using a temporary file is fragile: the code may be executed in a read-only directory, another process may want to access the same file at the same time... Here a temporary file is useless. Just pipe directly into read. In ksh (unlike most other sh variants including bash), the right-hand side of a pipe runs in the main shell, so assignments to variables in the right-hand side of a pipe remain available in the following commands.
It doesn't matter in this particular script, but you can use a variable without $ in an arithmetic expression. Using $ substitutes a string which can have confusing results, e.g. a='1+2'; $((a*3)) expands to 7. Not using $ uses the numerical value (in ksh, a='1+2'; $((a*3)) expands to 9; in some sh implementations you get an error because a's value is not numeric).
#!/usr/bin/ksh
set -e
typeset -i TOTAL_SIZE=0 USED_SPACE=0 AVAILABLE_SPACE=0 AVAILABLE_PERCENT=0
df -Pk | grep "rpool/ROOT" | read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=TOTAL_SIZE/1024))
Strange...when I get rid of your "declare" line, your original code seems to work perfectly well (at least with ksh on Linux)
The code :
#!/bin/ksh
df -k | grep "/home" > dftest.out
read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN < dftest.out
\rm dftest.out
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=$TOTAL_SIZE/1024))
print $TOTAL_SIZE
The result :
32962416 5732492 25552588 19% /home
5598
Which are the value a simple df -k is returning. The variables seem to last.
For those interested, I have figured out that it is not possible to use "read" the way I was using it.
The variable values assigned by "read" simply "do not last".
To remedy this, I have applied the less than ideal solution of using the standard "while read" format, and inside the loop, echo selected variables into a variable file.
Once said file was created, I just "loaded" it.
(pseudo code:)
LOOP START
echo "VAR_A="$VAR_A"; VAR_B="$VAR_B";" > somefile.out
LOOP END
. somefile.out

emulate SAS' datastep statement FIRST using linux command line tools

Let's say I have the first column of the following dataset in a file and I want to emulate the flag in the second column so I export only that row tied to a flag = 1 (dataset is pre-sorted by the target column):
1 1
1 0
1 0
2 1
2 0
2 0
I could run awk 'NR==1 {print; next} seen[$1]++ {print}' dataset but would run into a problem for very large files (seen keeps growing). Is there an alternative to handle this without tracking every single unique value of the target column (here column #1)? Thanks.
So you only have the first column? And would like to generate the second? I think a slightly different awk command could work
awk '{if (last==$1) {flag=0} else {last=$1; flag=1}; print $0,flag}' file.txt
Basically you just check if the first field matches the last one you've seen. Since it's sorted, you don't have to keep track of everything you've seen, only the last one to know if the value is different.
Seems like grep would be fine for this:
$ grep " 1" dataset