Check for Anti-Virus Artifacting with cksum - diff

I'm looking for a good way to see if Avira Anti-virus (www.avira.com) is leaving any traces behind after a scan. I am working in an environment where it is critical that nothing be modified, and that the box is disconnected from the network as per user specifications. The concept was to use cksum to monitor all files on a box, then pipe output to a text file, and diff the pre- and post-Avira cksums.
I have tried:
$ find . | xargs cksum | sort > cksum_A.txt
And
$ find . \! -type p -exec cksum {} \; > cksum_A.txt
I deleted all temporary and permanent instances of the cksum_A.txt and cksum_B.txt from both files, as it would certainly be used as a difference.
In multiple cases without running the anti-virus in-between, './.local/share/gvfs-metadata...' and './.gconf/apps/nautilus...' were found to have been modified according to diff.
The question is, is there a better way to identify artifacting on a bit-level? Or just disregard these files and move on?
Thanks!
Mason

You probably want to run from single-user mode, or at least switch off the GUI, to run the scan, as GUI applications and daemons may well write files in between.

Related

xdg-open multiple files without extension

I want to open multiple files with xdg-open with the following codes
me#host:~/Downloads$ find . -type f -iregex "./[^.]*"
./3ed090f2dde306e5e9f7200f1022a2c3
./ebd9863a73a5ef22344550a650d169a1
./edbdb765d87586fda75c4287a1e9ea1e
./d9e39bfe0a907ffb580a975d8c8719d2
./2b9cc942c04a8063bd8d4d8fd98814d9
./f5938dd24367ffaf766ef99928660786
./a51accbbf14c8a05cb82caa7d8bec0c6
./0820fb50b412f8e40f63b3bea12e9fb5
./53ef22110569d46b445a1e908a7ae88f
./61ee21f83a33b91674926daf70c34947
Try to open them
me#host:~/Downloads$ find . -type f -iregex "./[^.]*" | xargs xdg-open
xdg-open: unexpected argument './ebd9863a73a5ef22344550a650d169a1'
Try 'xdg-open --help' for more information.
me#host:~/Downloads$ find . -type f -iregex "./[^.]*" -print0| xargs -0 xdg-open
xdg-open: unexpected argument './ebd9863a73a5ef22344550a650d169a1'
Try 'xdg-open --help' for more information.
What's the problem with my usage of xdg-open?
Your problem is that xdg-open does not accept more than one argument, meaning that you can open only one file with it. This seems to be by design, as there are different underlying commands for opening files in different distros, and some of them accept only one argument.
If you are designing distribution-specific script, then you might want to try to find out what command xdg-open invokes. In Ubuntu MATE 16.04 it is gvfs-open, which in turn accepts multiple arguments. I found out this by feeding malformed filepath to xdg-open, as I (yet again) tried to open two files with it. Malformation I used was simply just two files with their paths, separated by comma, no spaces. This was accepted by xdg-open, but gvfs-open complained in return, exposing itself.
If you are designing distribution-independent script, then you may want to look for a solution from here: https://askubuntu.com/questions/356650/how-to-open-multiple-files-with-the-default-program-from-terminal/

Sorting and removing duplicates from single or multiple large files

I have a 70gb file with 400 million+ lines (JSON). My end goal is to remove duplicates lines so i have a fully "de-duped" version of the file. I am doing this on a machine with 8cores and 64gb ram.
I am also expanding on this thread, 'how to sort out duplicates from a massive list'.
Things i have tried:
Neek - javascript quickly runs out of memory
Using Awk (doesn't seem to work for this)
using Perl (perl -ne 'print unless $dup{$_}++;') - again, runs out of memory
sort -u largefile > targetfile
does not seem to work. I think the file is too large.
Current approach:
Split the files into chunks of 5million lines each.
Sort/Uniq each of the files
for X in *; do sort -u --parallel=6 $X > sorted/s-$X; done
Now I have 80 individually sorted files. I am trying to re-merge/de-dupe them them using sort -m. This seems to do nothing as the file/line size ends up being the same.
Since sort -m does not seem to work, i am currently trying this:
cat *.json | sort > big-sorted.json
then I will try to run uniq with
uniq big-sorted.json > unique-sorted.json
Based on past experience, I do not believe this will work.
What is the best approach here? How do i re-merge the files and remove any duplicate lines at this point.
Update 1
As I suspected, cat * | sort > bigfile did not work. It just copied everything to a single file the way it was previously sorted (in individual files).
Update 2:
I also tried the following code:
cat *.json | sort --parallel=6 -m > big-sorted.json
The result was the same as the previous update.
I am fresh out of ideas.
Thanks!
After some trial and error, i found the solution:
sort -us -o out.json infile.json

How to read a file without checking out in perforce

I'm writing a syntax check tool to parse several files on different branches.
Is there a way for me to read the contents without checking out the file?
The tool is written in Perl.
`p4 print //depot/path/to/file`;
(Usual requirements for running a p4 command apply -- make sure the p4 executable is in your PATH, make sure you're authenticated with p4 login, make sure you're connecting to the right server, etc.)
See p4 help print for more info on the print command -- you might find the -q and/or -o flags helpful depending on what exactly you need to do with the output.

Why does grep hang when run against the / directory?

My question is in two parts :
1) Why does grep hang when I grep all files under "/" ?
for example :
grep -r 'h' ./
(note : right before the hang/crash, I note that I see some "no such device or address" messages , regarding sockets....
Of course, I know that grep shouldn't run against a socket, but I would think that since sockets are just files in Unix, it should return a negative result, rather than crashing.
2) Now, my follow up question : In any case -- how can I grep the whole filesystem? Are there certain *NIX directories which we should leave out when doing this ? In particular, I'm looking for all recently written log files.
As #ninjalj said, if you don't use -D skip, grep will try to read all your device files, socket files, and FIFO files. In particular, on a Linux system (and many Unix systems), it will try to read /dev/zero, which appears to be infinitely long.
You'll be waiting for a while.
If you're looking for a system log, starting from /var/log is probably the best approach.
If you're looking for something that really could be anywhere in your file system, you can do something like this:
find / -xdev -type f -print0 | xargs -0 grep -H pattern
The -xdev argument to find tells it to stay within a single filesystem; this will avoid /proc and /dev (as well as any mounted filesystems). -type f limits the search to ordinary files. -print0 prints the file names separated by null characters rather than newlines; this avoid problems with files having spaces or other funny characters in their names.
xargs reads a list of file names (or anything else) on its standard input and invokes the specified command on everything in the list. The -0 option works with find's -print0.
The -H option to grep tells it to prefix each match with the file name. By default, grep does this only if there are two or more file names on its command line. Since xargs splits its arguments into batches, it's possible that the last batch will have just one file, which would give you inconsistent results.
Consider using find ... -name '*.log' to limit the search to files with names ending in .log (assuming your log files have such names), and/or using grep -I ... to skip binary files.
Note that all this depends on GNU-specific features. Some of these options might not be available on MacOS (which is based on BSD) or on other Unix systems. Consult your local documentation, and consider installing GNU findutils (for find and xargs) and/or GNU grep.
Before trying any of this, use df to see just how big your root filesystem is. Mine is currently 268 gigabytes; searching all of it would probably take several hours. A few minutes spent (a) restricting the files you search and (b) making sure the command is correct will be well worth the time you spend.
By default, grep tries to read every file. Use -D skip to skip device files, socket files and FIFO files.
If you keep seeing error messages, then grep is not hanging. Keep iotop open in a second window to see how hard your system is working to pull all the contents off its storage media into main memory, piece by piece. This operation should be slow, or you have a very barebones system.
Now, my follow up question : In any case -- how can I grep the whole filesystem? Are there certain *NIX directories which we should leave out when doing this ? In particular, Im looking for all recently written log files.
Grepping the whole FS is very rarely a good idea. Try grepping the directory where the log files should have been written; likely /var/log. Even better, if you know anything about the names of the files you're looking for (say, they have the extension .log), then do a find or locate and grep the files reported by those programs.

How to find untracked files in a Perforce tree? (analogue of svn status)

Anybody have a script or alias to find untracked (really: unadded) files in a Perforce tree?
EDIT: I updated the accepted answer on this one since it looks like P4V added support for this in the January 2009 release.
EDIT: Please use p4 status now. There is no need for jumping through hoops anymore. See #ColonelPanic's answer.
In the Jan 2009 version of P4V, you can right-click on any folder in your workspace tree and click "reconcile offline work..."
This will do a little processing then bring up a split-tree view of files that are not checked out but have differences from the depot version, or not checked in at all. There may even be a few other categories it brings up.
You can right-click on files in this view and check them out, add them, or even revert them.
It's a very handy tool that's saved my ass a few times.
EDIT: ah the question asked about scripts specifically, but I'll leave this answer here just in case.
On linux, or if you have gnu-tools installed on windows:
find . -type f -print0 | xargs -0 p4 fstat >/dev/null
This will show an error message for every unaccounted file. If you want to capture that output:
find . -type f -print0 | xargs -0 p4 fstat >/dev/null 2>mylogfile
Under Unix:
find -type f ! -name '*~' -print0| xargs -0 p4 fstat 2>&1|awk '/no such file/{print $1}'
This will print out a list of files that are not added in your client or the Perforce depot. I've used ! -name '*~' to exclude files ending with ~.
Ahh, one of the Perforce classics :) Yes, it really sucks that there is STILL no easy way for this built into the default commands.
The easiest way is to run a command to find all files under your clients root, and then attempt to add them to the depot. You'll end up with a changelist of all new files and existing files are ignored.
E.g dir /s /b /A-D | p4 -x - add
(use 'find . -type f -print' from a nix command line).
If you want a physical list (in the console or file) then you can pipe on the results of a diff (or add if you also want them in a changelist).
If you're running this within P4Win you can use $r to substitute the client root of the current workspace.
Is there an analogue of svn status or git status?
Yes, BUT.
As of Perforce version 2012.1, there's the command p4 status and in P4V 'reconcile offline work'. However, they're both very slow. To exclude irrelevant files you'll need to write a p4ignore.txt file per https://stackoverflow.com/a/13126496/284795
2021-07-16: THIS ANSWER MAY BE OBSOLETE.
I am reasonably sure that it was accurate in 2016, for whatever version of Perforce I was using them (which was not necessarily the most current). But it seems that this problem or design limitation has been remedied in subsequent releases of Perforce. I do not know what the stack overflow etiquette for this is -- should this answer be removed?
2016 ANSWER
I feel impelled to add an answer, since the accepted answer, and some of the others, have what I think is a significant problem: they do not understand the difference between a read-only query command, and a command that makes changes.
I don't expect any credit for this answer, but I hope that it will help others avoid wasting time and making mistakes by following the accepted but IMHO incorrect answer.
---+ BRIEF
Probably the most convenient way to find all untracked files in a perforce workspace is p4 reconcile -na.
-a says "give me files that are not in the repository, i.e. that should be added".
-n says "make no changes" - i.e. a dry-run. (Although the messages may say "opened for add", mentally you must interpret that as "would be opened for add if not -n")
Probably the most convenient way to find all local changes made while offline - not just files that might need to be added, but also files that might need to be deleted, or which have been changed without being opened for editing via p4 edit, is p4 reconcile -n.
Several answers provided scripts, often involving p4 fstat. While I have not verified all of those scripts, I often use similar scripts to make up for the deficiencies of perforce commands such as p4 reconcile -n - e.g. often I find that I want local paths rather than Perforce depot paths or workspace paths.
---+ WARNING
p4 status is NOT the counterpart to the status commands on other version control systems.
p4 status is NOT a read-only query. p4 status actually finds the same sort of changes that p4 reconcile does, and adds them to the repository. p4 status does not seem to have a -n dry-run option like p4 reconcile does.
If you do p4 status, look at the files and think "Oh, I don't need those", then you will have to p4 revert them if you want to continue editing in the same workspace. Or else the changes that p4 status added to your changeset will be checked in the next time.
There seems to be little or no reason to use p4 status rather than p4 reconcile -n, except for some details of local workspace vs depot pathname.
I can only imagine that whoever chose 'status' for a non-read-only command had limited command of the English language and other version control tools.
---+ P4V GUI
In the GUI p4v, the reconcile command finds local changes that may need to be added, deleted, or opened for editing. Fortunately it does not add them to a changelist by default; but you still may want to be careful to close the reconcile window after inspecting it, if you don't want to commit the changes.
Alternatively from P4Win, use the ""Local Files not in Depot" option on the left hand view panel.
I don't use P4V much, but I think the equivalent is to select "Hide Local Workspace Files" in the filter dropdown of the Workspace view tab.p4 help fstat
In P4V 2015.1 you'll find these options under the filter button like this:
I use the following in my tool that backs up any files in the workspace that differ from the repository (for Windows). It handles some odd cases that Perforce doesn't like much, like embedded blanks, stars, percents, and hashmarks:
dir /S /B /A-D | sed -e "s/%/%25/g" -e "s/#/%40/g" -e "s/#/%23/g" -e "s/\*/%2A/g" | p4 -x- have 1>NUL:
"dir /S /B /A-D" lists all files at or below this folder (/S) in "bare" format (/B) excluding directories (/A-D). The "sed" changes dangerous characters to their "%xx" form (a la HTML), and the "p4 have" command checks this list ("-x-") against the server discarding anything about files it actually locates in the repository ("1>NUL:"). The result is a bunch of lines like:
Z:\No_Backup\Workspaces\full\depot\Projects\Archerfish\Portal\Main\admin\html\images\nav\navxx_background.gif - file(s) not on client.
Et voilĂ !
Quick 'n Dirty: In p4v right-click on the folder in question and add all files underneath it to a new changelist. The changelist will now contain all files which are not currently part of the depot.
The following commands produce status-like output, but none is quite equivalent to svn status or git status, providing a one-line summary of the status of each file:
p4 status
p4 opened
p4 diff -ds
I don't have enough reputation points to comment, but Ross' solution also lists files that are open for add. You probably do not want to use his answer to clean your workspace.
The following uses p4 fstat (thanks Mark Harrison) instead of p4 have, and lists the files that aren't in the depot and aren't open for add.
dir /S /B /A-D | sed -e "s/%/%25/g" -e "s/#/%40/g" -e "s/#/%23/g" -e "s/\*/%2A/g" | p4 -x- fstat 2>&1 | sed -n -e "s/ - no such file[(]s[)]\.$//gp"
===Jac
Fast method, but little orthodox. If the codebase doesn't add new files / change view too often, you could create a local 'git' repository out of your checkout. From a clean perforce sync, git init, add and commit all files locally. Git status is fast and will show files not previously committed.
The p4 fstat command lets you test if a file exists in the workspace, combine with find to locate files to check as in the following Perl example:
// throw the output of p4 fstat to a 'output file'
// find:
// -type f :- only look at files,
// -print0 :- terminate strings with \0s to support filenames with spaces
// xargs:
// Groups its input into command lines,
// -0 :- read input strings terminated with \0s
// p4:
// fstat :- fetch workspace stat on files
my $status=system "(find . -type f -print0 | xargs -0 p4 fstat > /dev/null) >& $outputFile";
// read output file
open F1, $outputFile or die "$!\n";
// iterate over all the lines in F1
while (<F1>) {
// remove trailing whitespace
chomp $_;
// grep lines which has 'no such file' or 'not in client'
if($_ =~ m/no such file/ || $_ =~ m/not in client/){
// Remove the content after '-'
$_=~ s/-\s.*//g;
// below line is optional. Check ur output file for more clarity.
$_=~ s/^.\///g;
print "$_\n";
}
}
close F1;
Or you can use p4 reconcile -n -m ...
If it is 'opened for delete' then it has been removed from the workspace. Note that the above command is running in preview mode (-n).
I needed something that would work in either Linux, Mac or Windows. So I wrote a Python script for it. The basic idea is to iterate through files and execute p4 fstat on each. (of course ignoring dependencies and tmp folders)
You can find it here: https://gist.github.com/givanse/8c69f55f8243733702cf7bcb0e9290a9
This command can give you a list of files that needs to be added, edited or removed:
p4 status -aed ...
you can use them separately too
p4 status -a ...
p4 status -e ...
p4 status -d ...
In P4V, under the "View" menu item choose "Files in Folder" which brings up a new tab in the right pane.
To the far right of the tabs there is a little icon that brings up a window called "Files in Folder" with 2 icons.
Select the left icon that looks like a funnel and you will see several options. Choose "Show items not in Depot" and all the files in the folder will show up.
Then just right-click on the file you want to add and choose "Mark for Add...". You can verify it is there in the "Pending" tab.
Just submit as normal (Ctrl+S).