How to pull partition value from HDFS path using sed/grep/awk when partition value is dynamic - sed

I am trying to save partition value from hdfs path to a file for different tables
Tried using sed to pull last 8 digits but since partition value changes (sometimes it may be YYYYMMDD sometimes YYYYMM) trying to see if we can grep data_dt from hdfs path instead of using sed.
Code used
hadoop fs -ls <hdfs_path> | sort -k6,7 | tail -2 > partition_info.txt
partitions=$(sed -e 's,.*\(.\{8\}\)$,\1,' partition_info.txt)
echo $partitions > partition_tables.txt
Desired Output example
20200531
202005
202004
20200601
Hadoop fs -ls output looks like this
drwxr-xr-x - kmedgel kmedgego 0 2020-05-30 09:33 /km/gold/edge_gold/otsd_cmpl/data_dt=20200530
drwxr-xr-x - kmedgel kmedgego 0 2020-05-31 09:33 /km/gold/edge_gold/otsd_cmpl/data_dt=20200531
drwxr-xr-x - kmedgel kmedgego 0 2020-06-01 09:34 /km/gold/edge_gold/otsd_cmpl/data_dt=20200601
drwxr-xr-x - kmedgel kmedgego 0 2020-06-02 09:34 /km/gold/edge_gold/otsd_cmpl/data_dt=20200602
drwxr-xr-x - kmedgel kmedgego 0 2020-06-03 09:55 /km/gold/edge_gold/otsd_cmpl/data_dt=20200603

Used while loop saying notNeed for the whole statement except for the field we are looking for i.e data_dt
Answer
while IFS="=" read -r notNeed data_dt
do
{
echo $data_dt
}
done

Related

How to reset VSCode extensions for a workspace

I was having issues with a workspace so I tried disabling all of extensions for it. Then the only option was to enable all extensions for the workspace. I have extensions globally configured to be off or on, but I'm not sure how to get a workspace to reset to the global extensions now?
I know I can open a new window up and start manually enable/disabling the extensions in my workspace to match the new fresh window. The problem is that it would then have its own workspace extensions defined, so if I toggled one at a global level it would still have an override.
I also tried deleting the .vscode folder in the workspace, but that doesn't seem to change the extensions for the workspace.
It's not a great method but I guess this works with a lot of manual leg work.
Enable all extensions for workspace
Open a new VSCode window
Goto extensions and filter by #disabled
You can also do #enabled and disable all extensions in the first step if you have more disabled than enabled.
Now you have a list you can target. Just click Disable instead of Disable for Workspace.
I believe this should get your workspace back to normal. There has to be an easier way though. You may also have to toggle enabled extensions to disabled and then back to enabled in case it has a workspace override for them -- I didn't check how it would behave if a new window disabled the extension whether or not it would keep staying on.
I've found a way to reset all settings for a given workspace.
First, navigate to ~/.config/Code/User/workspaceStorage. Inside there, you'll find a lot of folders with seemingly random names. Each folder seems to represent one workspace.
# cd ~/.config/Code/User/workspaceStorage
# ls -la
total 156
drwxr-xr-x 39 micael micael 4096 Feb 18 15:00 .
drwxr-xr-x 6 micael micael 4096 Nov 15 13:05 ..
drwxr-xr-x 2 micael micael 4096 Jan 18 16:09 0cf549e23d37c32c70a7e30998ade1fe
drwxr-xr-x 3 micael micael 4096 Dec 29 17:42 1b637cb30f6c3acc9273df20c84be7aa
drwxr-xr-x 2 micael micael 4096 Jan 17 22:09 2010f3fb6dbb2574f12a5ba614b3b136
drwxr-xr-x 2 micael micael 4096 Feb 18 15:01 a09a9ab934662794ac48730fc950a654
...
Inside each folder, there's a workspace.json file that contains the path of the folder your workspace was created from.
We can use grep to quickly search all those folders for a matching string:
grep -r 'myfolder' ., where myfolder should be the name of the folder your workspace is using.
In this example, my workspace was created from a folder called python-playground:
# grep -r 'python-playground' .
./a09a9ab934662794ac48730fc950a654/workspace.json: "folder": "file:///home/micael/projects/python-playground"
grep: ./a09a9ab934662794ac48730fc950a654/state.vscdb: binary file matches
grep: ./a09a9ab934662794ac48730fc950a654/state.vscdb.backup: binary file matches
In my case, the folder I'm looking for is ./a09a9ab934662794ac48730fc950a654.
You'll likely get a few results, all inside the same folder. Let's first take a look inside the folder:
# ls -la a09a9ab934662794ac48730fc950a654
total 64
drwxr-xr-x 2 micael micael 4096 Feb 18 15:01 .
drwxr-xr-x 39 micael micael 4096 Feb 18 15:00 ..
-rw-r--r-- 1 micael micael 28672 Feb 18 15:01 state.vscdb
-rw-r--r-- 1 micael micael 24576 Feb 18 15:01 state.vscdb.backup
-rw-r--r-- 1 micael micael 64 Feb 18 15:00 workspace.json
# cat a09a9ab934662794ac48730fc950a654/workspace.json
{
"folder": "file:///home/micael/projects/python-playground"
}%
Now, we can either remove this folder or move it elsewhere. Make sure to close VSCode before doing so.
# rm -rf a09a9ab934662794ac48730fc950a654

How to retain the file attributes uid and gid when copy the file from remote using Net::SFTP::Foreign module?

I am using Net::SFTP::Foreign module to copy the files from remote to some source machine and after copy operation, able to retain the file creation time and permissions except the gid and uid.
After copy operation I have below attributes:
root#system # ls -n
/dest/files -rw-r--r-- 1 0 0 4424 Jun 10 04:45 /dest/files/file.txt
While at source the attributes are:
root#source # ls -n
/source/files -rw-r--r-- 1 1001 1002 4424 Jun 10 04:45 /source/files/file.txt
I used below code for SFTP operation:
my $sftp = Net::SFTP::Foreign->new(
host => $host
);
$sftp->get( $file, $dest, copy_perm => 1)
I have not found any option in documentation of Net::SFTP::Foreign for retaining the uid and gid.
Is anybody has any idea?

Perl script that checks rotated log files

There is a web server running multiple websites with configured daily log files rotation.
Task: create a perl script that checks a list of current log files in source directory and compare it with a list of rotated log files in another dir. Script must print a name of log file, if one the yesterday's log was not rotated.
Source dir example:
ls -l /var/log/httpd/logs/*log
-rw-r--r-- 1 root root 0 May 20 00:01 /var/log/httpd/logs/access.log
-rw-r--r-- 1 root root 483652 May 20 12:54 /var/log/httpd/logs/othersite.com_80-access.log
-rw-r--r-- 1 root root 305 May 20 11:51 /var/log/httpd/logs/othersite.com_80-error.log
-rw-r--r-- 1 root root 0 May 20 00:01 /var/log/httpd/logs/error.log
-rw-r--r-- 1 root root 46222 May 20 12:45 /var/log/httpd/logs/www.site.com_8880-access.log
-rw-r--r-- 1 root root 0 May 20 00:01 /var/log/httpd/logs/www.site.com_8880-error.log
dir with a rotated logs:
ls -l /var/log/httpd/logs/completed/|grep 2014-05-19
-rw-r--r-- 1 root root 20 May 20 00:01 access.log.2014-05-19.gz
-rw-r--r-- 1 root root 107244 May 20 00:01 othersite.com_80-access.log.2014-05-19.gz
-rw-r--r-- 1 root root 9991 May 20 00:01 www.site.com_8880-access.log.2014-05-19.gz
-rw-r--r-- 1 root root 20 May 20 00:01 www.site.com_8880-error.log.2014-05-19.gz
In this case two yesterday's log files are absent\were not rotated:
-rw-r--r-- 1 root root 483652 May 20 12:54 /var/log/httpd/logs/othersite.com_80-access.log
-rw-r--r-- 1 root root 305 May 20 11:51 /var/log/httpd/logs/othersite.com_80-error.log
Looking forward to any suggestions!
Do a glob <> on the source directory and put in a hash key. You put the size of the log or date in the value of the hash. Then go to the destination directory and do the same thing - read the files and as you loop, check to see if you have the file in you hash. You can compare the size and date also. if you don't have it, copy it.

top CPU consumers using ps command

What is the difference between two commands, pls help to explain it.
ps -ef|sort +6|tail
oracle 55676 1 0 03:06:16 - 0:36 oracleprod (LOCAL=NO)
oracle 24876 1 0 02:52:56 - 0:40 oracleprod (LOCAL=NO)
oracle 41616 1 0 07:00:59 - 0:44 oracleprod (LOCAL=NO)
oracle 43460 1 0 02:45:05 - 0:53 oracleprod (LOCAL=NO)
oracle 25754 1 0 08:10:03 - 1:01 oracleprod (LOCAL=NO)
ps -ef|sort +5|tail
root 5440 2094 0 Nov 21 - 0:47 /usr/sbin/syslogd
root 9244 1 0 Nov 21 - 3:26 ./pcimapsvr.ip -D0
root 10782 1 0 Nov 21 - 4:41 ./pciconsvr.ip -D0
Why do both commands show different processes ? And if I keep on changing the value of 'sort +3' or reduce, the processes keeps on changing. What exactly is command all about ? Please help to explain.
You are sorting the wrong columns using both an obsolete syntax and a wrong method. No surprise random processes show up.
You'll get the top consumers that way:
ps -ef | sort -n -k8 | tail
-n means sort numerically
-k8 means sort the the eight column (cumulative execution time)

Sort files in dired by full path

I am using find-name-dired to find multiple instances of files that all have the same name (call it foo.txt) but in different directories. I want the files listed by alphabetical order of file path. However, they're listed in what looks like a random order. Neither dired-sort-menu nor dired-sort-chiesa will sort the output of find-name-dired, even though it will work on other dired buffers (whose format looks very similar). If I write the contents of the dired buffer to a file, I'm able to open a shell and submit the file to a sort command in the shell that uses the 9th field (the path) as a key. This produces output that looks right, but of course it's no longer a dired buffer.
Is there a way that I can
read in that externally sorted file and open it in dired "mode" (analogous to compilation mode),
sort the output of find-name-dired while still in dired mode, or
produce output from find-name-dired that's sorted the way I want from the beginning?
UPDATE:
Just to make things a bit more concrete, here's the current buffer:
/home/afrankel/Documents/emacs_test/:
find . \( -iname foo.txt \) -exec ls -ld \{\} \;
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 a/foo.txt
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 b/foo.txt
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 d/foo.txt
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 c/z/foo.txt
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 c/foo.txt
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 f/foo.txt
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 e/foo.txt
find finished at Fri Nov 30 17:00:41
Pressing "s" (which would sort most dired buffers) gives the error "Cannot sort this dired buffer".
I want the buffer to look like this:
/home/afrankel/Documents/emacs_test/:
find . \( -iname foo.txt \) -exec ls -ld \{\} \;
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 a/foo.txt
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 b/foo.txt
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 c/foo.txt
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 c/z/foo.txt
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 d/foo.txt
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 e/foo.txt
-rw-r--r-- 1 afrankel users 4 Nov 30 16:59 f/foo.txt
find finished at Fri Nov 30 17:00:41
When you type s in a "normal" Dired buffer, Dired doesn't actually sort the buffer. What it does is to change the value of dired-actual-switches so that it does (or doesn't) contain the -t option ("sort by modification time") and then call revert-buffer which re-runs ls with the new options. This obviously doesn't work in a Dired buffer produced by running find.
What you need to do instead is to arrange to run find with the -s option:
-s Cause find to traverse the file hierarchies in lexicographical
order, i.e., alphabetical order within each directory.
which you can do (for all find-dired commands) by evaluating
(setq find-program "find -s")
Okay, I figured out how to do it using defadvice to automatically change the value of find-ls-option while I'm executing my new wrapper function (find-name-dired-sorted) and then to change it back to its original value.
(defadvice find-name-dired (around find-name-dired-around)
"Advice: Sort output by path name."
(let ((find-ls-option (list "-exec ls -ld {} \\; |sort --key=9")))
ad-do-it))
(defun find-name-dired-sorted (dir pattern)
"Sort the output of find-name-dired by path name."
(interactive
"DFind-name (directory): \nsFind-name (filename wildcard): ")
(ad-activate 'find-name-dired)
(find-name-dired dir pattern)
(ad-deactivate 'find-name-dired))
Here's one way to do it manually via a temporary change to the configuration:
Run M-x customize-group find-dired.
Change the contents of the field "Find Ls Option" . It should initially read "-exec ls -ld {} \;". Append text to make it read "-exec ls -ld {} \; |sort --key=9". (In other words, sort by field 9, which is the full path treated as a single string.)
Set the option for the current session only.
UPDATE: It's better to do use defadvice, as I did in my other (later) answer.