deleting unnecessary characters from txt document - character

I am facing a bit of a problem, i am not good at programming yet.
i have a text that looks like this:
D28151373 15-04 040 028230457 01-01 015 D28250305 01-08 048 D28250661 03-01 032 028151376 12-01 057 028230460 01-01 001 D28250305 01-09 049 D28250663 03-01 025 028151377 12-01 057 028230462 01-01 014
just like a million times longer.
What I need to do is to delete the first character and then keep the next 11 characters (including spaces) and the delete the next 9 characters, keep 11 characters, delete 9 characters, and on and on and on...
There must be a simple way to make a script do this automaticity but I simply can't figure out how. (BTW I am good at understanding the code but I am lost when i have to start myself) and what is the best program to do this simple task in, I was thinking about Notepad++ or C++.

Ctrl+H
Find what: .(.{11}).{8}
Replace with: $1 <-- there is a space after $1
Replace all
Explanation:
. : 1 character
(.{11}) : group 1, 11 characters
.{8} : 8 characters
Replacement:
$1 : group 1 and a space
Result for given example:
28151373 15 28230457 01 28250305 01 28250661 03 28151376 12 28230460 01 28250305 01 28250663 03 28151377 12 28230462 01

You may also use a Notepad++ macro:
Macro → Start Recording
Del Ctrl+→ → → Del Del Del Del Del Del Del →
Macro → Stop Recording
Macro → Run a Macro Multiple Times...

Related

Polling a constantly updating text file to cut selected information

I have a log tail running in real time and saving to a text file using the 'script' command named 'test.txt' at /home/pi/ also in real time. Now I want to set up a process that constantly polls that text file for changes and cuts out a specific reoccurring bit of data. For example, a section of the log would look like:
Feb 9 11:43:24 dnsmasq[887]: query[A] captive.g.aaplimg.com from 192.168.178.21
Feb 9 11:43:24 dnsmasq[887]: forwarded captive.g.aaplimg.com to 8.8.4.4
Feb 9 11:43:24 dnsmasq[887]: reply captive.g.aaplimg.com is 17.253.55.202
Feb 9 11:43:24 dnsmasq[887]: reply captive.g.aaplimg.com is 17.253.57.211
Feb 9 11:43:54 dnsmasq[887]: query[A] captive.g.aaplimg.com from 192.168.178.21
And I want to cut info only from the lines with query[A] (assuming that could be used as a marker) so that the output text looks like:
11:43 captive.g.aaplimg.com
But the problem is that there are different URL's attached to this line of the log, so for example a line with 'query[A]' could also look like:
Feb 9 11:49:56 dnsmasq[887]: query[A] www.googleapis.com from 192.168.178.21
Then I would want the output to be:
11:49 www.googleapis.com
But it needs to happen in real-time, as the text file/log is updating because I want this text file to be constantly polled and sent to a printer also in real time (a long story)
I have been looking at awk + sed to cut out the info I need, but they're new to me so I find the format a bit confusing, and i find it especially hard to figure out how to run it so it happens in real time.
Running on debian buster on pi.
Would love some help! Thanks
I assume you're looking for something like this:
tail -f my.log | perl -nle 'print"$1$2" if /(\d\d:\d\d):\d\d.*query\[A\]( \S+)/' > test.txt
The -f constantly outputs the last lines as the file my.log grows. It feeds the lines into the little perl one-liner program which looks for query[A] (escaping the [ and ] chars with \ since they otherwise have special meaning in regexpes) and when found outputs the time in hours and minutes and the domain name captured by the regexp into $1 and $2.

Can't get the written file content in q?

I've copy the exact example in q for mortals as follows:
q)h:hopen `:D:/q4m/raw
q)h[42]
548i
q)h 10 20 30
548i
q)hclose h
q)get `:D:/q4m/raw
'D:/q4m/raw
[0] get `:D:/q4m/raw
Look into the directory, the file was created there. Why can't I get it?
Instead, if I do:
q)h:hopen `:D:/q4m/L
q)h[42]
628i
q)h[10 20 30]
628i
q)hclose h
q)get `:D:/q4m/L
0 1 2 3 4 42 10 20 30
Things get normal, why?
After testing the given code I believe your issue may be in how you intialise the file.
I assume in the code that works that you use some variation of
`:D:/q4m/L set til 5
before.
However this is not done for
`:D:/q4m/raw
If you were to use
`:D:/q4m/raw set til 5
or alternatively
.[`:D:/q4m/raw;();:;()]
beforehand then the first set of code will work.
Additionally, if we look at the binary using
read1 `:D:/q4m/raw
and
read1 `:D:/q4m/L
and the output does not include 07 near the beginning then it is not being recognised as a proper kdb list. That is, hopen simply appends to the binary file instead of amending it. (If you notice the 05 byte that indicates length of the list, this doesn't increase when you add via the handle).
eg.
The first method you get
q)read1 `:D:/q4m/raw
0x2a000000000000000a0000000000000014000000000000001e00000000000000
which dosen't really mean anything in q.
The second method gives
q)read1 `:D:/q4m/L
0xfe2007000000000005000000000000000000000000000000010000000000000002000000000..
which is a proper kdb list (notice the 07 which indicates type).
If you wish to instead just read in /q4m/raw then I suggest setting an empty list, hopen to that list and pass it `:D:/q4m/raw as follows
q)`:empty set 0#0
`:empty
q)h:hopen `:empty
q)h read1 `:D:/q4m/raw
3i
q)get `:empty
42 10 20 30
This will only work if all entries are the same type.

How to copy last 13 characters of a string?

In Notepad++ I have a list of entries and at the end of each entry is a phone number (with dashes, 12 characters total). How do I go about either removing all the text before the number or copy/cut the number from the end of the entry for multiple entries? Thanks!
i.e.
1 $1,300 Deposit $1,300 Available 12/15/16 2050 Hurricane Shoals 678-790-0986
2 7 $1,400 Deposit $1,400 Available 12/22/16 1453 Alamein Dr  404-294-6441
3 $1,500 - $1,590 Not Income Based  /  Deposit $1,500 - $1,590 678-328-7952
Here is a way:
Ctrl+H
Find what: ^.*([\d-]{12})$
Replace with: $1
Replace all

finding matching pattern, performing calculations and shifting the columns of file

I need help to format below file with some calculation for a particular row having some pattern
hnt 1 454 454
gft 10 8844 8853
step 2 23 24
str 10 Check sum(00244-00240) 420 434
dert 03 14 16
ghh 33 Check sum(12366-12361) 8008 8046
I need to have four column file by performing subtraction for the row having text "check sum".
I wish to remove the text "Check sum" and then subtract the numbers given in ( ). For e.g. (00244-00240) will be subtracted and will be having value '4' and this '4' will be added to left column which has the value '10'.
so now this value will become '14'. After this calculation the other value on that row will shift left column wise. Thus making four columns table instead of six columns
The desired output is
hnt 1 454 454
gft 10 8844 8853
step 2 23 24
str 14 420 434
dert 03 14 16
ghh 38 8008 8046
I am new to shell scrip and appreciate your help to get above desired output using awk or sed or both. I am also ok if this can be achieved without using this awk and sed and by using other command in unix shell script
Perl can do the parsing and the addition:
perl -pe 's/(\d+)\s+Check sum\((\d+)([+-]\d+)\)/$1+$2+$3/e' file
hnt 1 454 454
gft 10 8844 8853
step 2 23 24
str 14 420 434
dert 03 14 16
ghh 38 8008 8046
If you want the output to look prettier, pipe the result through column -t
You can try this one liner:
sed "s/Check//;s/sum//;s/(//;s/)//" filename|awk '/-/{sub(/-/," ");for (i=1;i<=NF;i++);{calc=$3-$4;$2=$2+calc;$3="";$4=""}}1'
Note:
This answer does not retain the original white-spaces for the rows having calculation performed

Using sed to copy data between two numerical patterns to a new file

I'm running a bunch (~320) computational chemistry experiments and I need to pull a small amount of the data out of each of the files so that I can do some work on it in MatLab.
I'm pretty sure I can use sed to make this work, but try as I might I don't seem to be able to do so.
I need all of the data starting at the line beginning with "1 1" and ending with the line starting with "33 33".
I J FI(I,J) k(I,J) K(I,J)
1 1 -337.13279 -0.06697 -0.00430
2 2 3804.89120 8.52972 0.54787
3 3 3195.69653 6.01702 0.38648
4 4 3189.18684 5.99253 0.38490
5 5 3183.73262 5.97205 0.38359
6 6 3174.47525 5.93737 0.38136
7 7 3167.88746 5.91275 0.37978
8 8 1628.80868 1.56311 0.10040
9 9 1623.56055 1.55306 0.09975
10 10 1518.21620 1.35806 0.08723
11 11 1476.93012 1.28520 0.08255
12 12 1341.24087 1.05990 0.06808
13 13 1312.30373 1.01466 0.06517
14 14 1264.73004 0.94242 0.06053
15 15 1185.62592 0.82822 0.05320
16 16 1175.54013 0.81419 0.05230
17 17 1170.41211 0.80710 0.05184
18 18 1090.20196 0.70027 0.04498
19 19 1039.29190 0.63639 0.04088
20 20 1015.00116 0.60699 0.03899
21 21 1005.05773 0.59516 0.03823
22 22 986.55965 0.57345 0.03683
23 23 917.65537 0.49615 0.03187
24 24 842.93089 0.41863 0.02689
25 25 819.00146 0.39520 0.02538
26 26 758.39720 0.33888 0.02177
27 27 697.11173 0.28632 0.01839
28 28 628.75684 0.23292 0.01496
29 29 534.75856 0.16849 0.01082
30 30 499.35579 0.14692 0.00944
31 31 422.01320 0.10493 0.00674
32 32 409.30255 0.09870 0.00634
33 33 227.12411 0.03039 0.00195
33 2nd derivatives larger than 0.371D-04 over 561
MatLab is not a fan of text, so I'd like to not use text delimiters (though there are some in the header of this data section) and keep the data contained to only the numeric lines.
The data files contain a lot of other numbers as well, so I need to match the occurrence of "1 1" at the start of the line and "33 33" as the end of the copy. These 'indices' exist only in this block of info.
I attempted to use
% sed -n /"1 1"/,/"33 33"/p input.file > output.file
But I get a WHOLE BUNCH of data in the output file as it copies everything that shows up between any "1" and "33"
Is there any way to do what I'm looking for?
Also, I'm using the tcsh as that is what my servers run.
How about using awk
awk '$1=="1"&&$2=="1"{t=1};t;$1=="33"&&$2=="33"{t=0}' file
Recommand by #mklement0, if there is only one block, to avoid processing the remainder of the file you can update the command to:
awk '$1=="1"&&$2=="1"{t=1};t;$1=="33"&&$2=="33"{exit}' file
Your problem is twofold. First, there are two blanks between the ones, but your regex only allows for one (judging from the now indented code). Second, you are probably not precise enough; the /1 1/ pattern matches 11 11, for example, and 111 111 and so on.
So, you should consider:
sed -n -e '/^ *1 *1 /,/^33 *33 /p' -e '/^33 33 /q' input.file > output.file
The patterns are anchored to the start of line by the ^ (caret). The numbers are separated by one or more blanks (there are other, longer-winded ways of writing that in standard sed; the + option is not standard sed but is widely available). And the numbers are terminated by a blank. The chances are that the first expression alone will give you what you want. The second expression terminates the search early when it recognizes the 33 33 input line, which can save a significant amount of file I/O and hence processing time if the input file is big enough.
If the lines with ID numbers in the hundreds have some different format, then it should be fairly straight-forward to tweak the regexes to match what is used. If the data contains tabs instead of (or as well as) blanks, you can tweak the regexes to manage that, too.
If you data is all formatted exactly the same as this file, then you can use sed to just read the 3rd through the 35th line (rows 1 1 - 33 33). This is a lot easier than parsing the values, but does require that the files have a standard format:
sed -n 3,35p data.txt
Another cheap way would be to grep for only numeric lines, and take only the first 33:
grep "^[0-9 ][0-9 .-]*$" data.txt | head -n 33