how i can seperate only mobile number from this expression in notepad ++ - notepad

103.199.145.50 - - [02/Jan/2020:04:42:34 +0530]
"GET /mspProducerM/sendSMS?user=userjesus&pwd=jcallshttp&sender=JCALLS&mobile9159525910&msg=Promise%20for%20Jan,%2002%20The%20Lord%20blesses%20His%20people%20with%20peace.-%20Psalm%2029:11.%20For%20Prayers%20and%20Queries%20call%20044-45999000&mt=0
HTTP/1.1" 200 42
like i need to extract text, 9159525910

You may try the following find and replace, in regex mode, with DOT ALL enabled:
Find: \b\d+\.\d+\.\d+\.\d+ - - \[[^]]+\].*?".*?\bmobile=(\d+).*?" \d+ \d+
Replace: $1
Demo
The approach here is to match the entire line of your input, and then capture the mobile number. The replacement is just the captured phone number.

Related

Regex in Tableau pattern

I have the following string.
p3266 -- MR2015 - Conversion Task 5 - TRF for PLM 2016-08-13.fixTrfWorkOrder -- TRF Header Work Order to WBS conversion 2016-08-25. LCI# PYF
Im trying to regex just the (LCI# PYF)
this is what i have so far.
REGEXP_EXTRACT([\n\r].[*LCI\s*]([^\n\r]*)
how every this string is not always the same. sometimes there is the space between the # XXX sometimes not. I mainly need just the 3 digit letters (PYF) to be the returned value.
You may try:
REGEXP_EXTRACT('p3266 -- MR2015 - Conversion Task 5 - TRF for PLM 2016-08-13.fixTrfWorkOrder -- TRF Header Work Order to WBS conversion 2016-08-25. LCI# PYF',
'LCI#\s*(\S+)')
This would match and capture the non whitespace term appearing after LCI#, which is followed by optional whitespace.
Demo

Regex for Apache

I would like to create a rule for Apache to block a massive logins according this type of log:
93.176.51.15 - - [21/Nov/2019:00:02:40 +0100] "GET /wordpress/wp-login.php HTTP/1.1" 200 5485
What's the exactly regex that I need? I use this:
^.+?:\d+ <HOST> -.*"(GET|POST|HEAD) .*/wp-login.php.*$
Thanks in advance
More "precise" regex would look like:
^<HOST> \S+ \S+ [^"]*"[A-Z]{3,15}\s+\S*/wp-login\.php\b
It is anchored (^.+ is not an anchor), does not have catch-all's (like .* but especially non-greedy like .+?), and covers all http-methods as well as from intruder side supplied user-name (if no auth expected and web server would log it instead of -).
And if you have fail2ban >= 0.10 use <ADDR> instead of <HOST> (more faster, safe and precise if only IPs are logged).

Postgres query to get particular text from given text

I have text like this in different rows in a column
xxxxxxxxxxx ab_88_2018 xxxxxx
ab_88_2018 xxxxxx
AB_88_2018 xxxxxx
ab_2018_88 XXXXXX
So I want only 88 out of the text into another column.
What can be the query?
Its not 88, but two numbers in that position
Is the 88 always a 2 digit number? If so, this is working for me for Postgres and Redshift and I believe gets you what you want:
SELECT
CASE
WHEN LOWER(column)
~ '.*[a-z]{2}\_[0-9]{2}\_[0-9]{4}.*' THEN SPLIT_PART(column,'_',2)
WHEN LOWER(column)
~ '[a-z]{2}\_[0-9]{4}\_[0-9]{2}.*' THEN LEFT(SPLIT_PART(column,'_',3),2)
END As get_two_digit_number
The ~ (tilde) is similar to LIKE but allows you to do pattern matching through regex. See regexr.com and paste your examples and the code between the '' to see what it's matching
SPLIT_PART is taking the string that matches the pattern, and then breaking it on a character of my choosing, here it's the '_'. The last number is which break to return
Using 'xxxxxxxxxxx ab_88_2018 xxxxxx' as an example, SPLIT_PART('xxxxxxxxxxx ab_88_2018 xxxxxx','',2) will return '88'as 88 is the second part after ''. If you entered 1 it would return everything before the '_'

using AT commands. of service in response encoding and read Chinese or Arabic for Nokia phones

I am developing an application for GSM Modems using AT commands. I have a problem reading Unicode messages or ussd example:
that dcs=17 not 7 or 15 or 72
Two years ago, and I'm looking for a solution to no availI was able to find a partial solution through the use of Chinese phone where the phone can read Chinese codingBut all Nokia phones do not support the codec Arabic or ChineseAnd service responses appear incomprehensible symbols
Example:
+CUSD: 0,"ar??c
?J <10???#d#??? #0#??#D? ?Z?xb
# $#?#?#Z##?? #-#H?#???#b##$? #3#h?P???#??(??",17
But when you use the phone shows the Chinese response service correctly 100%
How do I address coding through Nokia phones or other
The character set used for strings in AT commands is controlled by AT+CSCS. The default value is "GSM" which is not capable of displaying anything outside a relative limited set of characters.
In your case, to read Arabic or Chinese "UTF-8" is probably the best choice, although "UCS-2" also can be used (will require a little post processing though).
Below you can see how the selected character set affects strings. I have kept the phone number to my Chinese teacher from when I lived in Taiwan, stored as "teacher" in Chinese (lǎo shī). The actual phone number is stripped out here, but otherwise the following is a verbatim copy of the responses from my phone:
$ echo at+cscs? | atinout - /dev/ttyACM0 -
+CSCS: "GSM"
OK
$ echo at+cpbr=403 | atinout - /dev/ttyACM0 -
+CPBR: 403,"",145,"??/M"
OK
$ echo at+cscs=? | atinout - /dev/ttyACM0 -
+CSCS: ("GSM","IRA","8859-1","UTF-8","UCS2")
OK
$ echo 'at+cscs="UTF-8"' | atinout - /dev/ttyACM0 -
OK
$ echo at+cscs? | atinout - /dev/ttyACM0 -
+CSCS: "UTF-8"
OK
$ echo at+cpbr=403 | atinout - /dev/ttyACM0 -
+CPBR: 403,"",145,"老師/M"
OK
$ echo 'at+cscs="UCS2"; +cpbr=403' | atinout - /dev/ttyACM0 -
+CPBR: 403,"",145,"80015E2B002F004D"
OK
$ echo 'at+cscs=?' | atinout - /dev/ttyACM0 -
+CSCS: ("00470053004D","004900520041","0038003800350039002D0031","005500540046002D0038","0055004300530032")
OK
$ echo 'at+cscs="005500540046002D0038"' | atinout - /dev/ttyACM0 -
OK
$ echo 'at+cscs=?' | atinout - /dev/ttyACM0 -
+CSCS: ("GSM","IRA","8859-1","UTF-8","UCS2")
OK
Update, upon checking 27.007, the string for the +CUSD: <m>[,<str>,<dcs>] unsolicited result code is not a regular string, but has its own encoding:
<str>: string type USSD-string (when <str> parameter is not given,
network is not interrogated):
- if <dcs> indicates that 3GPP TS 23.038 [25] 7 bit default alphabet is used:
- if TE character set other than "HEX" (refer command Select TE Character
Set +CSCS): MT/TA converts GSM alphabet into current TE character set
according to rules of 3GPP TS 27.005 [24] Annex A
- if TE character set is "HEX": MT/TA converts each 7-bit character of GSM
alphabet into two IRA character long hexadecimal number (e.g. character
Π (GSM 23) is presented as 17 (IRA 49 and 55))
- if <dcs> indicates that 8-bit data coding scheme is used: MT/TA converts each
8-bit octet into two IRA character long hexadecimal number (e.g. octet with
integer value 42 is presented to TE as two characters 2A (IRA 50 and 65))
<dcs>: 3GPP TS 23.038 [25] Cell Broadcast Data Coding Scheme in integer format
(default 0)
You therefore have to first determine if dcs is 7 or 8 bit, and then decode according to the above.
PS, the "USC2 0x81" format is described here. although it should not behave differently from plain UCS2 in this particular case.

Google Calculator Thousands Separator Special Character

NOTE: For more answers related to this, please see
Special Characters in Google Calculator
I noticed when grabbing the return value for a Google Calculator calculation, the thousands place is separated by a rather odd character. It is not simply a space.
Let's take the example of converting $4,000 USD to GBP.
If you visit the following Google link:
http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp
You'll note that the response is:
{lhs: "4000 U.S. dollars",rhs: "2 497.81441 British pounds",error: "",icc: true}
This looks reasonable, and the thousands place appears to be separated by a whitespace character.
However, if you enter the following into your command line:
curl -s "http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp"
You'll note that the response is:
{lhs: "4000 U.S. dollars",rhs: "2?498.28243 British pounds",error: "",icc: true}
That question mark (?) is a replacement character. What is going on?
AppleScript returns a different replacement character:
{lhs: "4000 U.S. dollars",rhs: "2†498.28243 British pounds",error: "",icc: true}
I am also getting from other sources:
{lhs: "4000 U.S. dollars",rhs: "2�498.28243 British pounds",error: "",icc: true}
It turns out that � is the proper Unicode replacement character 65533.
Can anyone give me insight into what Google is passing me?
It's a non-breaking space, U+00A0. It's to ensure that the number won't get broken at the end of a line.
Google returns the correct encoding (UTF-8) however:
Content-Type: text/html; charset=UTF-8
so ...
if it comes out as a normal space (U+0020) instead (Firefox does that when copying, stupidly enough), then the application performs conversion of certain characters to lookalikes, maybe to fit in some sort of restricted code page (ASCII perhaps).
if there is a question mark, then it was correctly read as Unicode but some part in processing uses a legacy character set that doesn't contain that character so it gets converted.
if there is a replacement character � (U+FFFD) then it was likely read as UTF-8, converted into a legacy character set that contains the character (e.g. Latin 1) and then re-interpreted as UTF-8.
if there is a totally different character, such as your dagger (†), then I'd guess the response is read correctly as Unicode, gets converted to a character set that contains the character and re-interpreted in another character set. A quick look at the Mac Roman codepage reveals that A0 indeed maps to †.
Needless to say, some parts in whatever you use in processing that response seem to be horrible broken in regard to Unicode. Something I'd hope wouldn't really happen that often in this millennium, but apparently it still does.
I figured out what it was by fiddling around in PowerShell a bit:
PS Home:\> $wc = new-object net.webclient
PS Home:\> $x = $wc.downloadstring('http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp')
PS Home:\> [char[]]$x|%{"$_ - " + +$_}
...
" - 34
2 - 50
  - 160
4 - 52
9 - 57
8 - 56
. - 46
2 - 50
8 - 56
2 - 50
4 - 52
...
Also a quick look at the response headers revealed that the encoding is set correctly.
According to my tests with curl in the Terminal on OSX, by changing the International character encoding in the Terminal preferences : The encoding is iso latin 1.
When I set the encoding to UTF8 : I get "2?498.28243"
When I set the encoding to MacRoman : I get "2†498.28243"
First solution : use a user agent from any browser (Safari on OSX 10.6.8 in this example)
curl -s -A 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.48 (KHTML, like Gecko) Version/5.1 Safari/534.48' 'http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp'
Second solution : use iconv
curl -s 'http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp' | iconv -t utf8 -f iso-8859-1
Try
set myUrl to quoted form of "http://www.google.com/ig/calculator?hl=en&q=4000%20usd%20to%20gbp"
set xxx to do shell script "curl " & myUrl & " | sed 's/[†]/,/'"