Splunk csv to match country code - match

I'm using splunk but having trouble trying to match first 2 or 3 digits in this:
sample:
messageId=9492947, to=61410428007
My csv looks like this:
to, Country
93, Afghanistan
355, Albania
213, Algeria
61, Australia
I'm trying to push the fields into a CSV and tell me what Country they matched.
I think I need to be doing a regex or something, but i have interesting fields marked in splunk which is "to"

This is one of those messy ones, but it does not require regular expressions. Let's say you have a CSV file with a header - something like:
code,country
92,Afghanistan
355,Albania
214,Algeria
61,Australia
44,United Kingdom
1,United States
You need the header for this. I created an example file, but the source can come from anywhere just as long as you have the to field extracted properly.
source="/opt/testfiles/test-country-code.log"
| eval lOne=substr(to,1,1)
| eval lTwo=substr(to,1,2)
| eval lThree=substr(to,1,3)
| lookup countries.csv code as lOne OUTPUT country as cOne
| lookup countries.csv code as lTwo OUTPUT country as cTwo
| lookup countries.csv code as lThree OUTPUT country as cThree
| eval country=coalesce(cOne,cTwo,cThree)
| table to,country
The substr calls extract one, two or three characters from the start of the string. The lookups convert each of those variables to the country name using the lookup table. The coalesce will take the first one of those with a value.

Related

How to remove multiple characters between 2 special characters in a column in SSIS expression

I want to remove the multiple characters starting from '#' till the ';' in derived column expression in SSIS.
For example,
my input column values are,
and want the output as,
Note: Length after '#' is not fixed.
Already tried in SQL but want to do it via SSIS derived column expression.
First of all: Please do not post pictures. We prefer copy-and-pastable sample data. And please try to provide a minimal, complete and reproducible example, best served as DDL, INSERT and code as I do it here for you.
And just to mention this: If you control the input, you should not mix information within one string... If this is needed, try to use a "real" text container like XML or JSON.
SQL-Server is not meant for string manipulation. There is no RegEx or repeated/nested pattern matching. So we would have to use a recursive / procedural / looping approach. But - if performance is not so important - you might use a XML hack.
--DDL and INSERT
DECLARE #tbl TABLE(ID INT IDENTITY,YourString VARCHAR(1000));
INSERT INTO #tbl VALUES('Here is one without')
,('One#some comment;in here')
,('Two comments#some comment;in here#here is the second;and some more text')
--The query
SELECT t.ID
,t.YourString
,CAST(REPLACE(REPLACE((SELECT t.YourString AS [*] FOR XML PATH('')),'#','<!--'),';','--> ') AS XML) SeeTheIntermediateXML
,CAST(REPLACE(REPLACE((SELECT t.YourString AS [*] FOR XML PATH('')),'#','<!--'),';','--> ') AS XML).value('.','nvarchar(max)') CleanedValue
FROM #tbl t
The result
+----+-------------------------------------------------------------------------+-----------------------------------------+
| ID | YourString | CleanedValue |
+----+-------------------------------------------------------------------------+-----------------------------------------+
| 1 | Here is one without | Here is one without |
+----+-------------------------------------------------------------------------+-----------------------------------------+
| 2 | One#some comment;in here | One in here |
+----+-------------------------------------------------------------------------+-----------------------------------------+
| 3 | Two comments#some comment;in here#here is the second;and some more text | Two comments in here and some more text |
+----+-------------------------------------------------------------------------+-----------------------------------------+
The idea in short:
Using some string methods we can wrap your unwanted text in XML comments.
Look at this
Two comments<!--some comment--> in here<!--here is the second--> and some more text
Reading this XML with .value() the content will be returned without the comments.
Hint 1: Use '-->;' in your replacement to keep the semi-colon as delimiter.
Hint 2: If there might be a semi-colon ; somewhere else in your string, you would see the --> in the result. In this case you'd need a third REPLACE() against the resulting string.

Is there a way to see raw string values using SQL / presto SQL / athena?

Edit after asked to better specify my need:
TL;DR: How to show whitespace escaped characters (such as /r) in the Athena console when performing a query? So this: "abcdef/r" instead of this "abcdef ".
I have a dataset with a column that contains some strings of variable length, all of them with a trailing whitespace.
Now, since I had analyzed this data before, using python, I know that this whitespace is a \r; however, if in Athena I SELECT my_column, it obviously doesn't show the escaped whitespace.
Essentially, what I'm trying to achieve:
my_column | ..
----------+--------
abcdef\r | ..
ghijkl\r | ..
What I'm getting instead:
my_column | ..
----------+--------
abcdef | ..
ghijkl | ..
If you're asking why would I want that, it's just to avoid having to parse this data through python if I ever incur in this situation again, so that I can immediately know if there's any weird escaped characters in my strings.
Any help is much appreciated.

Parsing a text file or an html file to create a table

I have a simple issue with a .msg file from outlook, but I discovered that with a code someone helped me with, it was not working since the htmlbody from the .msg file would vary between different emails even though they are from the same source, so my next option was to save the email as a .txt and .html file, since I have no knowledge of html I have no idea how to grab the table which is structured in the html with a . but on the text I found something easy, for example this is data from one table:
Summary
Date
Good mail
Rule matches
Spam
Malware
2019-10-22
4927
4519
2078
0
2019-10-23
4783
4113
1934
0
this is on the text file, Summary is the keyword, and after that key word, the next 5 lines are the columns of the table, after that ,each 5 lines following are the rows, this goes up to 7 rows in total, so headers and then 7 rows.
Now what I want to do is create a table from this text using the 5 first lines after summary as my columns. Since each .msg is different, this 5 columns will change order on each file randomly so I want to avoid this, my best attempt was to use convertfrom-string to create a table , but I have little idea on how to format the table with the conditions set above.
The problem I have is this simple, I have a table on the txt file shown as above, with 5 columns, each column besides the headers contains 7 rows, therei s also the condition that the email since it has more data, I need to stop there nad just grab that part which should be easy.
How can I use convertfrom-string to create the table using those 5 columns , how can I set the delimiter as a new line and how can I set the first 5 lines as the column headers?
I think trying to make this work with ConvertFrom-StringData is adding more work than necessary. But here is an alternative that works with your sample set.
$text = Get-Content -Path File.txt
$formattedText = if ($text[0] -match '^Summary') {
for ($i = 1; $i -lt $text.count; $i+=5 ) {
$text[$i..($i+4)] -join ','
}
}
$fomattedText | ConvertFrom-Csv | ConvertTo-Html
Explanation:
If we assume your text data is in File.txt, Get-Content is used to read the data as an array ($text). If the first line begins with Summary, the file will be parsed.
The for loop is used to skip 5 lines during each iteration until the end of the file. The for loop begins with $text values (indexes 1, 2, 3, 4, and 5) joined together by a ,. Then the index increment ($i) is increased by 5 and the next five index values are joined together. Each increment will create a new line of comma separated values. The reason for the , join is just to use the simple ConvertFrom-Csv later.
ConvertFrom-Csv converts the CSV data into an array of objects ($formattedText) with the first row becoming those objects' properties.
Finally, the array is piped to ConvertTo-Html, which will output all of the objects in a table.
Note: If you want to resize or add extra format to the table, you may need to do that after the code is generated. If your data has commas, you will need a different delimiter when joining the strings. You will then need to add the -Delimiter parameter to the ConvertFrom-Csv with the delimiter you choose.
Adaptation:
The code is fairly flexible. If you need to work with more than five properties, the $i+=5 will need to reflect the number of properties you need to cycle through. The same change needs to apply to $text[$i..($i+4)]. You want the .. to separate two values that differ by your property number.

Using Perl to parse text from blocks

I have a file with multiple blocks of test. FOR EACH block of test, I want to be able to extract what is in the square bracket, the line containing the FIRST instance of the word "area", and what is on the right of the square bracket. Everything will be a string. Essentially what I want to do is store each string into a variable in a hash so i can print it into a 3 column csv file.
Here's a sample of what the file looks like:
Student-[K-6] Exceptional in Math
/home/area/kinder/mathadvance.txt, 12
Students in grade K-12 shown to be exceptional in math.
Placed into special after school program.
See /home/area/overall/performance.txt, 200
Student-[Junior] Weak Performance
Students with overall weak performance.
Summer program services offered as shown in
"/home/area/services/summer.txt", 212
Student-[K-6] Physical Excerise Time Slots
/home/area/pe/schedule.txt, 303
Assigned time slots for PE based on student's grade level. Make reference to
/home/area/overall/classtimes.txt, 90
I want to to have a final csv file that looks like:
Grade,Topic,Path
K-6, Exceptional in Math, /home/area/kinder/mathadvance.txt, 12
K-6, Physical Exercise Time Slots, /home/area/pe/schedule.txt, 303
Junior, Weak Performance, "/home/area/services/summer.txt", 212
Since it's a csv file, I know it will also separate at the line number when exporting into excel but I'm fine with that.
I started off by putting the grade type into an array because I want to be able to add more strings to it for different grade levels.
My program looks like this so far:
#!/usr/bin/perl
use strict;
use warnings;
my #grades = ("K-6", "Junior", "Community-College", "PreK");
I was thinking that I will need to do some sort of system sed command to grab what is in the brackets and store it into a variable. Then I will grab everything to the right of the bracket on the line and store it into a variable. And then I will grep for a line containing "area" to get the path and I will store it as a string into a variable, put these in a hash, and then print into csv. I'm not sure if I'm thinking about this the right way. Also, I have NO IDEA how to do this for each BLOCK of text in the file. I need it by block because each block has its own corresponding grades, topics, and paths.
perl -000 -ne '($grade, $topic) = /\[(.*)\] (.*)/;
($path) = m{(.*/area/.*)};
print "$grade, $topic, $path\n"' -- file.txt
-000 turns on paragraph mode, -n won't read line by line, but paragraph by paragraph
/\[(.*)\] (.*)/ matches the square brackets and whatever follows them up to a newline. The inside of the square brackets and the following text are captured using the parentheses.
m{(.*/area/.*)} captures the line containing "area". It uses the m{} syntax instead of // so we don't have to backslash the slashes (avoiding so called "leaning toothpick syndrome")

Extract Data From Second Line of Output

I have a table that contains message_content. It looks like this:
message_content | WFUS54 ABNT 080344\r\r
| TORLCH\r\r
| TXC245-361-080415-\r
How would I extract only the 2nd line of that output(TORLCH)? I've tried to shorten the output to a certain number of characters but that ultimately doesn't provide what I want. I've also tried removing carriage returns and new lines. I am outputting my results to a CSV I could manipulate with Python, but was wondering if there's a way to do it in the query first.
Based on other examples, it seems like I could use a regular expression to maybe do this? Not sure where to start with learning that though.
you can split the line into an array, then take the second element:
(string_to_array(message_content, e'\r\r'))[2]
Online example: https://rextester.com/MDYLXB40812