Splunk Enterprise pie-chart with count from different search criteria - pie-chart

I'm trying to create a pie chart where i'v 2 search result sets from different condition and different source. But i'm not able to join the result set into one pie chart.
index=A sourcetype=B host=C | rex "pattern1" | chart count(field1) AS result1
index=A sourcetype=B host=C | rex "pattern2" | chart count(field2) AS result2
index=A sourcetype=B host=D | rex "pattern3" | chart count(field3) AS result3;
I'm able to get data for pattern1 and pattern2 as they have same index/sourcetype/host but cant join data from 3rd one.
PieChart should represent resullt1, result2, others(resul3 - result1 - result2) out of result3

You need to first get all of the results into the same result set before pulling them together into the chart. How easy this will be will depend on the regex or whether you can use normal eval conditions to split the categories.
index=A sourcetype=B host IN (C OR D)
| eval result = if(host = "C" AND _raw LIKE "%pattern1%", "result1",
host = "C" AND _raw LIKE "%pattern2%", "result2",
host = "D" AND _raw LIKE "%pattern3%", "other",
true(), "NA"
)
| where result != "NA"
| eval field = coalesce(field1, field2, field3)
| chart count(field) by result
Failing that, you can use |append to join your existing searches into the same result set with a common field, e.g. result and then do your |chart count by result. Something like the below.
index=A sourcetype=B host=C | rex "pattern1" | stats count(field1) AS result | eval type="type1"
|append [index=A sourcetype=B host=C | rex "pattern2" | stats count(field2) AS result | eval type = "type2"]
|append [index=A sourcetype=B host=D | rex "pattern3" | stats count(field3) AS result | eval type="other"]
| table type result

Related

Scala - Selecting minimum values by group

When looking at my input data frame below, what I'm hoping to do is be able to select the timeframe for each month where Diff_from_50 is the lowest. If there are any ties in this value, it should look at the AvgWindSpeed and select which ever has the lowest windspeed.
What would be the best way to do this in Scala? I've been working with the following code, but when I group by Month I lose my other columns. I'm also not exactly sure how to approach comparing the differences in temperature and then select the one with the lowest WindSpeed if there are ties.
Any suggestions/tips would be appreciated.
Current Code:
val oshdata = osh.select(col("TemperatureF"),col("Wind SpeedMPH"), concat(format_string("%02d",col("Month")),lit("/"),format_string("%02d",col("Day")),lit("/"),col("Year"),lit(" "),col("TimeCST")).as("Date")).withColumn("TemperatureF",when(col("TemperatureF").equalTo(-9999),null).otherwise(col("TemperatureF"))).withColumn("Wind SpeedMPH",when(col("Wind SpeedMPH").equalTo(-9999),null).otherwise(col("Wind SpeedMPH"))).withColumn("WindSpeed",when($"Wind SpeedMPH" === "Calm",0).otherwise($"Wind SpeedMPH"))
val ts = to_timestamp($"Date","MM/dd/yyyy hh:mm a")
val Oshmydata=oshdata.withColumn("ts",ts)
val OshgroupByWindow = Oshmydata.groupBy(window(col("ts"), "1 hour")).agg(avg("TemperatureF").as("avgTemp"),avg("WindSpeed").as("AvgWindSpeed")).select("window.start", "window.end", "avgTemp","AvgWindSpeed")
val Oshdaily = OshgroupByWindow.withColumn("_tmp",split($"start"," ")).select($"_tmp".getItem(0).as("Date"),date_format($"_tmp".getItem(1),"hh:mm:ss a").as("startTime"),$"end",$"avgTemp",$"AvgWindSpeed").withColumn("_tmp2",split($"end"," ")).select($"Date",$"StartTime",date_format($"_tmp2".getItem(1),"hh:mm:ss a").as("EndTime"),$"avgTemp",$"AvgWindSpeed").withColumn("Diff_From_50",abs($"avgTemp"-50))
val OshfinalData = Oshdaily.select(col("*"),month(col("Date")).as("Month")).orderBy($"Month",$"StartTime")
OshfinalData.createOrReplaceTempView("oshView")
val testing = OshfinalData.select(col("*")).groupBy($"Month",$"StartTime").agg(avg($"avgTemp").as("avgTemp"),avg($"AvgWindSpeed").as("AvgWindSpeed"))
val withDiff = testing.withColumn("Diff_from_50",abs($"avgTemp"-50))
withDiff.select(col("*")).groupBy($"Month").agg(min("Diff_from_50")).show()
Input Data Frame:
+-----+-----------+------------------+------------------+-------------------+
|Month| StartTime| avgTemp| AvgWindSpeed| Diff_from_50|
+-----+-----------+------------------+------------------+-------------------+
| 1|01:00:00 AM|17.375469072164957| 8.336983230663929| 32.62453092783504|
| 1|01:00:00 PM| 23.70729813664597|10.294075601374567| 26.29270186335403|
| 1|02:00:00 AM| 17.17661058638331| 8.332715559474817| 32.823389413616695|
| 1|02:00:00 PM| 23.78028142954523|10.131929492774708| 26.21971857045477|
| 1|03:00:00 AM|16.979751170960192| 8.305847424684158| 33.02024882903981|
| 1|03:00:00 PM| 23.78028142954523|11.131929492774708| 26.21971857045477|
| 2|01:00:00 AM| 18.19221536796537| 8.104439935064937| 31.80778463203463|
| 2|01:00:00 PM|25.602093162953263|10.756156072520753| 24.397906837046737|
| 2|02:00:00 AM| 17.7650265755505| 8.142266514806375| 32.2349734244495|
| 2|02:00:00 PM|25.602093162953263|11.756156072520753| 24.397906837046737|
+-----+-----------+------------------+------------------+-------------------+
Expected output:
+-----+-----------+------------------+------------------+-------------------+
|Month| StartTime| avgTemp| AvgWindSpeed| Diff_from_50|
+-----+-----------+------------------+------------------+-------------------+
| 1|02:00:00 PM| 23.78028142954523|10.131929492774708| 26.21971857045477|
| 2|01:00:00 PM|25.602093162953263|10.756156072520753| 24.397906837046737|
+-----+-----------+------------------+------------------+-------------------+
You can use the Window function:
val monthsLowest = Window
.partitionBy('Month)
.orderBy('Diff_from_50.asc, 'AvgWindSpeed.asc)
df.withColumn("rn", row_number over monthsLowest)
.where($"rn" === 1)
.drop("rn")
.show()
It will give you the same expected output.
For more information about Window functions in spark, there is a great guide https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-functions-windows.html
You can also take a look at answer:
How to select the first row of each group?

Check if a set of a field values is mapped against single value of another field in dataframe

Consider the below dataframe with store and books available:
+-----------+------+-------+
| storename | book | price |
+-----------+------+-------+
| S1 | B11 | 10$ | <<
| S2 | B11 | 11$ |
| S1 | B15 | 29$ | <<
| S2 | B10 | 25$ |
| S2 | B16 | 30$ |
| S1 | B09 | 21$ | <
| S3 | B15 | 22$ |
+-----------+------+-------+
Suppose we need to find the stores which have two books namely, B11 and B15. Here, the answer is S1 as it stores both books.
One way of doing it is to find intersection of the stores having book B11 with the stores having book B15 using below command:
val df_select = df.filter($"book" === "B11").select("storename")
.join(df.filter($"book" === "B15").select("storename"), Seq("storename"), "inner")
which contains the name of stores having both.
But instead I want a table
+-----------+------+-------+
| storename | book | price |
+-----------+------+-------+
| S1 | B11 | 10$ | <<
| S1 | B15 | 29$ | <<
| S1 | B09 | 21$ | <
+-----------+------+-------+
which contains all records related to that fulfilling store. Note that B09 is not left out. (Use case : the user can explore some other books as well in the same store)
We can do this by doing another intersection of above result with original dataframe:
df_select.join(df, Seq("storename"), "inner")
But, I see scalability and readability issue with step 1 as I have to keep on joining one dataframe to another if the number of books are more than 2. Lots of pain to do and that's error-prone too. Is there a more elegant way to do the same? Something like:
val storewise = Window.partitionBy("storename")
df.filter($"book".contains{"B11", "B15"}.over(storewise))
Found a simple solution using array_except function.
Add required set-of-field-values as an array in a new column, req_books
Add a column, all_books, storing all the books stored in a store using Window.
Using above two columns find if the store misses any required book, and filter them out if it misses anything.
Drop the excess columns created.
Code:
val df1 = df.withColumn("req_books", array(lit("B11"), lit("B15")))
.withColumn("all_books", collect_set('book).over(Window.partitionBy('storename)))
df1.withColumn("missing_books", array_except('req_books, 'all_books))
.filter(size('missing_books) === 0)
.drop('missing_book).drop('all_books).drop('req_books).show
Using Window Functions to create array of all values and check if it contains all the necessary values.
val bookList = List("B11", "B15") //list of books to search
def arrayContainsMultiple(bookList: Seq[String]) = udf((allBooks: WrappedArray[String]) => allBooks.intersect(bookList).sorted.equals(bookList.sorted))
val filteredDF = input
.withColumn("allBooks", collect_set($"books").over(Window.partitionBy($"storename")))
.filter(arrayContainsMultiple(bookList)($"allBooks"))
.drop($"allBooks")

Postgresql - Chain multiple regex_replace functions in single query?

Using Postgresql 11.6. I have values in tab_a.sysdescr that I want to convert using regex_replace and update those converted values into tab_b.os_type.
Here is table tab_a that contains the source string in sysdescr :
hostname | sysdescr |
-------------+-----------------+
wifiap01 | foo HiveOS bar |
switch01 | foo JUNOS bar |
router01 | foo IOS XR bar |
Here is table tab_b that is the target for my update, in column os_type :
hostname | mgmt_ip | os_type
-------------+--------------+---------
wifiap01 | 10.20.30.40 |
switch01 | 20.30.40.50 |
router01 | 30.40.50.60 |
This is example desired state for tab_b :
hostname | mgmt_ip | os_type
-------------+--------------+---------
wifiap01 | 10.20.30.40 | hiveos
switch01 | 20.30.40.50 | junos
router01 | 30.40.50.60 | iosxr
I have a working query that will work against a single os_type. In this example, HiveOS :
UPDATE tab_b
SET os_type = (
SELECT REGEXP_REPLACE(sysdescr, '.*HiveOS.*', 'hiveos')
FROM tab_a
WHERE tab_a.hostname = tab_b.hostname
)
WHERE EXISTS (
SELECT sysdescr
FROM tab_a
WHERE tab_a.hostname = tab_b.hostname
);
What I can't figure out is how I can "chain" multiple regex_replace functions together into a single query, or via nested sub-queries. Adding 'OR' after that SELECT REGEX_REPLACE line doesn't work, and haven't been able to find examples online of something like this.
End-goal is a single query function that will replace the strings as specified, updating the replaced string on all rows in tab_b. I was hoping to avoid having to delve into PL/Python but if that is the best way to solve this, that's okay. Ideally, I could define a third table that contains the pattern and replacement_string arguments - and could iterate over that somehow.
Edit: Example of what I am trying to accomplish
This is not valid code, but hopefully demonstrates what I am trying to accomplish. A single query that can be executed once, and will translate/transform every sysdescr in a table into proper values for os_type in a new table.
UPDATE tab_b
SET os_type = (
SELECT REGEXP_REPLACE(sysdescr, '.*HiveOS.*', 'hiveos') OR
SELECT REGEXP_REPLACE(sysdescr, '.*JUNOS.*', 'junos') OR
SELECT REGEXP_REPLACE(sysdescr, '.*IOS XR.*', 'iosxr')
FROM tab_a
WHERE tab_a.hostname = tab_b.hostname
)
WHERE EXISTS (
SELECT sysdescr
FROM tab_a
WHERE tab_a.hostname = tab_b.hostname
);
If foo and bar are consistent in all rows (as indicated in your example), then this should work:
postgres=# SELECT lower(replace(regexp_replace('foo IOS XR bar','foo (.*) bar','\1'),' ',''));
lower
-------
iosxr
(1 row)
In short, this does the following:
Trim off foo and bar from the front and back with regexp_replace()
Remove the spaces with replace()
Lower-case the text with lower()
If you need to do anything further to remove foo and bar, you can nest the string functions as demonstrated above.
I was able to solve this using a third table (lookup table). It contains two columns, one holding the match string and one holding the return string.
New table tab_lookup:
id | match_str | return_str
----+-----------------------------------------------+------------
1 | HiveOS | hiveos
2 | IOS XR | iosxr
3 | JUNOS | junos
5 | armv | opengear
6 | NX-OS | nxos
7 | Adaptive Security Appliance | asa
17 | NetScreen | netscreen
19 | Cisco Internetwork Operating System Software | ios
18 | Cisco IOS Software | ios
20 | ProCurve | hp
21 | AX Series Advanced Traffic Manager | a10
22 | SSG | netscreen
23 | M13, Software Version | m13
24 | WS-C2948 | catos
25 | Application Control Engine Appliance | ace
Using this query I can update tab_b.os_type with the appropriate value from tab_lookup.return_str:
UPDATE tab_b
SET os_type = (
SELECT return_str
FROM tab_lookup
WHERE EXISTS (
SELECT regexp_matches(sysdescr, match_str)
FROM tab_a
WHERE tab_a.hostname = tab_b.hostname
)
);
The only catch I have encountered is that there must be only one match against a given row. But this is easily accomplished by verbose match_str values. E.g, don't use 'IOS' but instead use 'Cisco IOS Software'.
All in all, very happy with this solution since it provides an easy way to update the lookup values, as more device types are added to the network.

Spark Scala Dataframe - replace/join column values with values from another dataframe (but is transposed)

I have a table with ~300 columns filled with characters (stored as String):
valuesDF:
| FavouriteBeer | FavouriteCheese | ...
|---------------|-----------------|--------
| U | C | ...
| U | E | ...
| I | B | ...
| C | U | ...
| ... | ... | ...
I have a Data Summary, which maps the characters onto their actual meaning. It is in this form:
summaryDF:
| Field | Value | ValueDesc |
|------------------|-------|---------------|
| FavouriteBeer | U | Unknown |
| FavouriteBeer | C | Carlsberg |
| FavouriteBeer | I | InnisAndGunn |
| FavouriteBeer | D | DoomBar |
| FavouriteCheese | C | Cheddar |
| FavouriteCheese | E | Emmental |
| FavouriteCheese | B | Brie |
| FavouriteCheese | U | Unknown |
| ... | ... | ... |
I want to programmatically replace the character values of each column in valuesDF with the Value Descriptions from summaryDF. This is the result I'm looking for:
finalDF:
| FavouriteBeer | FavouriteCheese | ...
|---------------|-----------------|--------
| Unknown | Cheddar | ...
| Unknown | Emmental | ...
| InnisAndGunn | Brie | ...
| Carlsberg | Unknown | ...
| ... | ... | ...
As there are ~300 columns, I'm not keen to type out withColumn methods for each one.
Unfortunately I'm a bit of a novice when it comes to programming for Spark, although I've picked up enough to get by over the last 2 months.
What I'm pretty sure I need to do is something along the lines of:
valuesDF.columns.foreach { col => ...... } to iterate over each column
Filter summaryDF on Field using col String value
Left join summaryDF onto valuesDF based on current column
withColumn to replace the original character code column from valuesDF with new description column
Assign new DF as a var
Continue loop
However, trying this gave me Cartesian product error (I made sure to define the join as "left").
I tried and failed to pivot summaryDF (as there are no aggregations to do??) then join both dataframes together.
This is the sort of thing I've tried, and always getting a NullPointerException. I know this is really not the right way to do this, and can see why I'm getting Null Pointer... but I'm really stuck and reverting back to old, silly & bad Python habits in desperation.
var valuesDF = sourceDF
// I converted summaryDF to a broadcasted RDD
// because its small and a "constant" lookup table
summaryBroadcast
.value
.foreach{ x =>
// searchValue = Value (e.g. `U`),
// replaceValue = ValueDescription (e.g. `Unknown`),
val field = x(0).toString
val searchValue = x(1).toString
val replaceValue = x(2).toString
// error catching as summary data does not exactly mapping onto field names
// the joys of business people working in Excel...
try {
// I'm using regexp_replace because I'm lazy
valuesDF = valuesDF
.withColumn( attribute, regexp_replace(col(attribute), searchValue, replaceValue ))
}
catch {case _: Exception =>
null
}
}
Any ideas? Advice? Thanks.
First, we'll need a function that executes a join of valuesDf with summaryDf by Value and the respective pair of Favourite* and Field:
private def joinByColumn(colName: String, sourceDf: DataFrame): DataFrame = {
sourceDf.as("src") // alias it to help selecting appropriate columns in the result
// the join
.join(summaryDf, $"Value" === col(colName) && $"Field" === colName, "left")
// we do not need the original `Favourite*` column, so drop it
.drop(colName)
// select all previous columns, plus the one that contains the match
.select("src.*", "ValueDesc")
// rename the resulting column to have the name of the source one
.withColumnRenamed("ValueDesc", colName)
}
Now, to produce the target result we can iterate on the names of the columns to match:
val result = Seq("FavouriteBeer",
"FavouriteCheese").foldLeft(valuesDF) {
case(df, colName) => joinByColumn(colName, df)
}
result.show()
+-------------+---------------+
|FavouriteBeer|FavouriteCheese|
+-------------+---------------+
| Unknown| Cheddar|
| Unknown| Emmental|
| InnisAndGunn| Brie|
| Carlsberg| Unknown|
+-------------+---------------+
In case a value from valuesDf does not match with anything in summaryDf, the resulting cell in this solution will contain null. If you want just to replace it with Unknown value, instead of .select and .withColumnRenamed lines above use:
.withColumn(colName, when($"ValueDesc".isNotNull, $"ValueDesc").otherwise(lit("Unknown")))
.select("src.*", colName)

Querying partial value from a field - SQL SERVER 2008

I need to return only a portion of the value in a given field.
Example:
A given field returns something like 'AB-1X3.4567' but the desired value is only the '1X3.4567'portion. So for this example I need to remove anything that precedes the pattern of
[0-9,A-Z][0-9,A-Z][0-9,A-Z][.][0-9,A-Z][0-9,A-Z][0-9,A-Z][0-9,A-Z].
What query could I write to do this?
using stuff() and patindex():
create table t (val varchar(32))
insert into t values
('AB-1X3.4567') -- given example
,('1X3.4567AB-1X3.4567') --extra junk on the end
,('1X3.4567') -- goldy locks
,('X3.4567') -- too short
,('AB-1X#.4567') -- # is not [0-9A-Z]
select
val
, str = stuff(val,1,patindex('%[0-9A-Z][0-9A-Z][0-9A-Z][.][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',val)-1,'')
from t
rextester demo: http://rextester.com/ITUJ68634
returns:
+---------------------+---------------------+
| val | str |
+---------------------+---------------------+
| AB-1X3.4567 | 1X3.4567 |
| 1X3.4567AB-1X3.4567 | 1X3.4567AB-1X3.4567 |
| 1X3.4567 | 1X3.4567 |
| X3.4567 | NULL |
| AB-1X#.4567 | NULL |
+---------------------+---------------------+
Your pattern alludes to anything which is XXX.XXXX where X = any single digit or letter. In that case we can use RIGHT() and LEN()
DECLARE #value VARCHAR(4000)='AB-1X3.4567'
SELECT RIGHT(#value,LEN(#value) - 3)