Splunk: How to get two searches in one timechart/graph? - append

I have to queries which look like this:
source="/log/ABCD/cABCDXYZ/xyz.log" doSomeTasks| timechart partial=f span=1h count as "#XYZ doSomeTasks" | fillnull
source="/log/ABCD/cABCDXYZ/xyz.log" doOtherTasks| timechart partial=f span=1h count as "#XYZ doOtherTasks" | fillnull
I now want to get this two searches in one graph (I do not want to sum the numbers I get per search up to one value).
I saw that there is the possibility to take appendcols but my trials to use this command were not successful.
I tried this but it did not work:
source="/log/ABCD/cABCDXYZ/xyz.log" doSomeTasks|timechart partial=f span=1h count as "#XYZ doSomeTasks" appendcols [doOtherTasks| timechart partial=f span=1h count as "#XYZ doOtherTasks" | fillnull]

Thanks to PM 77-1 the issue is solved.
This command works:
source="/log/ABCD/cABCDXYZ/xyz.log" doSomeTasks|timechart partial=f span=1h count as "#XYZ doSomeTasks" | appendcols[search source="/log/ABCD/cABCDXYZ/xyz.log" doOtherTasks| timechart partial=f span=1h count as "#XYZ doOtherTasks" | fillnull]
Note: You do not have to mention the source in the second search command if it is the same source as the first one.

General solution
Generate each data column by using a subsearch query in the following form:
|appendcols[search (myquery) |timechart count]
Additional steps
The list of one-or-more query columns needs to be preceded by a generated column which establishes the timechart rows (and gives appendcols something to append to).
|makeresults |timechart count |eval count=0
Note: It isn't strictly required to start with a generated column, but I've found this to be a clean and robust approach. Notably, it avoids problems that may occur in the special-case of "No results found", which otherwise can confuse the visualization rendering. Plus it's more uniform and, as a result, easier to work with.
Finally, specify each of the fields to be charted, with _time as the x-axis:
|fields _time, myvar1, myvar2, myvar3
Complete example
|makeresults |timechart span=5m count |eval count=0
|appendcols[search (myquery1) |timechart span=5m count as myvar1]
|appendcols[search (myquery2) |timechart span=5m count as myvar2]
|appendcols[search (myquery3) |timechart span=5m count as myvar3]
|fields _time, myvar1, myvar2, myvar3
Be careful to use the same span throughout.
Other hints
When comparing disparate data on the same chart, perhaps to evaluate their relative timing, it's common to have differences in type or scale that can render the overlaid result nearly useless. For cases like this, don't neglect the 'Log' format option for the Y-Axis.
In some cases, it may even be worthwhile to employ data hacks with eval to massage the values into a visual comparable state. For example, appending |eval myvar1=if(myvar1=0,0,1) deduplicates values when used following timechart count. Here's some relevant docs:
Mathematical functions
Comparison and Conditional functions

Related

Aggregating Wildcards in Sumologic

I'm trying to aggregate the API logs based on the different endpoints I have. There are a total of 4 endpoints:
1: /v1/vehicle_locations
2: /v1/vehicle_locations/id
3: /v1/driver_locations
4: /v1/driver_locations/id
The way I'm currently doing this is:
_sourceCategory=production | keyvalue auto | where (path matches "/v1/driver_locations" OR path matches "/v1/driver_locations/*" or path matches "/v1/vehicle_locations" or path matches "/v1/vehicle_locations/*") | count by path
The problem with this is that while I get the correct aggregate for /v1/vehicle_locations and /v1/driver_locations, I get individual results for /v1/driver_locations/id and /v1/vehicle_locations/id since the id is a wildcard. Is there a way I can aggregate these wildcards as well?
There are several ways to achieve what you ask. I think the most straightforward one and suggested is to use | parse operator so that you can treat the top-most element of your path as a field, e.g.
_sourceCategory=production
| keyvalue auto
| parse field=path "*/*" as topmost, rest
| where (topmost = "vehicle_locations" or topmost = "driver_locations")
| count by topmost
Note that by default | parse operator works on the raw message (e.g. the original log line), but you can make it parse a field - using the field= syntax and this is what it's used above.
You might want to tweak the parse expression or use a regex depending on the actual paths you encounter.
(Disclaimer: I am currently employed by Sumo Logic)

Exclude non-matching atoms from list

I would like to collect a list of atoms and pass a list to an object or abstraction that will pass through the matching atoms without modifying the order of the list, or removing duplicates.
(hello how are you)
|
[desiredobject how are you]
|
[print]
Ideally this would print (how are you). If I were to put in (how how how) I would get back the same message. But if I were to put in (jfj jfj jfj) I would get nothing.
[zl] is useful but I am looking for the inverse behaviour of [zl filter].
EDIT:
I came up with the following solution that works equally well to the solution #mattijs posted. My solution uses [uzi] to give indices of the symbols in a list. The indices output of [zl filter] is fed to [zl unique] in order to strip out undesired indices. This new list is fed to [zl lookup] in order to convert back to the symbols. The (fake) message is inserted in case [zl filter] filters everything and then [zl unique] would have no output.
----------begin_max5_patcher----------
1385.3oc0Z0scipBE95jmBVdclYo3Owbt67bz0r5hnzDZMfGEa6zYMu6GDvD
v+hoMM0dQzHHvlu89aCrc+mkKb1xdEW5.9GvcfEK9yxEKjEUWvB8yKbNfdMI
CUJeMGJ9E11GcVophiekKKNGjTvJKKvOfKvzDbyKPqNvp3YXtr0Pcoph3+NG
qFZGmUhefeoq9AFkWRdSVoG7mtm5KBscWki3I6Izc2WfS3pdKHVzDP755qdt
t02fhqG6dRpTjESie3CcLFSJ5fbLc92BBJywbDvEZLQCJhFPxvOiKJILpTNW
oKGkmaT7BilTijOxjcTzpiEQnph7NVTA9YRS6cOVJpPLO4hIYUgRHeMNxQU4
eW1L3m.gvvn5IdPP800wxaQvfSvfPQuKik7DN0bbbX4XJglWfKwTNh2RLbRw
Ofpx322uJxt9GPI3AabuX8BmcEjTFsVHrZYcwMC2c.uPopMzbxHeCJJumFWJ
lGUkaQE0v51Lrg4ivBlwxrq5nlTPDPTxADGyIJgE5drSIGxKHTt0.goHQeru
TPExxr5JUMO2SMoBkcB9ERJeuruLgRwqSxaTANGwnTxNbI2tLNZWocIVLaSq
PSFtU4sX5RtVS20kK6YTSW97QEijAYXMYvS87kby.o1zUewpgVLdWydrKqGF
IswCkWggMWs5OCler4LKgc3.lpgzlh+6xkM+Y06DBa5Wig5nGyBbNKuJSXcA
pnj+qBCdgv2Cd.8DFH34G.DJHAUhABGMbbAXOpDPY.ATmWY0iYDJNgUoDe+A
0Wlv2.rWYUE61ZaH1QQ3MthvGpb45uQwRkJjf0CqJhbZIZs7MbEUHWOa5Kwz
bS3kXY5cKrLsVF21v7sLPlvC5ffDbTPZUafZDas9VPeHLzaizbxeiu5V6U06
rx9synRABNSh4cIDI8TN.FOMCG3svv4yZxp3HSdtF9ESR3fLv1O.GYKht68w
SNmQSjlZH29mO7mgiySh7tcDkC3xRzNbu.Z85dWEGyWI+Mv0wFqh4K853EOB
N5d6vwwcdq1Mwby+she66IMKCOq66P+4BbtqfUkOOQSswYn+YQS3r.M+Av6c
Rwkmz5yCGcgSyYY37fjW8FYXbzeLbTtrypS2+bwU8ZQAQmy9LXybgsOC24qF
KUGV97a7MX8rAKYrmlqtN8UNMCiOKbFLWfyl3.vdAfJvfeyplcmQSu2S8I+8
BNK39NVW5TH+pC7w.Q3RJb002OpWxpJRZfCczf.1RYJtjSnGC.5cmDXfm0Kt
mjlZGvQUL5Jqi3mJ1pSxD4RE4dDk9k45sRMSj45CaLcb1cVHy9STjmOR7jA4
fIZXbfjlyDaUPS17bCp2njm2Z02VPQ1MdvW8EX7B7qeO4S1hy7ABV+sSq0CG
5KyETvDk4vuehr+7QjWOUi4Me+j4P3GyGTnbWwwJ+MlO.g5sMGKOTWmXIdCv
fZ0AbJXf6rQu0inLhdalHyvoJyyGNcn+krF87PlibuDbdFsgyorI4nqEGD7q
klojgSI5Yb58hgPbJo6QbdAYaEWcrEyrI4il2DmOuLtfjInWgXjAenAcWFaK
JSm+LGaaO4rvxSfn7pR0OkzQx9SJb57xcOkb+gabjulvGMMj7TAWPsbz5n1e
.AqubW7HogzUDUd7gGA5eeUnBLxSFSPIpD3OJpr91fJ01J5eeY1J5HkprXFG
UbutnxPIzmL6l5ENbuf4r24RSOOn9Hiikld2H5wdbVFCLCbcnYGJ.xMbTygo
AMxlZGwLER0dEUMd0YkTnUHxF5PSV6lu6JncW8rilaxxy5oJOt2D4o0P0u7D
eyDG37Bcftelxix3tUh2VKJsR31VIaa2DscnjrULN+c4+C3uv.MC
-----------end_max5_patcher-----------
So basically you want to compare each item in list 1 to each item in list 2 and only pass through an item in list 1 if it is also in list 2?
It's possible this can be simplified btw, just a quick solution:
----------begin_max5_patcher----------
1198.3oc4Z1saihCEG+ZxSAhq6NBayWYuXk1mipQUFvIwcH1HvosyLpu6KXC
IlDRpIkkwilKBowewe94y4Xr64mqb7R4uQp8b+a2Gccb94JGGYQsE3z8aGu8
32xJv0xl4wHuxSe16AUUBxaBYwktYU755JxFREgkQ5avFNSTS+AosQ.3W76J
lcXOkUPDxwDdpP9AQeoftRKwhrcT11mpHYBkRQnlAxMtc3bA99seAat590tt
nFEw2KIp164crJZtTtMOB+EBpKRFdur0d+aEEW3oeyIUc7nCHMcfVPdgTUS4
LMo53gKK0J1QqKsT7YtbfBe3XQTlpHvwhpHuP66+ZUguupe3OQ.4yr7RTf7q
fDM.zLGtsfm8Mh7g0uuvbxFyGAdIgMVq6uLrkTVYEolvDXQm30uq3CEhmF2R
XX8avYjq14Qmgb71VQy4rVQLnmsE2e6dzEDJsUB00srELb4HcVv4Eo3p1ohz
Bxfo3FuCLitGKHBpROP+i8iturhxDCFKBC2LF6pa7OJJFLTpZdYjZxarBxHu
RyE6ji0IZMveU29R2ucP42x+cnOrvM0sPulqLkcgCLXXEZNwPsZFwQNzWyEF
A+R3PG4wclSwrsdOn4TetiML57GhyLc5q78Uq5+iGlebREMANlUZBtMMiWKo
IREPL97vhePnwKnXnUPw+w0+dfH7dgH.FnLFWOEaRoS+0.YfU.xsMgrVTRFo
.YnzPDE7YMGQVAEEtEKbLRkWMLNZJ1i2L9HHwJP4OJbYMquMqljFwRDLQ90Z
CMIuIKisEVtshenbQoYHPhwDokYHZNnYjsPyJx1eAVlvP4dZB7mCVFZIALS+
0DvbZKf+guTIHvVLMqOjtnllqUd1qk7L.NGlln+nMMApXkymoIb4v4dRcMdK
YTdti+pKth39c9gE8EM6np5K06IARt6WzD.9Sd2iPezrr6wELZo7Y26Z7v+d
4g5rI5VJNr+58t6k05BLiueOQcLQd+FPB.JZFQAxeIPgROdS2Kw2j24MHJ5F
rP+gELkG1SmMYAkcsi4SJp15GmB07CUY8SB8qN3NTf4jZAkc7nZe7TfuyZ3N
Zd9viFUcr04kbJSzIP2uZz71T0cfoxNvpjMvXcGZW5t0LAXptA1EuMR2H6R2
ISQ2VjcBxT66X6i2FqaKi2lXm.sq3fPSiCBsL663ona6w9FZreYjcw6PS0ch
cYeartg1ktiLMdhk89I.C4cf8g6eCcKa2vpYgAsKc2t6Ry2siE85IlZmflyc
o0UXeFF4Uiegj+TysoY+tOgEhJZ5AgZin5o+z1BdJtnK2nNdvNGyHmQyjm+u
R+mOJ8ht1YOonf7pZ9yjjea3oedqS9bRI9ldtSocVCPUlb.Rfik3Ulmtbf.U
9xgF+7QMLc4fI2Hc4lQV97lmc69LmrLAoR0nOIKgAp+O7xQSw06gkwKCKasK
69LirDr1eVXIHL3DKStaV5Our7ZI3pLw9l.DAihC+OJqVA.z34Ki9y7B4Jti
TTvc+jA2lXV81QA+v60bvLzH65viYUQpyW3tiWW9BR5KCes0pG1nKVmd70nu
XxyXI4agZJ1B0ThoZBrXZBYh8Txm.SM+38U+GN+kvWC
-----------end_max5_patcher-----------
Hope that helps,
Mattijs

dataFrame keying using pandas groupby method

I new to pandas and trying to learn how to work with it. Im having a problem when trying to use an example I saw in one of wes videos and notebooks on my data. I have a csv file that looks like this:
filePath,vp,score
E:\Audio\7168965711_5601_4.wav,Cust_9709495726,-2
E:\Audio\7168965711_5601_4.wav,Cust_9708568031,-80
E:\Audio\7168965711_5601_4.wav,Cust_9702445777,-2
E:\Audio\7168965711_5601_4.wav,Cust_7023544759,-35
E:\Audio\7168965711_5601_4.wav,Cust_9702229339,-77
E:\Audio\7168965711_5601_4.wav,Cust_9513243289,25
E:\Audio\7168965711_5601_4.wav,Cust_2102513187,18
E:\Audio\7168965711_5601_4.wav,Cust_6625625104,-56
E:\Audio\7168965711_5601_4.wav,Cust_6073165338,-40
E:\Audio\7168965711_5601_4.wav,Cust_5105831247,-30
E:\Audio\7168965711_5601_4.wav,Cust_9513082770,-55
E:\Audio\7168965711_5601_4.wav,Cust_5753907026,-79
E:\Audio\7168965711_5601_4.wav,Cust_7403410322,11
E:\Audio\7168965711_5601_4.wav,Cust_4062144116,-70
I loading it to a data frame and the group it by "filePath" and "vp", the code is:
res = df.groupby(['filePath','vp']).size()
res.index
and the output is:
[E:\Audio\7168965711_5601_4.wav Cust_2102513187,
Cust_4062144116, Cust_5105831247,
Cust_5753907026, Cust_6073165338,
Cust_6625625104, Cust_7023544759,
Cust_7403410322, Cust_9513082770,
Cust_9513243289, Cust_9702229339,
Cust_9702445777, Cust_9708568031,
Cust_9709495726]
Now Im trying to approach the index like a dict, as i saw in examples, but when im doing
res['Cust_4062144116']
I get an error:
KeyError: 'Cust_4062144116'
I do succeed to get a result when im putting the filepath, but as i understand and saw in previouse examples i should be able to use the vp keys as well, isnt is so?
Sorry if its a trivial one, i just cant understand why it is working in one example but not in the other.
Rutger you are not correct. It is possible to "partial" index a multiIndex series. I simply did it the wrong way.
The index first level is the file name (e.g. E:\Audio\7168965711_5601_4.wav above) and the second level is vp. Meaning, for each file name i have multiple vps.
Now, this is correct:
res['E:\Audio\7168965711_5601_4.wav]
and will return:
Cust_2102513187 2
Cust_4062144116 8
....
but trying to index by the inner index (the Cust_ indexes) will fail.
You groupby two columns and therefore get a MultiIndex in return. This means you also have to slice using those to columns, not with a single index value.
Your .size() on the groupby object converts it into a Series. If you force it in a DataFrame you can use the .xs method to slice a single level:
res = pd.DataFrame(df.groupby(['filePath','vp']).size())
res.xs('Cust_4062144116', level=1)
That works. If you want to keep it as a series, boolean indexing can help, something like:
res[res.index.get_level_values(1) == 'Cust_4062144116']
The last option is a bit less readable, but sometimes also more flexibile, you could test for multiple values at once for example:
res[res.index.get_level_values(1).isin(['Cust_4062144116', 'Cust_6073165338'])]

how can I add weight to keyword for thinking sphinx

I did such search,
` Comment.search "aabbb "`
and I want to get the results which contain "ab" too.;
So I did that way:
` Comment.search "aabbb ab"`
but I found the results aabbb and ab are mixed , in fact, I want to make the results which match aabbb shows before ab, in other words, have a higher priority.
I know sphinx can add weight the fields of the table. for example add 10 to comments's name, 20 to comment's content. but is it possible to add weight to the query works?
Unfortunately this is not possible with sphinx yet but you can add similar behavior on a query by adding multiple times the keyword you want to weight.
For example:
"aabbb | aabbb | ab"
The aabbb is twice more important than ab
Sphinx has no ability to weight certain search phrases, I'm afraid - so what you're trying to do is not possible.
It's also worth noting that Sphinx uses AND logic by default - if you want results that match either aabbb OR ab, you'll probably want to use the :any match mode:
Comment.search "aabbb ab", :match_mode => :any

Generate unique 3 letter/number code and compare to existing ones in PHP/MySQL

I'm making a code generation script for UN/LOCODE system and the database has unique 3 letter/number codes in every country. So for example the database contains "EE TLL", EE being the country (Estonia) and TLL the unique code inside Estonia, "AR TLL" can also exist (the country code and the 3 letter/number code are stored separately). Codes are in capital letters.
The database is fairly big and already contains a huge number of locations, the user has also the possibility of entering the 3 letter/number him/herself (which will be checked against the database before submission automatically).
Finally neither 0 or 1 may be used (possible confusion with O and I).
What I'm searching for is the most efficient way to pick the next available code when none is provided.
What I've came up with:
I'd check with AAA till 999, but then for each code it would require a new query (slow?).
I could store all the 40000 possibilities in an array and subtract all the used codes that are already in the database... but that uses too much memory IMO (not sure what I'm talking about here actually, maybe 40000 isn't such a big number).
Generate a random code and hope it doesn't exist yet and see if it does, if it does start over again. That's just risk taking.
Is there some magic MySQL query/PHP script that can get me the next available code?
I will go with number 2, it is simple and 40000 is not a big number.
To make it more efficient, you can store a number representing each 3-letter code. The conversion should be trivial because you have a total of 34 (A-Z, 2-9) letters.
I would for option 1 (i.e. do a sequential search), adding a table that gives the last assigned code per country (i.e. such that AAA..code are all assigned already). When assigning a new code through sequential scan, that table gets updated; for user-assigned codes, it remains unmodified.
If you don't want to issue repeated queries, you can also write this scan as a stored routine.
To simplify iteration, it might be better to treat the three-letter codes as numbers (as Shawn Hsiao suggests), i.e. give a meaning to A-Z = 0..25, and 2..9 = 26..33. Then, XYZ is the number X*34^2+Y*34+Z == 23*1156+24*34+25 == 27429. This should be doable using standard MySQL functions, in particular using CONV.
I went with the 2nd option. I was also able to make a script that will try to match as close as possible the country name, for example for Tartu it will try to match T** then TA* and if possible TAR, if not it will try TAT as T is the next letter after R in Tartu.
The code is quite extensive, I'll just post the part that takes the first possible code:
$allowed = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ23456789';
$length = strlen($allowed);
$codes = array();
// store all possibilities in a huge array
for($i=0;$i<$length;$i++)
for($j=0;$j<$length;$j++)
for($k=0;$k<$length;$k++)
$codes[] = substr($allowed, $i, 1).substr($allowed, $j, 1).substr($allowed, $k, 1);
$used = array();
$query = mysql_query("SELECT code FROM location WHERE country = '$country'");
while ($result = mysql_fetch_array($query))
$used[] = $result['code'];
$remaining = array_diff($codes, $used);
$code = $remaining[0];
Thanks for your opinion, this will be the key to transport codes all over the world :)