I have to queries which look like this:
source="/log/ABCD/cABCDXYZ/xyz.log" doSomeTasks| timechart partial=f span=1h count as "#XYZ doSomeTasks" | fillnull
source="/log/ABCD/cABCDXYZ/xyz.log" doOtherTasks| timechart partial=f span=1h count as "#XYZ doOtherTasks" | fillnull
I now want to get this two searches in one graph (I do not want to sum the numbers I get per search up to one value).
I saw that there is the possibility to take appendcols but my trials to use this command were not successful.
I tried this but it did not work:
source="/log/ABCD/cABCDXYZ/xyz.log" doSomeTasks|timechart partial=f span=1h count as "#XYZ doSomeTasks" appendcols [doOtherTasks| timechart partial=f span=1h count as "#XYZ doOtherTasks" | fillnull]
Thanks to PM 77-1 the issue is solved.
This command works:
source="/log/ABCD/cABCDXYZ/xyz.log" doSomeTasks|timechart partial=f span=1h count as "#XYZ doSomeTasks" | appendcols[search source="/log/ABCD/cABCDXYZ/xyz.log" doOtherTasks| timechart partial=f span=1h count as "#XYZ doOtherTasks" | fillnull]
Note: You do not have to mention the source in the second search command if it is the same source as the first one.
General solution
Generate each data column by using a subsearch query in the following form:
|appendcols[search (myquery) |timechart count]
Additional steps
The list of one-or-more query columns needs to be preceded by a generated column which establishes the timechart rows (and gives appendcols something to append to).
|makeresults |timechart count |eval count=0
Note: It isn't strictly required to start with a generated column, but I've found this to be a clean and robust approach. Notably, it avoids problems that may occur in the special-case of "No results found", which otherwise can confuse the visualization rendering. Plus it's more uniform and, as a result, easier to work with.
Finally, specify each of the fields to be charted, with _time as the x-axis:
|fields _time, myvar1, myvar2, myvar3
Complete example
|makeresults |timechart span=5m count |eval count=0
|appendcols[search (myquery1) |timechart span=5m count as myvar1]
|appendcols[search (myquery2) |timechart span=5m count as myvar2]
|appendcols[search (myquery3) |timechart span=5m count as myvar3]
|fields _time, myvar1, myvar2, myvar3
Be careful to use the same span throughout.
Other hints
When comparing disparate data on the same chart, perhaps to evaluate their relative timing, it's common to have differences in type or scale that can render the overlaid result nearly useless. For cases like this, don't neglect the 'Log' format option for the Y-Axis.
In some cases, it may even be worthwhile to employ data hacks with eval to massage the values into a visual comparable state. For example, appending |eval myvar1=if(myvar1=0,0,1) deduplicates values when used following timechart count. Here's some relevant docs:
Mathematical functions
Comparison and Conditional functions
I'm trying to aggregate the API logs based on the different endpoints I have. There are a total of 4 endpoints:
1: /v1/vehicle_locations
2: /v1/vehicle_locations/id
3: /v1/driver_locations
4: /v1/driver_locations/id
The way I'm currently doing this is:
_sourceCategory=production | keyvalue auto | where (path matches "/v1/driver_locations" OR path matches "/v1/driver_locations/*" or path matches "/v1/vehicle_locations" or path matches "/v1/vehicle_locations/*") | count by path
The problem with this is that while I get the correct aggregate for /v1/vehicle_locations and /v1/driver_locations, I get individual results for /v1/driver_locations/id and /v1/vehicle_locations/id since the id is a wildcard. Is there a way I can aggregate these wildcards as well?
There are several ways to achieve what you ask. I think the most straightforward one and suggested is to use | parse operator so that you can treat the top-most element of your path as a field, e.g.
| keyvalue auto
| parse field=path "*/*" as topmost, rest
| where (topmost = "vehicle_locations" or topmost = "driver_locations")
| count by topmost
Note that by default | parse operator works on the raw message (e.g. the original log line), but you can make it parse a field - using the field= syntax and this is what it's used above.
You might want to tweak the parse expression or use a regex depending on the actual paths you encounter.
(Disclaimer: I am currently employed by Sumo Logic)
I would like to collect a list of atoms and pass a list to an object or abstraction that will pass through the matching atoms without modifying the order of the list, or removing duplicates.
(hello how are you)
[desiredobject how are you]
Ideally this would print (how are you). If I were to put in (how how how) I would get back the same message. But if I were to put in (jfj jfj jfj) I would get nothing.
[zl] is useful but I am looking for the inverse behaviour of [zl filter].
I came up with the following solution that works equally well to the solution #mattijs posted. My solution uses [uzi] to give indices of the symbols in a list. The indices output of [zl filter] is fed to [zl unique] in order to strip out undesired indices. This new list is fed to [zl lookup] in order to convert back to the symbols. The (fake) message is inserted in case [zl filter] filters everything and then [zl unique] would have no output.
So basically you want to compare each item in list 1 to each item in list 2 and only pass through an item in list 1 if it is also in list 2?
It's possible this can be simplified btw, just a quick solution:
Hope that helps,
I new to pandas and trying to learn how to work with it. Im having a problem when trying to use an example I saw in one of wes videos and notebooks on my data. I have a csv file that looks like this:
I loading it to a data frame and the group it by "filePath" and "vp", the code is:
res = df.groupby(['filePath','vp']).size()
and the output is:
[E:\Audio\7168965711_5601_4.wav Cust_2102513187,
Cust_4062144116, Cust_5105831247,
Cust_5753907026, Cust_6073165338,
Cust_6625625104, Cust_7023544759,
Cust_7403410322, Cust_9513082770,
Cust_9513243289, Cust_9702229339,
Cust_9702445777, Cust_9708568031,
Now Im trying to approach the index like a dict, as i saw in examples, but when im doing
I get an error:
KeyError: 'Cust_4062144116'
I do succeed to get a result when im putting the filepath, but as i understand and saw in previouse examples i should be able to use the vp keys as well, isnt is so?
Sorry if its a trivial one, i just cant understand why it is working in one example but not in the other.
Rutger you are not correct. It is possible to "partial" index a multiIndex series. I simply did it the wrong way.
The index first level is the file name (e.g. E:\Audio\7168965711_5601_4.wav above) and the second level is vp. Meaning, for each file name i have multiple vps.
Now, this is correct:
and will return:
Cust_2102513187 2
Cust_4062144116 8
but trying to index by the inner index (the Cust_ indexes) will fail.
You groupby two columns and therefore get a MultiIndex in return. This means you also have to slice using those to columns, not with a single index value.
Your .size() on the groupby object converts it into a Series. If you force it in a DataFrame you can use the .xs method to slice a single level:
res = pd.DataFrame(df.groupby(['filePath','vp']).size())
res.xs('Cust_4062144116', level=1)
That works. If you want to keep it as a series, boolean indexing can help, something like:
res[res.index.get_level_values(1) == 'Cust_4062144116']
The last option is a bit less readable, but sometimes also more flexibile, you could test for multiple values at once for example:
res[res.index.get_level_values(1).isin(['Cust_4062144116', 'Cust_6073165338'])]
I did such search,
` Comment.search "aabbb "`
and I want to get the results which contain "ab" too.;
So I did that way:
` Comment.search "aabbb ab"`
but I found the results aabbb and ab are mixed , in fact, I want to make the results which match aabbb shows before ab, in other words, have a higher priority.
I know sphinx can add weight the fields of the table. for example add 10 to comments's name, 20 to comment's content. but is it possible to add weight to the query works?
Unfortunately this is not possible with sphinx yet but you can add similar behavior on a query by adding multiple times the keyword you want to weight.
For example:
"aabbb | aabbb | ab"
The aabbb is twice more important than ab
Sphinx has no ability to weight certain search phrases, I'm afraid - so what you're trying to do is not possible.
It's also worth noting that Sphinx uses AND logic by default - if you want results that match either aabbb OR ab, you'll probably want to use the :any match mode:
Comment.search "aabbb ab", :match_mode => :any
I'm making a code generation script for UN/LOCODE system and the database has unique 3 letter/number codes in every country. So for example the database contains "EE TLL", EE being the country (Estonia) and TLL the unique code inside Estonia, "AR TLL" can also exist (the country code and the 3 letter/number code are stored separately). Codes are in capital letters.
The database is fairly big and already contains a huge number of locations, the user has also the possibility of entering the 3 letter/number him/herself (which will be checked against the database before submission automatically).
Finally neither 0 or 1 may be used (possible confusion with O and I).
What I'm searching for is the most efficient way to pick the next available code when none is provided.
What I've came up with:
I'd check with AAA till 999, but then for each code it would require a new query (slow?).
I could store all the 40000 possibilities in an array and subtract all the used codes that are already in the database... but that uses too much memory IMO (not sure what I'm talking about here actually, maybe 40000 isn't such a big number).
Generate a random code and hope it doesn't exist yet and see if it does, if it does start over again. That's just risk taking.
Is there some magic MySQL query/PHP script that can get me the next available code?
I will go with number 2, it is simple and 40000 is not a big number.
To make it more efficient, you can store a number representing each 3-letter code. The conversion should be trivial because you have a total of 34 (A-Z, 2-9) letters.
I would for option 1 (i.e. do a sequential search), adding a table that gives the last assigned code per country (i.e. such that AAA..code are all assigned already). When assigning a new code through sequential scan, that table gets updated; for user-assigned codes, it remains unmodified.
If you don't want to issue repeated queries, you can also write this scan as a stored routine.
To simplify iteration, it might be better to treat the three-letter codes as numbers (as Shawn Hsiao suggests), i.e. give a meaning to A-Z = 0..25, and 2..9 = 26..33. Then, XYZ is the number X*34^2+Y*34+Z == 23*1156+24*34+25 == 27429. This should be doable using standard MySQL functions, in particular using CONV.
I went with the 2nd option. I was also able to make a script that will try to match as close as possible the country name, for example for Tartu it will try to match T** then TA* and if possible TAR, if not it will try TAT as T is the next letter after R in Tartu.
The code is quite extensive, I'll just post the part that takes the first possible code:
$length = strlen($allowed);
$codes = array();
// store all possibilities in a huge array
$codes[] = substr($allowed, $i, 1).substr($allowed, $j, 1).substr($allowed, $k, 1);
$used = array();
$query = mysql_query("SELECT code FROM location WHERE country = '$country'");
while ($result = mysql_fetch_array($query))
$used[] = $result['code'];
$remaining = array_diff($codes, $used);
$code = $remaining[0];
Thanks for your opinion, this will be the key to transport codes all over the world :)