Issues with "QUERY(IMPORTRANGE)" - import

Here's my first question on this forum, though I've read through a lot of good answers here.
Can anyone tell me what I'm doing wrong with my attempt to do a query import from one sheet to a column in another?
Here's the formula I've tried, but all my adjustments still get me a parsing error.
=QUERY(IMPORTRANGE("https://docs.google.com/spreadsheets/d/1yGPdI0eBRNltMQ3Wr8E2cw-wNlysZd-XY3mtAnEyLLY/edit#gid=163356401","Master Treatment Log (Responses)!V2:V")"WHERE Col8="'&B2&'")")

Note that importrange is only needed for imports between spreadsheets. If you only import from one sheet into another within the same spreadsheet I would suggest using filter() or query().
Assuming the value in B2 is actually a string (and not a number), you can try
=QUERY(IMPORTRANGE("https://docs.google.com/spreadsheets/d/1yGPdI0eBRNltMQ3Wr8E2cw-wNlysZd-XY3mtAnEyLLY/edit#gid=163356401","Master Treatment Log (Responses)!V2:V"), "WHERE Col8="'&B2&'", 0)
Note the added comma before "WHERE". If you want to import a header row, change 0 to 1.
See if that helps? If not, please share a copy of your spreadsheet (sensitive data erased).

Related

Pyspark adding columns to existing dataframe

I am trying to add multiple column to to right how can i do that?
Attributes = ["RequestTypePesId","AgentId","UpdatedBy","CauseType","OriginatingSystem"] for i in Attributes: a = df2load.select(i).distinct() b = a.join(b,a.select(i) == b.select(i),"fullouter")
Output should be:
enter image description here
Check out this example:
https://stackoverflow.com/a/71966176/9658895
and since you’re new, you might find this article useful:
“Hello World” of PySpark for Python & Pandas User [Pandas Vs PySpark]
Lastly, before you ask your next question, please make sure that you have followed standard protocol of asking a question. This video might help you.

Grok filter for a time counter HH:MM

I'm quite new to ELK and Grok-filtering, and I'm struggling with parsing this particular pattern in my grok filter.
I've used the grok debugger to try and solve this, but although I like the tool, I just get confused by the custom patterns.
Eventually, I hope to parse lots of log files sent by filebeat to logstash, then send the parsed logs to elasticsearch and display with kibana or some similar visualization tool.
The lines that I need to parse follow the following pattern:
1310 2017-01-01 16:48:54 [325:51] [326:49] [359:57] Some log info text
The first four digits is a log type identifier, and will be used for grouping. I've called the field "LogLineID".
The date is formatted YYYY-MM-DD HH:MM:SS, and is parsed ok. I called the field "LogDate".
But now the problem begins. Within the square brackets, I have counters, formatted as MM:SS if you like. I cannot for the life of me find a way to sort these out, but I need to compare these times, hence I want to store them as minutes and seconds, not just numbers.
The first is a counter "TimeSpent",
the second is a counter "TimeStarted" and
the third is a counter "TimeSinceDown".
Then, last, comes the info text, which I've managed to grok with simply applying %{GREEDYDATA:LogInfo}.
I notice that the amount of minutes could be far higher than the standard 60 minutes within an hour, so I may be barking up the wrong tree here trying to parse it with date patterns such as TIMESTAMP_ISO8601, but then, I don't really know how else to do this.
So, I came this far:
%{NUMBER:LogLineID} %{TIMESTAMP_ISO8601:LogDate}
and were as mentioned able to (by cutting away the square bracket parts) to parse the log info text with
%{GREEDYDATA:LogInfo}
to create a field LogInfo.
But that's were I'm stuck. Could someone please help me figure out the rest?
Massive thanks in advance.
PS! I also found %{NUMBER:duration}, but it could as far as I could tell only parse timestamps with dot, not colon..
grok regex expression can help you solve the problem.
but first I wanna make sure that do you mean [325:51] [326:49] [359:57] are the three component that you wanna to fetch? And it will returns the result like :
TimeSpent: 325:51
TimeStarted: 326:49
TimeSinceDown: 359:57
were i get the point , you can use my ways in on of the following suggestions:
define your own custom pattern files and add the pattern in your file.
just use the expression in filter part of logstash conf file
hope it will helps you
Ah, there was a space.. Actually, I was misleading myself and everybody in my question, as it was not actually that log line that was causing problems. I just took the first one, not realizing where the problem really were, but the one causing problems had a space within the brackets as such: [ 42:31]. There are also some parts where there are two spaces, so the way I managed to solve this was to include a %{SPACE} between the \[ and the %{NUMBER}:
%{NUMBER:LogLineID} %{TIMESTAMP_ISO8601:LogDate} \[%{SPACE}%{NUMBER:TimeSpentMinutes}\:%{NUMBER:TimeSpentSeconds}\] \[%{SPACE}%{NUMBER:TimeStartedMinutes}\:%{NUMBER:TimeStartedSeconds}\] \[%{SPACE}%{NUMBER:TimeSinceDownMinutes}\:%{NUMBER:TimeSinceDownSeconds}\] %{GREEDYDATA:LogText}
I still haven't solved the merging of minutes and seconds, but this I can also handle in a later stage.
Thanks to Lin Don for showing an interest in my problem, and sorry for not replying sooner.
Hope the solution will help others (or even myself) if their stuck on the same kind of problem.
Note to myself: Read the logs more carefully before grok'ing.. :)

Extract text from a Wordnik API URL in Google Sheets

Can anyone explain how to import/extract a particular field from the following url into Google Sheets:
Wordnik URL
I'm guessing there is an IMPORTXML query that could do it, but it doesn't have the nodes that IMPORTXML usually uses to import that. Instead the code looks like this:
[{"mi":6.720745180909532,"gram1":"pretty","gram2":"much","wlmi":18.953166108085608},{"mi":6.650496643050408,"gram1":"pretty","gram2":"good","wlmi":18.469078820531266},{"mi":9.839004198061549,"gram1":"pretty","gram2":"darn","wlmi":17.298435816698845},{"mi":7.515791105774376,"gram1":"pretty","gram2":"cool","wlmi":15.515791105774376},{"mi":8.233704272151307,"gram1":"pretty","gram2":"impressive","wlmi":15.210984195651225}]
So if Cell A2 has the URL that produces this as the code, how do I get B2 to give me the text after "gram2" (in this case, "good", "darn", "cool" and "impressive").
Thanks
Tardy
I've come up with a workaround but it is kind of messy. I'd still like an answer to the question, but for reference and in case it's of use to anyone I'll post it here:
Using =IMPORTDATA(E10) (where E10 is the cell with the URL in) gave me an array (I guess based on .csv principles) that I could then manipulate using some other tools, like regexreplace, to get at the relevant bits of text.

dataFrame keying using pandas groupby method

I new to pandas and trying to learn how to work with it. Im having a problem when trying to use an example I saw in one of wes videos and notebooks on my data. I have a csv file that looks like this:
filePath,vp,score
E:\Audio\7168965711_5601_4.wav,Cust_9709495726,-2
E:\Audio\7168965711_5601_4.wav,Cust_9708568031,-80
E:\Audio\7168965711_5601_4.wav,Cust_9702445777,-2
E:\Audio\7168965711_5601_4.wav,Cust_7023544759,-35
E:\Audio\7168965711_5601_4.wav,Cust_9702229339,-77
E:\Audio\7168965711_5601_4.wav,Cust_9513243289,25
E:\Audio\7168965711_5601_4.wav,Cust_2102513187,18
E:\Audio\7168965711_5601_4.wav,Cust_6625625104,-56
E:\Audio\7168965711_5601_4.wav,Cust_6073165338,-40
E:\Audio\7168965711_5601_4.wav,Cust_5105831247,-30
E:\Audio\7168965711_5601_4.wav,Cust_9513082770,-55
E:\Audio\7168965711_5601_4.wav,Cust_5753907026,-79
E:\Audio\7168965711_5601_4.wav,Cust_7403410322,11
E:\Audio\7168965711_5601_4.wav,Cust_4062144116,-70
I loading it to a data frame and the group it by "filePath" and "vp", the code is:
res = df.groupby(['filePath','vp']).size()
res.index
and the output is:
[E:\Audio\7168965711_5601_4.wav Cust_2102513187,
Cust_4062144116, Cust_5105831247,
Cust_5753907026, Cust_6073165338,
Cust_6625625104, Cust_7023544759,
Cust_7403410322, Cust_9513082770,
Cust_9513243289, Cust_9702229339,
Cust_9702445777, Cust_9708568031,
Cust_9709495726]
Now Im trying to approach the index like a dict, as i saw in examples, but when im doing
res['Cust_4062144116']
I get an error:
KeyError: 'Cust_4062144116'
I do succeed to get a result when im putting the filepath, but as i understand and saw in previouse examples i should be able to use the vp keys as well, isnt is so?
Sorry if its a trivial one, i just cant understand why it is working in one example but not in the other.
Rutger you are not correct. It is possible to "partial" index a multiIndex series. I simply did it the wrong way.
The index first level is the file name (e.g. E:\Audio\7168965711_5601_4.wav above) and the second level is vp. Meaning, for each file name i have multiple vps.
Now, this is correct:
res['E:\Audio\7168965711_5601_4.wav]
and will return:
Cust_2102513187 2
Cust_4062144116 8
....
but trying to index by the inner index (the Cust_ indexes) will fail.
You groupby two columns and therefore get a MultiIndex in return. This means you also have to slice using those to columns, not with a single index value.
Your .size() on the groupby object converts it into a Series. If you force it in a DataFrame you can use the .xs method to slice a single level:
res = pd.DataFrame(df.groupby(['filePath','vp']).size())
res.xs('Cust_4062144116', level=1)
That works. If you want to keep it as a series, boolean indexing can help, something like:
res[res.index.get_level_values(1) == 'Cust_4062144116']
The last option is a bit less readable, but sometimes also more flexibile, you could test for multiple values at once for example:
res[res.index.get_level_values(1).isin(['Cust_4062144116', 'Cust_6073165338'])]

Crystal Formula to exclude if value does not contain a certain format

I'm trying to exclude any value from a certain field (table.value) that does not match this format AA#####A. Example if they entered APT12345T, or PT12345PT and No Value then I want to exclude it from the report. It needs to match example AP12345P. What selection formula can I use to accomplish this.
Any help is greatly appreciated
Thanks in advance.
try reading Crystal's help topics on the mid() and isnumeric() functions.
here's an example from the help file:
Examples The following example is applicable to both Basic and Crystal
syntax:
Mid("abcdef", 3, 2)
Returns "cd".
so, in your case, you want to strip your value into three pieces,
mid(table.value,1,2)
mid(table.value,3,5)
mid(table.value,8,1)
and build up a three-part boolean variable where:
the first piece is not numeric(), or between 'AA' and 'ZZ', or however
else you want to test for letters,
the second part isnumeric(), and
the third part passes the same test as the first part.
where are you getting stuck?
something like this:
not isnumeric(mid({table.field},1,2)) and
isnumeric(mid({table.field},3,5) and
not isnumeric(mid({table.field},8,1))