Looking for a way to calculate time lapse in openrefine - date

This is the given expression of GREL language on OpenRefine.
diff(date d1, date d2, optional string timeUnit)
For dates, returns the difference in given time units.
So the question is how to get the access to the values of both columns, that is not clear on presented on the documentation.
Thanks

The formula for accessing another column is:
cells.YourColumnName.value
If your column name contains spaces or non-ascii characters :
cells['Your Column Name'].value
So, assuming your two columns are named "date1" and "date2", and you want the difference in days, the GREL formula is as follows :
diff(cells.date1.value, cells.date2.value, "days")
or
diff(cells['date1'].value, cells['date2'].value, "days")

I found a way myself here is the example of the working command, the GREL documentation is not that explicit treating this procedure.
Here is the commend I used, I multiplied the result by -1 to make it positive.
diff(cells["DATA_COMPRA"].value, cells["DATA_VENCIMENTO"].value, "days") * -1
Hope that helps, I my have to come back here sometimes to get this script again and again.

Related

How to transform date in Stata?

I've looked for help on the internet for the following, but I could not find a satisfying answer: for an assignment, I need to plot the time series of a certain variable (the term spread in percentages), with years on the x-axis.
However, we use daily data. Does anybody know a convenient way in which this can be done? The 'date' variable that I've got is formulated in the following way: 20111017 represents the 17th of October 2011.
I tried to extract the first 4 numbers of the variable 'date', by using the substr(date, 1, 4) command, but the message 'type mismatch' popped up. Also, I'm not quite sure if it gives the right information if I only use the years to plot daily data (over the years). It now gives the following graph, which doesn't look that nice.
Answering the question in your title.
The date() function expects a string. If your variable with value 20111017 is in a numeric format you can convert it like this: tostring datenum , gen(datestr).
Then when using the date() function you must provide a mask that tells Stata what format the date string is in. Below is a reproducible example you can run to see how this works.
* Example generated by -dataex-. For more info, type help dataex
clear
input float datenum
20111016
end
* Convert numberic varaible to string
tostring datenum , gen(datestr)
* Convert string to date
gen date = date(datestr, "YMD")
* Display date as date
format date %td
If this does not help you, try to provide a reproducible example.
This adds some details to the helpful answer by #TheIceBear.
As he indicates, one way to get a Stata daily date from your run-together date variable is convert it to a string first. But tostring is just one way to do that and not essential. (I have nothing against tostring, as its original author, but it is better suited to other tasks.)
Here I use daily() not date(): the results are identical, but it's a good idea to use daily(): date() is all too often misunderstood as a generic date function, whereas all it does is produce daily dates (or missings).
To get a numeric year variable, just divide by 10000 and round down. You could convert to a string, extract the first 4 characters, and then convert to numeric, but that's more operations.
clear
set obs 1
gen long date = 20111017
format date %8.0f
gen ddate = daily(strofreal(date, "%8.0f"), "YMD")
format %td ddate
gen year = floor(date/10000)
list
+-----------------------------+
| date ddate year |
|-----------------------------|
1. | 20111017 17oct2011 2011 |
+-----------------------------+

Azure Data Factory - Dynamic Skip Lines Expression

I am attempting to import a CSV into ADF however the file header is not the first line of the file. It is dynamic therefore I need to match it based on the first column (e.g "TestID,") which is a string.
Example Data (Header is on Line 4)
Date:,01/05/2022
Time:,00:30:25
Test Temperature:,25C
TestID,StartTime,EndTime,Result
TID12345-01,00:45:30,00:47:12,Pass
TID12345-02,00:46:50,00:49:12,Fail
TID12345-03,00:48:20,00:52:17,Pass
TID12345-04,00:49:12,00:49:45,Pass
TID12345-05,00:50:22,00:51:55,Fail
I found this article which addresses this issue however I am struggling to rewrite the expression from using an integer to using a string.
https://kromerbigdata.com/2019/09/28/adf-dynamic-skip-lines-find-data-with-variable-headers
First Expression
iif(!isNull(toInteger(left(toString(byPosition(1)),1))),toInteger(rownum),toInteger(0))
As the article states, this expression looks at the first character of each row and if it is an integer it will return the row number (rownum)
How do I perform this action for a string (e.g "TestID,")
Many Thanks
Jonny
I think you want to consider first line that starts with string as your header and preceding lines that starts with numbers should not be considered as header. You can use isNan function to check if the first character is Not a number(i.e. string) as seen in the below modified expression:
iif(isNan(left(toString(byPosition(1)),1))
,toInteger(rownum)
,toInteger(0)
)
Following is a breakdown of the above expression:
left(toString(byPosition(1)),1): gets first character fron left side of the first column.
isNan: checks if the character is "not a number".
iif: not a number, true then return rownum, false then return 0.
Or you can also use functions like isInteger() to check if the first character is an integer or not and perform actions accordingly.
Later on as explained in the cited article you need to find minimum rownum to skip.
Hope it helps.

How to produce a formatted date string in Q/KDB?

How can one produce an ISO date string "yyyy-MM-dd" from a Q date type? I looked at concatenating the various parts but am not even able to get the day/month, e.g. d:2015.12.01;d.month prints 2015.12, i.e. more than just the month.
If you plan to do it on a large scale (i.e. a large vector/list of dates or a column in a table) and you're sure your dates are always well-formed, then you could use a dot-amend:
q)update .[;(::;4 7);:;"-"]string date from ([] date:2#.z.D)
date
------------
"2016-01-04"
"2016-01-04"
This way you wouldn't have to apply to "each" entry of the vector/list, it works on the vector/list itself.
q)"-" sv "." vs string[2015.12.01]
"2015-12-01"
vs vector from string, splits by "." above;
sv string to vector, join by "-" above.
Remember a string is just a char array, so you can grab each part as you require with indexing. But the above is useful as the resulting vector of vs gives a 3-length vector that you manipulate any way you like
I believe the shortest (and cleanest) option for ISO8601 UTC timestamp available since at least kdb v3.4 would be to use .h.iso8601 builtin
i.e.
q).h.iso8601 .z.p
"2020-11-09T15:42:19.292301000"
Or, if you just need milliseconds similar to what JS toISOString() does, use:
q).isotime:{(23#.h.iso8601 x),"Z"}
q).isotime[.z.p]
"2020-11-09T16:02:02.601Z"
q).isotime[2015.12.01]
"2015-12-01T00:00:00.000Z"
Note .z.p is important, as .h.iso8601 .z.P would silently give you local time without timezone (+0100 etc) so it would still be interpreted as UTC by compliant ISO8601 parser :(
Check-out this GitHub library for datetime formatting. It supports the excel way of formatting date and time. It might not be the right fit for formatting a large number of objects.
q).dtf.format["yyyy-mm-dd"; 2018.06.08T01:02:03.456]
"2018-06-08"
time formatting :
q).dtf.format["yyyy-mmmm-dd hh:uu AM/PM"; 2018.01.08T01:02:03.456]
"2018-January-08 01:02 AM"
I am using something like this:
q)ymd:{[x;s](4#d),s,(2#-5#d),s,-2#d:string[x]}
q)ymd[.z.D;"-"]
"2016-01-25"
q)ymd[.z.D;"/"]
"2016/01/25"
q)ymd[.z.D;""]
"20160125"
Or for tables:
q)t:([]a:5#1;5#.z.d)
q)update s:ymd[;"-"] each d from t
a d s
-------------------------
1 2016.01.26 "2016-01-26"
1 2016.01.26 "2016-01-26"
1 2016.01.26 "2016-01-26"
1 2016.01.26 "2016-01-26"
1 2016.01.26 "2016-01-26"
Please change the separator like - or / in the update statement.
update s:{ssr[string x;".";y]}'[d;"-"] from ([]a:5#1;5?.z.d)
a d s
-------------------------
1 2010.12.31 "2010-12-31"
1 2012.08.24 "2012-08-24"
1 2004.12.05 "2004-12-05"
1 2000.10.02 "2000-10-02"
1 2006.09.10 "2006-09-10"

Checking the format of a string in Matlab

So I'm reading multiple text files in Matlab that have, in their first columns, a column of "times". These times are either in the format 'MM:SS.milliseconds' (sorry if that's not the proper way to express it) where for example the string '29:59.9' would be (29*60)+(59)+(.9) = 1799.9 seconds, or in the format of straight seconds.milliseconds, where '29.9' would mean 29.9 seconds. The format is the same for a single file, but varies across different files. Since I would like the times to be in the second format, I would like to check if the format of the strings match the first format. If it doesn't match, then convert it, otherwise, continue. The code below is my code to convert, so my question is how do I approach checking the format of the string? In otherwords, I need some condition for an if statement to check if the format is wrong.
%% Modify the textdata to convert time to seconds
timearray = textdata(2:end, 1);
if (timearray(1, 1) %{has format 'MM.SS.millisecond}%)
datev = datevec(timearray);
newtime = (datev(:, 5)*60) + (datev(:, 6));
elseif(timearray(1, 1) %{has format 'SS.millisecond}%)
newtime = timearray;
You can use regular expressions to help you out. Regular expressions are methods of specifying how to search for particular patterns in strings. As such, you want to find if a string follows the formats of either:
xx:xx.x
or:
xx.x
The regular expression syntax for each of these is defined as the following:
^[0-9]+:[0-9]+\.[0-9]+
^[0-9]+\.[0-9]+
Let's step through how each of these work.
For the first one, the ^[0-9]+ means that the string should start with any number (^[0-9]) and the + means that there should be at least one number. As such, 1, 2, ... 10, ... 20, ... etc. is valid syntax for this beginning. After the number should be separated by a :, followed by another sequence of numbers of at least one or more. After, there is a . that separates them, then this is followed by another sequence of numbers. Notice how I used \. to specify the . character. Using . by itself means that the character is a wildcard. This is obviously not what you want, so if you want to specify the actual . character, you need to prepend a \ to the ..
For the second one, it's almost the same as the first one. However, there is no : delimiter, and we only have the . to work with.
To invoke regular expressions, use the regexp command in MATLAB. It is done using:
ind = regexp(str, expression);
str represents the string you want to check, and expression is a regular expression that we talked about above. You need to make sure you encapsulate your expression using single quotes. The regular expression is taken in as a string. ind would this return the starting index of your string of where the match was found. As such, when we search for a particular format, ind should either be 1 indicating that we found this search at the beginning of the string, or it returns empty ([]) if it didn't find a match. Here's a reproducible example for you:
B = {'29:59.9', '29.9', '45:56.8', '24.5'};
for k = 1 : numel(B)
if (regexp(B{k}, '^[0-9]+:[0-9]+\.[0-9]+') == 1)
disp('I''m the first case!');
elseif (regexp(B{k}, '^[0-9]+\.[0-9]+') == 1)
disp('I''m the second case!');
end
end
As such, the code should print out I'm the first case! if it follows the format of the first case, and it should print I'm the second case! if it follows the format of the second case. As such, by running this code, we get:
I'm the first case!
I'm the second case!
I'm the first case!
I'm the second case!
Without knowing how your strings are formatted, I can't do the rest of it for you, but this should be a good start for you.

Crystal report issue with int to string conversion

I want to convert int to string and then concatenate dot with it. Here is the formula
totext({#SrNo})+ "."
It works perfectly but not what i want. I want to show at as
1.
but it shows me in this way
1.00.
it means that when i try to convert int to string it convert it into number with precision of two decimal zeros. Can someone tell me how can i show it in proper format. For information i want to tell you that SrNo is running total.
ToText(x, y, z, w) Function can use
x=The number to convert to text
y=The number of decimal places to include in result (optional). The value will be rounded to that decimal place.
z=The character to use as the thousands separator. If you don’t specify one, it will use your application default. (Optional.)
w=The character to use as the decimal separator. If you don’t specify one, it will use your application default. (Optional.)
Examples
ToText(12345.678) = > “12345.678″
ToText(12345.678,2) = > “12345.67″
ToText(12345.678,0) = > “12345″
You can try this :
totext({fieldname},0)
Ohhh I got the answer it was so simple.
totext takes 4 parameters
First parameter is value which is going to be converted
Second parameter is number of decimal previsions.
Third parameter is decimal separator. like (1,432.123) here dot(.) is third parameter.
Forth parameter is thousand separator. like (1,432) here comma(,) is forth parameter.
Example{
totext("1,432.1234",2) results 1,432.12
totext("1,432.1234",2,' " ') results 1,432"1234
totext("1,432.1234",2,' " ', ' : ') results 1:432,1234
}
Although i think this example may be not so good but i just want to give you an idea. This is for int conversion for date it has 2 parameters.
value to be converted and format of date.