boto 3 - loosing date format - date

I'm trying to read a parquet file using boto3. The original file has dates with the following format:
2016-12-07 23:00:00.000
And they are stored as timestamps.
My code in Sage Maker is:
boto_s3 = boto3.client('s3')
r = boto_s3.select_object_content(
Bucket='bucket_name',
Key='path/file.gz.parquet',
ExpressionType='SQL',
Expression=f"select fecha_instalacion,pais from s3object s ",
InputSerialization = {'Parquet': {}},
OutputSerialization = {'CSV': {}},
)
rl0 = list(r['Payload'])[0]
from io import StringIO
string_csv = rl0['Records']['Payload'].decode('ISO-8859-1')
csv = StringIO(string_csv)
pd.read_csv(csv, names=['fecha_instalacion', 'pais'])
But instead of the date I get:
fecha_instalacion pais
45352962065516692798029824 ESPAÃA
I loooked for dates with only one day in between and the nyuumber of digits that are the same are the first 6. As an example:
45337153205849123712294912--> 2016-12-09 23:00:00.000
45337116312360976293191680--> 2016-12-07 23:00:00.000
I would need to get the correct formated date, and avoid the especial characters.
Thanks.

The problem is the format. That Parquet file is using Int96 numbers to represent timestamp.
Here is a function to convert the int96Timestamp to python DateTime
import datetime
def dateFromInt96Timestamp(int96Timestamp):
julianCalendarDays = int96Timestamp >> 8*8
time = int((int96Timestamp & 0xFFFFFFFFFFFFFFFF) / 1_000)
linuxEpoch = 2_440_588
return datetime.datetime(1970, 1, 1) + datetime.timedelta(days=julianCalendarDays - linuxEpoch, microseconds=time)

Related

read files from current date minus 90 days in spark

I am reading all one by one files which is stored in a directory structure as YY=18/MM=12/DD=10 and need to read only current date minus 60 days. Files will be created created for every day and possibility is also that some day files wont create. so, for that day folder will not create.
I am reading all files which is stored in a directory structure as YY/MM/DD.
I am writing below code but its not working.
var datecalculate = {
var days = 0
do{
val start = DateTime.now
var start1 = DateTime.now.minusDays(days)
days = days + 1
var start2 = start1.toString
datecalculatenow(start2) }
while (days <= 90)
}
def datecalculatenow(start2:String):String={
var YY:String = start2.toString.substring(0,4)
var MM:String = start2.toString.substring(5,7)
var DD:String = start2.toString.substring(8,10)
var datepath = "YYYY=" + YY +"/MM=" +MM +"/DD=" +DD
var datepath1 = datepath.toString
org.apache.spark.sql.SparkSession.read.option("delimiter","|").
option("header","true").option("inferSchema","true").
csv("/Table/Files" + datepath1 )
}
I expect to read every files from current date minus 60 days, which has directory structure as YY/MM/DD
With spark sql you can use the following in a select statement to subtract 90 days;
date_sub(CAST(current_timestamp() as DATE), 90)
As it's possible to generate a dataframe from a list of path , why are you not first generating list of path. Here is the simple and concise way to read data from multiple paths:
val paths = (0 until 90).map(days => {
val tmpDate = DateTime.now.minusDays(days).toString()
val year = tmpDate.substring(0,4)
val month = tmpDate.substring(5,7)
val opdate = tmpDate.toString.substring(8,10)
(s"basepath/YY=$year/MM=$month/DD=$opdate")
}).toList
val df = spark.read.
option("delimiter", "|").
option("header", "true").
option("inferSchema","true")
.csv(paths:_*)
While generating paths, you can filter out the paths that do not exist. I've used some of your codes with some modifications. I've not tested in my local setup but the idea is same. Hopefully it'll help you.

How to change my the date format in matlab?

I have
num4 = xlsread('dat.xlsx', 1, 'A:B');
dnum4=datetime(num4(:,1),1,1) + caldays(num4(:,2));
dnum4=
16-Jul-2008
18-Jul-2008
06-Aug-2008
08-Aug-2008
13-Aug-2008
15-Aug-2008
20-Aug-2008
22-Aug-2008
30-Oct-2008
I want to change the outputs from dd-mmm-yyyy to yyyy-mm-dd.
How to do that?
If you look at the documentation you'll see that datetime objects have a Format property that controls the display format:
>> t = datetime('now','TimeZone','local','Format','d-MMM-y HH:mm:ss Z')
t =
datetime
25-May-2017 10:26:46 -0400
>> t.Format = 'yyyy-MM-dd'
t =
datetime
2017-05-25
One way is to convert to datenum, then back to datestr:
newFmt = datestr(datenum(dnum4, 'dd-mmm-yyyy'), 'yyyy-mm-dd')

How to convert 9 digit number into a particular date format?

I want to convert 9 digit number into a particular date format.
Example -
Number - 000007547 ===> Date - 2016/10/05
Number - 000007550 ===> Date - 2016/10/08
Number - 000007559 ===> Date - 2016/10/17
I already have this numbers and it's dates but I'm unable to find the logic behind that conversion. Is anyone aware of this 9 digit date-time format?
It seems that the number is a count of days.
Try this python script:
from datetime import date
d0 = date(1996, 2, 6)
d1 = date(2016, 10, 17)
delta = d1 - d0
print delta.days
To convert this format you can use this script:
from datetime import date,timedelta
d0 = date(1996, 2, 6)
d1 = date(2016, 10, 17)
delta = d0 + timedelta(days=7559)
print delta
Output: 2016-10-17
It is the timestamp.
A timestamp is encoded information, which indicates the date and time at which a particular event has occurred.
import datetime
value=1258094605
date_obj=datetime.date.fromtimestamp(value)
print(date_obj)

How to calculate hour to day in NetCDf file using scala

is there a method to convert the unit from hours to days in this dataset ?
double time(time) ;
time:units = "hours since 1800-01-01 00:00:0.0" ;
time:long_name = "Time" ;
time:delta_t = "0000-01-00 00:00:00" ;
time:avg_period = "0000-01-00 00:00:00" ;
time:standard_name = "time" ;
time:axis = "T" ;
time:actual_range = 1569072., 1895592. ;
If you can use Python, it's an easy process:
The first step is to convert the numeric dates to a datetime object using netCDF4 num2date.
The second step is to compute the number of days between each datetime object and the time stamp (or original date) in the time variable (i.e. 1800-01-01).
import netCDF4
import datetime
ncfile = netCDF4.Dataset('./precip.mon.mean.nc', 'r')
time = ncfile.variables['time']
# Convert from numeric times to datetime objects
dates = netCDF4.num2date(time[:], time.units)
# Compute number of days since the original date
orig_date = datetime.datetime(1800,1,1)
days_since = [(t - orig_date).days for t in dates]

Google Bookmark Export date format?

I been working on parsing out bookmarks from an export file generated by google bookmarks. This file contains the following date attributes:
ADD_DATE="1231721701079000"
ADD_DATE="1227217588219000"
These are not standard unix style timestamps. Can someone point me in the right direction here? I'll be parsing them using c# if you are feeling like really helping me out.
Chrome uses a modified form of the Windows Time format (“Windows epoch”) for its timestamps, both in the Bookmarks file and the history files. The Windows Time format is the number of 100ns-es since January 1, 1601. The Chrome format is the number of microseconds since the same date, and thus 1/10 as large.
To convert a Chrome timestamp to and from the Unix epoch, you must convert to seconds and compensate for the difference between the two base date-times (11644473600).
Here’s the conversion formulas for Unix, JavaScript (Unix in milliseconds), Windows, and Chrome timestamps (you can rearrange the +/× and -/÷, but you’ll lose a little precision):
u : Unix timestamp eg: 1378615325
j : JavaScript timestamp eg: 1378615325177
c : Chrome timestamp eg: 13902597987770000
w : Windows timestamp eg: 139025979877700000
u = (j / 1000)
u = (c - 116444736000000) / 10000000
u = (w - 1164447360000000) / 100000000
j = (u * 1000)
j = (c - 116444736000000) / 10000
j = (w - 1164447360000000) / 100000
c = (u * 10000000) + 116444736000000
c = (j * 10000) + 116444736000000
c = (w / 10)
w = (u * 100000000) + 1164447360000000
w = (j * 100000) + 1164447360000000
w = (c * 10)
Note that these are pretty big numbers, so you’ll need to use 64-bit numbers or else handle them as strings like with PHP’s BC-math module.
In Javascript the code will look like this
function chromeDtToDate(st_dt) {
var microseconds = parseInt(st_dt, 10);
var millis = microseconds / 1000;
var past = new Date(1601, 0, 1).getTime();
return new Date(past + millis);
}
1231721701079000 looks suspiciously like time since Jan 1st, 1970 in microseconds.
perl -wle 'print scalar gmtime(1231721701079000/1_000_000)'
Mon Jan 12 00:55:01 2009
I'd make some bookmarks at known times and try it out to confirm.
Eureka! I remembered having read the ADD_DATE’s meaning at some website, but until today, I could not find it again.
http://MSDN.Microsoft.com/en-us/library/aa753582(v=vs.85).aspx
offers this explanation as a “Note” just before the heading “Exports and Imports”:
“Throughout this file[-]format definition, {date} is a decimal integer that represents the number of seconds elapsed since midnight January 1, 1970.”
Before that, examples of {date} were shown:
<DT><H3 FOLDED ADD_DATE="{date}">{title}</H3>
…
and
<DT>{title}
…
Someday, I will write a VBA macro to convert these to recognizable dates, but not today!
If someone else writes a conversion script first, please share it. Thanks.
As of the newest Chrome Version 73.0.3683.86 (Official Build) (64-bit):
When I export bookmark, I got an html file like "bookmarks_3_22_19.html".
And each item has an 'add_date' field which contains date string. like this:
Stack Overflow
This timestamp is actually seconds (not microseconds) since Jan 1st, 1970. So we can parse it with Javascript like following code:
function ChromeTimeToDate(timestamp) {
var seconds = parseInt(timestamp, 10);
var dt = new Date();
dt.setTime(seconds * 1000);
return dt;
}
For the upper example link, we can call ChromeTimeToDate('1553220774') to get Date.
ChromeTimeToDate('1553220774')
12:09:03.263 Fri Mar 22 2019 10:12:54 GMT+0800 (Australian Western Standard Time)
Initially looking at it, it almost looks like if you chopped off the last 6 digits you'd get a reasonable Unix Date using the online converter
1231721701 = Mon, 12 Jan 2009 00:55:01 GMT
1227217588 = Thu, 20 Nov 2008 21:46:28 GMT
The extra 6 digits could be formatting related or some kind of extended attributes.
There is some sample code for the conversion of Unix Timestamps if that is in fact what it is.
look here for code samples: http://www.epochconverter.com/#code
// my groovy (java) code finally came out as:
def convertDate(def epoch)
{
long dv = epoch / 1000; // divide by 1,000 to avoid milliseconds
String dt = new java.text.SimpleDateFormat("dd/MMM/yyyy HH:mm:ss").format(new java.util.Date (dv));
// to get epoch date:
//long epoch = new java.text.SimpleDateFormat("MM/dd/yyyy HH:mm:ss").parse("01/01/1970 01:00:00").getTime() * 1000;
return dt;
} // end of def
So firefox bookmark date exported as json gave me:
json.lastModified :1366313580447014
convert from epoch date:18/Apr/2013 21:33:00
from :
println "convert from epoch date:"+convertDate(json.lastModified)
function ConvertToDateTime(srcChromeBookmarkDate) {
//Hp --> The base date which google chrome considers while adding bookmarks
var baseDate = new Date(1601, 0, 1);
//Hp --> Total number of seconds in a day.
var totalSecondsPerDay = 86400;
//Hp --> Read total number of days and seconds from source chrome bookmark date.
var quotient = Math.floor(srcChromeBookmarkDate / 1000000);
var totalNoOfDays = Math.floor(quotient / totalSecondsPerDay);
var totalNoOfSeconds = quotient % totalSecondsPerDay;
//Hp --> Add total number of days to base google chrome date.
var targetDate = new Date(baseDate.setDate(baseDate.getDate() + totalNoOfDays));
//Hp --> Add total number of seconds to target date.
return new Date(targetDate.setSeconds(targetDate.getSeconds() + totalNoOfSeconds));
}
var myDate = ConvertToDateTime(13236951113528894);
var alert(myDate);
//Thu Jun 18 2020 10:51:53 GMT+0100 (Irish Standard Time)
#Python program
import time
d = 1630352263 #for example put here, if (ADD_DATE="1630352263")
print(time.ctime(d)) #Mon Aug 30 22:37:43 2021 - you will see