pySpark output file with timestamp name

pySpark output file with timestamp name - pyspark

I need to name my outputfile with the timestamp but getting an error. Not sure what I'm doing wrong
timestamp = spark.sql("select string(date_format(current_timestamp,'yyyy/MM/dd_HH:mm:ss'))").collect()[0][0]
print(timestamp)
Error: ADLException: Error getting info for file
/06/05_13:14:01
no error if I use current date instead of timestamp. But I need timestamp

some caracters are not allowed in file naming:
#L1234_ABC123_2020/06/05_13:14:01 is not vaide. Try something like #L1234_ABC123_20200605_131401 for example, or with underscore _. colons : are not allowed basically.

Related

Problems with the type of cloud DB HMS

I have a problem with the Cloud DB
Message:{"defaultName":"AGCError","name":"database-server","errorCode":{"code":"2052","message":"the input object is invalid."}}
I don't know what could be the reason ?

As per the Huawei Documentation, The error code 2052, it is described as “invalid input object”. So please check your input value or object
Below might be the causes. Please check:
Check any field longer input values which you declared as string. Because string data type field maximum value range is 200 character only. If the string contains more than 200 characters, you are advised to use the Text type. Refer -
https://developer.huawei.com/consumer/en/doc/development/AppGallery-connect-Guides/agc-clouddb-data-type-0000001080815898#EN-US_TOPIC_0000001176121166__en-us_topic_0000001127251477_table2376546172218
Check the date field format. Because the date format should be (yyyy-MM-dd HH:mm:ss sss) like below

Changing dd/mm/yyyy/ hh/mm/ss format to yyyymm in Hive

I'm using Hive at the moment. I have a column (column A) of strings which is in the following format 11/9/2009 0:00:00. I'd like to extract the yyyymm. i.e. I'd like the above string to be 200909. I've tried two different methods none of them worked.
I have tried to convert the string using two different methods
concat(year(Column A),lpad(month(Column A),2,0))
convert(datetime, Column A)
For the first row of code I'm receiving : NULL in all rows
For the second one I'm receiving :
Encountered: DATETIME Expected: ALL, CASE, CAST, DEFAULT, DISTINCT,
EXISTS, FALSE, IF, INTERVAL, NOT, NULL, REPLACE, TRUNCATE, TRUE,
IDENTIFIER CAUSED BY: Exception: Syntax error

Use unix_timestamp(string date, string pattern) to convert given date format to seconds passed from 1970-01-01. Then use from_unixtime() to convert to required format:
select from_unixtime(unix_timestamp( '11/9/2009 0:00:00','dd/MM/yyyy HH:mm:ss'), 'yyyyMM');
Result:
200909
Read also: Impala data and time functions and Hive date functions.
One more solution, works in Hive:
select concat(regexp_extract('11/9/2009 0:00:00','(\\d{1,2})/(\\d{1,2})/(\\d{4})',3),lpad(regexp_extract('11/9/2009 0:00:00','(\\d{1,2})/(\\d{1,2})/(\\d{4})',2),2,0))

Since I'm trying to turn strings into YYYYMM I have to use the below, which worked for me:
'concat(substr(Column A, instr(Column A, ' ')-4, 4),substr(Column A, instr(Column A, ' /')+1, 2))'

how to use airflow macros with nodash to suffix to table name

i would like to suffix macros date with nodash to my final table .
I am using the below macro
if sd = 2018-05-09 , {{macros.ds_add(ds, -4)}}
to get the current date - 4 date, getting out put like 2018-05-05. Expected output would be 20180505.
tried
{{{{macros.ds_add(ds, -4)}}_nodash}}
I'm getting the
jinja2.exceptions.TemplateSyntaxError: expected token ':', got '}'
Assist me to resolve this issue.

You can use airflow.macros.ds_format to format the dates as you want. For example:
airflow.macros.ds_format(airflow.macros.ds_add('2018-05-09',-4),'%Y-%m-%d','%Y%m%d')
More details: http://airflow.incubator.apache.org/code.html?highlight=macro#airflow.macros.ds_format

Get a DateTime with an specific pattern with nscala-time

I am trying to get this pattern 'dd-MM-yyyy' with a variable of type DateTime
#{DateTimeFormat.forPattern("dd-MM-YYYY").parseDateTime(user.birthday.toString)}
But I am getting this error
java.lang.IllegalArgumentException: Invalid format: "2015-12-10T00:00:00.000Z" is malformed at "15-12-10T00:00:00.000Z"
Is there a way to do this with nscala-time?
makes a difference if I am using UTC?
UPDATE
For the moment I am casting to Date and doing this
#{Dates.format(user.birthday.toDate, "dd-MM-YYYY")}
But maybe is a better way without casting
thank you

So, if I understood your question correctly, you are trying to achieve the following:
Parse date from a string using a date format
Print/Display the date in another format.
Try the below:
#{Dates.format(DateTimeFormat.forPattern("yyyy-MM-dd'T'HH:mm:ss.SSSZ").parseDateTime(user.birthday.toString), "dd-MM-YYYY")}

Odd Logstash date parsing error

I'm getting the following error from Logstash:
{:timestamp=>"2013-12-30T17:05:01.968000-0800", :message=>"Failed parsing date from field", :field=>"message", :value=>"2013-12-30 17:04:59,539.539 INFO 14282:140418951137024 [foo.lib.base.onResults:152] -- /1.12/media - \"getMediaStoreUrl\": , 10.101.AA.BB, 10.101.19.254 took 0.170675992966, returning https://foo.s3.amazonaws.com/foo/customerMedia/1009238911/23883995/image?Signature=%2BfXqEdNWtWdhwzi%&*YEGJSDDdDFF%3D&Expires=1388455499&AWSAccessKeyId=NOIMNOTTHATSTUPID>, , >>>", :exception=>java.lang.IllegalArgumentException: Invalid format: "2013-12-30 17:04:59,539.539 INFO 14282:140418951137024..." is malformed at ".539 INFO 14282:140418951137024...", :level=>:warn}
The error is obviously about the date format, which comes to me as:
2013-12-30 17:04:59,539.539 INFO 14282:140418951137024...
And my pattern is as follows:
date {
match => ["message", "yyyy-MM-dd HH:mm:ss,SSS"]
}
I read up on the Joda-Time Library and I think I've got the format correct above. It's odd to me that the error message contains the doubled SSS (milliseconds) portion: ",539.539" (our logs output that way for some reason). I deliberately didn't put the second portion ".539" in my pattern because I want it ignored.
I am also successfully using the following pattern in another filter:
(?<pylonsdate>%{DATESTAMP}\.[0-9]+)
I'm just not exactly sure where this error is coming from. Any ideas what I need to do to correct this? Do I need to mutate #timestamp? Any help is appreciated!

The error is because the other information in "message" field will make date api parsing error.
Ex: INFO 14282:140418951137024...
You can use grok api to get the date and then use date api/
grok {
match => ["message","%{DATESTAMP:logtime}\.[0-9]+"]
}
date {
match => ["logtime","YY-MM-dd HH:mm:ss,SSS"]
}
I have tried this configuration with your log. It works on me. Hope this can help you .

FWIW, it may be due to this bug in the Logstash date filter.
I have a much simpler date filter that generates the same error, and came across your question while searching for the answer.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

pySpark output file with timestamp name - pyspark

some caracters are not allowed in file naming: #L1234_ABC123_2020/06/05_13:14:01 is not vaide. Try something like #L1234_ABC123_20200605_131401 for example, or with underscore _. colons : are not allowed basically.

Related

Problems with the type of cloud DB HMS

Changing dd/mm/yyyy/ hh/mm/ss format to yyyymm in Hive

how to use airflow macros with nodash to suffix to table name

Get a DateTime with an specific pattern with nscala-time

Odd Logstash date parsing error

Categories

Resources