I need to filter an input file by date (filter out rows with date after certain point).
I feed a columnn with type "Date" to tFilterRow and try to filter out with this code:
TalendDate.compareDate(row1.contact_date, TalendDate.parseDate("yyyy-MM-dd HH:mm:ss","2016-08-01 00:00:00"))
I get this error message:
The method matches(boolean, String) in the type Operator_tFilterRow_1 is not applicable for the arguments (int, String).
I am sure I pass correct types to the function (Date and Date), so where does this error come from? How to solve it, to filter my file in another way?
Turns out the function requires boolean, not an int, so I had to add " > 0 " at the end.
Full condition is:
TalendDate.compareDate(row1.contact_date, TalendDate.parseDate("yyyy-MM-dd HH:mm:ss","2016-08-01 00:00:00")) > 0
Related
I am using json data file “order_items” and data looks like
{“order_item_id”:1,“order_item_order_id”:1,“order_item_product_id”:957,“order_item_quantity”:1,“order_item_subtotal”:299.98,“order_item_product_price”:299.98}
{“order_item_id”:2,“order_item_order_id”:2,“order_item_product_id”:1073,“order_item_quantity”:1,“order_item_subtotal”:199.99,“order_item_product_price”:199.99}
{“order_item_id”:3,“order_item_order_id”:2,“order_item_product_id”:502,“order_item_quantity”:5,“order_item_subtotal”:250.0,“order_item_product_price”:50.0}
{“order_item_id”:4,“order_item_order_id”:2,“order_item_product_id”:403,“order_item_quantity”:1,“order_item_subtotal”:129.99,“order_item_product_price”:129.99}
orders = spark.read.json("/user/data/retail_db_json/order_items")
I am getting a error while run following command .
orders.where("order_item_order_id in( 2,4,5,6,7,8,9,10) ").groupby(“order_item_order_id”).agg(sum(“order_item_subtotal”),count()).orderBy(“order_item_order_id”).show()
TypeError: unsupported operand type(s) for +: ‘int’ and 'str’
I am not sure why I am getting ...All column vales are string. Any suggestion ?
Cast the column to int type. Can’t apply aggregation methods on string types.
I'm using Pig to read a huge CSV file (+29000 lines) that looks like this
What I'm interested in is begin and end, which are dates
I'm trying to find items that were active in 1930. So first I loaded the file using this statement :
stations = LOAD '/mytp/isd-history.csv'
USING PigStorage(',')
AS
(
id:int,
wban:long,
name:chararray,
country:chararray,
state:chararray,
icao:chararray,
lat:double,
lon:double,
ele:double,
begin:chararray,
end:chararray
);
Then I used this query to FILTER by date
items_active_1930 = FILTER stations
BY ToDate(begin,'yyyy-MM-dd') >= ToDate('1930-01-01')
AND ToDate(end,'yyyy-MM-dd') <= ToDate('1930-12-31');
When I try to dump, the job fails with the following result :
Unable to open iterator for alias items_active_1930. Backend error : Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-172 Operator Key: scope-172) children: null at []]: java.lang.IllegalArgumentException: Invalid format: "begin"
I would like to know if it's possible in FILTER, to first check if both begin and date are valid dates that match the specified date format, so that no errors occur in ToDate()
Specify the format for 1930-01-01 and 1930-12-31
items_active_1930 = FILTER stations
BY (datetime)ToDate(begin,'yyyy-MM-dd') >= (datetime)ToDate('1930-01-01','yyyy-MM-dd')
AND (datetime)ToDate(end,'yyyy-MM-dd') <= (datetime)ToDate('1930-12-31','yyyy-MM-dd');
I have a spark DataFrame in which I have a where condition to add number of dates in the existing date column based on some condition.
My code is something like below
F.date_add(df.transDate,
F.when(F.col('txn_dt') == '2016-01-11', 9999).otherwise(10)
)
since date_add() function accepts second argument as int, but my code returns as Column, it throws error.
How to collect value from case when condition?
pyspark.sql.functions.when() returns a Column, which is why your code is producing the TypeError: 'Column' object is not callable
You can get the desired result by moving the when to the outside, like this:
F.when(
F.col('txn_dt') == '2016-01-11',
F.date_add(df.transDate, 9999)
).otherwise(F.date_add(df.transDate, 10))
I am joining the two tables using the query below:
update campaign_items
set last_modified = evt.event_time
from (
select max(event_time) event_time
,result
from events
where request = '/campaignitem/add'
group by result
) evt
where evt.result = campaign_items.id
where the result column is of character varying type and the id is of integer type
But the data in the result column contains digits(i.e. 12345)
How would I run this query with converting the type of the result(character) into id
(integer)
Well you don't need to because postgresql will do implicit type conversion in this situation. For example, you can try
select ' 12 ' = 12
You will see that it returns true even though there is extra whitespace in the string version. Nevertheless, if you need explicit conversion.
where evt.result::int = campaign_items.id
According to your comment you have values like convRepeatDelay, these obviously cannot be converted to int. What you should then do is convert your int to char!!
where evt.result = campaign_items.id::char
There are several solutions. You can use the cast operator :: to cast a value from a given type into another type:
WHERE evt.result::int = campaign_items.id
You can also use the CAST function, which is more portable:
WHERE CAST(evt.result AS int) = campaign_items.id
Note that to improve performances, you can add an index on the casting operation (note the mandatory double parentheses), but then you have to use GROUP BY result::int instead of GROUP BY result to take advantage of the index:
CREATE INDEX i_events_result ON events_items ((result::int));
By the way the best option is maybe to change the result column type to int if you know that it will only contain integers ;-)
I have a query with the below WHERE clauses
WHERE
I.new_outstandingamount = 70
AND ISNUMERIC(SUBSTRING(RA.new_stampernumber,7, 4)) = 1
AND (DATEDIFF(M,T.new_commencementdate, SUBSTRING(RA.new_stampernumber,7, 10)) >= 1)
AND RA.new_applicationstatusname = 'Complete'
AND I.new_feereceived > 0
AND RA.new_stampernumber IS NOT NULL
AND T.new_commencementdate IS NOT NULL
RA.new_stampernumber is a string value which contains three concatenated pieces of information of uniform length. The middle piece of info in this string is a date in the format yyyy-MM-dd.
In order to filter out any rows where the date in this string in not formatted as expected I do a check to see if the first 4 characters are numeric using the ISNUMERIC function.
When I run the query I get an error message saying
The conversion of a nvarchar data type to a datetime data type resulted in an out-of-range value.
The line that is causing this error to occur is
AND (DATEDIFF(M,T.new_commencementdate, SUBSTRING(RA.new_stampernumber,7, 10)) >= 1)
When I comment out this line I don't get an error.
What is strange is that if I replace
AND ISNUMERIC(SUBSTRING(RA.new_stampernumber,7, 4)) = 1
with
AND SUBSTRING(RA.new_stampernumber,7, 4) IN ('2003','2004','2005','2006','2007','2008','2009','2010', '2011', '2012','2013','2014','2015'))
the query runs successfully.
Whats even more strange is that if I replace the above working line with this
AND SUBSTRING(RA.new_stampernumber,11, 1) = '-'
I get the error message again. But if I replace the equals sign with a LIKE comparison it works:
AND SUBSTRING(RA.new_stampernumber,11, 1) LIKE '-'
When I remove the DATEDIFF function and compare the results of each of these queries they all return the same resultset so it is not being caused by different data being returned by the different clauses.
Can anyone explain to me what could be causing the out-of-range error to be thrown for some clauses and not for others if the data being returned is in fact the same for each clause?
Thanks,
Neil
Different execution plans.
There is no guarantee that the WHERE clauses are processed in particular order. Presumably when it works it happens to filter out erroring rows before attempting the cast to date.
Also ISNUMERIC itself isn't very reliable for what you want. I'd change the DATEDIFF expression to something like the below
DATEDIFF(M, T.new_commencementdate,
CASE
WHEN RA.new_stampernumber LIKE
'______[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]%'
THEN SUBSTRING(RA.new_stampernumber, 7, 10)
END) >= 1 )