qliksense not getting longer strings from hive - qliksense

We have a few dashboards, wherein we source data from Apache Hive (CDH). the data is sourced through Cloudera hive ODBC connection.
What we are observing is that longer string values from source (e.g. longer than 275-280 chars) show up as null in qliksense.
wondering if anyone else has faced similar issues ? how was it handled then ?

Found it.
The Cloudera Hive Driver sets the default length of strings to 255. This is due to limitations on hive and its metadata. From the driver guide -
In the Default String Column Length field, type the default string
column length to use. Note: Hive does not provide the length for
String columns in its column metadata. This option allows you to tune
the length of String columns.
In the driver configuration, go to Advanced settings, and you can edit the default length for string columns. When i set the length to 99999, all my relevant attributes flowed through.
Hope it helps others.

Related

Talend - Output data to Snowflake table with spaces in Field Names

I have a very specific requirement to output data to a Snowflake table but the field names must have spaces in them. Snowflake appears to handle this okay, but I'm unsure how Talend will as I understand Java doesn't allow it. Can anyone help?
Also are there other tools that won't handle spaces in field names (i.e. R or Python) so we would be restricting use of the warehouse if we did that?

Default value in Talend ETL Tool

I'm importing data from MS Excel to MySQL Database using Talend ETL Tool, few columns might not have data in this case i need to consider default value.
how to give default value in talend ETL tool?
The simple way should look like this:row1.yourField == null ? "defaultValue" : row1.yourFieldYou'll have to do this for each concerned field with the appropriate default values.
Default value on database side will not work for the fields which participate to the query, so you have to transform the value on client side.
Hope this helps.
TRF
You can achieve this by writing an expression in the Tmap component of Talend to map the nulls with some default value.
row1.column=null ? “Default value” : row1.column
This needs to be done for every field wherein the mapping with default value is needed.
You may also refer the below link for a similar issue:
https://www.talendforge.org/forum/viewtopic.php?id=33623

Talend Data Itegration: Avoid nulls coming out of tExtractXMLField?

I have this simple flow in Talend DI 6 (simplified for posting on SO):
The last step crashes with a NullPointerException, because missing XML attributes are returned as null.
Is there a way to get empty string values instead of nulls?
For now I'm using a tReplace step to remove nulls as a work-around, but it's tedious and adds to the cost of maintenance by creating one more place where the list of attributes needs to be maintained.
In Talend DI 5.6.2 it is possible to add default data values to the schema. The column in the schema is called "Default". If you expect strings, you can set an empty string, which is set if the column value is null:
Talend schema view with Default column
Works also for other data types. Talend DI 6 should still be able to do this, although the field might be renamed.

SSIS Convert Between Unicode and Non-Unicode Error

I have an ssis package where I am using an OLEDB source linking to SQL Server 2005 table. All columns except a date column are NVARCHAR(255). I am using an Excel destination and using a SQL statement to create the sheet in the Excel workbook, the SQL is in the excel connection manager (effectively a create table statement that creates a sheet) and is derived from the mapping of the columns from the DB.
No matter what I have done I keep getting this unicode --> non-unicode conversion error between my source and destination. Tried conversion to string[DT_STR] between S > D, removed it, changed SQL Table VARCHAR to NVARCHAR and still get this flippin error.
Because I am creating the sheet in Excel with a SQL statement I do not see any way to actually pre-define what the data types of the columns will be in the Excel sheet. I imagine it would be a default meta data but I do not know.
So between my SQL table destination and the creation of my Excel sheet with this SSIS sql statement how can I stop this error coming up?
My error is:
Error at Data Flow Task [OLE DB Source [1]]: Column "MyColumn" cannot convert between unicode and non-unicode string data types.
And for all nvarchar columns.
Appreciate any help
Thanks
Andrew
The below Steps worked for me:
right click on source task.
click on "Show Advanced editor".
Go to "Input and Output Properties" tab.
select the output column for which you are getting the error.
Its data type will be "String[DT_STR]".
Change that data type to "Unicode String[DT_WSTR]".
save and close.
Add Data Conversion transformations to convert string columns from non-Unicode (DT_STR) to Unicode (DT_WSTR) strings.
You need to do this for all the string columns...
The missing piece here is Data Conversion object. It should be in between OLE DB Source and Destination object.
First, add a data conversion block into your data flow diagram.
Open the data conversion block and tick the column for which the error is showing. Below change its data type to unicode string(DT_WSTR) or whatever datatype is expected and save.
Go to the destination block. Go to mapping in it and map the newly created element to its corresponding address and save.
Right click your project in the solution explorer.select properties. Select configuration properties and select debugging in it. In this, set the Run64BitRunTime option to false (as excel does not handle the 64 bit application very well).
Instead of adding an earlier suggested Data Conversion you can cast the nvarchar column to a varchar column. This prevents you from having an unnecessary step and has a higher performance then the alternative.
In the select of your SQL statement replace date with CAST(date AS varchar([size])). For some reason this does not yet change the output data type. To do this do the following:
Right click your OLE DB Source step and open the advanced editor.
Go to Input and Output Properties
Select Output Columns
Select your column
Under Data Type Properties change DataType to string [DT_STR]
Change Length to the length you specified in your CAST statement
After doing this your source data will be output as a varchar and your error will disappear.
Source
I have been having the same issue and tried everything written here but it was still giving me the same error.
Turned out to be NULL value in the column which I was trying to convert.
Removing the NULL value solved my issue.
Cheers,
Ahmed
No-one seems to mention this but, converting varchar to nvarchar in the source query also solves the issue.
On the above example I kept losing the values, I think that delaying the Validation will allow the new data types to be saved as part of the meta data.
On the connection Manager for 'Excel Connection Manager' set the Delay Validation to False from the Properties.
Then on the data flow Destination task for Excel set the ValidationExternalMetaData to False, again from the properties.
This will now allow you to right click on the Excel Destination Task and go to Advanced Editor for Excel Destination --> far right tab - Input and Output Properties. In the External Columns folder section you will be able to now change the Data Types and Length values of the problematic columns and this can now be saved.
Good Luck!
I experienced this condition when I had installed Oracle version 12 client 32 bit client connected to an Oracle 12 Server running on windows.
Although both of Oracle-source and SqlServer-destination are NOT Unicode, I kept getting this message, as if the oracle columns were Unicode.
I solved the problem inserting a data conversion box, and
selecting type DT-STR (not unicode) for varchar2 fields and DT-WSTR (unicode) for numeric fields, then I've dropped the 'COPY OF' from the output field name.
Note that I kept getting the error because I had connected the source box arrow with the conversion box BEFORE setting the convertion types. So I had to switch source box and this cleaned all the errors in the destination box.
When creating table in SQL Server make your table columns NVARCHAR instead of VARCHAR.
I think people are missing this. In my case I had 100 character columns to convert between Oracle and MS Sql. All this stuff about Data Conversion and Advanced Editor is incredibly tedious if you have a 100 separate character columns to assign. Plus SSIS being SSIS, it will sometimes reset all your 100 advanced editor changes even if you set VALIDATEEXTERNALMETADATA to false, incredibly obnoxious. I wouldn't mind doing the Data Conversion if there was some value to it but 20 years ago ETL tools used to take oracle character to ms sql characters without fussing. What Bakalolo and Zafer say is the answer if you have a lot of character columns and you can live with nvarchar, just declare all your output ms sql columns (nvarchar) and your data task will automatically assign your oracle fields into ms sql fields with no manual overrides. I have also found that the new Oracle Source (2021) doesn't complain about a unicode conversion to varchar in ms sql. A colleague just told me that the ssis wizard (it may be only in vs 2019+) to assign oracle character to ms sql varchar will do the assignments automatically with no override, but I haven't tried that personally.
2022 update - I think this is just vs 2019 created packages and later. An ado.net task reading a varchar ms sql table going to oledb (and ado.net I think) ms sql varchar will throw the unicode error. If you switch the input task to oledb reading ms sql varchar table you won't have to do the advanced editor overrides for the varchar fields. If you don't want to do advanced editor overrides (who does?) try different tasks and more oledb tasks.
I just encounter same issue, I solve it in my SQL request : using convert directly
CONVERT(NVARCHAR(50),'') AS MyVarName
I need to put an empty (or fix size string) into excel file. Converting force type of MyVarName from DT-STR to DT-WSTR (unicode)
I know this is a very old post but I ran into the same issue and found that I had to manually select the conversion component output alias as the mapping in the excel destination component. Since the names of the OLE DB Source match the excel column names it was mapping it to the OLE DB and not to the Output Alias. Such as SourceID column from the OLE DB component being named Copy of SourceID after conversion. I don't see the original question saying they specifically selected the new alias name just that they mapped to DB columns. #Serge Voloshenko post comes the closest but also does not mention to make sure the mapping happens. To a new SSIS user this might be overlooked.

Cassandra CLI: RowKey with RandomPartitioner = Show Key in Cleartext?

using Cassandra CLI gives me the following output:
RowKey: 31307c32333239
=> (super_column=3f18d800-fed5-17cf-b91a-0016e6df0376,
(column=date, value=1312289229287, timestamp=1312289229647000)
I am using RandomPartitioner. Is it somehow possible to get the RowKey (from CLI) in Cleartext? Ideally in the CLI but if there is a Helper Class to convert it back into a String, this would also be ok.
I know the key is somehow hashed. If the key can not be "retrieved" (what I have to assume), Is there are helper Class exposed in Cassandra, that I can use to generate the Key based on my original String to compare them?
My Problem: I have stored Records in Cassandra, but using the Key like "user1|order1" I am not able to retrieve the records. Cassandra does not find any records. I assume that somehow my keys are wrong and I need to compare them and find out whtere the problem is...
Thanks very much !! Jens
This question is highly relevant: Cassandra cli: Convert hex values into a human-readable format
The only difference is that in Cassandra 0.8, there is now a "key_validation_class" attribute per column family, and it defaults to BytesType. When the CLI sees that it's BytesType, it represents the key in hex.
Cassandra is treating your RowKey as hex bytes: 31 30 7c 32 33 32 39 in hex is
10|2329 in ASCII, which I guess is your key.
If you want to see your plaintext keys in the CLI, then you need to give the command assume MyColumnFamily keys as ascii; which will make the CLI automatically translate between ASCII and hex for you in the row keys.
Or, when you create the column family, use create column family MyColumnFamily with key_validation_class = AsciiType; which will set the preference permanently.
The hashing occurs at a lower level and doesn't affect your problem.