Problem reading values column headers to csv in robot framework - encoding

When exporting a column header from web menu to CSV in Robot framework, the language is polish the text identifies unknown charcters. How to encode it?

I don't think the problem you are seeing above is to do with encoding. The result from
Get column headers from CSV file isn't a list which is what your error is pointing too.
List Should Contain Sub List is expecting 2 lists as args

Related

COPY HEADER available only in CSV mode

When I try to use the COPY command with HEADER option and format text to export a table in postgreSQL, I get the following error:
COPY HEADER available only in CSV mode
I understand that we can use format CSV with a different delimiter than , to generate a different file format, but I am wondering why the use of HEADER with text format is prohibited?
The default text format of COPY is proprietary to PostgreSQL and not very useful for data exchange with other software. For example, a NULL value is represented as \N.
Since nobody saw a need for having header data in this format, it didn't get implemented.
Use the csv format for data exchange.

Is it possible to have multiple row separators in Talend?

I'm facing a challenge for one of my first projects as a junior dev. I'm using Talend to open some metadata files that have a series of "key=value" pairs within the files. I eventually need to transform the metadata and write it as a new row in an Excel file.
The metadata file looks something like this:
DOCTYPE=some_data
DOCNBR=some_data
DOCREV=some_data
DOCBASE=some_data
DOCNAME=some_data
RELEASE=some_data
DWG=TYPE=2;NAME=some_data;SIZE=some_data
DESCRIPTION=some_data
Line 7 of the example above (DWG=TYPE=2;NAME=some_data;SIZE=some_data) is what I'm stuck on when I'm attempting to create a new delimited metadata file, using "=" as the field separator and "\n" as the row separator.
Is there a way to have multiple row separators to include ";" so that I could have the other items on line 7 on their own rows?
Yes you can.
Write a regex which include \n and ; both and give it to the field delimiter field

I am trying to read the time and message value field data as shown below and write it to an excel

Sample data and required excel image:
Also, Read Time section as shown in file, and populate excel file with the data in a column with the header name Time as shown above. Likewise, read the message value as shown in the .asc file and populate in excel file by converting the numbers from hexadecimal to decimal in columns named Data1, Data2, Data3,…
If your '.asc' file consists of tab delimited ASCII text then Excel will allow you to import it into an Excel worksheet.
The following explainer comes from Microsoft's Office support site:
There are two ways to import data from a text file by using Microsoft
Excel: You can open the text file in Excel, or you can import the text
file as an external data range. To export data from Excel to a text
file, use the Save As command.
There are two commonly used text file formats:
Delimited text files (.txt), in which the TAB character (ASCII
character code 009) typically separates each field of text.
Comma separated values text files (.csv), in which the comma character
(,) typically separates each field of text.
You can change the separator character that is used in both delimited
and .csv text files. This may be necessary to make sure that the
import or export operation works the way that you want it to.
If neither of those methods work for you and your '.asc' was generated by MATLAB then you may be able to use MATLAB to export directly to an Excel worksheet. MATLAB has a function xlswrite that you can use to write directly to a Microsoft Excel spreadsheet.
Another option, if you're comfortable writing some VBA code in Excel, is to use the textscan function to parse your '.asc' file.

Spark: Split CSV with newlines in octet-stream field

I am using Scala to parse CSV files. Some of these files have fields which are non-textual data like images or octet-streams. I would like to use Apache Spark's textFile() method to split up the CSV into rows, and
split(",[ ]*(?=([^\"]*\"[^\"]*\")*[^\"]*$)")
to split the row into fields. Unfortunatly this does not work with files that have these mentioned binary fields. There are two problems: 1) The octet-streams can contain newlines which make textFile() split rows which should be one, and 2) The octet-streams contain commas and/or double quotes which are not escaped and mess up my schema.
The files are usually big, couple of MBs up to couple of 100MBs. I have to take the CSV's as they are, although I could preprocess them.
All I want to achieve is a working split function so I can ignore the field with the octet-stream. Nevertheless, a great bonus would be to extract the textual information in the octet-stream.
So how would I go forward to solve my problems?
Edit: A typical record obtained with cat, the newlines are from the file, not for cosmetic purposes (shortened):
7,url,user,02/24/2015 02:29:00 AM,03/22/2015 03:12:36 PM,octet-stream,27156,"MSCF^#^#^#^#�,^#^#^#^#^#^#D^#^#^#^#^#^#^#^C^A^A^#^C^#^D^#^#^#^#^#^T^#^#^#^#^#^P^#�,^#^#^X=^#^#^#^#^#^#^#^#^#^#�^#^#^#^E^#^A^#��^A^#^#^#^#^#^#^#WF6�!^#Info.txt^#=^B^#^#��^A^#^#^#WF7�^#^#List.xml^#^�^#^#��^A^#^#^#WF:�^#^#Filename.txt^#��>��
^#�CK�]�r��^Q��T�^O�^#�-�j�]��FI�Ky��Ei�Je^K""!�^Qx #�*^U^?�^_�;��ħ�^LI^#$(�^Q���b��\N����t�����+������ȷgvM�^L̽�LǴL�^L��^ER��w^Ui^M��^X�Kޓ�^QJȧ��^N~��&�x�bB��D]1�^B|^G���g^SyG�����:����^_P�^T�^_�����U�|B�gH=��%Z^NY���,^U�^VI{��^S�^U�!�^Lpw�T���+�a�z�l������b����w^K��or��pH� ��ܞ�l��z�^\i=�z�:^C�^S!_ESCW��ESC""��g^NY2��s�� u���X^?�^R^R+��b^]^Ro�r���^AR�h�^D��^X^M�^]ޫ���ܰ�^]���0^?��^]�92^GhCx�DN^?
mY<{��L^Zk�^\���M�^V^HE���-Ե�$f�f����^D�e�^R:�u����� ^E^A�Ȑ�^B�^E�sZ���Yo��8Eސ�}��&JY���^A9^P������^P����~Jʭy��`�^9«�""�U� �:�}3���6�Hߧ�v���A7^Xi^L^]�sA�^Q�7�5d�^Xo˛�tY
Bp��4�Y���7DkV_���\^_q~�w�|�a�s̆���#�g�ӳu�^�!W}�n��Rgż_2�]�p�2}��b�G9�M^Q
�����:�X����bR[ԳZV!^G����^U�tq�&�Y6b��GR���s#mn6Z=^ZH^]�b��R^G�C�0R��{r1��4�#�
=r/X2�^O�����r^M�Rȕ�goG^X-����}���P+˥Qf�#��^C�Բ�z1�I�j����6�^Np���ܯ^P�[�^Tzԏ���^F2�e��\�E�߻6c�%���$�:E�*�*©t�y�J�,�S�2U�S�^X}ME�]��]�i��G�su�""��!�-��!r'ܷe_et Y^K^?0���l^A��^^�m�1/q����|�_r�5$�%�([x��W^E�G^^y���#����Z2^?ڠ�^_��^AҶ�OO��^]�vq%:j�^?�jX��\�]����^S�^^n�^C��>.^CY^O-� �_�\K����:p�<7Sֺnj���-Yk�r���^Q^M�n�J^B��^Z0^?�(^C��^W³!�g�Z�~R�A^M�^O^^�%;��Ԗ�p^S�w���*m^S���jڒ|�����<�^S�;Z^^Fc�1���^O�G_o����8��CS���w��^?��n�2~��m���G;��rx4�(�]�'��^E���eƧ�x��.�w�9WO�^^�י3��0,�y��H�Y�.H�x�""'���h}灢^T�Gm;^XE�̼�J��c�^^񾠫;�^A�qZ1ׁBZ^Q�^A^FB�^QbQ�_�3|ƺ�EvZ���^S�w���^P���9^MT��ǩY[+�+�9�Ԩ�^O�^Q���Fy(+�9p�^^Mj�2��Y^?��ڞ��^Ķb�^Z�ψMр}�ڣ�^^S�^?��^U�^Wڻ����z�^#��uk��k^^�>^O�^W�ݤO�h�^G�����Kˇ�.�R|�)-��e^G�^]�/J����U�ϴ�a���i5HO�^L�ESCg�R'���.����d���+~�}��ڝ^Y5]l�3jg54M�������2t�5^Y}�q)��^O;�X\�q^Ox~Vۗ�t�^\f� >k;^G�K5��,��X�t/�ǧ^G""5��4^MiΟ�n��^B^]�|�����V��ߌ֗Q~�H���8��t��5��ܗ�
�Z�^c�6N�ESCG����^_��>��t^L^R�^:�x���^]v�{^#+KM��qԎ�.^S�%&��=^W-�=�^S�����^CI���&^]_�s�˞�y�z�Jc^W�kڠ�^\��^]j�����^O��;�oY^^�^V59;�c��^B��T�nb����^C��^N��s�x�<{�9-�F�T�^N�5�^Se-���^T�Y[���`^ZsL��v�բ<C�+�~�^ۚ��""�Yκ2^_�^VxT�>��/ݳ^U�m�^#���3^Ge�n^Vc�V�^#�NVn�,�q��^^^]gy�R�S��Ȃ$���>A�d����xg�^GB3�M�J�^QJ^]�^\�{.�D��碎�^W�8a����qޠl?,'^R�^X�Cgy�P[����mڞ��H�Z�s�SD&蠤�s�E��nu�O#O<��3wj`C-%w�W�J�^WP^T�^]r^NT�TC�Lq�Z�f�!�;�l�Y��Gb��>�ud�hx�Ԭ^N)9�^N!k�҉s�35v������.�""^]��~4������۴�Z^]u�^Ti^^�i:�)K��P᳕!�#�^?�>��EE^VE-u�^SgV^L��<��^D�O<�+�J.�c�Z#>�.l����^S�
ESC��(��E�j�π쬖���2{^U&b\��P^S�`^O^XdL�^ 6bu��FD��^#^#^#^#","field_x, data",field_y,field_z
Expected output would be an array
("7","url","user","02/24/2015 02:29:00 AM","03/22/2015 03:12:36 PM","octet-stream","27156","field_x, data",field_y",field_z")
Or, but this is probably another question, such an array (like running strings on the octet-stream field):
("7","url","user","02/24/2015 02:29:00 AM","03/22/2015 03:12:36 PM","octet-stream","27156","Info.txt List.xml Filename.txt","field_x, data",field_y",field_z")
Edit 2: Every file that has a binary field also contains a length field for it. So instead of splitting directly I can walk left to right through my record and extract the fields. This is certainly a great improvement of my current situation but problem 1) still persists. How can I split those files reliably?
I took a closer look at the files and a header looks like this:
RecordId, Field_A, Content_Type, Content_Length, Content, Field_B
(Where Content_Type can be "octet-stream", Content_Length the number of bytes in the Content field, and Content obviously the data). And good for me, the value of Field_B is predictable, let's assume for a certain file it's always "Hello World".
So instead of using Spark's default behaviour splitting on newlines, how can I achieve that Spark is only splitting on newlines following "Hello World"? (I also edited the question title since the focus of the question changed)
As answered in Spark: Reading files using different delimiter than new line, I used textinputformat.record.delimiter to split on "Hello World\n" because I am a bit lucky that the last column always contains the same value. After that I simply walk left to right through the record and when I reach the length field I skip the next n bytes. Everything works now. Thanks for pointing me in the right direction.
There are two problems: 1) The octet-streams can contain newlines
which make textFile() split rows which should be one, and 2) The
octet-streams contain commas and/or double quotes which are not
escaped and mess up my schema.
Well, actually that csv file is properly escaped:
the multiline field is enclosed in double quotes: "MSCF^# .. ^#^#" (which also handles possible separators inside the field)
double quotes inside the field are escaped with another double quote as it should be: Je^K""!
Of course a simple split will not work in this case (and should never be used on csv data), but any csv reader able to handle multiline fields should parse that data correctly.
Also keep in mind that the double quotes inside the octet-stream have to be unescaped, or that data won't be valid (another reason not to use split, but a csv reader that handles this).

SSIS unicode flat file issue "Character not in code page"

I have a text file created in java using UTF-16 encoding.
When I try to import I am getting a validation failure/error on the flat file source before it even begins to move data. The error is a character is not in the specified code page.
[Flat File Source [908]] Error: Data conversion failed. The data conversion for column "ACTIVE_INGREDIENT" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.
In my Flat File connection, I don't have unicode selected (as that struggles to find my CR LF line terminators), but have have set code page to 65001-UTF8.
In may flat file data source, I have changed all Internal and External Columns to be DT_WSTR in the advanced editor (I can't cahnge code page it seems, stuck on 0 with this option).
I am not doing a data conversion as I am mapping to NVARCHAR tables (the SSIS job isnt even getting this far to try to transfer data).
I cant even redirect the rows to a text file to identify them as I have the same issue trying to output to a flat file destination.
Any help appreciated.