I'm creating an OCR line for our remits that our scanner will read. The scanner doesn't allow the '.' in the field - it assumes the last 2 digits are the decimal place values. I'm converting the field to to text but not sure how to remove the '.' and keep the decimal place values.
The most simple solution would be to create a Formula Field and use the Replace() function. The formula for your Formula Field would look like this:
StringVar myVariable;
myVariable := Replace({table.column}, ".", "");
myVariable;
This will search {table.column} for the first occurrence of a decimal and replace it with an empty string.
However, if your intent is to barcode the value, there may be a UFL available that could also do this for you. When creating barcodes, User Function Libraries are usually preferred because they have functions specifically designed to encode your barcode values. They aren't required though and you can always choose to manually encode barcode values manually with Formula Fields.
I'm trying to change the value of nulls to something else that can be used to filter. This data comes from a QVD file. The field that contains nulls, contains nulls due to no action taken on those items ( they will eventually change to something else once an action has been taken). I found this link which was very informative but i tried multiple solutions from the document to no avail.
What i don't quite understand is that whenever i make a new field (in the script or as an expression) the formula does not propagate in the records that are null, it shows " - ". For instance, the expression isNull(ActionTaken) will return false in a field that that not null, but only " - " in fields that are null. If i export the table to Excel, the " - " is exported, i copy this cell to a text analyzer i the UTF-8 encoded is \x2D\x0A\x0A, i'm not sure if that's an artifact of the export process.
I also tried using the NullAsValue statement but no luck. Using a combination of Len & Trim = 0 will return the same result as above. This is only one table, no other tables are involved.
Thanks in advance.
I had a similar case few years ago where the field looked empty but actually it was filled with a character which just looked empty. Trimming the field also didnt worked as expected in this case, because the character code was different
What I can suggest you is to check if the character number, returned for the empty value, is actually an empty string. You can use the ord to check the character number for the empty values. Once you have the number then you can use this number to replace it with whatever you want (for example empty string)
As stated in the title, I have two tables I'm attempting to link. Both Strings appear to be a match, however Crystal Reports is not picking it up. The only thing I can think is that that length of the field is different, even though the strings are the same. could that cause a discrepancy? If so how can I correct for it? Thank you
Length of the string will prevent a match. If you are using the Trim(string) function, that only removes spaces found at the beginning or end of your string, so the two strings could still be of different lengths after using this function. You will need to use another function to capture a substring of the original string. To do this you can use the Left(string, length) function to ensure both strings are the same length.
If they still do not match then you may have non-printable characters in one or both of your strings. Carriage Return and Line Feed tend to be the most commonly found non-printable characters. A Carriage Return is represented as Chr(10), while a Line Feed is represented as Chr(13). These are Built In Constants similar to those found in VBA and Visual Basic.
You can use a find and replace to remove them with the following formula. Its not a bad idea to also include the trim and left functions in this as well to ensure you get the best match possible.
Replace(Replace(Left(Trim({YourStringField}), 10),Chr(10), ""),Chr(13), "")
There are a few additional Built In Constants you may need to check for if this doesn't work. A Tab is represented as Chr(9) for example. Its very rare for strings to contain the other Built In Constants though. In most cases Carriage Return and Line Feed are the only ones that are typically found in Plain Text. Tabs and the other constants should only be found in Rich Text and are very rare in string data.
I am using Scala to parse CSV files. Some of these files have fields which are non-textual data like images or octet-streams. I would like to use Apache Spark's textFile() method to split up the CSV into rows, and
split(",[ ]*(?=([^\"]*\"[^\"]*\")*[^\"]*$)")
to split the row into fields. Unfortunatly this does not work with files that have these mentioned binary fields. There are two problems: 1) The octet-streams can contain newlines which make textFile() split rows which should be one, and 2) The octet-streams contain commas and/or double quotes which are not escaped and mess up my schema.
The files are usually big, couple of MBs up to couple of 100MBs. I have to take the CSV's as they are, although I could preprocess them.
All I want to achieve is a working split function so I can ignore the field with the octet-stream. Nevertheless, a great bonus would be to extract the textual information in the octet-stream.
So how would I go forward to solve my problems?
Edit: A typical record obtained with cat, the newlines are from the file, not for cosmetic purposes (shortened):
7,url,user,02/24/2015 02:29:00 AM,03/22/2015 03:12:36 PM,octet-stream,27156,"MSCF^#^#^#^#�,^#^#^#^#^#^#D^#^#^#^#^#^#^#^C^A^A^#^C^#^D^#^#^#^#^#^T^#^#^#^#^#^P^#�,^#^#^X=^#^#^#^#^#^#^#^#^#^#�^#^#^#^E^#^A^#��^A^#^#^#^#^#^#^#WF6�!^#Info.txt^#=^B^#^#��^A^#^#^#WF7�^#^#List.xml^#^�^#^#��^A^#^#^#WF:�^#^#Filename.txt^#��>��
^#�CK�]�r��^Q��T�^O�^#�-�j�]��FI�Ky��Ei�Je^K""!�^Qx #�*^U^?�^_�;��ħ�^LI^#$(�^Q���b��\N����t�����+������ȷgvM�^L̽�LǴL�^L��^ER��w^Ui^M��^X�Kޓ�^QJȧ��^N~��&�x�bB��D]1�^B|^G���g^SyG�����:����^_P�^T�^_�����U�|B�gH=��%Z^NY���,^U�^VI{��^S�^U�!�^Lpw�T���+�a�z�l������b����w^K��or��pH� ��ܞ�l��z�^\i=�z�:^C�^S!_ESCW��ESC""��g^NY2��s�� u���X^?�^R^R+��b^]^Ro�r���^AR�h�^D��^X^M�^]ޫ���ܰ�^]���0^?��^]�92^GhCx�DN^?
mY<{��L^Zk�^\���M�^V^HE���-Ե�$f�f����^D�e�^R:�u����� ^E^A�Ȑ�^B�^E�sZ���Yo��8Eސ�}��&JY���^A9^P������^P����~Jʭy��`�^9«�""�U� �:�}3���6�Hߧ�v���A7^Xi^L^]�sA�^Q�7�5d�^Xo˛�tY
Bp��4�Y���7DkV_���\^_q~�w�|�a�s̆���#�g�ӳu�^�!W}�n��Rgż_2�]�p�2}��b�G9�M^Q
�����:�X����bR[ԳZV!^G����^U�tq�&�Y6b��GR���s#mn6Z=^ZH^]�b��R^G�C�0R��{r1��4�#�
=r/X2�^O�����r^M�Rȕ�goG^X-����}���P+˥Qf�#��^C�Բ�z1�I�j����6�^Np���ܯ^P�[�^Tzԏ���^F2�e��\�E�6c�%���$�:E�*�*©t�y�J�,�S�2U�S�^X}ME�]��]�i��G�su�""��!�-��!r'ܷe_et Y^K^?0���l^A��^^�m�1/q����|�_r�5$�%�([x��W^E�G^^y���#����Z2^?ڠ�^_��^AҶ�OO��^]�vq%:j�^?�jX��\�]����^S�^^n�^C��>.^CY^O-� �_�\K����:p�<7Sֺnj���-Yk�r���^Q^M�n�J^B��^Z0^?�(^C��^W³!�g�Z�~R�A^M�^O^^�%;��Ԗ�p^S�w���*m^S���jڒ|�����<�^S�;Z^^Fc�1���^O�G_o����8��CS���w��^?��n�2~��m���G;��rx4�(�]�'��^E���eƧ�x��.�w�9WO�^^�י3��0,�y��H�Y�.H�x�""'���h}灢^T�Gm;^XE�̼�J��c�^^;�^A�qZ1ׁBZ^Q�^A^FB�^QbQ�_�3|ƺ�EvZ���^S�w���^P���9^MT��ǩY[+�+�9�Ԩ�^O�^Q���Fy(+�9p�^^Mj�2��Y^?��ڞ��^Ķb�^Z�ψMр}�ڣ�^^S�^?��^U�^Wڻ����z�^#��uk��k^^�>^O�^W�ݤO�h�^G�����Kˇ�.�R|�)-��e^G�^]�/J����U�ϴ�a���i5HO�^L�ESCg�R'���.����d���+~�}��ڝ^Y5]l�3jg54M�������2t�5^Y}�q)��^O;�X\�q^Ox~Vۗ�t�^\f� >k;^G�K5��,��X�t/�ǧ^G""5��4^MiΟ�n��^B^]�|�����V��ߌ֗Q~�H���8��t��5��ܗ�
�Z�^c�6N�ESCG����^_��>��t^L^R�^:�x���^]v�{^#+KM��qԎ�.^S�%&��=^W-�=�^S�����^CI���&^]_�s�˞�y�z�Jc^W�kڠ�^\��^]j�����^O��;�oY^^�^V59;�c��^B��T�nb����^C��^N��s�x�<{�9-�F�T�^N�5�^Se-���^T�Y[���`^ZsL��v�բ<C�+�~�^ۚ��""�Yκ2^_�^VxT�>��/ݳ^U�m�^#���3^Ge�n^Vc�V�^#�NVn�,�q��^^^]gy�R�S��Ȃ$���>A�d����xg�^GB3�M�J�^QJ^]�^\�{.�D��碎�^W�8a����qޠl?,'^R�^X�Cgy�P[����mڞ��H�Z�s�SD&蠤�s�E��nu�O#O<��3wj`C-%w�W�J�^WP^T�^]r^NT�TC�Lq�Z�f�!�;�l�Y��Gb��>�ud�hx�Ԭ^N)9�^N!k�҉s�35v������.�""^]��~4������۴�Z^]u�^Ti^^�i:�)K��P᳕!�#�^?�>��EE^VE-u�^SgV^L��<��^D�O<�+�J.�c�Z#>�.l����^S�
ESC��(��E�j�π쬖���2{^U&b\��P^S�`^O^XdL�^ 6bu��FD��^#^#^#^#","field_x, data",field_y,field_z
Expected output would be an array
("7","url","user","02/24/2015 02:29:00 AM","03/22/2015 03:12:36 PM","octet-stream","27156","field_x, data",field_y",field_z")
Or, but this is probably another question, such an array (like running strings on the octet-stream field):
("7","url","user","02/24/2015 02:29:00 AM","03/22/2015 03:12:36 PM","octet-stream","27156","Info.txt List.xml Filename.txt","field_x, data",field_y",field_z")
Edit 2: Every file that has a binary field also contains a length field for it. So instead of splitting directly I can walk left to right through my record and extract the fields. This is certainly a great improvement of my current situation but problem 1) still persists. How can I split those files reliably?
I took a closer look at the files and a header looks like this:
RecordId, Field_A, Content_Type, Content_Length, Content, Field_B
(Where Content_Type can be "octet-stream", Content_Length the number of bytes in the Content field, and Content obviously the data). And good for me, the value of Field_B is predictable, let's assume for a certain file it's always "Hello World".
So instead of using Spark's default behaviour splitting on newlines, how can I achieve that Spark is only splitting on newlines following "Hello World"? (I also edited the question title since the focus of the question changed)
As answered in Spark: Reading files using different delimiter than new line, I used textinputformat.record.delimiter to split on "Hello World\n" because I am a bit lucky that the last column always contains the same value. After that I simply walk left to right through the record and when I reach the length field I skip the next n bytes. Everything works now. Thanks for pointing me in the right direction.
There are two problems: 1) The octet-streams can contain newlines
which make textFile() split rows which should be one, and 2) The
octet-streams contain commas and/or double quotes which are not
escaped and mess up my schema.
Well, actually that csv file is properly escaped:
the multiline field is enclosed in double quotes: "MSCF^# .. ^#^#" (which also handles possible separators inside the field)
double quotes inside the field are escaped with another double quote as it should be: Je^K""!
Of course a simple split will not work in this case (and should never be used on csv data), but any csv reader able to handle multiline fields should parse that data correctly.
Also keep in mind that the double quotes inside the octet-stream have to be unescaped, or that data won't be valid (another reason not to use split, but a csv reader that handles this).
I've developed a workaround since crystal reports doesn't seem to have a substring function with the following formula:
right({_v_hardware.groupname},
truncate(instr(replace({_v_hardware.groupname},".",
","), ","))
What I'm trying to do is search for the period (".") in a string and replace it with a comma. Then find the comma position in the string and print all characters following after the comma. This is assuming the string will only have 1 period in the entire string.
Now when I attempt to do this, I get some weird characters which look like wingdings. Any ideas?
thanks in advance.
I don't know the entire issue that you are attempting to accomplish, but for this question alone, the step of replacing the period with a comma seems to be unnecessary. If you know that there is only one period in the string and you only want the characters right of the period then you should be able to do something like the following (this is #first_formula):
right({_v_hardware.groupname}, len({_v_hardware.groupname}) - instr({_v_hardware.groupname},"."))
If for some reason you want to show the comma then I'd do that in a separate formula. If you need the entire screen with the comma replaced then just do:
replace({_v_hardware.groupname},".",",")
And if you need the comma plus included in the string then it might just be easier to do something like:
"," + {#first_formula}
Hope this helps.