TYPO3 Flux field text - csv data - typo3

I'm using fluidcontent and inside a flux text field I'm saving csv data.
How can I split the string on every newline? It looks like the \n is not being saved.
The following code does not work.
<v:iterator.explode content="{settings.csv}" glue="\n" as="lines">
The split between columns works well with
<v:iterator.explode content="{settings.csv}" glue=";" as="elements">
I'm thankful for every help.
Cheers

You could also use the CommaSeparatedValueProcessor in the FLUIDTEMPLATE object to generate an array with all the lines and fields.

To split on newline the glue argument needs a value of "constant:LF" as described in the documentation
<v:iterator.explode content="{settings.csv}" glue="constant:LF" as="line">
{line}
</v:iterator.explode>

Related

Text data source supports only a single column, and you have 8 columns

This is the error I got when I tried to save a data frame to text:
org.apache.spark.sql.AnalysisException: Text data source supports only a single column, and you have 8 columns
This is the code:
df.write.text("/tmp/wt")
What I m doing wrong?
In Spark 1.6, the easiest solution is to use databricks' library and write:
df.write.format("com.databricks.spark.csv").save("pathToFile.csv")
If you do not want to use it, you can simply convert the rows of your dataframe into csv lines like this:
df.rdd
.map(_.toSeq.map(_+"").reduce(_+";"+_))
.saveAsTextFile("pathToFile.csv")
Note that if your fields can contain separators or quotes, you will have to add enclosing quotes and escape existing quotes, things that the library would do for you transparently.

Export data records from filemaker pro 16 with all strings quoted with ""

Is it possible to export data records from filemaker and have all strings be in a "" quote? With string I mean type = text. Numbers shouldn't change though.
There’s no option I'm aware of that provides this behavior when exporting.
The fieldType() function may be helpful.
https://fmhelp.filemaker.com/help/16/fmp/en/index.html#page/FMP_Help/fieldtype.html
Or, just create a single calc field "row" with all of the fields you want included - with appending quotes around the text fields - and do a .tab export of "rows" instead. That's generally how I handle it.
If you export as CSV, this is the default behaviour as long as you do not export the "Apply current layout’s data formatting..." setting enabled.
I have the same issue.
To strip the quotes from the .csv, I open the .csv in Excel and save the file in Excel as a .csv.
This strips the double quotes.

Is it possible to have multiple row separators in Talend?

I'm facing a challenge for one of my first projects as a junior dev. I'm using Talend to open some metadata files that have a series of "key=value" pairs within the files. I eventually need to transform the metadata and write it as a new row in an Excel file.
The metadata file looks something like this:
DOCTYPE=some_data
DOCNBR=some_data
DOCREV=some_data
DOCBASE=some_data
DOCNAME=some_data
RELEASE=some_data
DWG=TYPE=2;NAME=some_data;SIZE=some_data
DESCRIPTION=some_data
Line 7 of the example above (DWG=TYPE=2;NAME=some_data;SIZE=some_data) is what I'm stuck on when I'm attempting to create a new delimited metadata file, using "=" as the field separator and "\n" as the row separator.
Is there a way to have multiple row separators to include ";" so that I could have the other items on line 7 on their own rows?
Yes you can.
Write a regex which include \n and ; both and give it to the field delimiter field

Spark: Split CSV with newlines in octet-stream field

I am using Scala to parse CSV files. Some of these files have fields which are non-textual data like images or octet-streams. I would like to use Apache Spark's textFile() method to split up the CSV into rows, and
split(",[ ]*(?=([^\"]*\"[^\"]*\")*[^\"]*$)")
to split the row into fields. Unfortunatly this does not work with files that have these mentioned binary fields. There are two problems: 1) The octet-streams can contain newlines which make textFile() split rows which should be one, and 2) The octet-streams contain commas and/or double quotes which are not escaped and mess up my schema.
The files are usually big, couple of MBs up to couple of 100MBs. I have to take the CSV's as they are, although I could preprocess them.
All I want to achieve is a working split function so I can ignore the field with the octet-stream. Nevertheless, a great bonus would be to extract the textual information in the octet-stream.
So how would I go forward to solve my problems?
Edit: A typical record obtained with cat, the newlines are from the file, not for cosmetic purposes (shortened):
7,url,user,02/24/2015 02:29:00 AM,03/22/2015 03:12:36 PM,octet-stream,27156,"MSCF^#^#^#^#�,^#^#^#^#^#^#D^#^#^#^#^#^#^#^C^A^A^#^C^#^D^#^#^#^#^#^T^#^#^#^#^#^P^#�,^#^#^X=^#^#^#^#^#^#^#^#^#^#�^#^#^#^E^#^A^#��^A^#^#^#^#^#^#^#WF6�!^#Info.txt^#=^B^#^#��^A^#^#^#WF7�^#^#List.xml^#^�^#^#��^A^#^#^#WF:�^#^#Filename.txt^#��>��
^#�CK�]�r��^Q��T�^O�^#�-�j�]��FI�Ky��Ei�Je^K""!�^Qx #�*^U^?�^_�;��ħ�^LI^#$(�^Q���b��\N����t�����+������ȷgvM�^L̽�LǴL�^L��^ER��w^Ui^M��^X�Kޓ�^QJȧ��^N~��&�x�bB��D]1�^B|^G���g^SyG�����:����^_P�^T�^_�����U�|B�gH=��%Z^NY���,^U�^VI{��^S�^U�!�^Lpw�T���+�a�z�l������b����w^K��or��pH� ��ܞ�l��z�^\i=�z�:^C�^S!_ESCW��ESC""��g^NY2��s�� u���X^?�^R^R+��b^]^Ro�r���^AR�h�^D��^X^M�^]ޫ���ܰ�^]���0^?��^]�92^GhCx�DN^?
mY<{��L^Zk�^\���M�^V^HE���-Ե�$f�f����^D�e�^R:�u����� ^E^A�Ȑ�^B�^E�sZ���Yo��8Eސ�}��&JY���^A9^P������^P����~Jʭy��`�^9«�""�U� �:�}3���6�Hߧ�v���A7^Xi^L^]�sA�^Q�7�5d�^Xo˛�tY
Bp��4�Y���7DkV_���\^_q~�w�|�a�s̆���#�g�ӳu�^�!W}�n��Rgż_2�]�p�2}��b�G9�M^Q
�����:�X����bR[ԳZV!^G����^U�tq�&�Y6b��GR���s#mn6Z=^ZH^]�b��R^G�C�0R��{r1��4�#�
=r/X2�^O�����r^M�Rȕ�goG^X-����}���P+˥Qf�#��^C�Բ�z1�I�j����6�^Np���ܯ^P�[�^Tzԏ���^F2�e��\�E�߻6c�%���$�:E�*�*©t�y�J�,�S�2U�S�^X}ME�]��]�i��G�su�""��!�-��!r'ܷe_et Y^K^?0���l^A��^^�m�1/q����|�_r�5$�%�([x��W^E�G^^y���#����Z2^?ڠ�^_��^AҶ�OO��^]�vq%:j�^?�jX��\�]����^S�^^n�^C��>.^CY^O-� �_�\K����:p�<7Sֺnj���-Yk�r���^Q^M�n�J^B��^Z0^?�(^C��^W³!�g�Z�~R�A^M�^O^^�%;��Ԗ�p^S�w���*m^S���jڒ|�����<�^S�;Z^^Fc�1���^O�G_o����8��CS���w��^?��n�2~��m���G;��rx4�(�]�'��^E���eƧ�x��.�w�9WO�^^�י3��0,�y��H�Y�.H�x�""'���h}灢^T�Gm;^XE�̼�J��c�^^񾠫;�^A�qZ1ׁBZ^Q�^A^FB�^QbQ�_�3|ƺ�EvZ���^S�w���^P���9^MT��ǩY[+�+�9�Ԩ�^O�^Q���Fy(+�9p�^^Mj�2��Y^?��ڞ��^Ķb�^Z�ψMр}�ڣ�^^S�^?��^U�^Wڻ����z�^#��uk��k^^�>^O�^W�ݤO�h�^G�����Kˇ�.�R|�)-��e^G�^]�/J����U�ϴ�a���i5HO�^L�ESCg�R'���.����d���+~�}��ڝ^Y5]l�3jg54M�������2t�5^Y}�q)��^O;�X\�q^Ox~Vۗ�t�^\f� >k;^G�K5��,��X�t/�ǧ^G""5��4^MiΟ�n��^B^]�|�����V��ߌ֗Q~�H���8��t��5��ܗ�
�Z�^c�6N�ESCG����^_��>��t^L^R�^:�x���^]v�{^#+KM��qԎ�.^S�%&��=^W-�=�^S�����^CI���&^]_�s�˞�y�z�Jc^W�kڠ�^\��^]j�����^O��;�oY^^�^V59;�c��^B��T�nb����^C��^N��s�x�<{�9-�F�T�^N�5�^Se-���^T�Y[���`^ZsL��v�բ<C�+�~�^ۚ��""�Yκ2^_�^VxT�>��/ݳ^U�m�^#���3^Ge�n^Vc�V�^#�NVn�,�q��^^^]gy�R�S��Ȃ$���>A�d����xg�^GB3�M�J�^QJ^]�^\�{.�D��碎�^W�8a����qޠl?,'^R�^X�Cgy�P[����mڞ��H�Z�s�SD&蠤�s�E��nu�O#O<��3wj`C-%w�W�J�^WP^T�^]r^NT�TC�Lq�Z�f�!�;�l�Y��Gb��>�ud�hx�Ԭ^N)9�^N!k�҉s�35v������.�""^]��~4������۴�Z^]u�^Ti^^�i:�)K��P᳕!�#�^?�>��EE^VE-u�^SgV^L��<��^D�O<�+�J.�c�Z#>�.l����^S�
ESC��(��E�j�π쬖���2{^U&b\��P^S�`^O^XdL�^ 6bu��FD��^#^#^#^#","field_x, data",field_y,field_z
Expected output would be an array
("7","url","user","02/24/2015 02:29:00 AM","03/22/2015 03:12:36 PM","octet-stream","27156","field_x, data",field_y",field_z")
Or, but this is probably another question, such an array (like running strings on the octet-stream field):
("7","url","user","02/24/2015 02:29:00 AM","03/22/2015 03:12:36 PM","octet-stream","27156","Info.txt List.xml Filename.txt","field_x, data",field_y",field_z")
Edit 2: Every file that has a binary field also contains a length field for it. So instead of splitting directly I can walk left to right through my record and extract the fields. This is certainly a great improvement of my current situation but problem 1) still persists. How can I split those files reliably?
I took a closer look at the files and a header looks like this:
RecordId, Field_A, Content_Type, Content_Length, Content, Field_B
(Where Content_Type can be "octet-stream", Content_Length the number of bytes in the Content field, and Content obviously the data). And good for me, the value of Field_B is predictable, let's assume for a certain file it's always "Hello World".
So instead of using Spark's default behaviour splitting on newlines, how can I achieve that Spark is only splitting on newlines following "Hello World"? (I also edited the question title since the focus of the question changed)
As answered in Spark: Reading files using different delimiter than new line, I used textinputformat.record.delimiter to split on "Hello World\n" because I am a bit lucky that the last column always contains the same value. After that I simply walk left to right through the record and when I reach the length field I skip the next n bytes. Everything works now. Thanks for pointing me in the right direction.
There are two problems: 1) The octet-streams can contain newlines
which make textFile() split rows which should be one, and 2) The
octet-streams contain commas and/or double quotes which are not
escaped and mess up my schema.
Well, actually that csv file is properly escaped:
the multiline field is enclosed in double quotes: "MSCF^# .. ^#^#" (which also handles possible separators inside the field)
double quotes inside the field are escaped with another double quote as it should be: Je^K""!
Of course a simple split will not work in this case (and should never be used on csv data), but any csv reader able to handle multiline fields should parse that data correctly.
Also keep in mind that the double quotes inside the octet-stream have to be unescaped, or that data won't be valid (another reason not to use split, but a csv reader that handles this).

formatting text in a csv export

I'm having trouble with a .csv export which is being uploaded to a website. There are must be some hidden or illegal characters in a description field I have in the database. I'm having a tough time getting the text to format correctly and not break a php script.
If I use the GetAs(css) function in a calculation, the text works fine. Obviously this won't work as a working file but it at least validates there's something in the formatting of the description field that's breaking the export. I did use the excel clean(text) calculation and that fixes the issue as well. Just need to find a way in Filemaker to do this.
Any suggestions?? Maybe a custom function that strips out bad characters?
You can filter invalid characters out of text using the filter function. If you only want a minimal set of ASCII characters, use it like
filter(mytable::myfield; "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789.!?")