Talend: Split a data file into two flows/streams (header_info, data_rows) - talend

Please, I don't need no solution, just a few hints on how-tos. Anyhow, here is the problem I am tackeling with:
I have a file (bloomberg answer file) which is built as follows:
we have a header part (I am only interested in the
START-OF-FIELDS[...]END-OF-FIELDS; varying amount of fields!)
then there is the data part: START-OF-DATA[...]END-OF-DATA. Where each row: unique_id|some_val|some_val|EXCH_CODE|ID_BB_GLOBAL|NAME|SECURITY_TYP|TICKER\n
Shortened example file:
START-OF-FILE
RUNDATE=20150921
PROGRAMFLAG=oneshot
DATEFORMAT=yyyymmdd_sep
FIRMNAME=dl111111
FILETYPE=pc
REPLYFILENAME=r150921020044_20426_01_00
SECMASTER=yes
DERIVED=yes
CREDITRISK=yes
USERNUMBER=1111111
WS=0
SN=111111
CLOSINGVALUES=yes
SECID=BB_GLOBAL
PROGRAMNAME=getdata
START-OF-FIELDS
EXCH_CODE
ID_BB_GLOBAL
NAME
SECURITY_TYP
TICKER
END-OF-FIELDS
TIMESTARTED=Mon Sep 21 01:01:18 BST 2015
START-OF-DATA
BBG004C5BLW2|0|5|LABUAN INTL FIN|BBG004C5BLW2|1MDB GLOBAL INVESTMENTS|EURO-DOLLAR|OGIMK|
BBG000MGZ064|0|5|HK|BBG000MGZ064|361 DEGREES INTERNATIONAL|Common Stock|1361|
BBG000QVRHX9|0|5|AV|BBG000QVRHX9|3BG EMCORE CONVRT GLB-A|Open-End Fund|EMBDGCA|
BBG000BP52R2|0|5|US|BBG000BP52R2|3M CO|Common Stock|MMM|
BBG0068TPTD9|0|5|TRACE|BBG0068TPTD9|51JOB INC|US DOMESTIC|JOBS|
BBG0069D1BR3|0|5|NOT LISTED|BBG0069D1BR3|51JOB INC|EURO-DOLLAR|JOBS|
BBG000BJD1D4|0|5|US|BBG000BJD1D4|51JOB INC-ADR|ADR|JOBS|
BBG008CTTWK1|0|5|FRANKFURT|BBG008CTTWK1|AABAR INVESTMENTS PJSC|EURO MTN|AABAR|
BBG008D4J9S9|0|5|FRANKFURT|BBG008D4J9S9|AABAR INVESTMENTS PJSC|EURO MTN|AABAR|
BBG008B2BXH2|0|5|SIX|BBG008B2BXH2|AARGAUISCHE KANTONALBANK|DOMESTIC|KBAARG|
BBG0016WJL30|0|5|LX|BBG0016WJL30|AB-AMERICAN INCOME PT-ATEURH|Open-End Fund|ABAATEH|
BBG006F3D598|0|5|BH|BBG006F3D598|ABBEY CAPITAL DAILY FUTURE-B|Fund of Funds|ABBDFUB|
END-OF-DATA
TIMEFINISHED=Mon Sep 21 01:03:22 BST 2015
END-OF-FILE
And now my questions
How can I split this file into 2 flows (field_names; data_rows)?
My problem was:
The regex component only works on row level...
The tFileInputMSDelimited does bring me nowhere...
I don't want to start parsing the file by hand (tJava)... or do I have to?
Thanks for any hints in advance,
Marco

No need to java code, check out this very simple job:
Generally, the header have fixed row count, so we need only to play with rows numbers:
tFileInputDelimited1: header 11 and limit 5
tFileInputDelimited2: header 20 and footer 3
and it works fine, if you have a dynamic rows positions, try to find these positions, save them in variables then use this job based on variables. You can also refeer to my answer here.

I'd use tJavaFlex and some Java code. If you look at the actual code its not that hard to understand how it works even if you don't really know java.
Begin:
boolean header = false;
boolean data = false;
String headerData = "";
String line;
Main:
line = input_row.line;
if(line.equalsIgnoreCase("START-OF-FIELDS") ) { header = true; }
if(line.equalsIgnoreCase("END-OF-FIELDS") ) { header = false; }
if(line.equalsIgnoreCase("START-OF-DATA") ) { data = true; }
if(line.equalsIgnoreCase("END-OF-DATA") ) { data = false; }
if(header && !line.equalsIgnoreCase("START-OF-FIELDS")) {
headerData += line + "|";
}
if (data) {
if(line.equalsIgnoreCase("START-OF-DATA")) {
output_row.line = headerData.substring(0,headerData.length()-1); //remove the trailing delimiter.
} else {
output_row.line = line;
}
} else {
continue; //lets go to the next line.
}
End:
//if you want to handle the header separately:
globalMap.put("headerData",headerData);
Hope this helps.

Related

How to Extract Strings from Text

Lets Say this is My Text. Now I want to Extract All 4 Variable Separately from the text
"ScanCode=? scanMsg= ? ItemName=? ID= ?\n"
Please Help i need this is Dart, Flutter
The solution I developed first splits the data according to the space character. It then uses the GetValue() method to sequentially read the data from each piece. The next step will be to use the data by transforming it accordingly.
This example prints the following output to the console:
[ScanCode=1234, ScanMessage=Test, Itemname=First, ID=1]
[1234, Test, First, 1]
The solution I developed is available below:
void main()
{
String text = "ScanCode=1234 ScanMessage=Test ItemName=First ID=1";
List<String> original = text.split(' ');
List<String> result = [];
GetValue(original, result);
print(original);
print(result);
}
void GetValue(List<String> original, List<String> result)
{
for(int i = 0 ; i < original.length ; ++i)
{
result.insert(i, original[i].split('=')[1]);
}
}

Insert multiple lines of text into a Rich Text content control with OpenXML

I'm having difficulty getting a content control to follow multi-line formatting. It seems to interpret everything I'm giving it literally. I am new to OpenXML and I feel like I must be missing something simple.
I am converting my multi-line string using this function.
private static void parseTextForOpenXML(Run run, string text)
{
string[] newLineArray = { Environment.NewLine, "<br/>", "<br />", "\r\n" };
string[] textArray = text.Split(newLineArray, StringSplitOptions.None);
bool first = true;
foreach (string line in textArray)
{
if (!first)
{
run.Append(new Break());
}
first = false;
Text txt = new Text { Text = line };
run.Append(txt);
}
}
I insert it into the control with this
public static WordprocessingDocument InsertText(this WordprocessingDocument doc, string contentControlTag, string text)
{
SdtElement element = doc.MainDocumentPart.Document.Body.Descendants<SdtElement>().FirstOrDefault(sdt => sdt.SdtProperties.GetFirstChild<Tag>().Val == contentControlTag);
if (element == null)
throw new ArgumentException("ContentControlTag " + contentControlTag + " doesn't exist.");
element.Descendants<Text>().First().Text = text;
element.Descendants<Text>().Skip(1).ToList().ForEach(t => t.Remove());
return doc;
}
I call it with something like...
doc.InsertText("Primary", primaryRun.InnerText);
Although I've tried InnerXML and OuterXML as well. The results look something like
Example AttnExample CompanyExample AddressNew York, NY 12345 or
<w:r xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:t>Example Attn</w:t><w:br /><w:t>Example Company</w:t><w:br /><w:t>Example Address</w:t><w:br /><w:t>New York, NY 12345</w:t></w:r>
The method works fine for simple text insertion. It's just when I need it to interpret the XML that it doesn't work for me.
I feel like I must be super close to getting what I need, but my fiddling is getting me nowhere. Any thoughts? Thank you.
I believe the way I was trying to do it was doomed to fail. Setting the Text attribute of an element is always going to be interpreted as text to be displayed it seems. I ended up having to take a slightly different tack. I created a new insert method.
public static WordprocessingDocument InsertText(this WordprocessingDocument doc, string contentControlTag, Paragraph paragraph)
{
SdtElement element = doc.MainDocumentPart.Document.Body.Descendants<SdtElement>().FirstOrDefault(sdt => sdt.SdtProperties.GetFirstChild<Tag>().Val == contentControlTag);
if (element == null)
throw new ArgumentException("ContentControlTag " + contentControlTag + " doesn't exist.");
OpenXmlElement cc = element.Descendants<Text>().First().Parent;
cc.RemoveAllChildren();
cc.Append(paragraph);
return doc;
}
It starts the same, and gets the Content Control by searching for it's Tag. But then I get it's parent, remove the Content Control elements that were there and just replace them with a paragraph element.
It's not exactly what I had envisioned, but it seems to work for my needs.

Google form that turns on and off each day automatically

I love Google Forms I can play with them for hours. I have spent days trying to solve this one, searching for an answer. It is very much over my head. I have seen similar questions but none that seemed to have helped me get to an answer. We have a café where I work and I created a pre-order form on Google Forms. That was the easy part. The Café can only accept pre-orders up to 10:30am. I want the form to open at 7am and close at 10:30am everyday to stop people pre ordering when the café isn't able to deal with their order. I used the very helpful tutorial from http://labnol.org/?p=20707 to start me off I have added and messed it up and managed to get back to the below which is currently how it looks. It doesn't work and I can't get my head around it. At one point I managed to turn it off but I couldn't turn it back on!! I'm finding it very frustrating and any help in solving this would be amazing. To me it seems very simple as it just needs to turn on and off at a certain time every day. I don't know! Please help me someone?
FORM_OPEN_DATE = "7:00";
FORM_CLOSE_DATE = "10:30";
RESPONSE_COUNT = "";
/* Initialize the form, setup time based triggers */
function Initialize() {
deleteTriggers_();
if ((FORM_OPEN_DATE !== "7:00") &&
((new Date()).getTime("7:00") < parseDate_(FORM_OPEN_DATE).getTime ("7:00"))) {
closeForm("10:30");
ScriptApp.newTrigger("openForm")
.timeBased("7:00")
.at(parseDate_(FORM_OPEN_DATE))
.create(); }
if (FORM_CLOSE_DATE !== "10:30") {
ScriptApp.newTrigger("closeForm")
.timeBased("10:30")
.at(parseDate_(FORM_CLOSE_DATE))
.create(); }
if (RESPONSE_COUNT !== "") {
ScriptApp.newTrigger("checkLimit")
.forForm(FormApp.getActiveForm())
.onFormSubmit()
.create(); } }
/* Delete all existing Script Triggers */
function deleteTriggers_() {
var triggers = ScriptApp.getProjectTriggers();
for (var i in triggers) {
ScriptApp.deleteTrigger(triggers[i]);
}
}
/* Allow Google Form to Accept Responses */
function openForm() {
var form = FormApp.getActiveForm();
form.setAcceptingResponses(true);
informUser_("Your Google Form is now accepting responses");
}
/* Close the Google Form, Stop Accepting Reponses */
function closeForm() {
var form = FormApp.getActiveForm();
form.setAcceptingResponses(false);
deleteTriggers_();
informUser_("Your Google Form is no longer accepting responses");
}
/* If Total # of Form Responses >= Limit, Close Form */
function checkLimit() {
if (FormApp.getActiveForm().getResponses().length >= RESPONSE_COUNT ) {
closeForm();
}
}
/* Parse the Date for creating Time-Based Triggers */
function parseDate_(d) {
return new Date(d.substr(0,4), d.substr(5,2)-1,
d.substr(8,2), d.substr(11,2), d.substr(14,2));
}
I don't think you can use .timebased('7:00'); And it is good to check that you don't have a trigger before you try creating a new one so I like to do this. You can only specify that you want a trigger at a certain hour like say 7. The trigger will be randomly selected somewhere between 7 and 8. So you really can't pick 10:30 either. It has to be either 10 or 11. If you want more precision you may have to trigger your daily triggers early and then count some 5 minute triggers to get you closer to the mark. You'll have to wait to see where the daily triggers are placed in the hour first. Once they're set they don't change.
I've actually played around with the daily timers in a log by creating new ones until I get one that close enough to my desired time and then I turn the others off and keep that one. You have to be patient. As long as you id the trigger by the function name in the log you can change the function and keep the timer going.
Oh and I generally created the log file with drive notepad and then open it up whenever I want to view the log.
function formsOnOff()
{
if(!isTrigger('openForm'))
{
ScriptApp.newTrigger('openForm').timeBased().atHour(7).create()
}
if(!isTrigger('closeForm'))
{
ScriptApp.newTrigger('closeForm').timeBased().atHour(11)
}
}
function isTrigger(funcName)
{
var r=false;
if(funcName)
{
var allTriggers=ScriptApp.getProjectTriggers();
var allHandlers=[];
for(var i=0;i<allTriggers.length;i++)
{
allHandlers.push(allTriggers[i].getHandlerFunction());
}
if(allHandlers.indexOf(funcName)>-1)
{
r=true;
}
}
return r;
}
I sometimes run a log entry on my timers so that I can figure out exactly when they're happening.
function logEntry(entry,file)
{
var file = (typeof(file) != 'undefined')?file:'eventlog.txt';
var entry = (typeof(entry) != 'undefined')?entry:'No entry string provided.';
if(entry)
{
var ts = Utilities.formatDate(new Date(), "GMT-6", "yyyy-MM-dd' 'hh:mm:ss a");
var s = ts + ' - ' + entry + '\n';
myUtilities.saveFile(s, file, true);//this is part of a library that I created. But any save file function will do as long as your appending.
}
}
This is my utilities save file function. You have to provide defaultfilename and datafolderid.
function saveFile(datstr,filename,append)
{
var append = (typeof(append) !== 'undefined')? append : false;
var filename = (typeof(filename) !== 'undefined')? filename : DefaultFileName;
var datstr = (typeof(datstr) !== 'undefined')? datstr : '';
var folderID = (typeof(folderID) !== 'undefined')? folderID : DataFolderID;
var fldr = DriveApp.getFolderById(folderID);
var file = fldr.getFilesByName(filename);
var targetFound = false;
while(file.hasNext())
{
var fi = file.next();
var target = fi.getName();
if(target == filename)
{
if(append)
{
datstr = fi.getBlob().getDataAsString() + datstr;
}
targetFound = true;
fi.setContent(datstr);
}
}
if(!targetFound)
{
var create = fldr.createFile(filename, datstr);
if(create)
{
targetFound = true;
}
}
return targetFound;
}

how to ignore the last unknow characters at the end of the json string

{
education = (
{
school = {
id = 108102169223234;
name = psss;
};
type = College;
year = {
id = 142833822398097;
name = 2010;
};
}
);
}
!-- 1.2398s -->
the above leads me error as " NSLocalizedDescription=Unrecognised leading character"
not even close to valid JSON.. http://www.jsonlint.com/
Are you in charge of generating the feed? If so I would think it a lot better to fix the problem at the source than try re-factor your code to accommodate what ever that is that is getting returned.
Are you using a JSON framework in Xcode to parse that string?

In KRL, how do I detect if a variable is an array or hash?

In KRL, I'd like to detect whether a variable is an array or hash so that I know if I need to use the decode or encode operator on it. Is that possible?
I'd like to do something like this:
my_var = var.is_array => var.decode() | my_var
Update
The best way to do this is with the typeof() operator. This is new since the answer, but with the early interpretation of variables, the old way listed in the answer will no longer work.
Another useful operator for examining your data is isnull()
myHash.typeof() => "hash"
myArray.typeof() => "array"
...
The only way that I have figured out how to detect the data structure type is by coercing to a string and then checking to see if the resulting pointer string contains the word 'array' or 'hash'.
'One liner'
myHashIsHash = "#{myHash}".match(re/hash/gi);
myHashIsHash will be true/1
Example app built to demonstrate concept
ruleset a60x547 {
meta {
name "detect-array-or-hash"
description <<
detect-array-or-hash
>>
author "Mike Grace"
logging on
}
global {
myHash = {
"asking":"Mike Farmer",
"question":"detect type"
};
myArray = [0,1,2,3];
}
rule detect_types {
select when pageview ".*"
pre {
myHashIsArray = "#{myHash}".match(re/array/gi);
myHashIsHash = "#{myHash}".match(re/hash/gi);
myArrayIsArray = "#{myArray}".match(re/array/gi);
myArrayIsHash = "#{myArray}".match(re/hash/gi);
hashAsString = "#{myHash}";
arrayAsString = "#{myArray}";
}
{
notify("hash as string",hashAsString) with sticky = true;
notify("array as string",arrayAsString) with sticky = true;
notify("hash is array",myHashIsArray) with sticky = true;
notify("hash is hash",myHashIsHash) with sticky = true;
notify("array is array",myArrayIsArray) with sticky = true;
notify("array is hash",myArrayIsHash) with sticky = true;
}
}
}
Example app in action!