Convert List to json object in Talend - talend

I'm trying to convert list result from tJava Component to json object.
Result from tjava component is below.
[{run_id=5d0753d58d93b71a1d12cc22_, parent_run_id=null, pipe_invoker=scheduled, path_id=shared, count=33, plex_path=null, invoker=abc.com, nested_pipeline=true, duration=355, start_time=2020-11-20T11:17:32.298000+00:00, lable=MP_SQS, state=Completed, key=57694b41ee, root_ruuid=2_ba32ea346}, {run_id=5bd4c6ea346, parent_run_id=null, pipe_invoker=scheduled, path_id=shared, count=33, plex_path=null, invoker=wwr.com, nested_pipeline=true, duration=355, start_time=2020-11-20T11:17:32.298000+00:00, lable=Summary_MP_SQS, state=Completed, key=55dfff4f, root_ruuid=1246d2-8bdc-1846}]
i tried using tConvertType, or in tMapper converting into String then replace all function multiple time and then storing result into json file but noting is working as expected.
End Expected result is json file from above result.

I don't think there is an easy way with a Talend component to do that (but I may be wrong :-) ) :
Put quotes on the keys/value
Replace "=" by ":"
But it can be done with regular Java code, here an example :
String input =
"[{run_id=1111, parent_run_id=null1, pipe_invoker=scheduled1, path_id=shared, count=33, plex_path=null}," +
" {run_id=2222, parent_run_id=null2, pipe_invoker=scheduled2, path_id=shared, count=33, plex_path=null}]";
StringBuilder output = new StringBuilder();
input = input.substring(2,input.length() - 2); // remove "[{" .... "}]"
output.append("[{");
String step1 [] = input.split("\\}, \\{"); // each record, split on "}, {"
int j = 1;
for (String record : step1) {
String step2 [] = record.split(","); // each key/value, split on ","
int i = 1;
for (String keyValue : step2) {
// key=value --> "key":"value"
output.append("\"" + keyValue.split("=")[0].trim() + "\":\"" + keyValue.split("=")[1].trim() + "\"");
if (i++ < step2.length ) {
output.append(",");
}
}
if (j++ < step1.length ) {
output.append("} , {");
}
}
output.append("}]");
/*
output :
[{"run_id":"1111","parent_run_id":"null1","pipe_invoker":"scheduled1","path_id":"shared","count":"33","plex_path":"null"} , {"run_id":"2222","parent_run_id":"null2","pipe_invoker":"scheduled2","path_id":"shared","count":"33","plex_path":"null"}]
*/

Related

How to append CSV file with PostgreSQL query export in Datagrip?

I want to append an existing CSV file (Template for Google Ads Offline Conversion Tracking) with the results of a query from PostgreSQL using Datagrip.
Execute to file works just fine to export the query results as a CSV file. But obviously that just creates a new file with only the results of the query.
I was thinking to load the existing template in the groovy script that's there in datagrip, and then add its content to the top of the created file, but I can't make any headway on how to do this.
I found (maybe?) how to load the file, but I have no clue how to actually use this and put it at the top of the file so the results of the query can be added below it.
fh = new File("C:/1.csv")
def csv_content = fh.getText('utf-8')
That's all I got for loading the file with no clue on how to use this going forward.
Alternatively I considered to just manually add the info of the template by hand to the script, so it doesn't have to load another file, but just takes that info and then adds the query results afterwards. If you could give me a way on how to directly add rows like these would be very helpful too.
The groovy script file that is included in Datagrip is as follows:
/*
* Available context bindings:
* COLUMNS List<DataColumn>
* ROWS Iterable<DataRow>
* OUT { append() }
* FORMATTER { format(row, col); formatValue(Object, col); getTypeName(Object, col); isStringLiteral(Object, col); }
* TRANSPOSED Boolean
* plus ALL_COLUMNS, TABLE, DIALECT
*
* where:
* DataRow { rowNumber(); first(); last(); data(): List<Object>; value(column): Object }
* DataColumn { columnNumber(), name() }
*/
SEPARATOR = ","
QUOTE = "\""
NEWLINE = System.getProperty("line.separator")
def printRow = { values, valueToString ->
values.eachWithIndex { value, idx ->
def str = valueToString(value)
def q = str.contains(SEPARATOR) || str.contains(QUOTE) || str.contains(NEWLINE)
OUT.append(q ? QUOTE : "")
.append(str.replace(QUOTE, QUOTE + QUOTE))
.append(q ? QUOTE : "")
.append(idx != values.size() - 1 ? SEPARATOR : NEWLINE)
}
}
if (!TRANSPOSED) {
ROWS.each { row -> printRow(COLUMNS, { FORMATTER.format(row, it) }) }
}
else {
def values = COLUMNS.collect { new ArrayList<String>() }
ROWS.each { row -> COLUMNS.eachWithIndex { col, i -> values[i].add(FORMATTER.format(row, col)) } }
values.each { printRow(it, { it }) }
}
You need to pass content of your file to OUT variable like this:
fh = new File("C:/1.csv")
def csv_content = fh.getText('utf-8')
OUT.append(csv_content).append("\n")
Everything that is passed to OUT.append will be written to new file.
Just add it to the beginning of the script:
/*
* Available context bindings:
* COLUMNS List<DataColumn>
* ROWS Iterable<DataRow>
* OUT { append() }
* FORMATTER { format(row, col); formatValue(Object, col); getTypeName(Object, col); isStringLiteral(Object, col); }
* TRANSPOSED Boolean
* plus ALL_COLUMNS, TABLE, DIALECT
*
* where:
* DataRow { rowNumber(); first(); last(); data(): List<Object>; value(column): Object }
* DataColumn { columnNumber(), name() }
*/
SEPARATOR = ","
QUOTE = "\""
NEWLINE = System.getProperty("line.separator")
fh = new File("C:/1.csv")
def csv_content = fh.getText('utf-8')
OUT.append(csv_content).append("\n")
def printRow = { values, valueToString ->
values.eachWithIndex { value, idx ->
def str = valueToString(value)
def q = str.contains(SEPARATOR) || str.contains(QUOTE) || str.contains(NEWLINE)
OUT.append(q ? QUOTE : "")
.append(str.replace(QUOTE, QUOTE + QUOTE))
.append(q ? QUOTE : "")
.append(idx != values.size() - 1 ? SEPARATOR : NEWLINE)
}
}
if (!TRANSPOSED) {
ROWS.each { row -> printRow(COLUMNS, { FORMATTER.format(row, it) }) }
}
else {
def values = COLUMNS.collect { new ArrayList<String>() }
ROWS.each { row -> COLUMNS.eachWithIndex { col, i -> values[i].add(FORMATTER.format(row, col)) } }
values.each { printRow(it, { it }) }
}
Note that you may copy paste CSV-Groovy.csv.groovy to the same directory (e.g. MY-CSV.csv.groovy) and modify new file. This new extractor will be added to combobox with all extractors

ServiceStack Ormlite Deserialize Array for In Clause

I am storing some query criteria in the db via a ToJson() on the object that contains all the criteria. A simplified example would be:
{"FirstName" :[ {Operator: "=", Value: "John"}, { Operator: "in", Value:" ["Smith", "Jones"]"}], "SomeId": [Operator: "in", Value: "[1,2,3]" }]}
The lists are either string, int, decimal or date. These all map to the same class/table so it is easy via reflection to get FirstName or SomeId's type.
I'm trying to create a where clause based on this information:
if (critKey.Operator == "in")
{
wb.Values.Add(keySave + i, (object)ConvertList<Members>(key,
(string)critKey.Value));
wb.WhereClause = wb.WhereClause + " And {0} {1} (#{2})".Fmt(critKey.Column,
critKey.Operator, keySave + i);
}
else
{
wb.Values.Add(keySave + i, (object)critKey.Value);
wb.WhereClause = wb.WhereClause + " And {0} {1} #{2}".Fmt(critKey.Column, critKey.Operator, keySave + i);
}
It generates something like this (example from my tests, yes I know the storenumber part is stupid):
Email = #Email0 And StoreNumber = #StoreNumber0 And StoreNumber in (#StoreNumber1)
I'm running into an issue with the lists. Is there a nice way to do this with any of the ormlite tools instead of doing this all by hand? The where clause generates fine except when dealing with lists. I'm trying to make it generic but having a hard time on that part.
Second question maybe related but I can't seem to find how to use parameters with in. Coming from NPoco you can do (colum in #0, somearray)` but I cant' seem to find out how to do this without using Sql.In.
I ended up having to write my own parser as it seems ormlite doesn't support have the same support for query params for lists like NPoco. Basically I'd prefer to be able to do this:
Where("SomeId in #Ids") and pass in a parameter but ended up with this code:
listObject = ConvertListObject<Members>(key, (string)critKey.Value);
wb.WhereClause = wb.WhereClause + " And {0} {1} ({2})"
.Fmt(critKey.Column, critKey.Operator,listObject.EscapedList(ColumnType<Members>(key)));
public static string EscapedList(this List<object> val, Type t)
{
var escapedList = "";
if (t == typeof(int) || t == typeof(float) || t == typeof(decimal))
{
escapedList = String.Join(",", val.Select(x=>x.ToString()));
} else
{
escapedList = String.Join(",", val.Select(x=>"'" + x.ToString() + "'"));
}
return escapedList;
}
I'd like to see other answers especially if I'm missing something in ormlite.
When dealing with lists you can use the following example
var storeNumbers = new [] { "store1", "store2", "store3" };
var ev = Db.From<MyClass>
.Where(p => storeNumbers.Contains(p => p.StoreNumber));
var result = Db.Select(ev);

Blank results while using Tokens Regex rules to identify Named Entities

I am struggling with writing the correct rule which involves macros to identify organizations in a text.
To Identify Matrix Inc. in:
With it's rising share prices Matrix Inc. has come out a winner this quarter.
I am trying to check for words like Inc within the entity and thus defined a macros and rule as below:
$ORGANIZATION_TITLES = "/pharmaceuticals?|group|corp|corporation|international|co.?|inc.?|incorporated|holdings|motors|ventures|parters|llc|limited liability corporation|pvt.? ltd.?/"
ENV.defaults["stage"] = 1
{
ruleType: "tokens",
pattern: ([$ORGANIZATION_TITLES]),
action: ( Annotate($0, ner, "ORGANIZATION") )
}
ENV.defaults["stage"] = 2
{ ( [{tag:NNP}]+? ($ORGANIZATION_TITLES)) => ORGANIZATION }
I tried using bindings also and then applying the rule.
env.bind("$ORGANIZATION_TITLES", TokenSequencePattern.compile(env,"/pharmaceuticals?|group|corp|corporation|international|co.?|inc.?|incorporated|holdings|motors|ventures|parters|llc|limited liability corporation|pvt.? ltd.?/"));
Nothing seems to be working. I need to define more complex pattern rules involving macros like:
pattern: ( [ { ner:PERSON } ]+ /,/*? ($TITLES_CORPORATE_PREFIXES)*? $TITLES_CORPORATE+? /,/*? /of|for/? /,/*? [ { ner:ORGANIZATION } ]+ )
where $TITLES_CORPORATE_PREFIXES and $TITLES_CORPORATE are macros similar to $ORGANIZATION_TITLES.
What am I doing wrong?
EDIT
Here's my code:
public static void main(String[] args)
{
String rulesFile = "D:\\Workspace\\resource\\NERRulesFile.txt";
String dataFile = "D:\\Workspace\\resource\\GoldSetSentences.txt";
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
// pipeline.addAnnotator(new TokensRegexAnnotator(rulesFile));
String inputText = "Bill Edelman , CEO and Chairman , for Paragonix commented on the Supply Agreement with Essential Pharmaceuticals .";
Annotation document = new Annotation(inputText.toLowerCase());
pipeline.annotate(document);
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(TokenSequencePattern.getNewEnv(), rulesFile);
/* Next we can go over the annotated sentences and extract the annotated words,
Using the CoreLabel Object */
for (CoreMap sentence : sentences)
{
List<MatchedExpression> matched = extractor.extractExpressions(sentence);
for(MatchedExpression phrase : matched){
// Print out matched text and value
System.out.println("matched: " + phrase.getText() + " with value " + phrase.getValue());
// Print out token information
CoreMap cm = phrase.getAnnotation();
for (CoreLabel token : cm.get(TokensAnnotation.class))
{
String word = token.get(TextAnnotation.class);
String lemma = token.get(LemmaAnnotation.class);
String pos = token.get(PartOfSpeechAnnotation.class);
String ne = token.get(NamedEntityTagAnnotation.class);
System.out.println("matched token: " + "word="+word + ", lemma="+lemma + ", pos=" + pos + "ne=" + ne);
}
}
}
}
Here is a rules file that should work:
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
$ORGANIZATION_TITLES = "/inc\.|corp\./"
{ pattern: ([{pos: NNP}]+ $ORGANIZATION_TITLES), action: ( Annotate($0, ner, "RULE_FOUND_ORG") ) }
I have made some changes to our code base to make the TokensRegexAnnotator more easily accessible. You will need to get the latest version from GitHub: https://github.com/stanfordnlp/CoreNLP
java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,tokensregex -tokensregex.rules organization.rules -file samples.txt -outputFormat text -tokensregex.caseInsensitive
If you run this command or the equivalent Java API call it should work:

how to seek position in datatable before writing to the file

I have the data table that is read from csv file. Then it is iterated through row and columns, and each value is appended before writing to the file that is also destination csv file. I want to separate the data of one column upon the special character ("/"), into two columns. For example, the column of 'Type' of data table is "women/shoes and handbags/guess". I have another column 'SubType', so I want to separate one column to two columns in data table before writing. I just want to ignore third type that is guess. Is there a way to seek position "/" and after second "/", I want to insert that value into another column of data table that is 'SubType'.
foreach (DataRow dRow in dtSor.Rows)
{
for (int i = 0; i < dtSor.Columns.Count; i++)
{
if (dRow[i].ToString().Contains(","))
{
dest_csv.Append("\"" + dRow[i].ToString() + "\"" + ",");
}
else if (dRow[i].ToString() == "")
{
dest_csv.Append("NULL" + ",");
}
else
{
dest_csv.Append(dRow[i].ToString() + ",");
//dest_csv.Append(dRow[i].ToString());
}
}
dest_csv.Remove(dest_csv.Length - 1, 1);
dest_csv.Append(Environment.NewLine);
}
File.WriteAllText(destination_file, dest_csv.ToString(), Encoding.Default);
}
First check if the special char is in the field with indexOf, then split the field on the special char in an araay and take only the parts you're intereseted in.
like so:
const char special = '\\';
string maintype ="";
string subtype ="";
string field = dRow[i].ToString();
if (field.IndexOf(special)>-1)
{
string[] splitted = field.Split(special);
maintype = splitted[0];
subtype = splitted[1];
}

' ', hexadecimal value 0x1F, is an invalid character. Line 1, position 1

I am trying to read a xml file from the web and parse it out using XDocument. It normally works fine but sometimes it gives me this error for day:
**' ', hexadecimal value 0x1F, is an invalid character. Line 1, position 1**
I have tried some solutions from Google but they aren't working for VS 2010 Express Windows Phone 7.
There is a solution which replace the 0x1F character to string.empty but my code return a stream which doesn't have replace method.
s = s.Replace(Convert.ToString((byte)0x1F), string.Empty);
Here is my code:
void webClient_OpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
{
using (var reader = new StreamReader(e.Result))
{
int[] counter = { 1 };
string s = reader.ReadToEnd();
Stream str = e.Result;
// s = s.Replace(Convert.ToString((byte)0x1F), string.Empty);
// byte[] str = Convert.FromBase64String(s);
// Stream memStream = new MemoryStream(str);
str.Position = 0;
XDocument xdoc = XDocument.Load(str);
var data = from query in xdoc.Descendants("user")
select new mobion
{
index = counter[0]++,
avlink = (string)query.Element("user_info").Element("avlink"),
nickname = (string)query.Element("user_info").Element("nickname"),
track = (string)query.Element("track"),
artist = (string)query.Element("artist"),
};
listBox.ItemsSource = data;
}
}
XML file:
http://music.mobion.vn/api/v1/music/userstop?devid=
0x1f is a Windows control character. It is not valid XML. Your best bet is to replace it.
Instead of using reader.ReadToEnd() (which by the way - for a large file - can use up a lot of memory.. though you can definitely use it) why not try something like:
string input;
while ((input = sr.ReadLine()) != null)
{
string = string + input.Replace((char)(0x1F), ' ');
}
you can re-convert into a stream if you'd like, to then use as you please.
byte[] byteArray = Encoding.ASCII.GetBytes( input );
MemoryStream stream = new MemoryStream( byteArray );
Or else you could keep doing readToEnd() and then clean that string of illegal characters, and convert back to a stream.
Here's a good resource for cleaning illegal characters in your xml - chances are, youll have others as well...
https://seattlesoftware.wordpress.com/tag/hexadecimal-value-0x-is-an-invalid-character/
What could be happening is that the content is compressed in which case you need to decompress it.
With HttpHandler you can do this the following way:
var client = new HttpClient(new HttpClientHandler
{
AutomaticDecompression = DecompressionMethods.GZip
| DecompressionMethods.Deflate
});
With the "old" WebClient you have to derive your own class to achieve the similar effect:
class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
return request;
}
}
Above taken from here
To use the two you would do something like this:
HttpClient
using (var client = new HttpClient(new HttpClientHandler { AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate }))
{
using (var stream = client.GetStreamAsync(url))
{
using (var sr = new StreamReader(stream.Result))
{
using (var reader = XmlReader.Create(sr))
{
var feed = System.ServiceModel.Syndication.SyndicationFeed.Load(reader);
foreach (var item in feed.Items)
{
Console.WriteLine(item.Title.Text);
}
}
}
}
}
WebClient
using (var stream = new MyWebClient().OpenRead("http://myrss.url"))
{
using (var sr = new StreamReader(stream))
{
using (var reader = XmlReader.Create(sr))
{
var feed = System.ServiceModel.Syndication.SyndicationFeed.Load(reader);
foreach (var item in feed.Items)
{
Console.WriteLine(item.Title.Text);
}
}
}
}
This way you also recieve the benefit of not having to .ReadToEnd() since you are working with the stream instead.
Consider using System.Web.HttpUtility.HtmlDecode if you're decoding content read from the web.
If you are having issues replacing the character
For me there were some issues if you try to replace using the string instead of the char. I suggest trying some testing values using both to see what they turn up. Also how you reference it has some effect.
var a = x.IndexOf('\u001f'); // 513
var b = x.IndexOf(Convert.ToString((byte)0x1F)); // -1
x = x.Replace(Convert.ToChar((byte)0x1F), ' '); // Works
x = x.Replace(Convert.ToString((byte)0x1F), " "); // Fails
I blagged this
I had the same issue and found that the problem was a  embedded in the xml.
The solution was:
s = s.Replace("", " ")
I'd guess it's probably an encoding issue but without seeing the XML I can't say for sure.
In terms of your plan to simply replace the character but not being able to, because you have a stream rather than a text, simply read the stream into a string and then remove the characters you don't want.
Works for me.........
string.Replace(Chr(31), "")
I used XmlSerializer to parse XML and faced the same exception.
The problem is that the XML string contains HTML codes of invalid characters
This method removes all invalid HTML codes from string (based on this thread - https://forums.asp.net/t/1483793.aspx?Need+a+method+that+removes+illegal+XML+characters+from+a+String):
public static string RemoveInvalidXmlSubstrs(string xmlStr)
{
string pattern = "&#((\\d+)|(x\\S+));";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
if (regex.IsMatch(xmlStr))
{
xmlStr = regex.Replace(xmlStr, new MatchEvaluator(m =>
{
string s = m.Value;
string unicodeNumStr = s.Substring(2, s.Length - 3);
int unicodeNum = unicodeNumStr.StartsWith("x") ?
Convert.ToInt32(unicodeNumStr.Substring(1), 16)
: Convert.ToInt32(unicodeNumStr);
//according to https://www.w3.org/TR/xml/#charsets
if ((unicodeNum == 0x9 || unicodeNum == 0xA || unicodeNum == 0xD) ||
((unicodeNum >= 0x20) && (unicodeNum <= 0xD7FF)) ||
((unicodeNum >= 0xE000) && (unicodeNum <= 0xFFFD)) ||
((unicodeNum >= 0x10000) && (unicodeNum <= 0x10FFFF)))
{
return s;
}
else
{
return String.Empty;
}
})
);
}
return xmlStr;
}
Nobody can answer if you don't show relevant info - I mean the Xml content.
As a general advice I would put a breakpoint after ReadToEnd() call. Now you can do a couple of things:
Reveal Xml content to this forum.
Test it using VS Xml visualizer.
Copy-paste the string into a txt file and investigate it offline.