Change date column to integer - date

I have a large csv file as below:
DATE status code value value2
2014-12-13 Shipped 105732491-20091002165230 0.000803398 0.702892835
2014-12-14 Shipped 105732491-20091002165231 0.012925206 1.93748834
2014-12-15 Shipped 105732491-20091002165232 0.000191278 0.004772389
2014-12-16 Shipped 105732491-20091002165233 0.007493046 0.44883348
2014-12-17 Shipped 105732491-20091002165234 0.022015049 3.081006137
2014-12-18 Shipped 105732491-20091002165235 0.001894693 0.227268466
2014-12-19 Shipped 105732491-20091002165236 0.000312871 0.003113062
2014-12-20 Shipped 105732491-20091002165237 0.001754068 0.105016053
2014-12-21 Shipped 105732491-20091002165238 0.009773315 0.585910214
:
:
What i need to do is remove the header and change the date format to an integer yyyymmdd (eg. 20141217)
I am using opencsv to read and write the file.
Is there a way where i can change all the dates at once without parsing them one by one?
Below is my code to remove the header and create a new file:
void formatCsvFile(String fileToChange) throws Exception {
CSVReader reader = new CSVReader(new FileReader(new File(fileToChange)), CSVParser.DEFAULT_SEPARATOR, CSVParser.NULL_CHARACTER, CSVParser.NULL_CHARACTER, 1)
info "Read all rows at once"
List<String[]> allRows = reader.readAll();
CSVWriter writer = new CSVWriter(new FileWriter(fileToChange), CSVWriter.DEFAULT_SEPARATOR, CSVWriter.NO_QUOTE_CHARACTER)
info "Write all rows at once"
writer.writeAll(allRows)
writer.close()
}
Please can some one help?
Thanks

You don't need to parse the dates, but you do need to process each line in the file and convert the data on each line you want to convert. Java/Groovy doesn't have anything like awk where you can work with file data as columns, for example, the first 10 "columns" (characters usually) in every line in a file. Java/Groovy only deals with "rows" of data in a file, not "columns".
You could try something like this: (in Groovy)
reader.eachLine { String theLine ->
int idx = theLine.indexOf(' ')
String oldDate = theLine.subString(0, idx)
String newDate = oldDate.replaceAll('-', '')
String newLine = newDate + theLine.subString(idx);
writer.writeLine(newline);
}
Edit:
If your CSVReader class is not derived from File, then you can't use Groovy's eachLine method on it. And if the CSVReader class's readAll() method really returns a List of String arrays, then the above code could change to this:
allRows.each { String[] theLine ->
String newDate = theLine[0].replaceAll('-', '')
writer.writeLine(newDate + theLine[1..-1])
}

Ignore the first line (the header):
List<String[]> allRows = reader.readAll()[1..-1];
and replace the '-' in the dates by splitting each row and editting the first:
allrows = allrows.collect{
row -> row.split(',')[0].replace(',','') // the date
+ row.split(',')[1..-1] // the rest
}
I don't know what you mean by "all dates at once". For me can only be iterated.

Related

Salesforce trigger-Not able to understand

Below is the code written by my collegue who doesnt work in the firm anymore. I am inserting records in object with data loader and I can see success message but I do not see any records in my object. I am not able to understand what below trigger is doing.Please someone help me understand as I am new to salesforce.
trigger DataLoggingTrigger on QMBDataLogging__c (after insert) {
Map<string,Schema.RecordTypeInfo> recordTypeInfo = Schema.SObjectType.QMB_Initial_Letter__c.getRecordTypeInfosByName();
List<QMBDataLogging__c> logList = (List<QMBDataLogging__c>)Trigger.new;
List<Sobject> sobjList = (List<Sobject>)Type.forName('List<'+'QMB_Initial_Letter__c'+'>').newInstance();
Map<string, QMBLetteTypeToVfPage__c> QMBLetteTypeToVfPage = QMBLetteTypeToVfPage__c.getAll();
Map<String,QMBLetteTypeToVfPage__c> mapofLetterTypeRec = new Map<String,QMBLetteTypeToVfPage__c>();
set<Id>processdIds = new set<Id>();
for(string key : QMBLetteTypeToVfPage.keyset())
{
if(!mapofLetterTypeRec.containsKey(key)) mapofLetterTypeRec.put(QMBLetteTypeToVfPage.get(Key).Letter_Type__c, QMBLetteTypeToVfPage.get(Key));
}
for(QMBDataLogging__c log : logList)
{
Sobject logRecord = (sobject)log;
Sobject QMBLetterRecord = new QMB_Initial_Letter__c();
if(mapofLetterTypeRec.containskey(log.Field1__c))
{
string recordTypeId = recordTypeInfo.get(mapofLetterTypeRec.get(log.Field1__c).RecordType__c).isAvailable() ? recordTypeInfo.get(mapofLetterTypeRec.get(log.Field1__c).RecordType__c).getRecordTypeId() : recordTypeInfo.get('Master').getRecordTypeId();
string fieldApiNames = mapofLetterTypeRec.containskey(log.Field1__c) ? mapofLetterTypeRec.get(log.Field1__c).FieldAPINames__c : '';
//QMBLetterRecord.put('Letter_Type__c',log.Name);
QMBLetterRecord.put('RecordTypeId',tgh);
processdIds.add(log.Id);
if(string.isNotBlank(fieldApiNames) && fieldApiNames.contains(','))
{
Integer i = 1;
for(string fieldApiName : fieldApiNames.split(','))
{
string logFieldApiName = 'Field'+i+'__c';
fieldApiName = fieldApiName.trim();
system.debug('fieldApiName=='+fieldApiName);
Schema.DisplayType fielddataType = getFieldType('QMB_Initial_Letter__c',fieldApiName);
if(fielddataType == Schema.DisplayType.Date)
{
Date dateValue = Date.parse(string.valueof(logRecord.get(logFieldApiName)));
QMBLetterRecord.put(fieldApiName,dateValue);
}
else if(fielddataType == Schema.DisplayType.DOUBLE)
{
string value = (string)logRecord.get(logFieldApiName);
Double dec = Double.valueOf(value.replace(',',''));
QMBLetterRecord.put(fieldApiName,dec);
}
else if(fielddataType == Schema.DisplayType.CURRENCY)
{
Decimal decimalValue = Decimal.valueOf((string)logRecord.get(logFieldApiName));
QMBLetterRecord.put(fieldApiName,decimalValue);
}
else if(fielddataType == Schema.DisplayType.INTEGER)
{
string value = (string)logRecord.get(logFieldApiName);
Integer integerValue = Integer.valueOf(value.replace(',',''));
QMBLetterRecord.put(fieldApiName,integerValue);
}
else if(fielddataType == Schema.DisplayType.DATETIME)
{
DateTime dateTimeValue = DateTime.valueOf(logRecord.get(logFieldApiName));
QMBLetterRecord.put(fieldApiName,dateTimeValue);
}
else
{
QMBLetterRecord.put(fieldApiName,logRecord.get(logFieldApiName));
}
i++;
}
}
}
sobjList.add(QMBLetterRecord);
}
if(!sobjList.isEmpty())
{
insert sobjList;
if(!processdIds.isEmpty()) DeleteDoAsLoggingRecords.deleteTheProcessRecords(processdIds);
}
Public static Schema.DisplayType getFieldType(string objectName,string fieldName)
{
SObjectType r = ((SObject)(Type.forName('Schema.'+objectName).newInstance())).getSObjectType();
DescribeSObjectResult d = r.getDescribe();
return(d.fields.getMap().get(fieldName).getDescribe().getType());
}
}
You might be looking in the wrong place. Check if there's an unit test written for this thing (there should be one, especially if it's deployed to production), it should help you understand how it's supposed to be used.
You're inserting records of QMBDataLogging__c but then it seems they're immediately deleted in DeleteDoAsLoggingRecords.deleteTheProcessRecords(processdIds). Whether whatever this thing was supposed to do succeeds or not.
This seems to be some poor man's CSV parser or generic "upload anything"... that takes data stored in QMBDataLogging__c and creates QMB_Initial_Letter__c out of it.
QMBLetteTypeToVfPage__c.getAll() suggests you could go to Setup -> Custom Settings, try to find this thing and examine. Maybe it has some values in production but in your sandbox it's empty and that's why essentially nothing works? Or maybe some values that are there are outdated?
There's some comparison if what you upload into Field1__c can be matched to what's in that custom setting. I guess you load some kind of subtype of your QMB_Initial_Letter__c in there. Record Type name and list of fields to read from your log record is also fetched from custom setting based on that match.
Then this thing takes what you pasted, looks at the list of fields in from the custom setting and parses it.
Let's say the custom setting contains something like
Name = XYZ, FieldAPINames__c = 'Name,SomePicklist__c,SomeDate__c,IsActive__c'
This thing will look at first record you inserted, let's say you have the CSV like that
Field1__c,Field2__c,Field3__c,Field4__c
XYZ,Closed,2022-09-15,true
This thing will try to parse and map it so eventually you create record that a "normal" apex code would express as
new QMB_Initial_Letter__c(
Name = 'XYZ',
SomePicklist__c = 'Closed',
SomeDate__c = Date.parse('2022-09-15'),
IsActive__c = true
);
It's pretty fragile, as you probably already know. And because parsing CSV is an art - I expect it to absolutely crash and burn when text with commas in it shows up (some text,"text, with commas in it, should be quoted",more text).
In theory admin can change mapping in setup - but then they'd need to add new field anyway to the loaded file. Overcomplicated. I guess somebody did it to solve issue with Record Type Ids - but there are better ways to achieve that and still have normal CSV file with normal columns and strong type matching, not just chucking everything in as strings.
In theory this lets you have "jagged" csv files (row 1 having 5 fields, row 2 having different record type and 17 fields? no problem)
Your call whether it's salvageable or you'd rather ditch it and try normal loading of QMB_Initial_Letter__c records. (get back to your business people and ask for requirements?) If you do have variable number of columns at source - you'd need to standardise it or group the data so only 1 "type" of records (well, whatever's in that "Field1__c") goes into each file.

In Flutter, how can I combine data into a string very quickly?

I am gathering accelerometer data from my phone using the sensors package, adding that data to a List<AccelerometerEvent>, and then combining that data into a (csv) String so I can use file.writeAsString() to save this data as a csv file. The problem I am having is that it takes too long to combine the data into a string.
For example:
List length : 28645
Milliseconds to combine into csv string: 113580
Code:
for (AccelerometerEvent event in history) {
dataString = dataString + '${event.timestamp},${event.x},${event.y},${event.z}\n';
}
What would be a more efficient way to do this?
Should I even combine the data into a string, or is there a better way to save this data to a file?
Thanks
Create a file object
write first line with column names, and after that each row (after \n) will be an event
See: FileMode.append
Will add new strings without replacing existing string in file
File file = File('events.csv');
file.writeAsStringSync('TIMESTAMP, X, Y, Z\n', mode: FileMode.append);
for (AccelerometerEvent event in history) {
final x = event.x;
final y = event.y;
final z = event.z;
final timestamp = event.timestamp;
String data = '$timestamp, $x, $y, $z';
file.writeAsStringSync('$data\n', mode: FileMode.append);
}

XLSX file via OpenXml SDK Both Valid and Invalid

I have a program which exports a System.Data.DataTable to an XLSX / OpenXml Spreadsheet. Finally have it mostly working. However when opening the Spreadsheet in Excel, Excel complains about the file being invalid, and needing repair, giving this message...
We found a problem with some content in . Do you want us to
try to recover as much as we can? If you trust the source of the
workbook, clik Yes.
If I click Yes, it comes back with this message...
Clicking the log file and opening that, just shows this...
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error268360_01.xml</logFileName>
<summary>Errors were detected in file 'C:\Users\aabdi\AppData\Local\Temp\data.20190814.152538.xlsx'</summary>
<repairedRecords>
<repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1.xml part</repairedRecord>
</repairedRecords>
</recoveryLog>
Obviously, we don't want to deploy this into a production environment like this. So I've been trying to figure out how to fix this. I threw together a quick little sample to validate the XML and show the errors, based on this link from MSDN. But when I run the program and load the exact same XLSX document that Excel complains about, the Validator comes back saying that the file is perfectly Valid. So I'm not sure where else to go from there.
Any better tools for trying to validate my XLSX XML? Following is the complete code I'm using to generate the XLSX file. (Yes, it's in VB.NET, it's a legacy app.)
If I comment out the line in the For Each dr As DataRow loop, then the XLSX file opens fine in Excel, (just without any data). So it's something with the individual cells, but I'm not really DOING much with them. Setting a value and data type, and that's it.
I also tried replacing the For Each loop in ConstructDataRow with the following, but it still outputs the same "bad" XML...
rv.Append(
(From dc In dr.Table.Columns
Select ConstructCell(
NVL(dr(dc.Ordinal), String.Empty),
MapSystemTypeToCellType(dc.DataType)
)
).ToArray()
)
Also tried replacing the call to Append with AppendChild for each cell too, but that didn't help either.
The zipped up XLSX file (erroring, with dummy data) is available here:
https://drive.google.com/open?id=1KVVWEqH7VHMxwbRA-Pn807SXHZ32oJWR
Full DataTable to Excel XLSX Code
#Region " ToExcel "
<Extension>
Public Function ToExcel(ByVal target As DataTable) As Attachment
Dim filename = Path.GetTempFileName()
Using doc As SpreadsheetDocument = SpreadsheetDocument.Create(filename, DocumentFormat.OpenXml.SpreadsheetDocumentType.Workbook)
Dim data = New SheetData()
Dim wbp = doc.AddWorkbookPart()
wbp.Workbook = New Workbook()
Dim wsp = wbp.AddNewPart(Of WorksheetPart)()
wsp.Worksheet = New Worksheet(data)
Dim sheets = wbp.Workbook.AppendChild(New Sheets())
Dim sheet = New Sheet() With {.Id = wbp.GetIdOfPart(wsp), .SheetId = 1, .Name = "Data"}
sheets.Append(sheet)
data.AppendChild(ConstructHeaderRow(target))
For Each dr As DataRow In target.Rows
data.AppendChild(ConstructDataRow(dr)) '// THIS LINE YIELDS THE BAD PARTS
Next
wbp.Workbook.Save()
End Using
Dim attachmentname As String = Path.Combine(Path.GetDirectoryName(filename), $"data.{Now.ToString("yyyyMMdd.HHmmss")}.xlsx")
File.Move(filename, attachmentname)
Return New Attachment(attachmentname, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
End Function
Private Function ConstructHeaderRow(dt As DataTable) As Row
Dim rv = New Row()
For Each dc As DataColumn In dt.Columns
rv.Append(ConstructCell(dc.ColumnName, CellValues.String))
Next
Return rv
End Function
Private Function ConstructDataRow(dr As DataRow) As Row
Dim rv = New Row()
For Each dc As DataColumn In dr.Table.Columns
rv.Append(ConstructCell(NVL(dr(dc.Ordinal), String.Empty), MapSystemTypeToCellType(dc.DataType)))
Next
Return rv
End Function
Private Function ConstructCell(value As String, datatype As CellValues) As Cell
Return New Cell() With {
.CellValue = New CellValue(value),
.DataType = datatype
}
End Function
Private Function MapSystemTypeToCellType(t As System.Type) As CellValues
Dim rv As CellValues
Select Case True
Case t Is GetType(String)
rv = CellValues.String
Case t Is GetType(Date)
rv = CellValues.Date
Case t Is GetType(Boolean)
rv = CellValues.Boolean
Case IsNumericType(t)
rv = CellValues.Number
Case Else
rv = CellValues.String
End Select
Return rv
End Function
#End Region
For anyone else coming in and finding this, I finally tracked this down to the Cell.DataType
Setting a value of CellValues.Date will cause Excel to want to "fix" the document.
(apparently for dates, the DataType should be NULL, and Date was only used in Office 2010).
Also, if you specify a DataType of CellValues.Boolean, then the CellValue needs to be either 0 or 1. "true" / "false" will also cause Excel to want to "fix" your spreadsheet.
Also, Microsoft has a better validator tool already built for download here:
https://www.microsoft.com/en-us/download/details.aspx?id=30425

How to export an csv file to a bigqery table using java dataflow?

I want to read an csv file from the cloud bucket and write it to a bigquery table with columns using dataflow in java. How can I set the headers to the csv file while writing to bigquery?
There are two issues to solve here
Skipping the header when reading the data, and
Using the header to correctly populate teh bigquery table columns.
For (1) this is, as of June 2019, not implemented natively, though you could try the options listed at Skipping header rows - is it possible with Cloud DataFlow?. For (2) the easiest would be to read the first line of your CSV in your main program, and pass the list of column names in the constructor to a DoFn that converts CSV lines into TableRow objects ready to write to Bigquery.
Your final program would look something like
public void CsvToBigquery(csvInputPattern, bigqueryTable) {
final String[] columns = readAndSplitFirstLineOfFirstFile(csvInputPattern);
Pipeline p = new Pipeline.create(...);
p
.apply(TextIO.read().from(csvInputPattern)
.apply(Filter.by(new MatchIfNonHeader())
.apply(ParDo.of(new DoFn<String, TableRow>() {
... // use columns here to TableRows
})
.apply(BigtableIO.write().withTableId(bigqueryTable)...);
}
I've done a similar task and used Apache Common library in ParDo function to extract the data from CSV files and then converted them to Table Row Objects for BQ.
String fileData = c.element();
BufferedReader fileReader = new BufferedReader(new InputStreamReader(
new ByteArrayInputStream(fileData.getBytes("UTF-8")), "UTF-8"));
CSVParser csvParser = new CSVParser(fileReader,CSVFormat.DEFAULT.withFirstRecordAsHeader().withIgnoreHeaderCase().withTrim());
Iterable<CSVRecord> csvRecords = csvParser.getRecords();
for (CSVRecord csvRecord : csvRecords) {
TableRow row = new TableRow();
checkAndConvertIntoBqDataType(csvRecord.toMap());
c.output(row);
}

reading files and folders in order with apache beam

I have a folder structure of the type year/month/day/hour/*, and I'd like the beam to read this as an unbounded source in chronological order. Specifically, this means reading in all the files in the first hour on record and adding their contents for processing. Then, add the file contents of the next hour for processing, up until the current time where it waits for new files to arrive in the latest year/month/day/hour folder.
Is it possible to do this with apache beam?
So what I would do is to add timestamps to each element according to the file path. As a test I used the following example.
First of all, as explained in this answer, you can use FileIO to match continuously a file pattern. This will help as, per your use case, once you have finished with the backfill you want to keep reading new arriving files within the same job. In this case I provide gs://BUCKET_NAME/data/** because my files will be like gs://BUCKET_NAME/data/year/month/day/hour/filename.extension:
p
.apply(FileIO.match()
.filepattern(inputPath)
.continuously(
// Check for new files every minute
Duration.standardMinutes(1),
// Never stop checking for new files
Watch.Growth.<String>never()))
.apply(FileIO.readMatches())
Watch frequency and timeout can be adjusted at will.
Then, in the next step we'll receive the matched file. I will use ReadableFile.getMetadata().resourceId() to get the full path and split it by "/" to build the corresponding timestamp. I round it to the hour and do not account for timezone correction here. With readFullyAsUTF8String we'll read the whole file (be careful if the whole file does not fit into memory, it is recommended to shard your input if needed) and split it into lines. With ProcessContext.outputWithTimestamp we'll emit downstream a KV of filename and line (the filename is not needed anymore but it will help to see where each file comes from) and the timestamp derived from the path. Note that we're shifting timestamps "back in time" so this can mess up with the watermark heuristics and you will get a message such as:
Cannot output with timestamp 2019-03-17T00:00:00.000Z. Output timestamps must be no earlier than the timestamp of the current input (2019-06-05T15:41:29.645Z) minus the allowed skew (0 milliseconds). See the DoFn#getAllowedTimestampSkew() Javadoc for details on changing the allowed skew.
To overcome this I set getAllowedTimestampSkew to Long.MAX_VALUE but take into account that this is deprecated. ParDo code:
.apply("Add Timestamps", ParDo.of(new DoFn<ReadableFile, KV<String, String>>() {
#Override
public Duration getAllowedTimestampSkew() {
return new Duration(Long.MAX_VALUE);
}
#ProcessElement
public void processElement(ProcessContext c) {
ReadableFile file = c.element();
String fileName = file.getMetadata().resourceId().toString();
String lines[];
String[] dateFields = fileName.split("/");
Integer numElements = dateFields.length;
String hour = dateFields[numElements - 2];
String day = dateFields[numElements - 3];
String month = dateFields[numElements - 4];
String year = dateFields[numElements - 5];
String ts = String.format("%s-%s-%s %s:00:00", year, month, day, hour);
Log.info(ts);
try{
lines = file.readFullyAsUTF8String().split("\n");
for (String line : lines) {
c.outputWithTimestamp(KV.of(fileName, line), new Instant(dateTimeFormat.parseMillis(ts)));
}
}
catch(IOException e){
Log.info("failed");
}
}}))
Finally, I window into 1-hour FixedWindows and log the results:
.apply(Window
.<KV<String,String>>into(FixedWindows.of(Duration.standardHours(1)))
.triggering(AfterWatermark.pastEndOfWindow())
.discardingFiredPanes()
.withAllowedLateness(Duration.ZERO))
.apply("Log results", ParDo.of(new DoFn<KV<String, String>, Void>() {
#ProcessElement
public void processElement(ProcessContext c, BoundedWindow window) {
String file = c.element().getKey();
String value = c.element().getValue();
String eventTime = c.timestamp().toString();
String logString = String.format("File=%s, Line=%s, Event Time=%s, Window=%s", file, value, eventTime, window.toString());
Log.info(logString);
}
}));
For me it worked with .withAllowedLateness(Duration.ZERO) but depending on the order you might need to set it. Keep in mind that a value too high will cause windows to be open for longer and use more persistent storage.
I set the $BUCKET and $PROJECT variables and I just upload two files:
gsutil cp file1 gs://$BUCKET/data/2019/03/17/00/
gsutil cp file2 gs://$BUCKET/data/2019/03/18/22/
And run the job with:
mvn -Pdataflow-runner compile -e exec:java \
-Dexec.mainClass=com.dataflow.samples.ChronologicalOrder \
-Dexec.args="--project=$PROJECT \
--path=gs://$BUCKET/data/** \
--stagingLocation=gs://$BUCKET/staging/ \
--runner=DataflowRunner"
Results:
Full code
Let me know how this works. This was just an example to get started and you might need to adjust windowing and triggering strategies, lateness, etc to suit your use case