How to read Pipe("|") deliminated file in spring batch processing - spring-batch

I am trying to read .dat file which contains text which is delimited by the pipe symbol: |
How can I use spring-batch processing to read a file deliminated with this character?

You can find an example in the getting started guide: https://spring.io/guides/gs/batch-processing/
To set a custom delimiter, use the delimiter method: https://docs.spring.io/spring-batch/4.0.x/api/org/springframework/batch/item/file/builder/FlatFileItemReaderBuilder.DelimitedBuilder.html#delimiter-java.lang.String-

Java config for defining custom Delimiter
public FlatFileItemReader<POJO> reader() {
return new FlatFileItemReaderBuilder<POJO>().resource(new UrlResource(filename))
.name("Reader")
.delimited()
.delimiter("|")
.names("Your File Header")
.targetType(POJO.class)
.build();
}

Related

Saving to yml file using Spigot

I'm attempting to produce a Message.yml file using Spigot's YAMLConfiguration.
This is my code:
public static void create() {
if(messagesFile.exists()) return;
try {
messagesFile.createNewFile();
messages.options().copyDefaults(true);
messages.addDefault("MESSAGES.PREFIX", "&c[YourServer] ");
messages.addDefault("MESSAGES.DESIGN", "§8§l- ");
messages.addDefault("MESSAGES.NOPERMS", "§c§lDazu hast du keine Rechte!");
messages.addDefault("MESSAGES.ADDMAP.USAGE", "§c§lBitte nutze /addmap [mapname]!");
messages.save(messagesFile);
} catch(Exception e) {
e.printStackTrace();
}
}
However, the config.yml file I received after running it read as follows:
MESSAGES:
PREFIX: '&c[YourServer] '
DESIGN: "\xa78\xa7l- "
NOPERMS: "\xa7c\xa7lDazu hast du keine Rechte!"
ADDMAP:
USAGE: "\xa7c\xa7lBitte nutze /addmap [mapname]!"
Is there any way to fix it?
It thinks the text is a string and not a standalone character.
https://www.spigotmc.org/threads/special-characters-in-config.298138/
Yeah u use special caracter to save the color but it's a String. Don't put your color here just save the String. When you want to resend the text from the config just put for example.
player.sendMessage(ChatColor.RED + config.get("MESSAGES.PREFIX"));
this is just an example
Like #Minecraft said in his answer, the issue is that Java is recognizing the § as a part of your string and translating it to unicode.
What I would do is have your custom config file stored in your plugin resources directory with all the default values you want it to have already defined.
Then when you want to use the custom message, get it from the config file using getConfig()'s returned value's methods. Then, if you want to support color codes, you should use message = ChatColor.translateAlternateColorCodes('&', yourMessage); or something along those lines. Should be plenty to get you going.
Also, be sure and use a unified symbol for these color codes (default is &), but you can set it in the aforementioned method translateAlternativeColorCodes(). You seem to be using & or §, I would stick to &.
Sources:
https://www.spigotmc.org/wiki/config-files/#using-custom-configurations
https://hub.spigotmc.org/javadocs/bukkit/org/bukkit/ChatColor.html#translateAlternateColorCodes(char,java.lang.String)

Is there a way to read an Excel file using Dataflow

Is there a way to read an Excel file stored in a GCS bucket using Dataflow?
And I would also like to know if we can access the metadata of an object in GCS using Dataflow. If yes then how?
CSV files are often used to read files from excel. These files can be split and read line by line so they are ideal for dataflow. You can use TextIO.Read to pull in each line of the file, then parse them as CSV lines.
If you want to use a different binary excel format, then I believe that you would need to read in the entire file and use a library to parse it. I recommend using CSV files if you can.
As for reading the GCS metadata. I don't think that you can do this with TextIO, but you could call the GCS API directly to access the metadata. If you only do this for a few files at the start of your program then it will work and not be too expensive. If you need to read many files like this, you'll be adding an extra RPC for each file.
Be careful to not read the same file multiple times, I suggest reading each file's metadata once once and then writing the metadata out to a side input. Then in one of your ParDo's you can access the side input for each file.
Useful links:
ETL & Parsing CSV files in Cloud Dataflow
https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Read
https://cloud.google.com/dataflow/model/par-do#side-inputs
private static final int BUFFER_SIZE = 64 * 1024;
private static void printBlob(com.google.cloud.storage.Storage storage, String bucketName, String blobPath) throws IOException, InvalidFormatException {
try (ReadChannel reader = ((com.google.cloud.storage.Storage) storage).reader(bucketName, blobPath)) {
InputStream inputStream = Channels.newInputStream(reader);
Workbook wb = WorkbookFactory.create(inputStream);
StringBuffer data = new StringBuffer();
for(int i=0;i<wb.getNumberOfSheets();i++) {
String fName = wb.getSheetAt(i).getSheetName();
File outputFile = new File("D:\\excel\\"+fName+".csv");
FileOutputStream fos = new FileOutputStream(outputFile);
XSSFSheet sheet = (XSSFSheet) wb.getSheetAt(i);
Iterator<Row> rowIterator = sheet.iterator();
data.delete(0, data.length());
while (rowIterator.hasNext())
{
// Get Each Row
Row row = rowIterator.next();
data.append('\n');
// Iterating through Each column of Each Row
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext())
{
Cell cell = cellIterator.next();
// Checking the cell format
switch (cell.getCellType())
{
case Cell.CELL_TYPE_NUMERIC:
data.append(cell.getNumericCellValue() + ",");
break;
case Cell.CELL_TYPE_STRING:
data.append(cell.getStringCellValue() + ",");
break;
case Cell.CELL_TYPE_BOOLEAN:
data.append(cell.getBooleanCellValue() + ",");
break;
case Cell.CELL_TYPE_BLANK:
data.append("" + ",");
break;
default:
data.append(cell + ",");
}
}
}
fos.write(data.toString().getBytes());
}
}
}
You should be able to read the metadata of a GCS file by using the GCS API. However you would need the filenames. You can do this by doing a ParDo or other transform over a list of PCollection<string> which holds the filenames.
We don't have any default readers for excel files. You can parse from a CSV file by using a text input:(ETL & Parsing CSV files in Cloud Dataflow)
I'm not very knowledgeable on excel, and how the file format is stored. If you want to process one file at a time, you can use a PCollection<string> of files. And then use some library to parse the excel file at a time.
If an excel file can be split into easily-parallelizable parts, I'd suggest you take a look at this doc (https://beam.apache.org/documentation/io/authoring-overview/). (If you are still using Dataflow SDK, it should be similar.) It may be worth splitting into smaller chunks before reading to get more parallelization out of your pipeline. In this case you could use IOChannelFactory to read from the file.

How to print the name of the files that are processed?

I am using MultiResourceItemReader to read multiple csv files from a directory. I would want to log the file names when the read of records from it starts. Tried the option of my Pojo implements ResourceAware & printing the resource.getFileName(). But this method gets invoked every time.
Is there a way to have the fileName only once when the read starts ?
I would extends the MultiResourceItemReader and override setResources():
#Override
void setResources(Resources resources) {
// print out using `resources`
super.setResources(resources);
}

How to change binary file into RDD or Dataframe?

http://spark.apache.org/docs/latest/sql-programming-guide.html#interoperating-with-rdds
The link shows how to change txt file into RDD, and then change to Dataframe.
So how to deal with binary file ?
Ask for an example ,Thank you very much .
There is a similar question without answer here : reading binary data into (py) spark DataFrame
To be more detail, I don't know how to parse the binary file .for example , I can parse txt file into lines or words like this:
JavaRDD<Person> people = sc.textFile("examples/src/main/resources/people.txt").map(
new Function<String, Person>() {
public Person call(String line) throws Exception {
String[] parts = line.split(",");
Person person = new Person();
person.setName(parts[0]);
person.setAge(Integer.parseInt(parts[1].trim()));
return person;
}
});
It seems that I just need the API that could parse the binary file or binary stream like this way:
JavaRDD<Person> people = sc.textFile("examples/src/main/resources/people.bin").map(
new Function<String, Person>() {
public Person call(/*stream or binary file*/) throws Exception {
/*code to construct every row*/
return person;
}
});
EDIT:
The binary file contains structure data (relational database 's table,the database is a self-made database) and I know the meta info of the structure data.I plan to change the structure data into RDD[Row].
And I could change every thing about the binary file when I use FileSystem's API (http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html) to write the binary stream into HDFS .And The binary file is splittable. I don't have any idea to parse the binary file like the example code above . So I cann't try anything so far.
There is a binary record reader that is already available for spark (I believe available in 1.3.1, atleast in the scala api).
sc.binaryRecord(path: string, recordLength: int, conf)
Its on you though to convert those binaries to an acceptable format for processing.

excel to xml conversion using ado

How to convert excel data into xml file using ado.net?
You can use the Microsoft Jet OLEDB 4.0 Data Provider to read the Excel file. Information about how to establish a connection to an Excel file can be found here.
This article explains how to read an Excel file using the provider. Once you have read the data, you can compose your XML document using LINQ to XML or the System.Xml classes.
In Excel, you can save the file to XML by using the File menu and changing the saved file type to XML spreadsheet.
If you want to read an Excel XML file with ADO.Net, try the XmlReader.
Or see this step-by-step example from Microsoft.
I've not used ado.net, but I've used xquery very successfully for this. Use excel export to create an XML file, then write xquery/xpath commands to convert as you want. Excel XML export format is pretty gnarly but it does do the job. Use the Oxygen 30 day eval license to lighten the xquery debug job.
use this code :
public static DataSet exceldata(string filelocation)
{
DataSet ds = new DataSet();
OleDbCommand excelCommand = new OleDbCommand();OleDbDataAdapter excelDataAdapter = new OleDbDataAdapter();
string excelConnStr = "Provider=Microsoft.Jet.OLEDB.4.0; Data Source=" + filelocation + "; Extended Properties =Excel 8.0;";
OleDbConnection excelConn = new OleDbConnection(excelConnStr);
excelConn.Open();
DataTable dtPatterns = new DataTable();excelCommand = new OleDbCommand("SELECT UUID, `PATTERN` as PATTERN, `PLAN` as PLAN FROM [PATTERNS$]", excelConn);
excelDataAdapter.SelectCommand = excelCommand;
excelDataAdapter.Fill(dtPatterns);
dtPatterns.TableName = "Patterns";
ds.Tables.Add(dtPatterns);
return ds;
}
and then convert returned datatable to xml with DataTable.WriteXml()