I have been using Spark-excel (https://github.com/crealytics/spark-excel) to write the output to a single sheet of an Excel sheet. However, I am unable to write the output to different sheets (tabs).
Can anyone suggest any alternative?
Thanks,
Sai
I would suggest to split the problem into two phases:
save the data into multiple csv using multiple Spark flows
write an application, that converts multiple csv files to a single excel sheet, using e.g. this Java library: http://poi.apache.org/
Related
how can I convert an uploaded CSV to dataframe in foundry using a code workbook? should I use the #transform decorator with spark.read.... (not sure of the exact syntax)?
Thx!!
CSV is a "special format" where Foundry can infer the schema of the CSV and automatically convert it to a dataset. In this example, I uploaded a CSV with hail data from NOAA:
If you hit Edit Schema on the main page, you can use a front end tool to set certain configurations about the CSV (delimiter, column name row, column type etc).
Once it's a dataset, you can import the dataset in Code Workbooks and it should work out of the box.
If you wanted to parse the raw file, I would recommend you use Code Authoring, not Code Workbooks, as it's better suited for production level code, has less overhead and is more efficient for this type of workflow (parsing raw files). If you really wanted to use code workbooks, change the type of your input using the input helper bar, or in the inputs tab.
Once you finish iterating please move this to a Code Authoring repository and repartition your data. File reads at code workbook can substantially slow down your whole code workbook pipeline. Code Authoring offers preview of raw files now, so it's just as fast to develop as using Code Workbooks.
Note: Only imported datasets and persisted datasets can be read in as Python transform inputs. Transforms that are not saved as a dataset cannot be read in as Python transform inputs. Datasets with no schema should be read in as a transform input automatically.
I've a tableau workbook using excel as a data source. The excel uses lots for formulae inside and takes input parameters using cells on a particular worksheet. The issue is I want to input these parameters through Tableau, get the excel refreshed on the backend and show the output of the excel in Tableau. Any suggestions how can I accomplish this? Thanks much in advance
You can create parameters in Tableau. The Excel calculations using these parameters would also need to move into Tableau as Calculated Fields. Basically you would pull the raw Excel data into Tableau and perform calculations there using your Tableau parameters.
I have the following scenario: several csv files contain different columns of the same table. Can I fill the redshift table from them somehow, and, ideally, with the help of the data pipeline? I couldn't find the way I can achieve this. Can anyone help with the solution or maybe simple example if it's possible?
You can do it by converting your csv files into json format prior to their load. Then particular Json tag will not be found in the file: copy will just dismiss it.
Is it possible to run multiple PostgreSQL queries, and using pgadmin3 have them each export to a separate tab on a XLSX file?
On those same lines, is it possible to run one PostgresQL query that exports to multiple tabs based on some criteria?
You'll want to use an external tool for this. PostgreSQL knows nothing about the XLSX format, nor about OpenDocument or any of that.
I suggest writing a script that exports a bunch of individual CSV files with copy. Then using an external tool to convert them to xlsx and assemble them into sheets in the document.
It's possible that ETL tools like CloverETL, Pentaho Kettle, or Talend Studio may do what you want too. I haven't checked this specific functionality.
Can someone please give me a technical design overview of how I should implement this scenario :
I am using spring batch to import data from CSV files to different tables and once they are imported I run some validations on these tables and now I need to write all those data from 3 different tables into three different Sheets of a single Excel file. Can someone please help me how I should use ItemReaders and Itemwriters to solve this problem ?
If I'm asked I would implement as follows. create xls file from your code or first step which would be method invoker. which would create the file. and pass the file job parameters.
Step 1/2 would do a chunk reading from table 1 and in the itemwriter I would use the custom Item writer which would use POI and I would write to first sheet.
Step 2 would do a chunk reading from table 2 and in the itemwriter would read second sheet.
Since you have single file you can never achieve the advantage of spring batch performance like multithread, partitioning etc. Rather than its better to write to different file with independent task