I have a Jenkins job that runs the Selenium tests. The results are then stored in a CSV file and then fed to Cassandra. My requirement is to create JIRA request if the test fails either by analyzing the CSV file or from Cassandra. Please suggest the possible approaches.
Jira API + CSV Reader or Cassandra API
https://docs.atlassian.com/jira/REST/latest/
Related
The premise is simple two ADLS Gen 2 accounts. Accessing both as abfss://
Source account with directory format yyyy/mm/dd records stored as jsonl format.
I need to read this account recursively starting at any directory.
Transform the data.
Write the data to the target account in parquet with format Year=yyyy/Month=mm/Day=dd.
The source account is an ADLS Gen 2 account with private endpoints that is not a part of Synapse Analytics.
The target account is the default ADLS Gen 2 for Azure Synapse Analytics.
Using Spark notebook within Synapse Analytics with managed virtual network.
Source storage account has private endpoints.
Code written in pyspark.
Using linked services for both ADLS Gen 2 accounts setup with private endpoints.
from pyspark.sql.functions import col, substring
spark.conf.set("spark.storage.synapse.linkedServiceName", psourceLinkedServiceName)
spark.conf.set("fs.azure.account.auth.type", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedSASProvider")
#read the data into a data frame
df = spark.read.option("recursiveFileLookup","true").schema(inputSchema).json(sourceFile)
#perform the transformations to flatten this structure
#create the partition columns for delta lake
dfDelta = df.withColumn('ProductSold_OrganizationId',col('ProductSold.OrganizationId'))\
.withColumn('ProductSold_ProductCategory',col('ProductSold.ProductCategory'))\
.withColumn('ProductSold_ProductId',col('ProductSold.ProductId'))\
.withColumn('ProductSold_ProductLocale',col('ProductSold.ProductLocale'))\
.withColumn('ProductSold_ProductName',col('ProductSold.ProductName'))\
.withColumn('ProductSold_ProductType',col('ProductSold.ProductType'))\
.withColumn('Year',substring(col('CreateDate'),1,4))\
.withColumn('Month',substring(col('CreateDate'),6,2))\
.withColumn('Day',substring(col('CreateDate'),9,2))\
.drop('ProductSold')
#dfDelta.show()
spark.conf.set("spark.storage.synapse.linkedServiceName", psinkLinkedServiceName)
spark.conf.set("fs.azure.account.auth.type", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedSASProvider")
dfDelta.write.partitionBy("Year","Month","Day").mode('append').format("parquet").save(targetFile) ;
when trying to access a single file you get the error message
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (vm-19231616 executor 2): java.nio.file.AccessDeniedException: Operation failed: "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.", 403, HEAD,
The error indicates that it cannot authenticate to the source file. However
Here is where it gets strange.
I uncomment the line
#dfDelta.show()
And it works.
However, if you put the comment back and run it again it continues to work. The only way to see the failure again is to totally stop the spark session and then restart it.
Ok more strangeness
change the sourcefile path so that its something like 2022/06 which should read multiple files and regardless of if the dfDelta.show() statement is uncommented you get the same error.
The only method I have found to get this to work is to process one file at a time within the spark notebook. option("recursiveFileLookup","true") only works when there is one file that is processed.
Finally what have I tried.
I have tried creating another spark session.
Spark session 1 reads the data and puts into a view.
Spark session 2 reads the data in the view and and attempts to write it.
The configuration for spark session two uses the token library for the sink file.
Results in the same error message.
My best guess - this has something to do with the spark cores processing multiple files and when I change the config they get confused as to how to read from the source file.
I had this working perfectly before I changed the Synapse Analytics account to use a managed virtual network. But in that case I accessed the source storage account using a managed identity linked service and had not issues writing to the default Synapse Analytics ADLS Gen 2 account.
Also tried option("forwardSparkAzureStorageCredentials", "true") on both the read and write dataframes
Any suggestions on how to get this to work with multiple files would be appreciated.
I have a source of SAP BW Open Hub in data factory and a sink of Azure data lake gen2 and am using a copy activity to move the data.
I am attempting to transfer the data to the lake and split into numerous files, with 200000 rows per file. I would also like to be able to prefix all of the filenames e.g. 'cust_', so the files would be something along the lines of cust_1, cust_2, cust_3 etc.
This method only seems to be an issue when using SAP BW Open Hub as a source (it works fine when using SQL Server as a source. Please see the warning message below. After checking with out internal SAP BW team, they assure me that the data is in a tabular format, and no explicit partition is enabled, so there shouldn't be an issue.
When executing the copy activity, the files are transferred to the lake but the file name prefix setting is ignored, and the filenames instead are set automatically, as below (the name seems to be automatically made up of the SAP BW Open Hub table and the request ID):
Here is the source config:
All other properties on the other tabs are set to default and have been unchanged.
QUESTION: without using a data flow, is there any way to split the files when pulling from SAP BW Open Hub and also be able to dictate the filenames in the lake?
I tried to reproduce the issue and it works fine with a work around. Instead of splitting the data while copying from SAP BW to Azure data lake storage, you can just simply copy the entire exact data (without partition) into the Azure SQL Database. Please follow copy data from SAP Business warehouse by using azure data factory (make sure to use Azure SQL Database as sink).
Now the data is in you Azure SQL Database, you can now simply use the copy activity to copy the data to Azure data lake storage.
In source configuration, keep “Partition option” as None.
Source Config:
Sink config:
Output:
I am very new to Apache Nifi. I am trying to Migrate data from Oracle to Mongo DB as per the screenshot in Apache NiFi. I am failing with the reported error. Pls help.
Till PutFile i think its working fine, as i can see the below Json format file in my local directory.
Simple setup direct from Oracle Database to MongoDb without SSL or username and password (not recommended for Production)
Just keep tinkering on PutMongoRecord Processor until you resolve all outstanding issues and exclamation mark is cleared
I am first using an ExecuteSQL processor which is resulting the dataset in Avro, I need the final data in JSON. In DBconnection pooling Service, you need to create a controller with the credentials of your Orcale database. Post that I am using Split Avro and then Transform XML to convert it into JSON. In Transform XML, you need to use XSLT file. After that, I use PutMongo Processor for ingestion in Json which gets automatically converted in BSON
I have a spring batch boot app which takes a flat file as input . I converted the app into cloud task and deployed in spring local data flow server. Next , I created a stream starting with File Source -> tasklaunchrequest-transform -> task-launcher-local which starts my batch cloud task app .
It looks like that the File does not come into the batch app . I do not see anything in the logs to indicate that.
I checked the docs at https://github.com/spring-cloud-stream-app-starters/tasklaunchrequest-transform/tree/master/spring-cloud-starter-stream-processor-tasklaunchrequest-transform
It says
Any input type. (payload and header are discarded)
My question is how do I pass the file as payload from File Source to the Batch app which seems to be a very basic feature.
any help is very much appreciated.
You'll need to write your own transformer that takes the data from the source and packages it up so your task can consume it.
i am new to spring batch,i need few clarifications regarding spring batch admin.
can i do job configuration related information in database instead of uploading XML file based configuration???