Adding additional column in data being saved in cosmosDB by Azure data factory's copy activity - copy

I am using azure data factory's copy activity to copy data from a csv file in blob to CosmosDB(with SQL API). In the Sink's linked service if I do not import any schema , my copy activity on execution reads headers from CSV and then saves the data in json form in cosmosDB. Till here it works fine.
I need to add a batch id column in the data being added in cosmosDB (batch id as GUID / pipelinerunID) so that I can track which all data in a set was copied as batch.
How can I keep all my source columns and add my batch id column in it and save it in my cosmos DB.
The schema is not fixed and can change on each adf pipeline trigger so cannot do import schema and do one o one column mapping in copy activity.

Per my knowledge, you can't add custom column when you transfer data from csv to cosmos db. I suggest you using Azure Function Cosmos DB Trigger to add batchId when the document created into database as workaround.
#r "Microsoft.Azure.Documents.Client"
#r "Newtonsoft.Json"
#r "Microsoft.Azure.DocumentDB.Core"
using System;
using System.Collections.Generic;
using Microsoft.Azure.Documents;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using Microsoft.Azure.Documents.Client;a
public static void Run(IReadOnlyList<Document> documents, TraceWriter log)
{
if (documents != null && documents.Count > 0)
{
private static readonly string endpointUrl = "https://***.documents.azure.com:443/";
private static readonly string authorizationKey = "***";
private static readonly string databaseId = "db";
private static readonly string collectionId = "coll";
private static DocumentClient client;
documents[0].SetPropertyValue("batchId","123");
var document = client.ReplaceDocumentAsync(UriFactory.CreateDocumentUri(databaseId, collectionId, documents[0].id), documents[0]).Result.Resource;
log.Verbose("document Id " + documents[0].Id);
}
}
However, it seems that you need to specify the batchId by yourself which can't match the batchId in the azure data factory.
Hope it helps you.

Related

Persist String variable to Lob with Micronaut Data JDBC

I'm using Micronaut Data JDBC and I have an issue.
I have a #MappedEntity for JDBC with a content field that is a String used in a JPA context as follows:
#Lob
#Column(name = "content")
private String content;
I need to migrate this code to JDBC and I need that this content will be persisted as a Lob as well in a PostgreSQL.
With the current code, I'm just able to store the content as a String.
Any idea of how could I achieve that?
For persisting the String content as a LOB, what needs to be done is to annotate the field as follows:
#ColumnTransformer(write = "lo_from_bytea(0, ?::bytea)")
#Column(name = "content")
private String content;
by doing this, PostgreSQL stores an ID in the content column and the data is persisted in the pg_largeobject table as it was desired.

Connecting to multiple database and selecting tables at runtime spring boot java

I've a usecase where I need to connect to two different databases(Postgres and Oracle). Postgres is already configured with jpa. I need to add one more databases(Oracle). In the oracle database i need to choose tables at runtime for insertion and deletion(since tables are not fixed). Currently im passing the tables in my properties file as a list
oracle:
deletion:
table:
-
tableName: user
primaryKey: userId
emailField: emailId
deleteTableName: user_delete
-
tableName: customer
primaryKey: customerId
emailField: emailAddress
deleteTableName: customer_delete
I've created a bean that reads all these properties and puts them in a list
#Bean("oracleTables")
#ConfigurationProperties("oracle.deletion.table")
List<Table> getAllTAbles(){
return new ArrayList<>();
}
I have a list of emailAddresses with me. For each of these tables i need to fetch primary key based on emailAddress from parent table(value in tableName) and insert data into corresponding deleteTable(value in deleteTableName). Once that is done i need to delete data from the actual table(value in tableName) based on email address.
I'm planning to loop through the list of tables I have in my bean and perform fetch, insert and delete.
sample snippet
#Autowired
#Qualifier("oracleTables")
List<Table> tables;
public boolean processDelete(List<String> emails){
for(Table table:tables){
//fetch all the primary keys for given emails from main table(value in tableName)
//insert into corresponding delete table
//delete from main table
}
}
But the question i have is , should i go with jdbcTemplate or jpaRepository/hibernate. And some help with implementation as well with a small sample/link.
The reason for this question is
1)Tables in my case are not fixed
2)I need transaction management to rollback in case of failure in either fetching or inserting or deletion.
3)I need to configure two databases
should i go with jdbcTemplate or jpaRepository/hibernate
Most definitely JdbcTemplate. JPA does not easily allow dynamic tables.
I need transaction management to rollback in case of failure in either fetching or inserting or deletion
If you need transactions, you'll also need to define two separate transaction managers:
#Bean
public TransactionManager oracleTransactionManager() {
var result = new DataSourceTransactionManager();
...
result.setDataSource(oracleDataSource());
return result;
}
#Bean
public TransactionManager postgresTransactionManager() {
...
}
Then, if you want declarative transactions, you need to specify the manager with which to run a given method:
#Transactional(transactionManager = "oracleTransactionManager")
public void doWorkInOracleDb() {
...
}
I need to configure two databases
Just configure two separate DataSource beans. Of course, you will actually need two separate JdbcTemplate beans as well.

How to retrieve/set SQL query for an ItemReader from database?

I have spring batch program which reads the data from DB and process it and inserts(using ItemWriter) in to other table in the database. Here i am using a bunch of SQL queries for ItemReader,ItemProcessor and ItemWriter.
my requirement is store all these queries in a table with parameter and value format and retrieve it with a single DB call and pass it to ItemReader or ItemProcessor or ItemrWriter. So that if there is any change in the queries in future, we will end up in doing only DB updates and the code will be untouched.
I tried to do in beforeJob section but i am facing error saying "java.lang.IllegalArgumentException: The SQL query must be provided". But i can do this successfully by making an DB call inside the ItemReader method. Iam trying to avoid this way of approach because i need to make db call for each ItemReader,ItemProcessor and ItemWriter. Please let me know how to achieve this ?
You can create a step with a tasklet that reads the query from the database and adds it to the execution context under some key, then configure the reader of your chunk-oriented step with the query from the execution context. Here is a quick example:
1. Retrieve the query and put it in the execution context:
#Bean
public Tasklet queryRetrievalTasklet() {
return (contribution, chunkContext) -> {
String query = ""; // retrieve query from db (using a JdbcTemplate for example)
chunkContext.getStepContext().getJobExecutionContext().put("query", query);
return RepeatStatus.FINISHED;
};
}
2. Configure the reader with the query from the execution context
#Bean
#StepScope
public ItemReader<Integer> itemReader(#Value("#{jobExecutionContext['query']}") String query) {
// return your reader configured with the query
return null;
}
Hope this helps.
In my opinion, such configuration is usually done storing queries in properties not in database. Like :
batch.query.unload=SELECT ...

Generating and accessing stored procedures using Entity framework core

I am implementing Asp.Net core Web API , entity framework core, database first approach using Visual Studio 2017. I have managed to generate the context and class files based on an existing database. I need to access stored procedures using my context. In earlier version of entity framework it was simple by selecting the stored procedure objects in the wizard and generating an edmx that contains those objects. I could then access stored procedures via the complex type objects exposed by entity framework. How do I do a similar thing in entity framework core. An example would help ?
Database first approach is not there in EF Core with edmx files.Instead you have to use Scaffold-DbContext
Install Nuget packages Microsoft.EntityFrameworkCore.Tools and Microsoft.EntityFrameworkCore.SqlServer.Design
Scaffold-DbContext "Server=(localdb)\mssqllocaldb;Database=Blogging;Trusted_Connection=True;" Microsoft.EntityFrameworkCore.SqlServer -OutputDir Models
but that will not get your stored procedures. It is still in the works,tracking issue #245
But, To execute the stored procedures, use FromSql method which executes RAW SQL queries
e.g.
var products= context.Products
.FromSql("EXECUTE dbo.GetProducts")
.ToList();
To use with parameters
var productCategory= "Electronics";
var product = context.Products
.FromSql("EXECUTE dbo.GetProductByCategory {0}", productCategory)
.ToList();
or
var productCategory= new SqlParameter("productCategory", "Electronics");
var product = context.Product
.FromSql("EXECUTE dbo.GetProductByName #productCategory", productCategory)
.ToList();
There are certain limitations to execute RAW SQL queries or stored procedures.You can’t use it for INSERT/UPDATE/DELETE. if you want to execute INSERT, UPDATE, DELETE queries, use the ExecuteSqlCommand
var categoryName = "Electronics";
dataContext.Database
           .ExecuteSqlCommand("dbo.InsertCategory #p0", categoryName);
The above examples work fine when executing a Stored Procedure if you are expecting the result set to be the same as any object already defined. But what if you want a resultset that is not supported? According to the developers of EF Core 2, this is a feature that will come, but there is already today an easy solution.
Create the model you want to use for your output. This model will represent the output, not a table in the database.
namespace Example.EF.Model
{
public class Sample
{
public int SampleID { get; set; }
public string SampleName { get; set; }
}
}
Then add to your context a new DBSet with your model:
public virtual DbSet<Sample> Sample { get; set; }
And then do as above, and use your model for the output:
var products = _samplecontext.Sample
.FromSql($"EXEC ReturnAllSamples {id}, {startdate}, {enddate}").ToList();
I hope this helps anyone out.
My original post - https://stackoverflow.com/a/57224037/1979465
To call a stored procedure and get the result into a list of model in EF Core, we have to follow 3 steps.
Step 1.
You need to add a new class just like your entity class. Which should have properties with all the columns in your SP. For example if your SP is returning two columns called Id and Name then your new class should be something like
public class MySPModel
{
public int Id {get; set;}
public string Name {get; set;}
}
Step 2.
Then you have to add one DbQuery property into your DBContext class for your SP.
public partial class Sonar_Health_AppointmentsContext : DbContext
{
public virtual DbSet<Booking> Booking { get; set; } // your existing DbSets
...
public virtual DbQuery<MySPModel> MySP { get; set; } // your new DbQuery
...
}
Step 3.
Now you will be able to call and get the result from your SP from your DBContext.
var result = await _context.Query<MySPModel>().AsNoTracking().FromSql(string.Format("EXEC {0} {1}", functionName, parameter)).ToListAsync();
I am using a generic UnitOfWork & Repository. So my function to execute the SP is
/// <summary>
/// Execute function. Be extra care when using this function as there is a risk for SQL injection
/// </summary>
public async Task<IEnumerable<T>> ExecuteFuntion<T>(string functionName, string parameter) where T : class
{
return await _context.Query<T>().AsNoTracking().FromSql(string.Format("EXEC {0} {1}", functionName, parameter)).ToListAsync();
}
Hope it will be helpful for someone !!!
The workaround we use in EF Core to execute stored procedures to get the data is by using FromSql method and you can execute stored procedure this way:
List<Employee> employees = dbcontext.Employee
.FromSql("GetAllEmployees").ToList();
But for Create, Update, and Delete, we use ExecuteSqlCommand like the one below:
var employee = "Harold Javier";
dbcontext.Employee
.ExecuteSqlCommand("InsertEmployee #emp", employee);
The solution Rohith / Harold Javier / Sami provided works. I would like to add that you can create a separate EF6 project to generate the C# classes for resultsets and then copy the files to your EFCore project. If you change a stored proc, you can update the result file using the methods discussed here: Stored Procedures and updating EDMX
If you need corresponding typescript interfaces, you can install this VS2017 extension typescript definition generator: https://marketplace.visualstudio.com/items?itemName=MadsKristensen.TypeScriptDefinitionGenerator
There are still be a couple of copying, but it is less tedious than creating the classes manually.
Edit: there is a VS2017 extension for generating the dbconext https://marketplace.visualstudio.com/items?itemName=ErikEJ.EFCorePowerTools. It does not do stored procedures, but it provides a right click menu item from VS project instead of the command line Scaffold-DbContext.
If you need to execute stored procedure in MySQL database from EntityFramework Core, the following code should work.
var blogTagId = 1;
var tags = await _dbContext.BlogTags.FromSqlRaw("CALL SP_GetBlogTags({0})", blogTagId).ToListAsync();

Save form data to multi-value field on MongoDB using Play2.2.2

I'm experimenting with play (v2.2.2), and I have it connected to MongoDB (v2.4.6) using jackson.
I have a models.Role class with the following attributes:
#Id
#ObjectId
public String id;
public String name;
public ArrayList<String> permissions;
On the template (roles.scala.html), I can easily get the list of permissions to be printed on the HTML, but when I try to add a new role passing a single permission as a string in the form as an #InputText field, it does not get recorded in MongoDB. I suppose that it is because play/scala is trying to assign a simple String to an ArrayList<String>.
Any ideas on the propper approach? Maybe I should do some logic on the create() method under Role class?