Druid: Firehose to import records from database

Druid: Firehose to import records from database - druid

Unlike default examples where we have firehose(s) for importing rows from csv, tsv etc, do we have one so we can import records from database and insert into druid? Any thoughts?
Here is what I was thinking -
"firehose": {
"type" : "database",
"datasource" : {
"connectURI" : "jdbc:mysql://localhost:3306/test",
"user" : "druid",
"password" : "xyz123"
},
"query" : "select * from table"
"frequency" : "P1M"
}
We can possibly extend it to get connection via jndi datasource and few others. Does this sort of implementation have any issues?

How about this idea? It's custom firehose for jdbc ingestion.
In this case, only supports one time query ingestion.
https://github.com/sirpkt/druid/tree/jdbc_firehose/extensions-contrib/jdbc-firehose
This is code snippet. Using DBI library try to get result set from existing database server.
public Firehose connect(final MapInputRowParser parser) throws IOException, ParseException, IllegalArgumentException
{
if (columns != null) {
verifyParserSpec(parser.getParseSpec(), columns);
}
final Handle handle = new DBI(
connectorConfig.getConnectURI(),
connectorConfig.getUser(),
connectorConfig.getPassword()
).open();
final String query = makeQuery(columns);
final ResultIterator<InputRow> rowIterator = handle
.createQuery(query)
.map(
new ResultSetMapper<InputRow>()
{
List<String> queryColumns = (columns == null) ? Lists.<String>newArrayList(): columns;
#Override
public InputRow map(
final int index,
final ResultSet r,
final StatementContext ctx
) throws SQLException
{
try {
if (queryColumns.size() == 0)
{
ResultSetMetaData metadata = r.getMetaData();
for (int idx = 1; idx <= metadata.getColumnCount(); idx++)
{
queryColumns.add(metadata.getColumnName(idx));
}
Preconditions.checkArgument(queryColumns.size() > 0,
String.format("No column in table [%s]", table));
verifyParserSpec(parser.getParseSpec(), queryColumns);
}
ImmutableMap.Builder<String, Object> builder = new ImmutableMap.Builder();
for (String column: queryColumns) {
builder.put(column, r.getObject(column));
}
return parser.parse(builder.build());
} catch(IllegalArgumentException e) {
throw new SQLException(e);
}
}
}
).iterator();

Related

Liquibase update throwing error when running in parallel

I am trying to generate DDL scripts for my JPA entities and it works just fine.
I am adding a unit test for the same which is providing me with this error
Caused by: liquibase.exception.ServiceNotFoundException: Could not find unique implementation of liquibase.executor.Executor. Found 0 implementations
at liquibase.servicelocator.ServiceLocator.findClass(ServiceLocator.java:185)
at liquibase.servicelocator.ServiceLocator.newInstance(ServiceLocator.java:211)
... 21 more
My testng test looks like this
#DataProvider(parallel = true)
public Object[][] listToArrays() {
File f=new File(PropertyReader.getInstance().getProperty("path");
File[] files=f.listFiles();
List<MetaData> list =prepare(files);
Object[][] array = new Object[list.size()][1];
for (int i = 0; i < list.size(); i++) {
array[i][0] = list.get(i);
}
return array;
}
#Test(dataProvider = "listToArrays")
public void test(MetaData s) throws ScriptGenerationException,
UnsupportedEncodingException, LiquibaseException, IOException {
String[] rdbmsTypes=new String[]
{"MSSQL","POSTGRES","ORACLE","MYSQL"};
for (String rdbmsType : rdbmsTypes) {
FileSystemResourceAccessor fsOpener = new FileSystemResourceAccessor();
CommandLineResourceAccessor clOpener = new CommandLineResourceAccessor(this.getClass().getClassLoader());
CompositeResourceAccessor fileOpener = new CompositeResourceAccessor(new ResourceAccessor[] { fsOpener, clOpener });
Database database = CommandLineUtils.createDatabaseObject(fileOpener, this.url, this.username, this.password, this.driver,
this.defaultCatalogName, this.defaultSchemaName, Boolean.parseBoolean(this.outputDefaultCatalog),
Boolean.parseBoolean(this.outputDefaultSchema), this.databaseClass,
this.driverPropertiesFile, this.propertyProviderClass, this.liquibaseCatalogName,
this.liquibaseSchemaName, this.databaseChangeLogTableName, this.databaseChangeLogLockTableName);
Liquibase liquibase=new Liquibase(d, null, database);
Writer w=getOutputWriter();
liquibase.update(new Contexts(this.contexts), new LabelExpression(this.labels), w);
w.close();
}
}
What could be the cause of the problem ?
Note : I am only using Liquibase in offline mode.

Dynamic JPQL query qith JOIN

Had to write a jpql query, based on the input need to add and condition and for some input had to need JOIN queries.
#Override
public List<IncidentHdr> fetchIncidents(IncidentHdrDto incidentHdrDto) {
StringBuilder query = new StringBuilder();
query.append(ReposJPQL.GET_INCIDENT_DETAILS);
Map<String, Object> parameters = new HashMap<String, Object>();
List<String> criteria = new ArrayList<String>();
if(incidentHdrDto.getIncidentId() > 0) {
criteria.add("inc.incidentId = :incidentId");
parameters.put("incidentId", incidentHdrDto.getIncidentId());
}
if(incidentHdrDto.getCatCode() > 0) {
criteria.add("inc.catCode = :catCode");
parameters.put("catCode", incidentHdrDto.getCatCode());
}
if(incidentHdrDto.getType != null) {
//here i need to generate a join query
//SELECT * FROM INCIDENT JOIN CATEGORY_MAST ON(INCIDENT.CAT_CODE = CATEGORY_MAST.CAT_CODE) WHERE CATEGORY_MAST.TYPE_CODE = 16
}
Query q = em.createQuery(query.toString());
logger.info("Get Incidents Query : "+query.toString());
for (Entry<String, Object> entry : parameters.entrySet()) {
q.setParameter(entry.getKey(), entry.getValue());
}
List<IncidentHdr> incidentHdrs = q.getResultList();
return incidentHdrs;
}
where as ReposJPQL is the base query which had a where condition.
public interface ReposJPQL {
public String GET_INCIDENT_DETAILS = "SELECT inc FROM IncidentHdr inc WHERE 1 = 1" ;
}

Camel mongodb - MongoDbProducer multiple inserts

I am trying to do a multiple insert using the camel mongo db component.
My Pojo representation is :
Person {
String firstName;
String lastName;
}
I have a processor which constructs a valid List of Person pojo and is a valid json structure.
When this list of Person is sent to the mongodb producer , on invocation of createDoInsert the type conversion to BasicDBObject fails. This piece of code below looks to be the problem. Should it have more fall backs / checks in place to attempt the list conversion down further below as it fails on the very first cast itself. Debugging the MongoDbProducer the exchange object being received is a DBList which extends DBObject. This causes the singleInsert flag to remain set at true which fails the insertion below as we get a DBList instead of a BasicDBObject :
if(singleInsert) {
BasicDBObject insertObjects = (BasicDBObject)insert;
dbCol.insertOne(insertObjects);
exchange1.getIn().setHeader("CamelMongoOid", insertObjects.get("_id"));
}
The Camel MongoDbProducer code fragment
private Function<Exchange, Object> createDoInsert() {
return (exchange1) -> {
MongoCollection dbCol = this.calculateCollection(exchange1);
boolean singleInsert = true;
Object insert = exchange1.getIn().getBody(DBObject.class);
if(insert == null) {
insert = exchange1.getIn().getBody(List.class);
if(insert == null) {
throw new CamelMongoDbException("MongoDB operation = insert, Body is not conversible to type DBObject nor List<DBObject>");
}
singleInsert = false;
insert = this.attemptConvertToList((List)insert, exchange1);
}
if(singleInsert) {
BasicDBObject insertObjects = (BasicDBObject)insert;
dbCol.insertOne(insertObjects);
exchange1.getIn().setHeader("CamelMongoOid", insertObjects.get("_id"));
} else {
List insertObjects1 = (List)insert;
dbCol.insertMany(insertObjects1);
ArrayList objectIdentification = new ArrayList(insertObjects1.size());
objectIdentification.addAll((Collection)insertObjects1.stream().map((insertObject) -> {
return insertObject.get("_id");
}).collect(Collectors.toList()));
exchange1.getIn().setHeader("CamelMongoOid", objectIdentification);
}
return insert;
};
}
My route is as below :
<route id="uploadFile">
<from uri="jetty://http://0.0.0.0:9886/test"/>
<process ref="fileProcessor"/>
<unmarshal>
<csv>
<header>fname</header>
<header>lname</header>
</csv>
</unmarshal>
<process ref="mongodbProcessor" />
<to uri="mongodb:mongoBean?database=axs175&collection=insurance&operation=insert" />
and the MongoDBProcessor constructing the List of Person Pojo
#Component
public class MongodbProcessor implements Processor {
#Override
public void process(Exchange exchange) throws Exception {
ArrayList<List<String>> personlist = (ArrayList) exchange.getIn().getBody();
ArrayList<Person> persons = new ArrayList<>();
for(List<String> records : personlist){
Person person = new Person();
person.setFname(records.get(0));
person.setLname(records.get(1));
persons.add(person);
}
exchange.getIn().setBody(persons);
}
}
Also requested information here - http://camel.465427.n5.nabble.com/Problems-with-MongoDbProducer-multiple-inserts-tc5792644.html

This issue is now fixed via - https://issues.apache.org/jira/browse/CAMEL-10728

How to select shard for search query?

I'm recently implemented ShardIdentifierProvider. It is working fine. But how to ensure it is using only one shared for query?
public class SiteIdAsShardIdProvider extends ShardIdentifierProviderTemplate {
#Override
protected Set<String> loadInitialShardNames(Properties properties, BuildContext buildContext) {
ServiceManager serviceManager = buildContext.getServiceManager();
SessionFactory sessionFactory = serviceManager.requestService(HibernateSessionFactoryServiceProvider.class, buildContext);
Session session = sessionFactory.openSession();
try {
#SuppressWarnings("unchecked")
List<String> ids = session.createSQLQuery("select cast(id as CHAR(3)) from website").list();
return new HashSet<>(ids);
} finally {
session.close();
}
}
#Override
public String getShardIdentifier(Class<?> entityType, Serializable id, String idAsString, Document document) {
return document.getFieldable("siteId").stringValue();
}
}

Creating your own custom filter and overriding getShardIdentifiersForQuery should do the trick. Here is something that does approximately the same as what's in the documentation, but with a ShardIdentifierProviderTemplate:
#Override
public Set<String> getShardIdentifiersForQuery(FullTextFilterImplementor[] filters) {
FullTextFilter filter = getFilterByName( filters, "customer" );
if ( filter == null ) {
return getAllShardIdentifiers();
}
else {
Set<String> result = new HashSet<>();
result.add( filter.getParameter( "customerID" ) );
return result;
}
}
private FullTextFilter getFilterByName(FullTextFilterImplementor[] filters, String name) {
for ( FullTextFilterImplementor filter: filters ) {
if ( filter.getName().equals( name ) ) {
return filter;
}
}
return null;
}
I created a ticket to update the documentation: https://hibernate.atlassian.net/browse/HSEARCH-2513

The shard selection at query time is controlled by using a custom Filter.
See "5.3.1. Using filters in a sharded environment" for details and examples.

How to get table metadata from camel-sql component

I'm looking for a way to get all the column meta data for the given table name using camel-sql component.
Though it uses spring-jdbc behind the scenes i do not see a way to get the ResultSetMetaData.

I couldn't find a direct way to get the column details from camel-sql component, For now managed to get the information using spring jdbc template and data source.
public List<String> getColumnNamesFromTable(final TableData tableData) throws MetaDataAccessException {
final List<String> columnNames = new ArrayList<String>();
JdbcTemplate jdbcTemplate = new JdbcTemplate(dataSource);
StringBuilder query = new StringBuilder();
query.append("SELECT * FROM ").append(SINGLE_BLANK_SPACE);
query.append(tableData.getSchemaName());
query.append(tableData.getTableName()).append(SINGLE_BLANK_SPACE);
query.append("WHERE rownum < 0");
jdbcTemplate.query(query.toString(), new ResultSetExtractor<Integer>() {
#Override
public Integer extractData(ResultSet rs) throws SQLException, DataAccessException {
ResultSetMetaData rsmd = rs.getMetaData();
int columnCount = rsmd.getColumnCount();
for (int i = 1; i <= columnCount; i++) {
columnNames.add(rsmd.getColumnName(i).toUpperCase());
}
return columnCount;
}
});
return columnNames;
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Druid: Firehose to import records from database - druid

Related

Liquibase update throwing error when running in parallel

Dynamic JPQL query qith JOIN

Camel mongodb - MongoDbProducer multiple inserts

How to select shard for search query?

How to get table metadata from camel-sql component

Categories

Resources