How to write nested query in druid? - druid

I am new to druid. I have worked with mysql databases so far. I want to know, how to write below nested mysql query as a druid query?
Select distinct(a.userId) as userIds
from transaction as a
where
a.transaction_type = 1
and a.userId IN (
select distinct(b.userId) where transaction as b where a.transaction_type = 2
)
I really appreciate your help.

There are couple of things you might be interested to know as you are new to druid.
Druid supports SQL now, it does not support all the fancy and complex feature like SQL does but it does support many standard SQL thing. It also provides the way to write SQL query in druid JSON.
Here's the more detail on that with example:
http://druid.io/docs/latest/querying/sql
Your query is simple enough so you can use druid sql feature as below:
{
"query" : "<your_sql_query>",
"resultFormat" : "object"
}
If you want to build a JSON query for above query and don't want to write entire big JSON then try this cool trick:
Running sql query to broker node with -q and it will print JSON query for you which you can use and then also modify it as necessary, here's the syntax for that:
curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -q <druid_sql_query>
In addition to this, You can also use DruidDry library which provides support to write fancy druid query in Java.

Related

Use TableProvider to generate a table and run an SQL query in Apache Beam

I want to generate an unbounded collection of rows and run an SQL query on it using the Apache Beam Calcite SQL dialect and the Apache Flink runner. Based on the source code and documentation of Apache Beam, one can do something like this using a table provider: GenerateSequenceTableProvider. But I don't understand how to use it outside of the Beam SQL CLI. I'd like to use it in my regular Java code.
I was trying to do something like this:
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
Pipeline pipeline = Pipeline.create(options);
GenerateSequenceTableProvider tableProvider = new GenerateSequenceTableProvider();
tableProvider.createTable(Table.builder()
.name("sequence")
.schema(Schema.of(Schema.Field.of("sequence", Schema.FieldType.INT64), Schema.Field.of("event_time", Schema.FieldType.DATETIME)))
.type(tableProvider.getTableType())
.build()
);
PCollection<Row> res = PCollectionTuple.empty(pipeline).apply(SqlTransform.query("select * from sequenceSchema.sequence limit 5").withTableProvider("sequenceSchema", tableProvider));
pipeline.run().waitUntilFinish();
But I'm getting Object 'sequence' not found within 'sequenceSchema' errors, so I guess I'm not actually creating the table. So how do I create the table? If I understand correctly, the values should be provided automatically by the table provider.
Basically, how to use Beam SQL table providers if I want to execute queries on tables that these providers are supposed (I think?) to generate?
The TableProvider interface is a bit difficult to work with directly. The problem you're running into is that the GenerateSquenceTableProvider, like many other TableProviders, doesn't have any way to store table metadata on its own. So calling its createTable method is actually a no-op! What you'll want to do is wrap it in an InMemoryMetaStore, something like this:
GenerateSequenceTableProvider tableProvider = new GenerateSequenceTableProvider();
InMemoryMetaStore metaStore = new InMemoryMetaStore();
metaStore.registerProvider(tableProvider);
metaStore.createTable(Table.builder()
.name("sequence")
.schema(Schema.of(Schema.Field.of("sequence", Schema.FieldType.INT64), Schema.Field.of("event_time", Schema.FieldType.DATETIME)))
.type(tableProvider.getTableType())
.build()
);
PCollection<Row> res = PCollectionTuple.empty(pipeline)
.apply(SqlTransform.query("select * from sequenceSchema.sequence limit 5")
.withTableProvider("sequenceSchema", metaStore));
(Note I haven't tested this, but I think something like it should work)
As robertwb pointed out, another option would be to just avoid the TableProvider interface and use GenerateSequence directly. You'd just need to make sure that your PCollection has a schema. Then you could process it with SqlTransform, like this:
pc.apply(SqlTransform.query("select * from PCOLLECTION limit 5"))
If you can't get TableProviders to work, you could read this as an ordinary PCollection and then apply a SqlTransform to the result.

Executing the query using bq command line in Google Big Query

I execute a query using the below Python script and the table gets populated with 2,564,691 rows. When I run the same query using Google Big Query console, it returns 17,379,353 rows (query is as-is). I was wondering whether there is some issue with the below script. Not sure whether --replace in bq query replaces the past result set instead of appending to it.
Any help would be appreciated.
dateToday = (time.strftime("%Y/%m/%d"))
dateToday1 = dateToday.replace('/','')
commandStr = "type C:\Users\query.txt | bq query --allow_large_results --replace --destination_table table:dataset1_%s -n 1" %(dateToday1)
In the Web UI you can use Query History option to navigate to respective queries.
After you locate them - you can expand respective entries and see what exactly query was executed
I am more than sure that just comparing query texts you will see source of "discrepancy" right away!
added
In Query History - not only you can see Query Text, but also all configuration properties that were used for respective query - like Write Preference for example and others. So even if query text the same you can see potential difference in configuration that will give you a clue

Is it possible to run a SQL query with EntityFramework that joins three tables between two databases?

So I've got a SQL query that is called from an API that I'm trying to write an integration test for. I have the method that prepares the data totally working, but I realized that I don't know how to actually execute the query to check that data (and run the test). Here is what the query looks like (slightly redacted to protect confidental data):
SELECT HeaderQuery.[headerid],
kaq.[applicationname],
HeaderQuery.[usersession],
HeaderQuery.[username],
HeaderQuery.[referringurl],
HeaderQuery.[route],
HeaderQuery.[method],
HeaderQuery.[logdate],
HeaderQuery.[logtype],
HeaderQuery.[statuscode],
HeaderQuery.[statusdescription],
DetailQuery.[detailid],
DetailQuery.[name],
DetailQuery.[value]
FROM [DATABASE1].[dbo].[apilogheader] HeaderQuery
LEFT JOIN [DATABASE1].[dbo].[apilogdetails] DetailQuery
ON HeaderQuery.[headerid] = DetailQuery.[headerid]
INNER JOIN [DATABASE2].[dbo].[apps] kaq
ON HeaderQuery.[applicationid] = kaq.[applicationid]
WHERE HeaderQuery.[applicationid] = #applicationid1
AND HeaderQuery.[logdate] >= #logdate2
AND HeaderQuery.[logdate] <= #logdate3
For the sake of the test, and considering I already have the SQL script, I was hoping to be able to just execute that script above (providing the where clause programmatically) using context.Database.SqlQuery<string>(QUERY) but since I have two different contexts, I'm not sure how to do that.
The short answer is no, EF doesn’t support cross database queries. However there are a few things you can try.
You can use two different database contexts (one for each database).
Run your respective queries and then merge / massage the data after
the query returns.
Create a database view and query the view through EF.
Using a SYNONYM
https://rachel53461.wordpress.com/2011/05/22/tricking-ef-to-span-multiple-databases/
If the databases are on the same server, you can try using a
DbCommandInterceptor
I’ve had this requirement before and personally like the view option.

How to modify parser in PostgreSQL to handle new keyword and parse it

I am working on implementing Selectivity hints feature in PostgreSQL 9.3.4. I am working on this only for using it in my academic research. I have decided to give selectivity information per relation as part of query like shown below.
select * from lineitem, orders where l_extendedprice <=2400 and l_orderkey = o_orderkey selectivity(lineitem, 0.3) selectivity(orders, 0.7)
I tried separating the selectivity hint portion of the query before Postgres parses query. But it becomes very clumsy. I am thinking Postgres parser modification to handle this case could be complex that is why I did not get into Postgres grammar and parser. How should I take this selectivity hints separate from a normal query and populate in my data structures?

How to build a select using Zend with a DISTINCT specific column?

I'm using Zend Framework for my website and I'd like to retrieve some data from my PostgreSQL database.
I have a request like :
SELECT DISTINCT ON(e.id) e.*, f.*, g.* FROM e, f, g
WHERE e.id = f.id_e AND f.id = g.id_f
This request works well but I don't know how to convert the DISTINCT ON(e.id) with Zend.
It seems that I can get DISTINCT rows but no distinct columns.
$select->distinct()->from("e")->join("f", "e.id = f.id_e")
->join("g", "f.id = g.id_f");
Any idea on how to make a select with distinct column ?
Thanks for help
You probably can't do this with Zend Framework since distinct on is not part of the SQL standard (end of page in Postgres documentation). Although Postgres supports it, I would assume its not part of Zend Framework because you could in theory configure another database connection which does not offer support.
If you know in advance that you're developing for a specific database (Postgres in this case), you could use manually written statements instead. You'll gain more flexibility within the queries and better performance at the cost of no longer being able to switch databases.
You would then instantiate a Zend_Db_Apdapter for Postgres. There a various methods available to get results for SQL queries which are described in the frameworks documentation starting at section Reading Query Results. If you choose to go this route I'd recommend to create an own subclass of the Zend_Db_Adapter_Pgsql class. This is to be able to convert data types and throw exceptions in case of errors instead of returning ambiguous null values and hiding error causes.