I execute a query using the below Python script and the table gets populated with 2,564,691 rows. When I run the same query using Google Big Query console, it returns 17,379,353 rows (query is as-is). I was wondering whether there is some issue with the below script. Not sure whether --replace in bq query replaces the past result set instead of appending to it.
Any help would be appreciated.
dateToday = (time.strftime("%Y/%m/%d"))
dateToday1 = dateToday.replace('/','')
commandStr = "type C:\Users\query.txt | bq query --allow_large_results --replace --destination_table table:dataset1_%s -n 1" %(dateToday1)
In the Web UI you can use Query History option to navigate to respective queries.
After you locate them - you can expand respective entries and see what exactly query was executed
I am more than sure that just comparing query texts you will see source of "discrepancy" right away!
added
In Query History - not only you can see Query Text, but also all configuration properties that were used for respective query - like Write Preference for example and others. So even if query text the same you can see potential difference in configuration that will give you a clue
Related
I've looked at this resource, but it's not quite what I need. This question is what I want to accomplish, but I want to run it in the BQ terminal.
For instance, in the past I've exported table information as a .json in bq command-line as so:
bq show --schema --format=prettyjson Dataset.TableView > /home/directory/Dataset.TableView.json
This gives a prettyjson of Table information of a specified dataset in a set project. I would like to just have a .csv (or any type of list) of all dataset names in the project. But I can't figure out how to change that command-line appropriately to output what I want.
In order to further contribute to the community, as an alternative to #DanielZagales answer, using the bq command line. According to the documentation, you can use the bq ls to list all the datasets in a project. Such as follows,
bq ls -a --format=pretty --project_id your-project-id
The flag -a is short for --all, which guarantees that all the datasets will be included in the list. The flag --format=pretty will output the list as a table format, you can use other formatting such as described here. Furthermore, you can also filter the datasets which match an expression with --filter labels.key:value or set the maximum number of results with --max_results or -n.
Note: you can also list all the tables within a dataset, such as described here.
You should be able to query the information schema to get the results you want.
example:
select * from `project_id.INFORMATION_SCHEMA.SCHEMATA`;
You can then add that to the bq command like:
bq query --format=csv 'select * from `project_id.INFORMATION_SCHEMA.SCHEMATA`;'
I am trying to create a filter for a field that contains over 5000 unique values. However, the filter's query is automatically setting a limit of 1000 rows, meaning that the majority of the values do not get displayed in the filter dropdown.
I updated the config.py file inside the 'anaconda3/lib/python3.7/site-packages' directory by increasing the DEFAULT_SQLLAB_LIMIT and QUERY_SEARCH_LIMIT to 6000, however this did not work.
Is there any other config that I need to update?
P.S - The code snippet below shows the json representation of the filter where the issue seems to be coming from.
"query": "SELECT casenumber AS casenumber\nFROM pa_permits_2019\nGROUP BY casenumber\nORDER BY COUNT(*) DESC\nLIMIT 1000\nOFFSET 0"
After using the grep command to find all files containing the text '1000', I found out the the filter limit can be configured through the filter_row_limit in viz.py
I am pretty new to Pentaho so my query might sound very novice.
I have written a transformation in which am using CSV file input step and table input step.
Steps I followed:
Initially, I created a parameter in transformation properties. The
parameter birthdate doesn't have any default value set.
I have used this parameter in postgresql query in table input step
in the following manner:
select * from person where EXTRACT(YEAR FROM birthdate) > ${birthdate};
I am reading the CSV file using CSV file input step. How do I assign the birthdate value which is present in my CSV file to the parameter which I created in the transformation?
(OR)
Could you guide me the process of assigning the CSV field value directly to the SQL query used in the table input step without the use of a parameter?
TLDR;
I recommend using a "database join" step like in my third suggestion below.
See the last image for reference
First idea - Using Table Input as originally asked
Well, you don't need any parameter for that, unless you are going to provide the value for that parameter when asking the transformation to run. If you need to read data from a CSV you can do that with this approach.
First, read your CSV and make sure your rows are ok.
After that, use a select values to keep only the columns to be used as parameters.
In the table input, use a placeholder (?) to determine where to place the data and ask it to run for each row that it receives from the source step.
Just keep in ming that the order of columns received by the table input (the columns out of the select values) is the same order that it will be used for the placeholders (?). This should not be a problem with your question that uses only one placeholder, but keep that in mind as you ramp up using Pentaho.
Second idea, using a Database Lookup
This is another approach where you can't personalize the query made to the database and may experience a better performance because you can set a "Enable cache" flag and if you don't need to use a function on your where clause this is really recommended.
Third idea, using a Database Join
That is my recommended approach if you need a function on your where clause. It looks a lot like the Table Input approach but you can skip the select values step and select what columns to use, repeat the same column a bunch of times and enable a "outer join" flag that returns the rows without result from the query
ProTip: If you feel the transformation running too slow, try to use multiple copies from the step (documentation here) and obviously make sure the table have the appropriate indexes in place.
Yes there's a way of assigning directly without the use of parameter. Do as follows.
Use Block this step until steps finish to halt the table input step till csv input step completes.
Following is how you configure each step.
Note:
Postgres query should be select * from person where EXTRACT(YEAR
FROM birthdate) > ?::integer
Check Execute for each row and Replace variables in in Table input step.
Select only the birthday column in CSV input step.
What I'm trying to do is, if a field is blank, use another field within WRKQRY(Query/400) in define result fields. Is this possible?
You can create an SQL view using the RUNSQLSTM command and then run a query over the view.
CREATE VIEW QTEMP/MYVIEW AS
SELECT F1, CASE WHEN F2 <> ' ' THEN F2 ELSE F3 END AS FX FROM MYLIB/MYFILE
Then tie it all together with a CL program.
PGM
DLTF FILE(QTEMP/MYVIEW)
MONMSG MSGID(CPF0000)
RUNSQLSTM SRCFILE(MYLIB/MYSRC) MBR(MYMBR)
RUNQRY QRY(MYLIB/MYQRY)
ENDPGM
Query/400 is obsolete, and should be considered deprecated. It was replaced about 2 decades ago by Query Management. Query/400 queries run under the old database optimizer (CQE) and cannot benefit from newer faster optimization techniques employed by the new optimizer (SQE). It is recommended to migrate Query/400 queries to QM Query or to DB2 Web Query.
Fortunately, Query Management Queries can be created in a prompted mode which should be very familiar to Query/400 users. Prompted-mode queries can be converted to the more powerful SQL-mode.
You can use the RTVQMQRY command to generate SQL source from the Query/400 query you have asked about Once you have the source, you can then use the CASE ... END expression given by #Mike. Create the QM query with the CRTQMQRY command, and run it with STRQMQRY.
If you still need to do this, I can show you how to do it in 3 passes of Query 400.
Yeah, I know that's not efficient but it can be done.
Take a look at CASE that should work for you.
CASE field
WHEN ' ' THEN newfield
ELSE field
END as myfield
I have a complicated dynamic query in TSQL that I want to export to Excel.
[The result table contains fields with text longer than 255 chars, if it matters]
I know I can export result using the Management Studio menus but I want to do it automatically by code. Do you know how?
Thanks in advance.
You could have a look at sp_send_dbmail. This allows you to send an email from your query after it's run, containing an attached CSV of the resultset. Obviously the viability of this method would be dependent on how big your resultset is.
Example from the linked document:
EXEC msdb.dbo.sp_send_dbmail
#profile_name = 'AdventureWorks2008R2 Administrator',
#recipients = 'danw#Adventure-Works.com',
#query = 'SELECT COUNT(*) FROM AdventureWorks2008R2.Production.WorkOrder
WHERE DueDate > ''2006-04-30''
AND DATEDIFF(dd, ''2006-04-30'', DueDate) < 2' ,
#subject = 'Work Order Count',
#attach_query_result_as_file = 1 ;
One way is to use bcp which you can call from the command line - check out the examples in that reference, and in particular see the info on the -t argument which you can use to set the field terminator (for CSV). There's this linked reference on Specifying Field and Row Terminators.
Or, directly using TSQL you could use OPENROWSET as explained here by Pinal Dave.
Update:
Re;: 2008 64Bit & OPENROWSET - I wasn't aware of that, quick dig throws up this on MSDN forums with a link given. Any help?
Aside from that, other options include writing an SSIS package or using SQL CLR to write an export procedure in .NET to call directly from SQL. Or, you could call bcp from TSQL via xp_cmdshell - you have to enable it though which will open up the possible "attack surface" of SQL Server. I suggest checking out this discussion.
Some approaches here: SQL Server Excel Workbench
I needed to accept a dynamic query and save the results to disk so I can download it through the web application.
insert into data source didn't work out for me because of continued effort in getting it to work.
Eventually I went with sending the query to powershell from SSMS
Read my post here
How do I create a document on the server by running an existing storedprocedure or the sql statement of that procedure on a R2008 sql server
Single quotes however was a problem and at first i didn't trim my query and write it on one line so it had line breaks in sql studio which actually matters.