Dataprep sort in reverse order does not work, any solutions? - google-cloud-dataprep

I've come across a very annoying bug in Google Dataprep.
According to this page: https://cloud.google.com/dataprep/docs/html/Window-Transform_57344658, it should be possible to reverse the order of sorting by adding a dash in front of the column name.
However, although the preview shows that the data is correctly sorted, the output will always be sorted in ascending order.
I have tested it in various ways and I'm sure it is a bug in the system.
The formula I'm trying to use is a PREV(column_name, 1) function which is not grouped, but is ordered by column_name and -date.
To subsequently deduplicate the dataset based on this column: If(window==column_name)
Hopefully it will be solved as soon as possible. The current situation asks for a workaround. Does anyone know an elegant solution?

I have been able to reproduce the issue you reported when using a Window function ordered in reverse order using a dash -OrderColumn.
I see that you have already created a Dataprep issue in Public Issue Tracker and another member of the GCP Support team was also able to reproduce this behavior and reported it to the Dataprep team. Cloud Dataprep is a third party product developed and managed by a company called Trifacta, and this issue requires support from their product team. I see that the Trifacta Dataprep team is working on the issue reported, and their first response is that, while they work on this, a quick workaround is to use sort in a separate step.
As a final note, maybe next time you can share screenshots in your question in order to provide an easier understanding of your issue.

Related

Regarding non-availability of Project Management Chart

Our company has recently introduced Azure DevOps to streamline project management process. Currently, 140 projects are created under our organization in Azure DevOps. As and when requirement comes from client for any specific project, we create tasks/bugs for different developers under that project. Currently we use only two Work Item type - Bug and Task.
Now the issue is Management of the company wants to see Project-wise number of "New/Open", "Active" and "Closed" Tasks and Bugs in a SINGLE chart. That means, that single chart must fit consolidated data of 140 projects. If a person views that single chart they must get idea, for example that - Project 1 has 2 new/open work items, 2 active work items and 2 closed Work items , Project 2 has 1 new/open work items, 10 active work items and 3 closed Work items and so on.. This is done so that management in a glance can understand which project is lagging behind for customer delivery. So that they can work accordingly build more manpower for those Projects.
I have tried to create various such charts and widgets with different queries in Azure DevOps. I used widget burn up and burn out charts but it gives data for tasks of single Project only. Also when we add multiple projects to it, it shows summation of completed/remaining tasks for those Projects & NOT Project name-wise completed/remaining tasks bifurcation.
I also tried "Charts for work item" widget but it also fetches count as per- Assignee, State and Work Item type and not project name wise count is fetched.
I don't want to navigate through 140 projects pages to see it's open, active and closed tasks. So please help me out in suggesting the ideas on How can I build a single chart from where we can get all this data? I will be forever grateful for your answers.
Thank you!
You could create a query across projects, select Team Project column in Column option, ad save the query as shared query. Check the screenshots below:
Then add a chart widget to a dashboard, select Pivot table, and set Team Projects, State as Rows and Columns. Check the screenshot below:
You could expand the view to see more details:
https://learn.microsoft.com/en-us/azure/devops/report/dashboards/charts?view=azure-devops#add-a-chart-widget-to-a-dashboard
I'd be slightly careful based on what you've put down because I don't think your management team quite understand what they need, and DevOps can only do so much. I'd be challenging them around the setup of your DevOps process personally because I don't think it's advisable to not have user stories as part of your setup. Although it simplifies some aspects of DevOps, our experience has been that people have been able to group things together better with user stories as well as tasks.
Appreciate it's a good idea to be able to see what's going on across all projects, but I think there are probably further criteria to think about. E.g. do you want to see estimates instead of/as well as the count of the items since that will have a better reflection of the effort required. In terms of completed items and in fact, probably all that you're displaying, again it's more on your project process, but are management genuinely interested in everything? For example, do they need to know that something was closed 6 months ago, or are they just interested in the last month?
I suppose what I'm getting at is you probably need a bit more information from management about what they want to use the report for so you can give them what they need rather than they want. There's a temptation to say you want everything because you don't understand the capabilities of the solution or what you're going to use it for, and my recommendation would be to challenge them on this so you can better present things (giving them what they need rather than what they want).
In terms of what you're looking to do, I'll openly admit I'm not clued up on everything DevOps related but I doubt you'll be able to report at a project level within DevOps. I think what you'd need to do is set up your query, which would look across all projects in your organisation, and then export the results to Excel. From there I'd create a pivot table (or perhaps more than 1) with the data that you need. Have Project names down the left side (row headers), and bring in whatever else you need as columns. I think that's probably a good quick win to get something in front of your management team, and then you could challenge from there - almost picking holes in it so that they realise that the business decisions that they'd make from this may not be fully informed, and suggesting some changes. From experience, it's probably better to consider it almost as a prototype and not get bogged down with a solution at this stage because you may be asked for changes when they can visualise what they've originally asked for. Once management is happy you could look at other solutions to provide the report, but Excel is typically a good starting point I've found in the past when working on something new like this.

Google Fit Rest Api Step Counts inconsistent and different from Fit App

This seems to be a common enough problem that there are a lot of entries when one googles for help but nothing has helped me yet.
I am finding that the results provided by the REST API for estimated_steps are wildly different from those that appear in the device app.
I am running a fetch task for users via cron job on a PHP/Laravel app.
I'm using this https://developers.google.com/fit/scenarios/read-daily-step-total - estimated_steps to retrieve the step count.
Some days the data is correct. Some days its wildly different. For instance, on one given day, the REST API gives step count of 5661 while the app shows 11,108. Then there are six seven days when the stream is correct.
Has anyone faced this sort of behavior? I've tested for timezone differences, logged and analyzed the response json to see if i'm making some obvious mistake, but nope.
You may check this How do I get the same step count as the Google Fit app? documentation. Be noted that even when using the right data source, your step count may still be different from that of the Google Fit app.
This could be due to one of the following reasons:
On Wear, the Fit MicroApp when connected will display step counts queried on the phone and transferred over via the Wearable APIs. Other MicroApps accessing local-only data will only get watch steps. We are working on making this easier for developers.
Sometimes the step calculation code for the Google Fit app is updated with bug fixes before we are able to release the fixes to developers (which requires a Google Play Services release). We are also working on making it possible for developers to access fixes at the same time.
The Fit app uses a specific data source for steps and it adds some functionality (which can be seen on the documentation) on top of the default merged steps stream.
You can access the "estimated" steps stream as shown here:
derived:com.google.step_count.delta:com.google.android.gms:estimated_steps
Hope this helps!

Submit user form only if it`s unique and have not been submitted before

I spent hours researching the best way to do this but so far have no answer. Hope someone can help.
Basically, here is the issue we are trying to solve:
We need to collect information about our products in the company. To do this, we were planning to use some kind of forms (as an example - google forms). Idea is that one product can be submitted only once. How can I do it? Google forms only allow me to limit number of responses from one IP which is NOT what I need...
do you must use some third part service?
i think maybe you need code this logic by yourself. you can use the common way to solve the problem, for example, use the user_id & product_id.

Getting issues before a time from github

The github issues api provides a way to get all the issues that were updated at or after a particular time, but is there a way I can get all the issues before a particular time?
If the issues API isn't a good fit, you can falbback on the Issues Search API.
That would at least allows to search from oldest to most recent and filter out the ones that are past the time specified initially.
Example
Let’s say you want to find the oldest unresolved Python bugs on Windows. Your query might look something like this.
https://api.github.com/search/issues?q=windows+label:bug+language:python+state:open&sort=created&order=asc
Filter the result of that query and what is left is all the issues before a certain date.

Automatically generated test data to a DB from a schema?

I have a discussion-db, and I need a great amount of test data, for different sized samples. Please, see the ready SELECT, JOIN and CREATE-queries, please scroll down in the link.
How can I automatically generate test data to the db?
How to generate test data in different sized samples?
Is there some ready tool?
Here are a couple of suggestions for free tools that generate test data:
Databene Benerator: supports many JDBC-capable database brands, uses XML format compatible with DbUnit, GPL license.
Super Smack: originally a load-test tool for MySQL, it also supports PostgreSQL and it includes a generator of mock data.
A current version of Super Smack appears to be available here
I asked a similar question here on StackOverflow in February, and the two choices above seemed like the best options.
I know this question is super dated, but I was looking for the answer to this exact question today and I came across this:
http://wiki.postgresql.org/wiki/Sample_Databases
Out of the options listed (including built in tools like pgbench), pgFoundry has several compelling options that works perfectly for the test cases I am working on.
I thought it might help someone like me, so there it is.
I'm not sure how to get automatically generated data and insert it into the database (I'm sure you could pull it off with a python script or something), but if you're just looking for endless blabbering to stick into a db, this should be helpful.
I'm not a postres person, but in many other DBs I've used, a simple mechanism for generating large quantities of test data is a cross join. The technique is particularly useful for generating large quantities of test data.
Here's a nice blog post on it (SQL Server specific though).